synthesisai package
Submodules
synthesisai.data_types module
synthesisai.dataset module
- class synthesisai.dataset.Grouping(value)[source]
Bases:
EnumDifferent modalities of grouping Synthesis AI dataset.
- CAMERA = 'CAMERA'
Items with the same camera are grouped into the list.
The size of the dataset is #cameras. Each element is a List[Item] with the same `CAMERA_NAME.
- NONE = 'NONE'
Each image is treated independently.
The size of the dataset is #scenes * #cameras * #frames (assuming the same number of the cameras/frames per scene).
- SCENE = 'SCENE'
Items with the same scene are grouped into the list.
The size of the dataset is #scenes. Each element is a List[Item] with the same `SCENE_ID.
- SCENE_CAMERA = 'SCENE_CAMERA'
Items are grouped first by camera and then by scene. List of frames for a particular scene is indexed by scene_id.
The size of the dataset is #scenes * #cameras , each element is a List[Item] is a list of consecutive frames for given scene and camera.
- class synthesisai.dataset.SaiDataset(root: Union[str, PathLike], modalities: Optional[List[Modality]] = None, body_segmentation_mapping: Optional[Dict[str, int]] = None, clothing_segmentation_mapping: Optional[Dict[str, int]] = None, face_segmentation_classes: Optional[List[str]] = None, face_bbox_pad: int = 0, grouping: Grouping = Grouping.NONE, out_of_frame_landmark_strategy: OutOfFrameLandmarkStrategy = OutOfFrameLandmarkStrategy.IGNORE, transform: Optional[Callable[[Dict[Modality, Any]], Dict[Modality, Any]]] = None)[source]
Bases:
SequenceSynthesis AI dataset.
This class provides access to all the modalities available in Synthesis AI generated datasets.
- BODY_SEGMENTATION_MAPPING = {'arm_lower_left': 54, 'arm_lower_right': 55, 'arm_upper_left': 56, 'arm_upper_right': 57, 'background': 0, 'beard': 1, 'body': 104, 'brow': 2, 'caruncle_left': 29, 'caruncle_right': 30, 'cheek_left': 3, 'cheek_right': 4, 'chin': 5, 'clothing': 105, 'cornea_left': 106, 'cornea_right': 107, 'default': 0, 'ear_left': 6, 'ear_right': 7, 'eye_fluid': 114, 'eye_fluid_left': 35, 'eye_fluid_right': 36, 'eye_left': 115, 'eye_right': 116, 'eyebrow_left': 31, 'eyebrow_right': 32, 'eyebrows': 108, 'eyelashes': 109, 'eyelashes_left': 33, 'eyelashes_right': 34, 'eyelid': 110, 'eyelid_left': 112, 'eyelid_right': 113, 'eyes': 111, 'finger1_mid_bottom_left': 58, 'finger1_mid_bottom_right': 59, 'finger1_mid_left': 60, 'finger1_mid_right': 61, 'finger1_mid_top_left': 62, 'finger1_mid_top_right': 63, 'finger2_mid_bottom_left': 64, 'finger2_mid_bottom_right': 65, 'finger2_mid_left': 66, 'finger2_mid_right': 67, 'finger2_mid_top_left': 68, 'finger2_mid_top_right': 69, 'finger3_mid_bottom_left': 70, 'finger3_mid_bottom_right': 71, 'finger3_mid_left': 72, 'finger3_mid_right': 73, 'finger3_mid_top_left': 74, 'finger3_mid_top_right': 75, 'finger4_mid_bottom_left': 76, 'finger4_mid_bottom_right': 77, 'finger4_mid_left': 78, 'finger4_mid_right': 79, 'finger4_mid_top_left': 80, 'finger4_mid_top_right': 81, 'finger5_mid_bottom_left': 82, 'finger5_mid_bottom_right': 83, 'finger5_mid_left': 84, 'finger5_mid_right': 85, 'finger5_mid_top_left': 86, 'finger5_mid_top_right': 87, 'foot_left': 92, 'foot_right': 93, 'forehead': 8, 'glasses': 117, 'glasses_frame': 98, 'glasses_lens_left': 99, 'glasses_lens_right': 100, 'hair': 9, 'hand_left': 88, 'hand_right': 89, 'head': 10, 'headphones': 101, 'headwear': 102, 'iris_left': 37, 'iris_right': 38, 'jaw': 11, 'jowl': 12, 'leg_lower_left': 94, 'leg_lower_right': 95, 'leg_upper_left': 96, 'leg_upper_right': 97, 'lip_lower': 13, 'lip_upper': 14, 'lower_eyelid_left': 39, 'lower_eyelid_right': 40, 'mask': 103, 'mouth': 15, 'mouthbag': 16, 'mustache': 17, 'nails_left': 90, 'nails_right': 91, 'nape': 18, 'neck': 19, 'nose': 20, 'nose_outer': 21, 'nostrils': 22, 'orbit_left': 23, 'orbit_right': 24, 'pupil_left': 41, 'pupil_right': 42, 'sclera_left': 43, 'sclera_right': 44, 'shoulders': 47, 'smile_line': 25, 'teeth': 26, 'temples': 27, 'tongue': 28, 'torso_lower_left': 48, 'torso_lower_right': 49, 'torso_mid_left': 50, 'torso_mid_right': 51, 'torso_upper_left': 52, 'torso_upper_right': 53, 'undereye': 120, 'undereye_left': 118, 'undereye_right': 119, 'upper_eyelid_left': 45, 'upper_eyelid_right': 46}
Default body segmentation mapping.
- CLOTHING_SEGMENTATION_MAPPING = {'background': 0, 'default': 0, 'long sleeve dress': 2, 'long sleeve outerwear': 3, 'long sleeve shirt': 8, 'scarf': 12, 'shoe': 14, 'short sleeve dress': 6, 'short sleeve outerwear': 10, 'short sleeve shirt': 5, 'shorts': 9, 'skirt': 11, 'sling dress': 1, 'trousers': 4, 'vest': 13, 'vest dress': 7}
Default clothing segmentation mapping
- FACE_SEGMENTATION_CLASSES = ['brow', 'cheek_left', 'cheek_right', 'chin', 'eye_left', 'eye_right', 'eyelid_left', 'eyelid_right', 'eyes', 'forehead', 'jaw', 'jowl', 'lip_lower', 'lip_upper', 'mouth', 'mouthbag', 'nose', 'nose_outer', 'nostrils', 'smile_line', 'teeth', 'undereye', 'eyelashes_left', 'eyelashes_right', 'eyebrow_left', 'eyebrow_right', 'undereye_left', 'undereye_right']
Segmentation classes included in the face bounding box.
- property body_segmentation_mapping: Dict[str, int]
Body segmentation mapping for the dataset.
- Type
Dict[str, int]
- property clothing_segmentation_mapping: Dict[str, int]
Clothing segmentation mapping for the dataset.
- Type
Dict[str, int]
synthesisai.item_loader module
synthesisai.item_loader_factory module
synthesisai.item_loader_v1 module
synthesisai.item_loader_v2 module
synthesisai.modality module
- class synthesisai.modality.Modality(value)[source]
Bases:
EnumDifferent modalities of Synthesis AI dataset. All image modalities are in [y][x][channel] format, with axis going as follows:
┌-----> x | | v y
- BODY_SEGMENTATION = 5
Semantic segmentation map of various body parts.
Type: ndarray[uint16]. Channels: 1.
- CAMERA_NAME = 39
Camera name consisting of lowercase alphanumeric characters. Usually used when more than one are defined in a scene. Default is “cam_default”.
Type: str.
- CAM_INTRINSICS = 38
Camera intrinsics matrix in OpenCV format: https://docs.opencv.org/3.4.15/dc/dbb/tutorial_py_calibration.html.
Type: ndarray[float32]. Shape: (4, 4).
- CAM_TO_HEAD = 33
Transformation matrix from the camera to the head coordinate system.
Type: Dict[InstanceId, ndarray[float32]]. Shape: (4, 4).
- CAM_TO_WORLD = 36
Transformation matrix from the camera to the world coordinate system.
Type: ndarray[float32]. Shape: (4, 4).
- CLOTHING_SEGMENTATION = 6
Semantic segmentation map of different types of clothing.
Type: ndarray[uint16]. Channels: 1.
- DEPTH = 4
Depth Image. All values are positive floats. Background has depth=0.
Type: ndarray[float16]. Channels: 1.
- EXPRESSION = 26
Expression and its intensity.
Format:
{ instance_id: { 'intensity': float64, 'name': str }, ... }
- FACE_BBOX = 31
Face bounding box in the format (left, top, right, bottom) in pixels.
Type: Dict[InstanceId, Tuple[int, int, int, int]].
- FACIAL_HAIR = 25
Facial hair metadata. If no facial hair is present for a human, None is provided.
Format:
{ instance_id: { 'relative_length': float64, 'relative_density': float64, 'style': str, 'color_seed': float64, 'color': str }, ... }
- FRAME_NUM = 40
Frame number used for consecutive animation frames. Used for animation.
Type: int.
- GAZE = 28
Gaze direction in camera space.
Format:
{ instance_id: { 'horizontal_angle': ndarray[float64] **Shape**: `(3,)`. 'vertical_angle': ndarray[float64] **Shape**: `(3,)`. }, ... }
- GAZE_TARGET = 29
The target that the gaze direction points to.
Format:
{ instance_id: str }
- GESTURE = 27
Name of the gesture.
Type: str
- HAIR = 24
Hair metadata. If no hair are present None is returned.
Format:
{ instance_id: { 'relative_length': float64, 'relative_density': float64, 'style': str, 'color_seed': float64, 'color': str }, ... }
- HEAD_TO_CAM = 32
Transformation matrix from the head to the camera coordinate system.
Type: Dict[InstanceId, ndarray[float32]]. Shape: (4, 4).
- HEAD_TO_WORLD = 34
Transformation matrix from the head to the world coordinate system.
Type: Dict[InstanceId, ndarray[float32]]. Shape: (4, 4).
- HEAD_TURN = 30
The direction that the head is turned, relative to the body, provided in degrees.
Format:
{ "roll": int, "pitch": int, "yaw": int }
- IDENTITY = 22
Mapping from instance ID (ranging from 0 to the number of humans in the image) to human ID (corresponding to the ID provided in the job config)
Type: Dict[InstanceId, int].
- IDENTITY_METADATA = 23
Additional metadata about the people in the image.
Format:
{ instance_id: { 'gender': 'female'|'male', 'age': int, 'weight_kg': int, 'height_cm': int, 'id': int, 'ethnicity': 'arab'|'asian'|'black'|'hisp'|'white' }, ... }
- INSTANCE_SEGMENTATION = 7
Semantic segmentation map defining the various characters in the image.
Type: ndarray[uint16]. Channels: 1.
- LANDMARKS_3D_COCO = 18
COCO whole body landmarks in 3D. Each landmark is given by name and three coordinates (x,y,z) in camera space.
Type: Dict[InstanceId, Dict[LandmarkId, Landmark3d]]. Should have no more than 133 points.
- LANDMARKS_3D_IBUG68 = 15
iBUG-68 landmarks in 3D. Each landmark is given by name and three coordinates (x,y,z) in camera space.
Type: Dict[InstanceId, Dict[LandmarkId, Landmark3d]]. Should have no more than 68 points.
- LANDMARKS_3D_KINECT_V2 = 16
Kinect v2 landmarks in 3D. Each landmark is given by name and three coordinates (x,y,z) in camera space.
Type: Dict[InstanceId, Dict[LandmarkId, Landmark3d]]. Should have no more than 32 points.
- LANDMARKS_3D_MEDIAPIPE = 17
MediaPipe pose landmarks in 3D. Each landmark is given by name and three coordinates (x,y,z) in camera space.
Type: Dict[InstanceId, Dict[LandmarkId, Landmark3d]]. hould have no more than 33 points.
- LANDMARKS_3D_MEDIAPIPE_FACE = 41
MediaPipe dense face landmarks in 3D. Each landmark is given by three coordinates (x,y,z) in camera space.
Type: Dict[InstanceId, ndarray[float32]]. Shape: (468, 3).
- LANDMARKS_3D_MPEG4 = 19
MPEG4 landmarks in 3D. Each landmark is given by name and three coordinates (x,y,z) in camera space.
Type: Dict[InstanceId, Dict[LandmarkId, Landmark3d]].
- LANDMARKS_3D_SAI = 42
SAI dense face landmarks in 3D. Each landmark is given by three coordinates (x,y,z) in camera space.
Type: Dict[InstanceId, ndarray[float32]]. Shape: (4840, 3).
- LANDMARKS_COCO = 13
COCO whole body landmarks. Each landmark is given by name and two coordinates (x,y) in pixels.
Type: Dict[InstanceId, Dict[LandmarkId, Landmark2d]]. Should have no more than 133 points.
- LANDMARKS_CONTOUR_IBUG68 = 10
iBUG-68 contour landmarks. Each landmark is given by two coordinates (name, x,y) in pixels. Each keypoint is defined in a similar manner to human labelers marking 2D face kepoints.
Type: Dict[InstanceId, Dict[LandmarkId, Landmark2d]]. Should have no more than 68 points.
- LANDMARKS_IBUG68 = 9
iBUG-68 landmarks. Each landmark is given by name and two coordinates (x,y) in pixels. Each keypoint is a 2D projection of a 3D landmark.
Type: Dict[InstanceId, Dict[LandmarkId, Landmark2d]. Should have no more than 68 points.
- LANDMARKS_KINECT_V2 = 11
Kinect v2 landmarks. Each landmark by name and two coordinates (x,y) in pixels.
Type: Dict[InstanceId, Dict[LandmarkId, Landmark2d]]. Should have no more than 32 points.
- LANDMARKS_MEDIAPIPE = 12
MediaPipe pose landmarks. Each landmark is given by name and two coordinates (x,y) in pixels.
Type: Dict[InstanceId, Dict[LandmarkId, Landmark2d]]. Should have no more than 33 points.
- LANDMARKS_MEDIAPIPE_FACE = 43
MediaPipe dense face landmarks. Each landmark is given by two coordinates (x,y) in pixels.
Type: Dict[InstanceId, ndarray[float32]]. Shape: (468, 2).
- LANDMARKS_MPEG4 = 14
MPEG4 landmarks. Each landmark is given by name and two coordinates (x,y) in pixels.
Type: Dict[InstanceId, Dict[LandmarkId, Landmark2d]].
- LANDMARKS_SAI = 44
SAI dense face landmarks. Each landmark is given by two coordinates (x,y) in pixels.
Type: Dict[InstanceId, ndarray[float32]]. Shape: (4840, 2).
- NORMALS = 3
Normals image. All values are in [-1,1] range.
Type: ndarray[float16]. Channels: 3.
- PUPILS = 20
Coordinates of pupils. Each pupil is given by name and two coordinates (x,y) in pixels.
Type: Dict[InstanceId, Dict[str, Landmark2d]].
- PUPILS_3D = 21
Coordinates of pupils in 3D. Each pupil is given by name and three coordinates (x,y,z) in camera space.
Type: Dict[InstanceId, Dict[str, Landmark3d]].
- RGB = 2
RGB image modality.
Type: ndarray[uint8]. Channels: 3.
- SCENE_ID = 1
Scene ID (rendered scene number).
Type: int.
- UV = 8
UV Image. This is a 2-channel image containing UV coordinates, where the first channel corresponds to the U coordinate and the second corresponds to the V coordinate.
Type: ndarray[uint16]. Channels: 2.
- WORLD_TO_CAM = 37
Transformation matrix from the world to the camera coordinate system.
Type: ndarray[float32]. Shape: (4, 4).
- WORLD_TO_HEAD = 35
Transformation matrix from the world to the head coordinate system.
Type: Dict[InstanceId, ndarray[float32]]. Shape: (4, 4).
Module contents
Top-level package for synthesisai.