Welcome to Face API Dataset’s documentation!¶
face_api_dataset.dataset module¶
-
class
face_api_dataset.dataset.OutOfFrameLandmarkStrategy(value)¶ Bases:
enum.EnumAn enumeration.
-
IGNORE= 'ignore'¶
-
CLIP= 'clip'¶
-
static
clip_landmarks_(landmarks, height, width)¶ - Parameters
landmarks (Dict[int, Tuple[float, float]]) –
height (int) –
width (int) –
- Return type
Dict[int, Tuple[float, float]]
-
-
class
face_api_dataset.dataset.Grouping(value)¶ Bases:
enum.EnumDifferent modalities of grouping Synthesis AI dataset.
-
NONE= 'NONE'¶ Each image is treated independently.
The size of the dataset is #scenes * #cameras * #frames (assuming the same number of the cameras/frames per scene).
-
SCENE= 'SCENE'¶ Items with the same scene are grouped into the list.
The size of the dataset is #scenes. Each element is a List[Item] with the same `SCENE_ID.
-
CAMERA= 'CAMERA'¶ Items with the same camera are grouped into the list.
The size of the dataset is #cameras. Each element is a List[Item] with the same `CAMERA_NAME.
-
SCENE_CAMERA= 'SCENE_CAMERA'¶ Items are grouped first by camera and then by scene. List of frames for a particular scene is indexed by scene_id.
The size of the dataset is #scenes * #cameras , each element is a List[Item] is a list of consecutive frames for given scene and camera.
-
-
class
face_api_dataset.dataset.FaceApiDataset(root, modalities=None, segments=None, face_segments=None, face_bbox_pad=0, grouping=<Grouping.NONE: 'NONE'>, out_of_frame_landmark_strategy=<OutOfFrameLandmarkStrategy.IGNORE: 'ignore'>, transform=None)¶ Bases:
SequenceSynthesis AI face dataset.
This class provides access to all the modalities available in Synthesis AI generated datasets.
-
SEGMENTS= {'arm_lower_left': 46, 'arm_lower_right': 47, 'arm_upper_left': 48, 'arm_upper_right': 49, 'background': 0, 'beard': 1, 'body': 2, 'brow': 3, 'cheek_left': 4, 'cheek_right': 5, 'chin': 6, 'clothing': 7, 'cornea_left': 103, 'cornea_right': 104, 'default': 0, 'ear_left': 8, 'ear_right': 9, 'eye_left': 10, 'eye_right': 11, 'eyebrow_left': 94, 'eyebrow_right': 95, 'eyebrows': 89, 'eyelashes': 12, 'eyelashes_left': 92, 'eyelashes_right': 93, 'eyelid': 13, 'eyelid_left': 92, 'eyelid_right': 93, 'eyes': 14, 'finger1_mid_bottom_left': 52, 'finger1_mid_bottom_right': 53, 'finger1_mid_left': 54, 'finger1_mid_right': 55, 'finger1_mid_top_left': 56, 'finger1_mid_top_right': 57, 'finger2_mid_bottom_left': 58, 'finger2_mid_bottom_right': 59, 'finger2_mid_left': 60, 'finger2_mid_right': 61, 'finger2_mid_top_left': 62, 'finger2_mid_top_right': 63, 'finger3_mid_bottom_left': 64, 'finger3_mid_bottom_right': 65, 'finger3_mid_left': 66, 'finger3_mid_right': 67, 'finger3_mid_top_left': 68, 'finger3_mid_top_right': 69, 'finger4_mid_bottom_left': 70, 'finger4_mid_bottom_right': 71, 'finger4_mid_left': 72, 'finger4_mid_right': 73, 'finger4_mid_top_left': 74, 'finger4_mid_top_right': 75, 'finger5_mid_bottom_left': 76, 'finger5_mid_bottom_right': 77, 'finger5_mid_left': 78, 'finger5_mid_right': 79, 'finger5_mid_top_left': 80, 'finger5_mid_top_right': 81, 'foot_left': 88, 'foot_right': 89, 'forehead': 15, 'glasses': 16, 'glasses_frame': 96, 'glasses_lens_left': 97, 'glasses_lens_right': 98, 'hair': 17, 'hand_left': 50, 'hand_right': 51, 'head': 18, 'headphones': 19, 'headwear': 20, 'jaw': 21, 'jowl': 22, 'leg_lower_left': 84, 'leg_lower_right': 85, 'leg_upper_left': 86, 'leg_upper_right': 87, 'lip_lower': 23, 'lip_upper': 24, 'mask': 25, 'mouth': 26, 'mouthbag': 27, 'mustache': 28, 'nails_left': 82, 'nails_right': 83, 'neck': 29, 'nose': 30, 'nose_outer': 31, 'nostrils': 32, 'pupil_left': 90, 'pupil_right': 91, 'sclera_left': 101, 'sclera_right': 102, 'shoulders': 33, 'smile_line': 34, 'teeth': 35, 'temples': 36, 'tongue': 37, 'torso_lower_left': 40, 'torso_lower_right': 41, 'torso_mid_left': 42, 'torso_mid_right': 43, 'torso_upper_left': 44, 'torso_upper_right': 45, 'undereye': 38, 'undereye_left': 99, 'undereye_right': 100}¶ Default segmentation mapping.
-
FACE_SEGMENTS= ['brow', 'cheek_left', 'cheek_right', 'chin', 'eye_left', 'eye_right', 'eyelid_left', 'eyelid_right', 'eyes', 'jaw', 'jowl', 'lip_lower', 'lip_upper', 'mouth', 'mouthbag', 'nose', 'nose_outer', 'nostrils', 'smile_line', 'teeth', 'undereye', 'eyelashes_left', 'eyelashes_right', 'eyebrow_left', 'eyebrow_right', 'undereye_left', 'undereye_right']¶ Segments included in the bounding box.
-
N_LANDMARKS= 68¶
-
__init__(root, modalities=None, segments=None, face_segments=None, face_bbox_pad=0, grouping=<Grouping.NONE: 'NONE'>, out_of_frame_landmark_strategy=<OutOfFrameLandmarkStrategy.IGNORE: 'ignore'>, transform=None)¶ Initializes FaceApiDataset from the data in
rootdirectory, loading listedmodalities. All dataset files should be located in the root directory:root ├── metadata.jsonl 0.cam_default.f_1.exr 0.cam_default.f_1.rgb.png 0.cam_default.f_1.info.json 0.cam_default.f_1.segments.png 0.cam_default.f_1.alpha.tif 0.cam_default.f_1.depth.tif 0.cam_default.f_1.normals.tif 1.cam_default.f_1.exr 1.cam_default.f_1.rgb.png 1.cam_default.f_1.info.json 1.cam_default.f_1.segments.png 1.cam_default.f_1.alpha.tif 1.cam_default.f_1.depth.tif 1.cam_default.f_1.normals.tifNo extra files are allowed, but all files which are not needed to load modalities listed may be absent.
For instance, if only RGB and SEGMENTS modalities are needed the following files are enough:
root ├── 0.cam_default.f_1.rgb.png 0.cam_default.f_1.info.json 0.cam_default.f_1.segments.png 1.cam_default.f_1.rgb.png 1.cam_default.f_1.info.json 1.cam_default.f_1.segments.pngIf any of the required files are absent for at least one image, or any redundant files are located in the directory,
ValueErroris raised.To work with segment modalities,
segmentsparameter is used. This shows how to map segments name to integer representation. For example, to work with background (0)/face (1)/hair (2)/body (3) segmentation it may look like this:segments = { 'default': 0, 'background': 0, 'beard': 1, 'body': 3, 'brow': 1, 'cheek_left': 1, 'cheek_right': 1, 'chin': 1, 'clothing': 3, 'ear_left': 1, 'ear_right': 1, 'eye_left': 1, 'eye_right': 1, 'eyelashes': 1, 'eyelid': 1, 'eyes': 1, 'forehead': 1, 'glasses': 0, 'hair': 2, 'head': 1, 'headphones': 0, 'headwear': 0, 'jaw': 1, 'jowl': 1, 'lip_lower': 1, 'lip_upper': 1, 'mask': 0, 'mouth': 1, 'mouthbag': 1, 'mustache': 1, 'neck': 3, 'nose': 1, 'nose_outer': 1, 'nostrils': 1, 'shoulders': 3, 'smile_line': 1, 'teeth': 1, 'temples': 1, 'tongue': 1, 'undereye': 1 }
In addition
transformfunction may be provided. If it is not None it will be applied to modality dict after each__get__()call.For example, to flip rgb image and its segmentation:
def transform(modalities: List[Modality]) -> List[Modality]: ret = modalities.copy() ret[Modality.RGB] = flip(modalities[Modality.RGB]) ret[Modality.SEGMENTS] = flip(modalities[Modality.SEGMENTS]) return ret
- Parameters
root (Union[str,bytes,os.PathLike]) – Dataset root. All image files (ex. 0.cam_default.f_1.rgb.png) should be located directly in this directory.
modalities (Optional[List[Modality]]) – List of modalities to load. If None all the modalities are loaded.
segments (Optional[Dict[str,int]]) – Mapping from object names to segmentation id. If None
SEGMENTSmapping is used.face_segments (Optional[List[str]]) – List of object names considered to incorporate a face. If None
FACE_SEGMENTSmapping is used.face_bbox_pad (int) – Extra area in pixels to pad around height and width of face bounding box.
transform (Optional[Callable[[Dict[Modality,Any]],Dict[Modality,Any]]]) – Additional transforms to apply to modalities.
grouping (face_api_dataset.dataset.Grouping) –
out_of_frame_landmark_strategy (face_api_dataset.dataset.OutOfFrameLandmarkStrategy) –
- Return type
None
-
property
segments¶ Segment mapping for the dataset.
- Type
Dict[str, int]
-
get_group_index()¶ - Return type
pandas.core.frame.DataFrame
-
classmethod
get_euler_angles(matrix, order, degrees)¶ Euler angles by matrix 4x4.
- Parameters
matrix (np.ndarray) – matrix 4x4.
order (str) – axes order (‘xyz’,’yzx’,’zxy’).
degrees (bool) – whether use degrees or radians.
- Return type
numpy.ndarray
-
classmethod
get_quaternion(matrix)¶ Quaternion by matrix 4x4.
- Parameters
matrix (np.ndarray) – matrix 4x4
- Return type
numpy.ndarray
-
classmethod
get_shift(matrix)¶ Return shift vector by matrix 4x4 which contains both rotation and shift.
- Parameters
matrix (np.ndarray) – matrix 4x4
- Return type
numpy.ndarray
-
-
class
face_api_dataset.modality.Modality(value)¶ Bases:
enum.EnumDifferent modalities of Synthesis AI dataset. All image modalities are in [y][x][channel] format, with axis going as follows:
┌-----> x | | v y
-
SCENE_ID= 1¶ Scene ID (rendered scene number).
Type: int.
-
RGB= 2¶ RGB image modality.
Type: ndarray[uint8]. Channels: 3.
-
NORMALS= 3¶ Normals image. All values are in [-1,1] range.
Type: ndarray[float16]. Channels: 3.
-
DEPTH= 4¶ Depth Image. All values are positive floats. Background has depth=0.
Type: ndarray[float16]. Channels: 1.
-
ALPHA= 5¶ Alpha Image. 0 - means complete transparency, 255 - solid object.
Type: ndarray[uint8]. Channels: 1.
-
SEGMENTS= 6¶ Segmentation map. Semantic of different values is defined by segments mapping.
Type: ndarray[uint16]. Channels: 1.
-
LANDMARKS_IBUG68= 7¶ iBUG-68 landmarks. Each landmark is given by name and two coordinates (x,y) in pixels. Each keypoint is a 2D projection of a 3D landmark.
Type: Dict[int, Tuple[float, float]. Should have no more than 68 points.
-
LANDMARKS_CONTOUR_IBUG68= 8¶ iBUG-68 contour landmarks. Each landmark is given by two coordinates (name, x,y) in pixels. Each keypoint is defined in a similar manner to human labelers marking 2D face kepoints.
Type: Dict[int, Tuple[float, float]. Should have no more than 68 points.
-
LANDMARKS_KINECT_V2= 9¶ Kinect v2 landmarks. Each landmark by name and two coordinates (x,y) in pixels.
Type: Dict[int, Tuple[float, float]. Should have no more than 32 points.
-
LANDMARKS_MEDIAPIPE= 10¶ MediaPipe pose landmarks. Each landmark is given by name and two coordinates (x,y) in pixels.
Type: Dict[int, Tuple[float, float]. Should have no more than 33 points.
-
LANDMARKS_COCO= 11¶ COCO whole body landmarks. Each landmark is given by name and two coordinates (x,y) in pixels.
Type: Dict[int, Tuple[float, float]. Should have no more than 133 points.
-
LANDMARKS_MPEG4= 12¶ MPEG4 landmarks. Each landmark is given by name and two coordinates (x,y) in pixels.
Type: Dict[int, Tuple[float, float].
-
LANDMARKS_3D_IBUG68= 13¶ iBUG-68 landmarks in 3D. Each landmark is given by name and three coordinates (x,y,z) in camera space.
Type: Dict[int, Tuple[float, float, float]]. Should have no more than 68 points.
-
LANDMARKS_3D_KINECT_V2= 14¶ Kinect v2 landmarks in 3D. Each landmark is given by name and three coordinates (x,y,z) in camera space.
Type: Dict[int, Tuple[float, float, float]]. Should have no more than 32 points.
-
LANDMARKS_3D_MEDIAPIPE= 15¶ MediaPipe pose landmarks in 3D. Each landmark is given by name and three coordinates (x,y,z) in camera space.
Type: Dict[int, Tuple[float, float, float]]. hould have no more than 33 points.
-
LANDMARKS_3D_COCO= 16¶ COCO whole body landmarks in 3D. Each landmark is given by name and three coordinates (x,y,z) in camera space.
Type: Dict[str, Tuple[float, float, float]]. Should have no more than 133 points.
-
LANDMARKS_3D_MPEG4= 17¶ MPEG4 landmarks in 3D. Each landmark is given by name and three coordinates (x,y,z) in camera space.
Type: Dict[int, Tuple[float, float, float]].
-
PUPILS= 18¶ Coordinates of pupils. Each pupil is given by name and two coordinates (x,y) in pixels.
Type: Dict[str, Tuple[float, float]].
-
PUPILS_3D= 19¶ Coordinates of pupils in 3D. Each pupil is given by name and three coordinates (x,y,z) in camera space.
Type: Dict[str, Tuple[float, float, float]].
-
IDENTITY= 20¶ Unique ID of the person on the image.
Type: int.
-
IDENTITY_METADATA= 21¶ Additional metadata about the person on the image.
Format:
{'gender': 'female'|'male', 'age': int, 'weight_kg': int, 'height_cm': int, 'id': int, 'ethnicity': 'arab'|'asian'|'black'|'hisp'|'white'}
-
HAIR= 22¶ Hair metadata. If no hair are present None is returned.
Format:
{'relative_length': float64, 'relative_density': float64, 'style': str, 'color_seed': float64, 'color': str}
-
FACIAL_HAIR= 23¶ Facial hair metadata. If no facial hair are present None is returned.
Format:
{'relative_length': float64, 'relative_density': float64, 'style': str, 'color_seed': float64, 'color': str}
-
EXPRESSION= 24¶ Expression and its intensity.
Format:
{'intensity': float64, 'name': str}
-
GAZE= 25¶ Gaze direction in camera space.
Format:
{'horizontal_angle': ndarray[float64] **Shape**: `(3,)`. 'vertical_angle': ndarray[float64] **Shape**: `(3,)`.}
-
FACE_BBOX= 26¶ Face bounding box in the format (left, top, right, bottom) in pixels.
Type: Tuple[int, int, int, int].
-
HEAD_TO_CAM= 27¶ Transformation matrix from the head to the camera coordinate system.
Type: ndarray[float32]. Shape: (4, 4).
-
CAM_TO_HEAD= 28¶ Transformation matrix from the camera to the head coordinate system.
Type: ndarray[float32]. Shape: (4, 4).
-
HEAD_TO_WORLD= 29¶ Transformation matrix from the head to the world coordinate system.
Type: ndarray[float32]. Shape: (4, 4).
-
WORLD_TO_HEAD= 30¶ Transformation matrix from the world to the head coordinate system.
Type: ndarray[float32]. Shape: (4, 4).
-
CAM_TO_WORLD= 31¶ Transformation matrix from the camera to the world coordinate system.
Type: ndarray[float32]. Shape: (4, 4).
-
WORLD_TO_CAM= 32¶ Transformation matrix from the world to the camera coordinate system.
Type: ndarray[float32]. Shape: (4, 4).
-
CAM_INTRINSICS= 33¶ Camera intrinsics matrix in OpenCV format: https://docs.opencv.org/3.4.15/dc/dbb/tutorial_py_calibration.html.
Type: ndarray[float32]. Shape: (4, 4).
-
CAMERA_NAME= 34¶ Camera name consisting of lowercase alphanumeric characters. Usually used when more than one are defined in a scene. Default is “cam_default”.
Type: str.
-
FRAME_NUM= 35¶ Frame number used for consecutive animation frames. Used for animation.
Type: int.
-
LANDMARKS_3D_MEDIAPIPE_FACE= 36¶ MediaPipe dense face landmarks in 3D. Each landmark is given by three coordinates (x,y,z) in camera space.
Type: ndarray[float32]. Shape: (468, 3).
-
LANDMARKS_3D_SAI= 37¶ SAI dense face landmarks in 3D. Each landmark is given by three coordinates (x,y,z) in camera space.
Type: ndarray[float32]. Shape: (4840, 3).
-
LANDMARKS_MEDIAPIPE_FACE= 38¶ MediaPipe dense face landmarks. Each landmark is given by two coordinates (x,y) in pixels.
Type: ndarray[float32]. Shape: (468, 2).
-
LANDMARKS_SAI= 39¶ SAI dense face landmarks. Each landmark is given by two coordinates (x,y) in pixels.
Type: ndarray[float32]. Shape: (4840, 2).
-