Welcome to Face API Dataset’s documentation!¶

face_api_dataset.dataset module¶

class face_api_dataset.dataset.OutOfFrameLandmarkStrategy(value)¶

Bases: enum.Enum

An enumeration.

IGNORE = 'ignore'¶

CLIP = 'clip'¶

static clip_landmarks_(landmarks, height, width)¶

Parameters

landmarks (Dict[int, Tuple[float, float]]) –
height (int) –
width (int) –

Return type

Dict[int, Tuple[float, float]]

class face_api_dataset.dataset.Grouping(value)¶

Bases: enum.Enum

Different modalities of grouping Synthesis AI dataset.

NONE = 'NONE'¶

Each image is treated independently.

The size of the dataset is #scenes * #cameras * #frames (assuming the same number of the cameras/frames per scene).

SCENE = 'SCENE'¶

Items with the same scene are grouped into the list.

The size of the dataset is #scenes. Each element is a List[Item] with the same `SCENE_ID.

CAMERA = 'CAMERA'¶

Items with the same camera are grouped into the list.

The size of the dataset is #cameras. Each element is a List[Item] with the same `CAMERA_NAME.

SCENE_CAMERA = 'SCENE_CAMERA'¶

Items are grouped first by camera and then by scene. List of frames for a particular scene is indexed by scene_id.

The size of the dataset is #scenes * #cameras , each element is a List[Item] is a list of consecutive frames for given scene and camera.

class face_api_dataset.dataset.FaceApiDataset(root, modalities=None, segments=None, face_segments=None, face_bbox_pad=0, grouping=<Grouping.NONE: 'NONE'>, out_of_frame_landmark_strategy=<OutOfFrameLandmarkStrategy.IGNORE: 'ignore'>, transform=None)¶

Bases: Sequence

Synthesis AI face dataset.

This class provides access to all the modalities available in Synthesis AI generated datasets.

SEGMENTS = {'arm_lower_left': 46, 'arm_lower_right': 47, 'arm_upper_left': 48, 'arm_upper_right': 49, 'background': 0, 'beard': 1, 'body': 2, 'brow': 3, 'cheek_left': 4, 'cheek_right': 5, 'chin': 6, 'clothing': 7, 'cornea_left': 103, 'cornea_right': 104, 'default': 0, 'ear_left': 8, 'ear_right': 9, 'eye_left': 10, 'eye_right': 11, 'eyebrow_left': 94, 'eyebrow_right': 95, 'eyebrows': 89, 'eyelashes': 12, 'eyelashes_left': 92, 'eyelashes_right': 93, 'eyelid': 13, 'eyelid_left': 92, 'eyelid_right': 93, 'eyes': 14, 'finger1_mid_bottom_left': 52, 'finger1_mid_bottom_right': 53, 'finger1_mid_left': 54, 'finger1_mid_right': 55, 'finger1_mid_top_left': 56, 'finger1_mid_top_right': 57, 'finger2_mid_bottom_left': 58, 'finger2_mid_bottom_right': 59, 'finger2_mid_left': 60, 'finger2_mid_right': 61, 'finger2_mid_top_left': 62, 'finger2_mid_top_right': 63, 'finger3_mid_bottom_left': 64, 'finger3_mid_bottom_right': 65, 'finger3_mid_left': 66, 'finger3_mid_right': 67, 'finger3_mid_top_left': 68, 'finger3_mid_top_right': 69, 'finger4_mid_bottom_left': 70, 'finger4_mid_bottom_right': 71, 'finger4_mid_left': 72, 'finger4_mid_right': 73, 'finger4_mid_top_left': 74, 'finger4_mid_top_right': 75, 'finger5_mid_bottom_left': 76, 'finger5_mid_bottom_right': 77, 'finger5_mid_left': 78, 'finger5_mid_right': 79, 'finger5_mid_top_left': 80, 'finger5_mid_top_right': 81, 'foot_left': 88, 'foot_right': 89, 'forehead': 15, 'glasses': 16, 'glasses_frame': 96, 'glasses_lens_left': 97, 'glasses_lens_right': 98, 'hair': 17, 'hand_left': 50, 'hand_right': 51, 'head': 18, 'headphones': 19, 'headwear': 20, 'jaw': 21, 'jowl': 22, 'leg_lower_left': 84, 'leg_lower_right': 85, 'leg_upper_left': 86, 'leg_upper_right': 87, 'lip_lower': 23, 'lip_upper': 24, 'mask': 25, 'mouth': 26, 'mouthbag': 27, 'mustache': 28, 'nails_left': 82, 'nails_right': 83, 'neck': 29, 'nose': 30, 'nose_outer': 31, 'nostrils': 32, 'pupil_left': 90, 'pupil_right': 91, 'sclera_left': 101, 'sclera_right': 102, 'shoulders': 33, 'smile_line': 34, 'teeth': 35, 'temples': 36, 'tongue': 37, 'torso_lower_left': 40, 'torso_lower_right': 41, 'torso_mid_left': 42, 'torso_mid_right': 43, 'torso_upper_left': 44, 'torso_upper_right': 45, 'undereye': 38, 'undereye_left': 99, 'undereye_right': 100}¶: Default segmentation mapping.

FACE_SEGMENTS = ['brow', 'cheek_left', 'cheek_right', 'chin', 'eye_left', 'eye_right', 'eyelid_left', 'eyelid_right', 'eyes', 'jaw', 'jowl', 'lip_lower', 'lip_upper', 'mouth', 'mouthbag', 'nose', 'nose_outer', 'nostrils', 'smile_line', 'teeth', 'undereye', 'eyelashes_left', 'eyelashes_right', 'eyebrow_left', 'eyebrow_right', 'undereye_left', 'undereye_right']¶: Segments included in the bounding box.

N_LANDMARKS = 68¶

__init__(root, modalities=None, segments=None, face_segments=None, face_bbox_pad=0, grouping=<Grouping.NONE: 'NONE'>, out_of_frame_landmark_strategy=<OutOfFrameLandmarkStrategy.IGNORE: 'ignore'>, transform=None)¶

Initializes FaceApiDataset from the data in root directory, loading listed modalities. All dataset files should be located in the root directory:

root
├── metadata.jsonl
cam_default.f_1.exr
cam_default.f_1.rgb.png
cam_default.f_1.info.json
cam_default.f_1.segments.png
cam_default.f_1.alpha.tif
cam_default.f_1.depth.tif
cam_default.f_1.normals.tif
cam_default.f_1.exr
cam_default.f_1.rgb.png
cam_default.f_1.info.json
cam_default.f_1.segments.png
cam_default.f_1.alpha.tif
cam_default.f_1.depth.tif
cam_default.f_1.normals.tif

No extra files are allowed, but all files which are not needed to load modalities listed may be absent.

For instance, if only RGB and SEGMENTS modalities are needed the following files are enough:

root
├── 0.cam_default.f_1.rgb.png
cam_default.f_1.info.json
cam_default.f_1.segments.png
cam_default.f_1.rgb.png
cam_default.f_1.info.json
cam_default.f_1.segments.png

If any of the required files are absent for at least one image, or any redundant files are located in the directory, ValueError is raised.

To work with segment modalities, segments parameter is used. This shows how to map segments name to integer representation. For example, to work with background (0)/face (1)/hair (2)/body (3) segmentation it may look like this:

segments = { 'default': 0,
             'background': 0,
             'beard': 1,
             'body': 3,
             'brow': 1,
             'cheek_left': 1,
             'cheek_right': 1,
             'chin': 1,
             'clothing': 3,
             'ear_left': 1,
             'ear_right': 1,
             'eye_left': 1,
             'eye_right': 1,
             'eyelashes': 1,
             'eyelid': 1,
             'eyes': 1,
             'forehead': 1,
             'glasses': 0,
             'hair': 2,
             'head': 1,
             'headphones': 0,
             'headwear': 0,
             'jaw': 1,
             'jowl': 1,
             'lip_lower': 1,
             'lip_upper': 1,
             'mask': 0,
             'mouth': 1,
             'mouthbag': 1,
             'mustache': 1,
             'neck': 3,
             'nose': 1,
             'nose_outer': 1,
             'nostrils': 1,
             'shoulders': 3,
             'smile_line': 1,
             'teeth': 1,
             'temples': 1,
             'tongue': 1,
             'undereye': 1
            }

In addition transform function may be provided. If it is not None it will be applied to modality dict after each __get__() call.

For example, to flip rgb image and its segmentation:

def transform(modalities: List[Modality]) -> List[Modality]:
    ret = modalities.copy()
    ret[Modality.RGB] = flip(modalities[Modality.RGB])
    ret[Modality.SEGMENTS] = flip(modalities[Modality.SEGMENTS])
    return ret

Parameters

root (Union[str,bytes,os.PathLike]) – Dataset root. All image files (ex. 0.cam_default.f_1.rgb.png) should be located directly in this directory.
modalities (Optional[List[Modality]]) – List of modalities to load. If None all the modalities are loaded.
segments (Optional[Dict[str,int]]) – Mapping from object names to segmentation id. If None SEGMENTS mapping is used.
face_segments (Optional[List[str]]) – List of object names considered to incorporate a face. If None FACE_SEGMENTS mapping is used.
face_bbox_pad (int) – Extra area in pixels to pad around height and width of face bounding box.
transform (Optional[Callable[[Dict[Modality,Any]],Dict[Modality,Any]]]) – Additional transforms to apply to modalities.
grouping (face_api_dataset.dataset.Grouping) –
out_of_frame_landmark_strategy (face_api_dataset.dataset.OutOfFrameLandmarkStrategy) –

Return type

None

property segments¶

Segment mapping for the dataset.

Type: Dict[str, int]

property modalities¶

List of the loaded modalities.

Type: List[Modality]

get_group_index()¶

Return type: pandas.core.frame.DataFrame

classmethod get_euler_angles(matrix, order, degrees)¶

Euler angles by matrix 4x4.

Parameters

matrix (np.ndarray) – matrix 4x4.
order (str) – axes order (‘xyz’,’yzx’,’zxy’).
degrees (bool) – whether use degrees or radians.

Return type

numpy.ndarray

classmethod get_quaternion(matrix)¶

Quaternion by matrix 4x4.

Parameters: matrix (np.ndarray) – matrix 4x4
Return type: numpy.ndarray

classmethod get_shift(matrix)¶

Return shift vector by matrix 4x4 which contains both rotation and shift.

Parameters: matrix (np.ndarray) – matrix 4x4
Return type: numpy.ndarray

class face_api_dataset.modality.Modality(value)¶

Bases: enum.Enum

Different modalities of Synthesis AI dataset. All image modalities are in [y][x][channel] format, with axis going as follows:

┌-----> x
|
|
v
y

SCENE_ID = 1¶

Scene ID (rendered scene number).

Type: int.

RGB = 2¶

RGB image modality.

Type: ndarray[uint8]. Channels: 3.

NORMALS = 3¶

Normals image. All values are in [-1,1] range.

Type: ndarray[float16]. Channels: 3.

DEPTH = 4¶

Depth Image. All values are positive floats. Background has depth=0.

Type: ndarray[float16]. Channels: 1.

ALPHA = 5¶

Alpha Image. 0 - means complete transparency, 255 - solid object.

Type: ndarray[uint8]. Channels: 1.

SEGMENTS = 6¶

Segmentation map. Semantic of different values is defined by segments mapping.

Type: ndarray[uint16]. Channels: 1.

LANDMARKS_IBUG68 = 7¶

iBUG-68 landmarks. Each landmark is given by name and two coordinates (x,y) in pixels. Each keypoint is a 2D projection of a 3D landmark.

Type: Dict[int, Tuple[float, float]. Should have no more than 68 points.

LANDMARKS_CONTOUR_IBUG68 = 8¶

iBUG-68 contour landmarks. Each landmark is given by two coordinates (name, x,y) in pixels. Each keypoint is defined in a similar manner to human labelers marking 2D face kepoints.

Type: Dict[int, Tuple[float, float]. Should have no more than 68 points.

LANDMARKS_KINECT_V2 = 9¶

Kinect v2 landmarks. Each landmark by name and two coordinates (x,y) in pixels.

Type: Dict[int, Tuple[float, float]. Should have no more than 32 points.

LANDMARKS_MEDIAPIPE = 10¶

MediaPipe pose landmarks. Each landmark is given by name and two coordinates (x,y) in pixels.

Type: Dict[int, Tuple[float, float]. Should have no more than 33 points.

LANDMARKS_COCO = 11¶

COCO whole body landmarks. Each landmark is given by name and two coordinates (x,y) in pixels.

Type: Dict[int, Tuple[float, float]. Should have no more than 133 points.

LANDMARKS_MPEG4 = 12¶

MPEG4 landmarks. Each landmark is given by name and two coordinates (x,y) in pixels.

Type: Dict[int, Tuple[float, float].

LANDMARKS_3D_IBUG68 = 13¶

iBUG-68 landmarks in 3D. Each landmark is given by name and three coordinates (x,y,z) in camera space.

Type: Dict[int, Tuple[float, float, float]]. Should have no more than 68 points.

LANDMARKS_3D_KINECT_V2 = 14¶

Kinect v2 landmarks in 3D. Each landmark is given by name and three coordinates (x,y,z) in camera space.

Type: Dict[int, Tuple[float, float, float]]. Should have no more than 32 points.

LANDMARKS_3D_MEDIAPIPE = 15¶

MediaPipe pose landmarks in 3D. Each landmark is given by name and three coordinates (x,y,z) in camera space.

Type: Dict[int, Tuple[float, float, float]]. hould have no more than 33 points.

LANDMARKS_3D_COCO = 16¶

COCO whole body landmarks in 3D. Each landmark is given by name and three coordinates (x,y,z) in camera space.

Type: Dict[str, Tuple[float, float, float]]. Should have no more than 133 points.

LANDMARKS_3D_MPEG4 = 17¶

MPEG4 landmarks in 3D. Each landmark is given by name and three coordinates (x,y,z) in camera space.

Type: Dict[int, Tuple[float, float, float]].

PUPILS = 18¶

Coordinates of pupils. Each pupil is given by name and two coordinates (x,y) in pixels.

Type: Dict[str, Tuple[float, float]].

PUPILS_3D = 19¶

Coordinates of pupils in 3D. Each pupil is given by name and three coordinates (x,y,z) in camera space.

Type: Dict[str, Tuple[float, float, float]].

IDENTITY = 20¶

Unique ID of the person on the image.

Type: int.

IDENTITY_METADATA = 21¶

Additional metadata about the person on the image.

Format:

{'gender': 'female'|'male',
 'age': int,
 'weight_kg': int,
 'height_cm': int,
 'id': int,
 'ethnicity': 'arab'|'asian'|'black'|'hisp'|'white'}

HAIR = 22¶

Hair metadata. If no hair are present None is returned.

Format:

{'relative_length': float64,
 'relative_density': float64,
 'style': str,
 'color_seed': float64,
 'color': str}

FACIAL_HAIR = 23¶

Facial hair metadata. If no facial hair are present None is returned.

Format:

{'relative_length': float64,
 'relative_density': float64,
 'style': str,
 'color_seed': float64,
 'color': str}

EXPRESSION = 24¶

Expression and its intensity.

Format:

{'intensity': float64, 
'name': str}

GAZE = 25¶

Gaze direction in camera space.

Format:

{'horizontal_angle': ndarray[float64] **Shape**: `(3,)`.
 'vertical_angle': ndarray[float64] **Shape**: `(3,)`.}

FACE_BBOX = 26¶

Face bounding box in the format (left, top, right, bottom) in pixels.

Type: Tuple[int, int, int, int].

HEAD_TO_CAM = 27¶

Transformation matrix from the head to the camera coordinate system.