Media Streams

All media streams support the following properties:

Property	Type	Description
`name`	string	Uniquely identifies the stream.
`format`	string	Four character code that identifies the media sample format.
`language`	string	ISO-639 alpha2 `en` or alpha3 `eng` code that identifies the language.
`region`	string	ISO-3166 alpha2 `US` or alpha3 `USA` code that identifies the country or region.
`sample_rate`	object	Media sample rate specified as a rational number (e.g. 30000/1001, 48000/1024).
`duration`	integer	Duration of the stream in ticks.
`offset`	integer	Offset from the start of the stream in ticks.
`bit_rate`	integer	Average bit rate of the media stream.
`optional`	boolean	Indicates whether a composition input stream is optional. If the input stream is not present then all output stream references are removed from the composition.
`properties`	object	Stream specific properties.
`extension`	array	Lists the stream property extensions.
`sample`	array	Lists the media samples in the stream.

`sample_rate`

Property	Type	Description
numerator	integer	Number of ticks per second. For video streams this is the frame rate (e.g. 25) or a multiple of the frame rate (e.g 30000). For audio streams this is the audio sample rate in Hz (e.g. 48000).
denominator	integer	Nominal duration of a media sample in ticks (default is 1).

`video`

Represents a stream of video samples. Video streams support the following additional properties:

Property	Type	Description
`width`	integer	Stored image width in pixels.
`height`	integer	Stored image height in pixels.
`clean_aperture`	object	A rectangle that identifies clean image dimensions.
`pixel_aspect_ratio`	object	Pixel Aspect Ratio (PAR) of the displayed image as a rational number.
`orientation`	object	Indicates the orientation of the stored image.
`field_order`	enum	Indicates whether the frame is progressive or interlaced (upper or lower field first).
`bit_depth`	integer	Number of bits per color component.
`chroma_subsampling`	object	Subsampling of the chroma components.
`chroma_location`	enum	Location of the chroma samples relative to the luminance samples.
`color_primaries`	enum	Identifies the location (in XYZ space) of the red (R), green (G) and blue (B) color primaries and reference white point (W).
`color_primaries`	integer	ISO 23008-2 enumeration.
`transfer_characteristics`	enum	Identifies the the Opto-Electronic Transfer Function (OETF) used to convert between scene linear light levels and nonlinear component values.
transfer_characteristics	integer	ISO 23008-2 enumeration.
`matrix_coefficients`	enum	Identifies a set of matrix coefficients used to convert between color primary (RGB) and color difference (YUV) values.
`matrix_coefficients`	integer	ISO 23008-2 enumeration.
`video_range`	enum	Identifies the range of signal values that represent the real component values in the normalized range of 0.0 (black) to 1.0 (peak white).
`reference_black`	number	Specifies the normalized signal value that represents 0% reflectivity.
`reference_white`	number	Specifies the normalized signal value that represents 100% reflectivity.
`dynamic_range`	number	Specifies the dynamic range compression `ratio`.
`dynamic_range`	object	Specifies the dynamic range compression `black` and `white` levels.

`clean_aperture`

A rectangle that identifies clean image dimensions.

Property	Type	Description
`top`	integer	Inset from the top of the image in pixels.
`bottom`	integer	Inset from the bottom of the image in pixels.
`left`	integer	Inset from the left edge of the image in pixels.
`right`	integer	Inset from the right edge of the image in pixels.

`pixel_aspect_ratio`

Pixel Aspect Ratio (PAR) of the displayed image as a rational number.

Property	Type	Description
numerator	integer	The pixel width.
denominator	integer	The pixel height.

`orientation`

Indicates the orientation of the stored image.

Property	Type	Description
`rotation`	integer	Image rotation in degrees (0, 90, 180 or 270 CCW).
`mirrored`	boolean	Indicates whether the image is mirrored.

`field_order`

Indicates whether the frame is progressive or interlaced (upper or lower field first).

Enum	Value	Description
`unknown`	0
`progressive`	1	Progressive frame
`upper`	2	Interlaced upper field first
`top`	2	Interlaced upper field first
`lower`	3	Interlaced lower field first
`bottom`	4	Interlaced lower field first

`chroma_subsampling`

Subsampling of the chroma components horizontally and vertically. The human visual system is less sensitive to variations in color (chrominance) than in brightness (luminance). Chroma subsampling takes advantage of this difference to reduce the data rate of a video stream.

Property	Type	Description
`horizontal`	integer	Subsampling on the horizontal axis (1, 2 or 4).
`vertical`	integer	Subsampling on the vertical axis (1, 2 or 4).

Chroma subsampling is commonly expressed as a three-part ratio J:a:b that describes the number of luminance and chrominance samples in a conceptual region that is J pixels wide and 2 pixels high:

J: horizontal sampling reference (usually 4).
a: number of chroma samples (Cr, Cb) in the first row of J pixels.
b: number of additional chroma samples in the second row (either 0 or a).

The following table describes the common chroma subsampling schemes:

Subsampling	Horizontal	Vertical
4:4:4	1	1
4:4:0	1	2
4:2:2	2	1
4:2:0	2	2
4:1:1	4	1

`chroma_location`

Location of the chroma samples relative to the luminance samples.

Enum	Value	Description
unknown	0
cosited	1	Chroma samples are co-sited with the luminance samples on each line (MPEG-2 4:2:2).
interstitial	2	Chroma samples are sited horizontally midway between luminance samples and midway between adjacent lines (MPEG-1 4:2:0).
quincunx	2	Same as `interstitial`.
vertical_midpoint	3	Chroma samples are sited vertically midway between the luminance samples in each column (MPEG-2 4:2:0).
horizontally_cosited	3	Same as `vertical_midpoint`.
horizontal_midpoint	4	Chroma samples are sited horizontally midway between luminance samples on each line
vertically_cosited	4	Same as `horizontal_midpoint`
line_alternating	5	Chroma samples are co-sited horizontally. Vertically the CR and CB samples are co-sited on alternating pairs of lines (DV 4:2:0)
cosited_out_of_phase	5	Same as `line_alternating`.

`color_primaries`

Identifies the location (in XYZ space) of the red (R), green (G) and blue (B) color primaries and reference white point (W).

Enum	Value	Description
unknown	0
bt709	1	ITU-R BT.709
unspecified	2
bt470	4	ITU-R BT.470-6 System M
pal	5	ITU-R BT.601 625
ntsc	6	ITU-R BT.601 525
bt2020	9	ITU-R BT.2020
xyz	10	SMPTE ST 428-1 (CIE 1931 XYZ)
p3dci	11	SMPTE RP 431-2 (2011, P3-DCI)
p3d65	12	SMPTE EG 432-1 (2010, P3-D65)
p3d60	131	P3-D60 (ACES Cinema)

`transfer_characteristics`

Identifies the the Opto-Electronic Transfer Function (OETF) used to convert between scene linear light levels and nonlinear component values.

Enum	Value	Description
unknown	0
bt709	1	ITU-R BT.709
unspecified	2
bt601	6	ITU-R BT.601-6
linear	8	Linear
bt2020	14	ITU-R BT.2020
pq	16	SMPTE ST 2084
st428	17	SMPTE ST 428-1
hlg	18	ARIB STD-B67
slog3	130	Sony S-LOG3

`matrix_coefficients`

Identifies a set of matrix coefficients used to convert between color primary (RGB) and color difference (YUV) values.

Enum	Value	Description
identity	0	IEC 61966-2-1 (RGB), SMPTE ST 428-1 (XYZ)
bt709	1	ITU-R BT.709
unspecified	2
pal	5	ITU-R BT.601-6 625
ntsc	6	ITU-R BT.601-6 525
bt2020	9	ITU-R BT.2020 non-constant luminance
bt2020_2	10	ITU-R BT.2020 constant luminance

`video_range`

Identifies the range of signal values that represent the real component values in the normalized range of 0.0 (black) to 1.0 (peak white).

Enum	Value	Description
unknown	0
narrow	1	64 - 940 (10 bit)
full	2	0 - 1023 (10 bit)
sony	3	512 - 65535 (16 bit)

`reference_black`

Specifies the normalized component value that represents 0% reflectivity. This value is used to scale the black level between different dynamic range systems.

For example the reference black level of 0.1 candelas per square meter is 0.0623 (6.23%) for SMPTE ST2084 (PQ) and 0.0 for standard dynamic range systems (e.g. BT.709).

Note that this value is a function of the current transfer_characteristics, for example, 0.0632 = PQ (0.1).

`reference_white`

Specifies the normalized component value that represents 100% reflectivity. This value is used to scale the white level between different dynamic range systems.

For example the reference white level of 203 candelas per square meter is 0.58 (58%) for SMPTE ST2084 (PQ), 0.75 (75%) for ARIB STD-B67 (HLG) and 1.0 (100%) for standard dynamic range systems (e.g. BT.709).

Note that this value is a function of the current transfer_characteristics, for example, 0.58 = PQ (203).

`dynamic_range`

Specifies the compression used to convert from a high dynamic range system to a lower (or standard) dynamic range system. Dynamic range compression preserves some or all of the highlights (and low lights) in the original system.

In any dynamic range system a normalized linear light level value of 0.0 represents the minimum black level and 1.0 represents the maximum white level. When the linear light levels are scaled to a lower dynamic range the normalized values may be less than 0 or greater than 1.0.

By default light levels outside the legal range are clipped when converted to integer signal values (at a specific bit_depth). For a full range signal the legal range is [0.0, 1.0]. For a narrow range signal the legal range is approximately [-0.07, 1.09].

Property	Type	Description
`ratio`	number	Specifies the dynamic range compression ratio for the Extended Reinhard tone mapping operator.
`black`	number	Specifies the normalized linear white level. Values above this level are compressed into the legal range.
`white`	number	Specifies the normalized linear black level. Values below this level are compressed into the legal range.

`audio`

Represents a stream of audio samples. Audio streams support the following media specific properties:

Property	Type	Description
`channels`	integer	Number of audio channels in the stream.
`label`	array	Identifies the speaker label for each audio channel. A value of zero indicates the channel label is unspecified.
`program`	array	Identifies the program number for each audio channel. A value of zero indicates the program number is unspecified.
`coding`	array	Identifies the coding of each audio channel. A value fo zero indicates the channel contains PCM audio.
`content`	array	Identifies the audio content of each channel. A value of zero indicates the audio content is unknown.

`label`

Value	Speaker	Label	Speaker	Label	Speaker	Label
1	Left	`L`	Front Left	`FL`
2	Right	`R`	Front Right	`FR`
3	Center	`C`	Front Center	`FC`
4	Low Frequency Enhancement	`LFE`
5	Right Surround	`Rs`	Back Right	`BR`
6	Left Surround	`Ls`	Back Left	`BL`
7	Left Center	`Lc`	Front Left Center	`FLC`
8	Right Center	`Rc`	Front Right Center	`FRC`
9	Center Surround	`Cs`	Back Center	`BC`
10	Left Surround Direct	`Lsd`	Side Left	`SL`
11	Right Surround Direct	`Rsd`	Side Right	`SR`
12	Top Center Surround	`Ts`	Top Center	`TC`
13	Left Top Front	`Ltf`	Top Front Left	`TFL`
14	Center Top Front	`Ctf`	Top Front Center	`TFC`
15	Right Top Front	`Rtf`	Top Front Right	`TFR`
16	Left Top Rear	`Ltr`	Top Rear Left	`Trl`	Top Back Left	`TBL`
17	Center Top Rear	`Ctr`	Top Rear Center	`Trc`	Top Back Center	`TBC`
18	Right Top Rear	`Rtr`	Top Rear Right	`Trr`	Top Back Right	`TBR`
19	Left Top Side	`Lts`	Top Side Left	`TSL`
20	Right Top Side	`Rts`	Top Side Right	`TSR`
33	Left Total	`Lt`
34	Right Total	`Rt`
35	Mono	`M`
65	Left Wide	`Lw`
66	Right Wide	`Rw`
68	Low Frequency Enhancement	`LFE2`
70	Left Rear Surround	`Lrs`
71	Right Rear Surround	`Rrs`

`program`

The program number differentiates channels having the same speaker label. For example the following table illustrates some common multi-program channel assignments from SMPTE ST2035:

Channel	9f	11c
1	L1	L1
2	R1	R1
3	M2	C1
4	M3	LFE1
5		Ls1
6		Rs1
7		L2
8		R2

`coding`

For audio streams with a PCM format the coding property indicates whether a channel contains non-audio data. The following channel coding formats are supported.

Format	Description
`ac-3`	Dolby Digital
`ec-3`	Dolby Digital Plus, Dolby Atmos
`dlbe`	Dolby E

`content`

The content property is a four character code that identifies the audio content for each channel. An audio program consists of multiple audio elements (dialog, music, effects, etc). Each element may span one or more channels. The audio elements listed below are defined in EBU R123 and SMPTE ST2035:

Enum	Value	Description
commentary	1	A speech element that is combined with an internal sound element t form a complete mix.
complete_mix	2	A mix consisting of all the elements required to form a standalone audio program.
dialog	3	The primary speech element of a program.
effects	4	Sound effects.
hearing_impaired	5	A mix of the program prepared for the hearing impaired.
international_sound	6	A mix consisting of all elements required to form a program except for the commentary element. International sound is usually defined as including all "on screen" sound elements and excluding any "off screen" commentary.
music	7	Music sound track.
clean_effects	8	A mix consisting of all elements required to form a program except for the dialog element(s).
secondary_audio	9	An alternate mix of the program typically containing a second language.
visually_impaired	10	A complete mix of the program including a narrative description of the video, or verbal description of the visual scene.

`subtitle`

Represents a stream of subtitle samples.