Media Streams
All media streams support the following properties:
Property | Type | Description |
---|---|---|
name | string | Uniquely identifies the stream. |
format | string | Four character code that identifies the media sample format. |
language | string | ISO-639 alpha2 en or alpha3 eng code that identifies the language. |
region | string | ISO-3166 alpha2 US or alpha3 USA code that identifies the country or region. |
sample_rate | object | Media sample rate specified as a rational number (e.g. 30000/1001, 48000/1024). |
duration | integer | Duration of the stream in ticks. |
offset | integer | Offset from the start of the stream in ticks. |
bit_rate | integer | Average bit rate of the media stream. |
optional | boolean | Indicates whether a composition input stream is optional. If the input stream is not present then all output stream references are removed from the composition. |
properties | object | Stream specific properties. |
extension | array | Lists the stream property extensions. |
sample | array | Lists the media samples in the stream. |
sample_rate
sample_rate
Property | Type | Description |
---|---|---|
numerator | integer | Number of ticks per second. For video streams this is the frame rate (e.g. 25) or a multiple of the frame rate (e.g 30000). For audio streams this is the audio sample rate in Hz (e.g. 48000). |
denominator | integer | Nominal duration of a media sample in ticks (default is 1). |
video
video
Represents a stream of video samples. Video streams support the following additional properties:
Property | Type | Description |
---|---|---|
width | integer | Stored image width in pixels. |
height | integer | Stored image height in pixels. |
clean_aperture | object | A rectangle that identifies clean image dimensions. |
pixel_aspect_ratio | object | Pixel Aspect Ratio (PAR) of the displayed image as a rational number. |
orientation | object | Indicates the orientation of the stored image. |
field_order | enum | Indicates whether the frame is progressive or interlaced (upper or lower field first). |
bit_depth | integer | Number of bits per color component. |
chroma_subsampling | object | Subsampling of the chroma components. |
chroma_location | enum | Location of the chroma samples relative to the luminance samples. |
color_primaries | enum | Identifies the location (in XYZ space) of the red (R), green (G) and blue (B) color primaries and reference white point (W). |
color_primaries | integer | ISO 23008-2 enumeration. |
transfer_characteristics | enum | Identifies the the Opto-Electronic Transfer Function (OETF) used to convert between scene linear light levels and nonlinear component values. |
transfer_characteristics | integer | ISO 23008-2 enumeration. |
matrix_coefficients | enum | Identifies a set of matrix coefficients used to convert between color primary (RGB) and color difference (YUV) values. |
matrix_coefficients | integer | ISO 23008-2 enumeration. |
video_range | enum | Identifies the range of signal values that represent the real component values in the normalized range of 0.0 (black) to 1.0 (peak white). |
reference_black | number | Specifies the normalized signal value that represents 0% reflectivity. |
reference_white | number | Specifies the normalized signal value that represents 100% reflectivity. |
dynamic_range | number | Specifies the dynamic range compression ratio . |
dynamic_range | object | Specifies the dynamic range compression black and white levels. |
clean_aperture
clean_aperture
A rectangle that identifies clean image dimensions.
Property | Type | Description |
---|---|---|
top | integer | Inset from the top of the image in pixels. |
bottom | integer | Inset from the bottom of the image in pixels. |
left | integer | Inset from the left edge of the image in pixels. |
right | integer | Inset from the right edge of the image in pixels. |
pixel_aspect_ratio
pixel_aspect_ratio
Pixel Aspect Ratio (PAR) of the displayed image as a rational number.
Property | Type | Description |
---|---|---|
numerator | integer | The pixel width. |
denominator | integer | The pixel height. |
orientation
orientation
Indicates the orientation of the stored image.
Property | Type | Description |
---|---|---|
rotation | integer | Image rotation in degrees (0, 90, 180 or 270 CCW). |
mirrored | boolean | Indicates whether the image is mirrored. |
field_order
field_order
Indicates whether the frame is progressive or interlaced (upper or lower field first).
Enum | Value | Description |
---|---|---|
unknown | 0 | |
progressive | 1 | Progressive frame |
upper | 2 | Interlaced upper field first |
top | 2 | Interlaced upper field first |
lower | 3 | Interlaced lower field first |
bottom | 4 | Interlaced lower field first |
chroma_subsampling
chroma_subsampling
Subsampling of the chroma components horizontally and vertically. The human visual system is less sensitive to variations in color (chrominance) than in brightness (luminance). Chroma subsampling takes advantage of this difference to reduce the data rate of a video stream.
Property | Type | Description |
---|---|---|
horizontal | integer | Subsampling on the horizontal axis (1, 2 or 4). |
vertical | integer | Subsampling on the vertical axis (1, 2 or 4). |
Chroma subsampling is commonly expressed as a three-part ratio J:a:b
that describes the number of luminance and chrominance samples in a conceptual region that is J pixels wide and 2 pixels high:
- J: horizontal sampling reference (usually 4).
- a: number of chroma samples (Cr, Cb) in the first row of J pixels.
- b: number of additional chroma samples in the second row (either 0 or a).
The following table describes the common chroma subsampling schemes:
Subsampling | Horizontal | Vertical |
---|---|---|
4:4:4 | 1 | 1 |
4:4:0 | 1 | 2 |
4:2:2 | 2 | 1 |
4:2:0 | 2 | 2 |
4:1:1 | 4 | 1 |
chroma_location
chroma_location
Location of the chroma samples relative to the luminance samples.
Enum | Value | Description |
---|---|---|
unknown | 0 | |
cosited | 1 | Chroma samples are co-sited with the luminance samples on each line (MPEG-2 4:2:2). |
interstitial | 2 | Chroma samples are sited horizontally midway between luminance samples and midway between adjacent lines (MPEG-1 4:2:0). |
quincunx | 2 | Same as interstitial . |
vertical_midpoint | 3 | Chroma samples are sited vertically midway between the luminance samples in each column (MPEG-2 4:2:0). |
horizontally_cosited | 3 | Same as vertical_midpoint . |
horizontal_midpoint | 4 | Chroma samples are sited horizontally midway between luminance samples on each line |
vertically_cosited | 4 | Same as horizontal_midpoint |
line_alternating | 5 | Chroma samples are co-sited horizontally. Vertically the CR and CB samples are co-sited on alternating pairs of lines (DV 4:2:0) |
cosited_out_of_phase | 5 | Same as line_alternating . |
color_primaries
color_primaries
Identifies the location (in XYZ space) of the red (R), green (G) and blue (B) color primaries and reference white point (W).
Enum | Value | Description |
---|---|---|
unknown | 0 | |
bt709 | 1 | ITU-R BT.709 |
unspecified | 2 | |
bt470 | 4 | ITU-R BT.470-6 System M |
pal | 5 | ITU-R BT.601 625 |
ntsc | 6 | ITU-R BT.601 525 |
bt2020 | 9 | ITU-R BT.2020 |
xyz | 10 | SMPTE ST 428-1 (CIE 1931 XYZ) |
p3dci | 11 | SMPTE RP 431-2 (2011, P3-DCI) |
p3d65 | 12 | SMPTE EG 432-1 (2010, P3-D65) |
p3d60 | 131 | P3-D60 (ACES Cinema) |
transfer_characteristics
transfer_characteristics
Identifies the the Opto-Electronic Transfer Function (OETF) used to convert between scene linear light levels and nonlinear component values.
Enum | Value | Description |
---|---|---|
unknown | 0 | |
bt709 | 1 | ITU-R BT.709 |
unspecified | 2 | |
bt601 | 6 | ITU-R BT.601-6 |
linear | 8 | Linear |
bt2020 | 14 | ITU-R BT.2020 |
pq | 16 | SMPTE ST 2084 |
st428 | 17 | SMPTE ST 428-1 |
hlg | 18 | ARIB STD-B67 |
slog3 | 130 | Sony S-LOG3 |
matrix_coefficients
matrix_coefficients
Identifies a set of matrix coefficients used to convert between color primary (RGB) and color difference (YUV) values.
Enum | Value | Description |
---|---|---|
identity | 0 | IEC 61966-2-1 (RGB), SMPTE ST 428-1 (XYZ) |
bt709 | 1 | ITU-R BT.709 |
unspecified | 2 | |
pal | 5 | ITU-R BT.601-6 625 |
ntsc | 6 | ITU-R BT.601-6 525 |
bt2020 | 9 | ITU-R BT.2020 non-constant luminance |
bt2020_2 | 10 | ITU-R BT.2020 constant luminance |
video_range
video_range
Identifies the range of signal values that represent the real component values in the normalized range of 0.0 (black) to 1.0 (peak white).
Enum | Value | Description |
---|---|---|
unknown | 0 | |
narrow | 1 | 64 - 940 (10 bit) |
full | 2 | 0 - 1023 (10 bit) |
sony | 3 | 512 - 65535 (16 bit) |
reference_black
reference_black
Specifies the normalized component value that represents 0% reflectivity. This value is used to scale the black level between different dynamic range systems.
For example the reference black level of 0.1 candelas per square meter is 0.0623 (6.23%) for SMPTE ST2084 (PQ) and 0.0 for standard dynamic range systems (e.g. BT.709).
Note that this value is a function of the current transfer_characteristics
, for example, 0.0632 = PQ (0.1).
reference_white
reference_white
Specifies the normalized component value that represents 100% reflectivity. This value is used to scale the white level between different dynamic range systems.
For example the reference white level of 203 candelas per square meter is 0.58 (58%) for SMPTE ST2084 (PQ), 0.75 (75%) for ARIB STD-B67 (HLG) and 1.0 (100%) for standard dynamic range systems (e.g. BT.709).
Note that this value is a function of the current transfer_characteristics
, for example, 0.58 = PQ (203).
dynamic_range
dynamic_range
Specifies the compression used to convert from a high dynamic range system to a lower (or standard) dynamic range system. Dynamic range compression preserves some or all of the highlights (and low lights) in the original system.
In any dynamic range system a normalized linear light level value of 0.0 represents the minimum black level and 1.0 represents the maximum white level. When the linear light levels are scaled to a lower dynamic range the normalized values may be less than 0 or greater than 1.0.
By default light levels outside the legal range are clipped when converted to integer signal values (at a specific bit_depth
). For a full
range signal the legal range is [0.0, 1.0]. For a narrow
range signal the legal range is approximately [-0.07, 1.09].
Property | Type | Description |
---|---|---|
ratio | number | Specifies the dynamic range compression ratio for the Extended Reinhard tone mapping operator. |
black | number | Specifies the normalized linear white level. Values above this level are compressed into the legal range. |
white | number | Specifies the normalized linear black level. Values below this level are compressed into the legal range. |
audio
audio
Represents a stream of audio samples. Audio streams support the following media specific properties:
Property | Type | Description |
---|---|---|
channels | integer | Number of audio channels in the stream. |
label | array | Identifies the speaker label for each audio channel. A value of zero indicates the channel label is unspecified. |
program | array | Identifies the program number for each audio channel. A value of zero indicates the program number is unspecified. |
coding | array | Identifies the coding of each audio channel. A value fo zero indicates the channel contains PCM audio. |
content | array | Identifies the audio content of each channel. A value of zero indicates the audio content is unknown. |
label
label
Value | Speaker | Label | Speaker | Label | Speaker | Label |
---|---|---|---|---|---|---|
1 | Left | L | Front Left | FL | ||
2 | Right | R | Front Right | FR | ||
3 | Center | C | Front Center | FC | ||
4 | Low Frequency Enhancement | LFE | ||||
5 | Right Surround | Rs | Back Right | BR | ||
6 | Left Surround | Ls | Back Left | BL | ||
7 | Left Center | Lc | Front Left Center | FLC | ||
8 | Right Center | Rc | Front Right Center | FRC | ||
9 | Center Surround | Cs | Back Center | BC | ||
10 | Left Surround Direct | Lsd | Side Left | SL | ||
11 | Right Surround Direct | Rsd | Side Right | SR | ||
12 | Top Center Surround | Ts | Top Center | TC | ||
13 | Left Top Front | Ltf | Top Front Left | TFL | ||
14 | Center Top Front | Ctf | Top Front Center | TFC | ||
15 | Right Top Front | Rtf | Top Front Right | TFR | ||
16 | Left Top Rear | Ltr | Top Rear Left | Trl | Top Back Left | TBL |
17 | Center Top Rear | Ctr | Top Rear Center | Trc | Top Back Center | TBC |
18 | Right Top Rear | Rtr | Top Rear Right | Trr | Top Back Right | TBR |
19 | Left Top Side | Lts | Top Side Left | TSL | ||
20 | Right Top Side | Rts | Top Side Right | TSR | ||
33 | Left Total | Lt | ||||
34 | Right Total | Rt | ||||
35 | Mono | M | ||||
65 | Left Wide | Lw | ||||
66 | Right Wide | Rw | ||||
68 | Low Frequency Enhancement | LFE2 | ||||
70 | Left Rear Surround | Lrs | ||||
71 | Right Rear Surround | Rrs |
program
program
The program number differentiates channels having the same speaker label
. For example the following table illustrates some common multi-program channel assignments from SMPTE ST2035:
Channel | 9f | 11c |
---|---|---|
1 | L1 | L1 |
2 | R1 | R1 |
3 | M2 | C1 |
4 | M3 | LFE1 |
5 | Ls1 | |
6 | Rs1 | |
7 | L2 | |
8 | R2 |
coding
coding
For audio
streams with a PCM format
the coding
property indicates whether a channel contains non-audio data. The following channel coding formats are supported.
Format | Description |
---|---|
ac-3 | Dolby Digital |
ec-3 | Dolby Digital Plus, Dolby Atmos |
dlbe | Dolby E |
content
content
The content
property is a four character code that identifies the audio content for each channel. An audio program consists of multiple audio elements (dialog, music, effects, etc). Each element may span one or more channels. The audio elements listed below are defined in EBU R123 and SMPTE ST2035:
Enum | Value | Description |
---|---|---|
commentary | 1 | A speech element that is combined with an internal sound element t form a complete mix. |
complete_mix | 2 | A mix consisting of all the elements required to form a standalone audio program. |
dialog | 3 | The primary speech element of a program. |
effects | 4 | Sound effects. |
hearing_impaired | 5 | A mix of the program prepared for the hearing impaired. |
international_sound | 6 | A mix consisting of all elements required to form a program except for the commentary element. International sound is usually defined as including all "on screen" sound elements and excluding any "off screen" commentary. |
music | 7 | Music sound track. |
clean_effects | 8 | A mix consisting of all elements required to form a program except for the dialog element(s). |
secondary_audio | 9 | An alternate mix of the program typically containing a second language. |
visually_impaired | 10 | A complete mix of the program including a narrative description of the video, or verbal description of the visual scene. |
subtitle
subtitle
Represents a stream of subtitle samples.
Updated about 1 year ago