Media Streams

All media streams support the following properties:

PropertyTypeDescription
namestringUniquely identifies the stream.
formatstringFour character code that identifies the media sample format.
languagestringISO-639 alpha2 en or alpha3 eng code that identifies the language.
regionstringISO-3166 alpha2 US or alpha3 USA code that identifies the country or region.
sample_rateobjectMedia sample rate specified as a rational number (e.g. 30000/1001, 48000/1024).
durationintegerDuration of the stream in ticks.
offsetintegerOffset from the start of the stream in ticks.
bit_rateintegerAverage bit rate of the media stream.
optionalbooleanIndicates whether a composition input stream is optional.

If the input stream is not present then all output stream references are removed from the composition.
propertiesobjectStream specific properties.
extensionarrayLists the stream property extensions.
samplearrayLists the media samples in the stream.

sample_rate

PropertyTypeDescription
numeratorintegerNumber of ticks per second.

For video streams this is the frame rate (e.g. 25) or a multiple of the frame rate (e.g 30000).

For audio streams this is the audio sample rate in Hz (e.g. 48000).
denominatorintegerNominal duration of a media sample in ticks (default is 1).

video

Represents a stream of video samples. Video streams support the following additional properties:

PropertyTypeDescription
widthintegerStored image width in pixels.
heightintegerStored image height in pixels.
clean_apertureobjectA rectangle that identifies clean image dimensions.
pixel_aspect_ratioobjectPixel Aspect Ratio (PAR) of the displayed image as a rational number.
orientationobjectIndicates the orientation of the stored image.
field_orderenumIndicates whether the frame is progressive or interlaced (upper or lower field first).
bit_depthintegerNumber of bits per color component.
chroma_subsamplingobjectSubsampling of the chroma components.
chroma_locationenumLocation of the chroma samples relative to the luminance samples.
color_primariesenumIdentifies the location (in XYZ space) of the red (R), green (G) and blue (B) color primaries and reference white point (W).
color_primariesintegerISO 23008-2 enumeration.
transfer_characteristicsenumIdentifies the the Opto-Electronic Transfer Function (OETF) used to convert between scene linear light levels and nonlinear component values.
transfer_characteristicsintegerISO 23008-2 enumeration.
matrix_coefficientsenumIdentifies a set of matrix coefficients used to convert between color primary (RGB) and color difference (YUV) values.
matrix_coefficientsintegerISO 23008-2 enumeration.
video_rangeenumIdentifies the range of signal values that represent the real component values in the normalized range of 0.0 (black) to 1.0 (peak white).
reference_blacknumberSpecifies the normalized signal value that represents 0% reflectivity.
reference_whitenumberSpecifies the normalized signal value that represents 100% reflectivity.
dynamic_rangenumberSpecifies the dynamic range compression ratio.
dynamic_rangeobjectSpecifies the dynamic range compression black and white levels.

clean_aperture

A rectangle that identifies clean image dimensions.

PropertyTypeDescription
topintegerInset from the top of the image in pixels.
bottomintegerInset from the bottom of the image in pixels.
leftintegerInset from the left edge of the image in pixels.
rightintegerInset from the right edge of the image in pixels.

pixel_aspect_ratio

Pixel Aspect Ratio (PAR) of the displayed image as a rational number.

PropertyTypeDescription
numeratorintegerThe pixel width.
denominatorintegerThe pixel height.

orientation

Indicates the orientation of the stored image.

PropertyTypeDescription
rotationintegerImage rotation in degrees (0, 90, 180 or 270 CCW).
mirroredbooleanIndicates whether the image is mirrored.

field_order

Indicates whether the frame is progressive or interlaced (upper or lower field first).

EnumValueDescription
unknown0
progressive1Progressive frame
upper2Interlaced upper field first
top2Interlaced upper field first
lower3Interlaced lower field first
bottom4Interlaced lower field first

chroma_subsampling

Subsampling of the chroma components horizontally and vertically. The human visual system is less sensitive to variations in color (chrominance) than in brightness (luminance). Chroma subsampling takes advantage of this difference to reduce the data rate of a video stream.

PropertyTypeDescription
horizontalintegerSubsampling on the horizontal axis (1, 2 or 4).
verticalintegerSubsampling on the vertical axis (1, 2 or 4).

Chroma subsampling is commonly expressed as a three-part ratio J:a:b that describes the number of luminance and chrominance samples in a conceptual region that is J pixels wide and 2 pixels high:

  • J: horizontal sampling reference (usually 4).
  • a: number of chroma samples (Cr, Cb) in the first row of J pixels.
  • b: number of additional chroma samples in the second row (either 0 or a).

The following table describes the common chroma subsampling schemes:

SubsamplingHorizontalVertical
4:4:411
4:4:012
4:2:221
4:2:022
4:1:141

chroma_location

Location of the chroma samples relative to the luminance samples.

EnumValueDescription
unknown0
cosited1Chroma samples are co-sited with the luminance samples on each line (MPEG-2 4:2:2).
interstitial2Chroma samples are sited horizontally midway between luminance samples and midway between adjacent lines (MPEG-1 4:2:0).
quincunx2Same as interstitial.
vertical_midpoint3Chroma samples are sited vertically midway between the luminance samples in each column (MPEG-2 4:2:0).
horizontally_cosited3Same as vertical_midpoint.
horizontal_midpoint4Chroma samples are sited horizontally midway between luminance samples on each line
vertically_cosited4Same as horizontal_midpoint
line_alternating5Chroma samples are co-sited horizontally. Vertically the CR and CB samples are co-sited on alternating pairs of lines (DV 4:2:0)
cosited_out_of_phase5Same as line_alternating.

color_primaries

Identifies the location (in XYZ space) of the red (R), green (G) and blue (B) color primaries and reference white point (W).

EnumValueDescription
unknown0
bt7091ITU-R BT.709
unspecified2
bt4704ITU-R BT.470-6 System M
pal5ITU-R BT.601 625
ntsc6ITU-R BT.601 525
bt20209ITU-R BT.2020
xyz10SMPTE ST 428-1 (CIE 1931 XYZ)
p3dci11SMPTE RP 431-2 (2011, P3-DCI)
p3d6512SMPTE EG 432-1 (2010, P3-D65)
p3d60131P3-D60 (ACES Cinema)

transfer_characteristics

Identifies the the Opto-Electronic Transfer Function (OETF) used to convert between scene linear light levels and nonlinear component values.

EnumValueDescription
unknown0
bt7091ITU-R BT.709
unspecified2
bt6016ITU-R BT.601-6
linear8Linear
bt202014ITU-R BT.2020
pq16SMPTE ST 2084
st42817SMPTE ST 428-1
hlg18ARIB STD-B67
slog3130Sony S-LOG3

matrix_coefficients

Identifies a set of matrix coefficients used to convert between color primary (RGB) and color difference (YUV) values.

EnumValueDescription
identity0IEC 61966-2-1 (RGB), SMPTE ST 428-1 (XYZ)
bt7091ITU-R BT.709
unspecified2
pal5ITU-R BT.601-6 625
ntsc6ITU-R BT.601-6 525
bt20209ITU-R BT.2020 non-constant luminance
bt2020_210ITU-R BT.2020 constant luminance

video_range

Identifies the range of signal values that represent the real component values in the normalized range of 0.0 (black) to 1.0 (peak white).

EnumValueDescription
unknown0
narrow164 - 940 (10 bit)
full20 - 1023 (10 bit)
sony3512 - 65535 (16 bit)

reference_black

Specifies the normalized component value that represents 0% reflectivity. This value is used to scale the black level between different dynamic range systems.

For example the reference black level of 0.1 candelas per square meter is 0.0623 (6.23%) for SMPTE ST2084 (PQ) and 0.0 for standard dynamic range systems (e.g. BT.709).

Note that this value is a function of the current transfer_characteristics, for example, 0.0632 = PQ (0.1).

reference_white

Specifies the normalized component value that represents 100% reflectivity. This value is used to scale the white level between different dynamic range systems.

For example the reference white level of 203 candelas per square meter is 0.58 (58%) for SMPTE ST2084 (PQ), 0.75 (75%) for ARIB STD-B67 (HLG) and 1.0 (100%) for standard dynamic range systems (e.g. BT.709).

Note that this value is a function of the current transfer_characteristics, for example, 0.58 = PQ (203).

dynamic_range

Specifies the compression used to convert from a high dynamic range system to a lower (or standard) dynamic range system. Dynamic range compression preserves some or all of the highlights (and low lights) in the original system.

In any dynamic range system a normalized linear light level value of 0.0 represents the minimum black level and 1.0 represents the maximum white level. When the linear light levels are scaled to a lower dynamic range the normalized values may be less than 0 or greater than 1.0.

By default light levels outside the legal range are clipped when converted to integer signal values (at a specific bit_depth). For a full range signal the legal range is [0.0, 1.0]. For a narrow range signal the legal range is approximately [-0.07, 1.09].

PropertyTypeDescription
rationumberSpecifies the dynamic range compression ratio for the Extended Reinhard tone mapping operator.
blacknumberSpecifies the normalized linear white level. Values above this level are compressed into the legal range.
whitenumberSpecifies the normalized linear black level. Values below this level are compressed into the legal range.

audio

Represents a stream of audio samples. Audio streams support the following media specific properties:

PropertyTypeDescription
channelsintegerNumber of audio channels in the stream.
labelarrayIdentifies the speaker label for each audio channel. A value of zero indicates the channel label is unspecified.
programarrayIdentifies the program number for each audio channel. A value of zero indicates the program number is unspecified.
codingarrayIdentifies the coding of each audio channel. A value fo zero indicates the channel contains PCM audio.
contentarrayIdentifies the audio content of each channel. A value of zero indicates the audio content is unknown.

label

ValueSpeakerLabelSpeakerLabelSpeakerLabel
1LeftLFront LeftFL
2RightRFront RightFR
3CenterCFront CenterFC
4Low Frequency EnhancementLFE
5Right SurroundRsBack RightBR
6Left SurroundLsBack LeftBL
7Left CenterLcFront Left CenterFLC
8Right CenterRcFront Right CenterFRC
9Center SurroundCsBack CenterBC
10Left Surround DirectLsdSide LeftSL
11Right Surround DirectRsdSide RightSR
12Top Center SurroundTsTop CenterTC
13Left Top FrontLtfTop Front LeftTFL
14Center Top FrontCtfTop Front CenterTFC
15Right Top FrontRtfTop Front RightTFR
16Left Top RearLtrTop Rear LeftTrlTop Back LeftTBL
17Center Top RearCtrTop Rear CenterTrcTop Back CenterTBC
18Right Top RearRtrTop Rear RightTrrTop Back RightTBR
19Left Top SideLtsTop Side LeftTSL
20Right Top SideRtsTop Side RightTSR
33Left TotalLt
34Right TotalRt
35MonoM
65Left WideLw
66Right WideRw
68Low Frequency EnhancementLFE2
70Left Rear SurroundLrs
71Right Rear SurroundRrs

program

The program number differentiates channels having the same speaker label. For example the following table illustrates some common multi-program channel assignments from SMPTE ST2035:

Channel9f11c
1L1L1
2R1R1
3M2C1
4M3LFE1
5Ls1
6Rs1
7L2
8R2

coding

For audio streams with a PCM format the coding property indicates whether a channel contains non-audio data. The following channel coding formats are supported.

FormatDescription
ac-3Dolby Digital
ec-3Dolby Digital Plus, Dolby Atmos
dlbeDolby E

content

The content property is a four character code that identifies the audio content for each channel. An audio program consists of multiple audio elements (dialog, music, effects, etc). Each element may span one or more channels. The audio elements listed below are defined in EBU R123 and SMPTE ST2035:

EnumValueDescription
commentary1A speech element that is combined with an internal sound element t form a complete mix.
complete_mix2A mix consisting of all the elements required to form a standalone audio program.
dialog3The primary speech element of a program.
effects4Sound effects.
hearing_impaired5A mix of the program prepared for the hearing impaired.
international_sound6A mix consisting of all elements required to form a program except for the commentary element.

International sound is usually defined as including all "on screen" sound elements and excluding any "off screen" commentary.
music7Music sound track.
clean_effects8A mix consisting of all elements required to form a program except for the dialog element(s).
secondary_audio9An alternate mix of the program typically containing a second language.
visually_impaired10A complete mix of the program including a narrative description of the video, or verbal description of the visual scene.

subtitle

Represents a stream of subtitle samples.