Media Streams

All media streams support the following properties:

Property

Type

Description

name

string

Uniquely identifies the stream.

format

string

Four character code that identifies the media sample format.

language

string

ISO-639 alpha2 en or alpha3 eng code that identifies the language.

region

string

ISO-3166 alpha2 US or alpha3 USA code that identifies the country or region.

sample_rate

object

Media sample rate specified as a rational number (e.g. 30000/1001, 48000/1024).

duration

integer

Duration of the stream in ticks.

offset

integer

Offset from the start of the stream in ticks.

bit_rate

integer

Average bit rate of the media stream.

optional

boolean

Indicates whether a composition input stream is optional.

If the input stream is not present then all output stream references are removed from the composition.

properties

object

Stream specific properties.

extension

array

Lists the stream property extensions.

sample

array

Lists the media samples in the stream.

sample_rate

Property

Type

Description

numerator

integer

Number of ticks per second.

For video streams this is the frame rate (e.g. 25) or a multiple of the frame rate (e.g 30000).

For audio streams this is the audio sample rate in Hz (e.g. 48000).

denominator

integer

Nominal duration of a media sample in ticks (default is 1).

video

Represents a stream of video samples. Video streams support the following additional properties:

Property

Type

Description

width

integer

Stored image width in pixels.

height

integer

Stored image height in pixels.

clean_aperture

object

A rectangle that identifies clean image dimensions.

pixel_aspect_ratio

object

Pixel Aspect Ratio (PAR) of the displayed image as a rational number.

orientation

object

Indicates the orientation of the stored image.

field_order

enum

Indicates whether the frame is progressive or interlaced (upper or lower field first).

bit_depth

integer

Number of bits per color component.

chroma_subsampling

object

Subsampling of the chroma components.

chroma_location

enum

Location of the chroma samples relative to the luminance samples.

color_primaries

enum

Identifies the location (in XYZ space) of the red (R), green (G) and blue (B) color primaries and reference white point (W).

color_primaries

integer

ISO 23008-2 enumeration.

transfer_characteristics

enum

Identifies the the Opto-Electronic Transfer Function (OETF) used to convert between scene linear light levels and nonlinear component values.

transfer_characteristics

integer

ISO 23008-2 enumeration.

matrix_coefficients

enum

Identifies a set of matrix coefficients used to convert between color primary (RGB) and color difference (YUV) values.

matrix_coefficients

integer

ISO 23008-2 enumeration.

video_range

enum

Identifies the range of signal values that represent the real component values in the normalized range of 0.0 (black) to 1.0 (peak white).

reference_black

number

Specifies the normalized signal value that represents 0% reflectivity.

reference_white

number

Specifies the normalized signal value that represents 100% reflectivity.

dynamic_range

number

Specifies the dynamic range compression ratio.

dynamic_range

object

Specifies the dynamic range compression black and white levels.

clean_aperture

A rectangle that identifies clean image dimensions.

Property

Type

Description

top

integer

Inset from the top of the image in pixels.

bottom

integer

Inset from the bottom of the image in pixels.

left

integer

Inset from the left edge of the image in pixels.

right

integer

Inset from the right edge of the image in pixels.

pixel_aspect_ratio

Pixel Aspect Ratio (PAR) of the displayed image as a rational number.

Property

Type

Description

numerator

integer

The pixel width.

denominator

integer

The pixel height.

orientation

Indicates the orientation of the stored image.

Property

Type

Description

rotation

integer

Image rotation in degrees (0, 90, 180 or 270 CCW).

mirrored

boolean

Indicates whether the image is mirrored.

field_order

Indicates whether the frame is progressive or interlaced (upper or lower field first).

Enum

Value

Description

unknown

0

progressive

1

Progressive frame

upper

2

Interlaced upper field first

top

2

Interlaced upper field first

lower

3

Interlaced lower field first

bottom

4

Interlaced lower field first

chroma_subsampling

Subsampling of the chroma components horizontally and vertically. The human visual system is less sensitive to variations in color (chrominance) than in brightness (luminance). Chroma subsampling takes advantage of this difference to reduce the data rate of a video stream.

Property

Type

Description

horizontal

integer

Subsampling on the horizontal axis (1, 2 or 4).

vertical

integer

Subsampling on the vertical axis (1, 2 or 4).

Chroma subsampling is commonly expressed as a three-part ratio J:a:b that describes the number of luminance and chrominance samples in a conceptual region that is J pixels wide and 2 pixels high:

  • J: horizontal sampling reference (usually 4).
  • a: number of chroma samples (Cr, Cb) in the first row of J pixels.
  • b: number of additional chroma samples in the second row (either 0 or a).

The following table describes the common chroma subsampling schemes:

Subsampling

Horizontal

Vertical

4:4:4

1

1

4:4:0

1

2

4:2:2

2

1

4:2:0

2

2

4:1:1

4

1

chroma_location

Location of the chroma samples relative to the luminance samples.

Enum

Value

Description

unknown

0

cosited

1

Chroma samples are co-sited with the luminance samples on each line (MPEG-2 4:2:2).

interstitial

2

Chroma samples are sited horizontally midway between luminance samples and midway between adjacent lines (MPEG-1 4:2:0).

quincunx

2

Same as interstitial.

vertical_midpoint

3

Chroma samples are sited vertically midway between the luminance samples in each column (MPEG-2 4:2:0).

horizontally_cosited

3

Same as vertical_midpoint.

horizontal_midpoint

4

Chroma samples are sited horizontally midway between luminance samples on each line

vertically_cosited

4

Same as horizontal_midpoint

line_alternating

5

Chroma samples are co-sited horizontally. Vertically the CR and CB samples are co-sited on alternating pairs of lines (DV 4:2:0)

cosited_out_of_phase

5

Same as line_alternating.

color_primaries

Identifies the location (in XYZ space) of the red (R), green (G) and blue (B) color primaries and reference white point (W).

Enum

Value

Description

unknown

0

bt709

1

ITU-R BT.709

unspecified

2

bt470

4

ITU-R BT.470-6 System M

pal

5

ITU-R BT.601 625

ntsc

6

ITU-R BT.601 525

bt2020

9

ITU-R BT.2020

xyz

10

SMPTE ST 428-1 (CIE 1931 XYZ)

p3dci

11

SMPTE RP 431-2 (2011, P3-DCI)

p3d65

12

SMPTE EG 432-1 (2010, P3-D65)

p3d60

131

P3-D60 (ACES Cinema)

transfer_characteristics

Identifies the the Opto-Electronic Transfer Function (OETF) used to convert between scene linear light levels and nonlinear component values.

Enum

Value

Description

unknown

0

bt709

1

ITU-R BT.709

unspecified

2

bt601

6

ITU-R BT.601-6

linear

8

Linear

bt2020

14

ITU-R BT.2020

pq

16

SMPTE ST 2084

st428

17

SMPTE ST 428-1

hlg

18

ARIB STD-B67

slog3

130

Sony S-LOG3

matrix_coefficients

Identifies a set of matrix coefficients used to convert between color primary (RGB) and color difference (YUV) values.

Enum

Value

Description

identity

0

IEC 61966-2-1 (RGB), SMPTE ST 428-1 (XYZ)

bt709

1

ITU-R BT.709

unspecified

2

pal

5

ITU-R BT.601-6 625

ntsc

6

ITU-R BT.601-6 525

bt2020

9

ITU-R BT.2020 non-constant luminance

bt2020_2

10

ITU-R BT.2020 constant luminance

video_range

Identifies the range of signal values that represent the real component values in the normalized range of 0.0 (black) to 1.0 (peak white).

Enum

Value

Description

unknown

0

narrow

1

64 - 940 (10 bit)

full

2

0 - 1023 (10 bit)

sony

3

512 - 65535 (16 bit)

reference_black

Specifies the normalized component value that represents 0% reflectivity. This value is used to scale the black level between different dynamic range systems.

For example the reference black level of 0.1 candelas per square meter is 0.0623 (6.23%) for SMPTE ST2084 (PQ) and 0.0 for standard dynamic range systems (e.g. BT.709).

Note that this value is a function of the current transfer_characteristics, for example, 0.0632 = PQ (0.1).

reference_white

Specifies the normalized component value that represents 100% reflectivity. This value is used to scale the white level between different dynamic range systems.

For example the reference white level of 203 candelas per square meter is 0.58 (58%) for SMPTE ST2084 (PQ), 0.75 (75%) for ARIB STD-B67 (HLG) and 1.0 (100%) for standard dynamic range systems (e.g. BT.709).

Note that this value is a function of the current transfer_characteristics, for example, 0.58 = PQ (203).

dynamic_range

Specifies the compression used to convert from a high dynamic range system to a lower (or standard) dynamic range system. Dynamic range compression preserves some or all of the highlights (and low lights) in the original system.

In any dynamic range system a normalized linear light level value of 0.0 represents the minimum black level and 1.0 represents the maximum white level. When the linear light levels are scaled to a lower dynamic range the normalized values may be less than 0 or greater than 1.0.

By default light levels outside the legal range are clipped when converted to integer signal values (at a specific bit_depth). For a full range signal the legal range is [0.0, 1.0]. For a narrow range signal the legal range is approximately [-0.07, 1.09].

Property

Type

Description

ratio

number

Specifies the dynamic range compression ratio for the Extended Reinhard tone mapping operator.

black

number

Specifies the normalized linear white level. Values above this level are compressed into the legal range.

white

number

Specifies the normalized linear black level. Values below this level are compressed into the legal range.

audio

Represents a stream of audio samples. Audio streams support the following media specific properties:

Property

Type

Description

channels

integer

Number of audio channels in the stream.

label

array

Identifies the speaker label for each audio channel. A value of zero indicates the channel label is unspecified.

program

array

Identifies the program number for each audio channel. A value of zero indicates the program number is unspecified.

coding

array

Identifies the coding of each audio channel. A value fo zero indicates the channel contains PCM audio.

content

array

Identifies the audio content of each channel. A value of zero indicates the audio content is unknown.

label

ValueSpeakerLabelSpeakerLabelSpeakerLabel
1LeftLFront LeftFL
2RightRFront RightFR
3CenterCFront CenterFC
4Low Frequency EnhancementLFE
5Right SurroundRsBack RightBR
6Left SurroundLsBack LeftBL
7Left CenterLcFront Left CenterFLC
8Right CenterRcFront Right CenterFRC
9Center SurroundCsBack CenterBC
10Left Surround DirectLsdSide LeftSL
11Right Surround DirectRsdSide RightSR
12Top Center SurroundTsTop CenterTC
13Left Top FrontLtfTop Front LeftTFL
14Center Top FrontCtfTop Front CenterTFC
15Right Top FrontRtfTop Front RightTFR
16Left Top RearLtrTop Rear LeftTrlTop Back LeftTBL
17Center Top RearCtrTop Rear CenterTrcTop Back CenterTBC
18Right Top RearRtrTop Rear RightTrrTop Back RightTBR
19Left Top SideLtsTop Side LeftTSL
20Right Top SideRtsTop Side RightTSR
33Left TotalLt
34Right TotalRt
35MonoM
65Left WideLw
66Right WideRw
68Low Frequency EnhancementLFE2
70Left Rear SurroundLrs
71Right Rear SurroundRrs

program

The program number differentiates channels having the same speaker label. For example the following table illustrates some common multi-program channel assignments from SMPTE ST2035:

Channel

9f

11c

1

L1

L1

2

R1

R1

3

M2

C1

4

M3

LFE1

5

Ls1

6

Rs1

7

L2

8

R2

coding

For audio streams with a PCM format the coding property indicates whether a channel contains non-audio data. The following channel coding formats are supported.

Format

Description

ac-3

Dolby Digital

ec-3

Dolby Digital Plus, Dolby Atmos

dlbe

Dolby E

content

The content property is a four character code that identifies the audio content for each channel. An audio program consists of multiple audio elements (dialog, music, effects, etc). Each element may span one or more channels. The audio elements listed below are defined in EBU R123 and SMPTE ST2035:

Enum

Value

Description

commentary

1

A speech element that is combined with an internal sound element t form a complete mix.

complete_mix

2

A mix consisting of all the elements required to form a standalone audio program.

dialog

3

The primary speech element of a program.

effects

4

Sound effects.

hearing_impaired

5

A mix of the program prepared for the hearing impaired.

international_sound

6

A mix consisting of all elements required to form a program except for the commentary element.

International sound is usually defined as including all "on screen" sound elements and excluding any "off screen" commentary.

music

7

Music sound track.

clean_effects

8

A mix consisting of all elements required to form a program except for the dialog element(s).

secondary_audio

9

An alternate mix of the program typically containing a second language.

visually_impaired

10

A complete mix of the program including a narrative description of the video, or verbal description of the visual scene.

subtitle

Represents a stream of subtitle samples.


Did this page help you?