Amendment 3 to MPEG-2 Systems

Amendment 3 to MPEG-2 Systems

ISO/IEC 13818-1:200X0/PDAM 3Amd.3:(2003)

INTERNATIONAL ORGANIZATION FOR STANDARDIZATION

ORGANISATION INTERNATIONALE DE NORMALISATION

ISO/IEC JTC1/SC29/WG11

CODING OF MOVING PICTURES AND AUDIO

ISO/IEC JTC1/SC29/WG11

MPEG2007/N9274

Title: / Text of ISO/IEC 13818-1:2006/PDAM 3 Transport of Scalable Video over ITU-T Rec H.222.0 | ISO/IEC 13818-1
Editors: / Thomas Wiegand (Fraunhofer HHI), Thomas Schierl (Fraunhofer HHI), Bertrand Berthelot (Orange)
Source: / Systems

INTERNATIONAL ORGANISATION FOR STANDARDISATION

ORGANISATION INTERNATIONALE DE NORMALISATION

ISO/IEC JTC1/SC29/WG11

CODING OF MOVING PICTURES AND AUDIO

ISO/IEC JTC1/SC29/WG11N5771

Trondheim, July 2003

Systems

Title: Text of ISO/IEC 13818-1/2000/FDAM-3

INFORMATION TECHNOLOGY -

GENERIC CODING OF MOVING PICTURES AND AUDIO: SYSTEMS

Amendment 3: Transport of Scalable Video over ITU-T Rec H.222.0 | ISO/IEC 13818-1Transport of AVC video data over ITU-T Rec H.222.0 |ISO/IEC 13818-1 streams

ISO/IEC 13818-1/:200X0/PDAM Final Draft Amendment 3

International Standard

1

ITUT Rec. H.222.0 (200X0)/PDAM 3Amd.3(2003)

ISO/IEC 13818-1:200X0/PDAM 3Amd.3:(2003)

1

ITUT Rec. H.222.0 (200X0)/PDAM 3Amd.3(2003)

ISO/IEC 13818-1:200X0/PDAM 3Amd.3:(2003)

INTERNATIONAL STANDARD

ITU-T RECOMMENDATION

INFORMATION TECHNOLOGY -- GENERIC CODING OF MOVING PICTURES AND ASSOCIATED AUDIO INFORMATION: SYSTEMS

AMENDMENT 3: Transport of Scalable Video over ITU-T Rec H.222.0 | ISO/IEC 13818-1

Introduction

This document specifies the transport of bit-streams conforming to profiles defined in Annex G of ISO/IEC 14496-10 Amd.3 over MPEG-2 Transport Streams as defined in ISO/IEC 13818-1:2006. It proposes extensions to the MPEG-2 Systems standard 13818-1:2006.

A number of use cases can be supported, if different video sub-bitstreams with different values of dependency_id of the scalable bit-stream are transported as different elementary streams (ES). This allows for de-multiplexing on Transport Stream (TS) level, which is the pre-requisite for selective access to a certain set of layers, selective content protection, or unequal error protection mechanisms. MPEG-2 TS already specifies the transport of Network Abstraction Layer (NAL) units conforming to profiles defined in Annex A of ISO/IEC 14496-10. This document makes extensions for supporting NAL units and bit-streams according to Amd.3 of ISO/IEC 14496-10 in a very simple way.

Definitions

Add a definition of the term video sub-bitstream to section 2.1:

A “video sub-bitstream" is defined to be all VCL NAL units with one particular value of dependency_id (DID) and associated non-VCL NAL units as defined in Annex G of ISO/IEC 14496-10. NAL units conforming to profiles defined in Annex A of ISO/IEC 14496-10 are having nal_unit_type equal to 1 or 5 when combined with video sub-bitstreams that conform to Annex G of ISO/IEC 14496-10.

SVC Elementary Streams

Merge text and specifications into Subclause 2.14.2 ISO/IEC 13818-1:2006:

An SVC Elementary Stream is an ES containing a video sub-bitstream conforming to one or more profiles defined in Annex G of ISO/IEC 14496-10. All NAL units of an SVC Elementary Stream have the same dependency_id (DID) as defined in Annex G of ISO/IEC 14496-10.

For correct re-assembly of the bit-stream, Decoding Time Stamps (DTS) in the PES (Packetized ES) headers in front of each Access Unit shall be used.

A hierarchy descriptor as described in section 7 shall be used when one or more video sub-bitstreams conforming to one or more profiles defined in Annex G of ISO/IEC 14496-10 are contained in the assigned ESs.

SVC Stream Type

The stream type assigned to an elementary stream (ES), as defined in ISO/IEC 13818-1:2006, which contains the video sub-bitstream conforming to one or more profiles defined in Annex A of ISO/IEC 14496-10, shall be 0x1B. A stream type 0x1F shall be added to Table 2-29 in sub-clause 2.4.4.10 of ISO/IEC 13818-1:2006 for SVC elementary streams that contain a video sub-bitstream conforming to one or more profiles defined in Annex G of ISO/IEC 14496-10.

Extend Sub-clause 2.4.4.10:

Segment of Table 2-29 — Stream type assignments, in sub-clause 2.4.4.10, after insertion of the proposed stream type:

Value / Description
… / …
0x1B / AVC video stream as defined in ITU-T Rec. H.264 | ISO/IEC 14496-10 Video
… / …
… / …
0x1F / SVC video sub-bitstream as defined in the Amendment 3 to ITU-T Rec. H.264 | ISO/IEC 14496-10 Video, AnnexG
0x20-0x7E / ITU-T Rec. H.222.0 | ISO/IEC 13818-1 Reserved
… / …

Add the following note to that table entry:

ES which contain any NAL units conforming to one or more profiles defined in Annex G of ISO/IEC 14496-10 shall be assigned the stream type 0x1F.

Use of Descriptors

Add the following note to Subclause 2.6:

When ES descriptors are used with stream_type 0x1F, the semantic of descriptors apply to all the data contained in the particular ES.

SVC Video Descriptor

Replace Table 2-39 in Subclause 2.6.2:

Table 2-39 - Program and program element descriptors

descriptor_tag / TS / PS / Identification
0 / n/a / n/a / Reserved
1 / n/a / n/a / Reserved
2 / X / X / video_stream_descriptor
3 / X / X / audio_stream_descriptor
4 / X / X / hierarchy_descriptor
5 / X / X / Registration_descriptor
6 / X / X / data_stream_alignment_descriptor
7 / X / X / target_background_grid_descriptor
8 / X / X / Video_window_descriptor
9 / X / X / CA_descriptor
10 / X / X / ISO_639_language_descriptor
11 / X / X / System_clock_descriptor
12 / X / X / Multiplex_buffer_utilization_descriptor
13 / X / X / Copyright_descriptor
14 / X / Maximum_bitrate_descriptor
15 / X / X / Private_data_indicator_descriptor
16 / X / X / Smoothing_buffer_descriptor
17 / X / STD_descriptor
18 / X / X / IBP_descriptor
19-26 / X / Defined in ISO/IEC 13818-6
27 / X / X / MPEG-4_video_descriptor
28 / X / X / MPEG-4_audio_descriptor
29 / X / X / IOD_descriptor
30 / X / SL_descriptor
31 / X / X / FMC_descriptor
32 / X / X / External_ES_ID_descriptor
33 / X / X / MuxCode_descriptor
34 / X / X / FmxBufferSize_descriptor
35 / X / MultiplexBuffer_descriptor
36 / X / X / Content_labeling_descriptor
37 / X / X / Metadata_pointer_descriptor
38 / X / X / Metadata_descriptor
39 / X / X / Metadata_STD_descriptor
40 / X / X / AVC video descriptor
41 / X / X / IPMP_descriptor (defined in ISO/IEC 13818-11, MPEG-2 IPMP)
42 / X / X / AVC timing and HRD descriptor
43 / X / X / SVC video descriptor
44-63 / n/a / n/a / ITU-T Rec. H.222.0 | ISO/IEC 13818-1 Reserved
64-255 / n/a / n/a / User Private

Add after Subclause 2.6.56:

2.6.57 SVC video descriptor

Table 1: SVC_video_descriptor syntax:

Syntax / No. of bits / Mnemonic
SVC_video_descriptor() {
descriptor_tag
descriptor_length
profile_idc
constraint_set0_flag
constraint_set1_flag
constraint_set2_flag
constraint_set3_flag
AVC_compatible_flags
level_idc
width
height
frame_rate
average_bitrate
maximum_bitrate
} / 8
8
8
1
1
1
1
4
8
16
16
16
16
16 / uimsbf
uimsbf
uimsbf
bslbf
bslbf
bslbf
bslbf
bslbf
uimsbf
uimsbf
uimsbf
uimsbf
uimsbf
uimsbf

Proposed semantic definition of fields in the SVC_video_descriptor:

profile_idc,

constraint_set0_flag,

constraint_set1_flag,

constraint_set2_flag,

constraint_set3_flag,

AVC_compatible_flags,

level_idc: These values are copied from the Sequence Parameter Set of the ES.

width, height: max. image resolution in pixels of the ES.

frame rate: max. frame rate, in frames / 256 seconds of the Operation Point of the ES.

average_bitrate: average bitrate of the whole ES, in kbits per second of the Operation Point of the ES.

maximum_bitrate:maximum bit-rate of the ES, in kbits within window of 1 second of the Operation Point of the ES.

Hierarchy Descriptor Syntax and Semantics

Replace in Subclause 2.6.7 ’ Semantic definition of fields in hierarchy descriptor’:

hierarchy_type – The hierarchical relation between the associated hierarchy layer and its hierarchy embedded layer is defined in Table 2-44.

hierarchy_layer_index – The hierarchy_layer_index is a 6-bit field that defines a unique index of the associated program element in a table of coding layer hierarchies. Indices shall be unique within a single program definition.

hierarchy_embedded_layer_index – The hierarchy_embedded_layer_index is a 6-bit field that defines the hierarchy table index of the program element that needs to be accessed before decoding of the elementary stream associated with this hierarchy_descriptor. This field is undefined if the hierarchy_type value is 15 (base layer).

hierarchy_channel – The hierarchy_channel is a 6-bit field that indicates the intended channel number for the associated program element in an ordered set of transmission channels. The most robust transmission channel is defined by the lowest value of this field with respect to the overall transmission hierarchy definition.

NOTE – A given hierarchy_channel may at the same time be assigned to several program elements.

Table 2-43 – Hierarchy descriptor

Syntax / No. of bits / Mnemonic
hierarchy_descriptor() {
descriptor_tag / 8 / uimsbf
descriptor_length / 8 / uimsbf
reserved / 4 / bslbf
hierarchy_type / 4 / uimsbf
reserved / 2 / bslbf
hierarchy_layer_index / 6 / uimsbf
reserved / 2 / bslbf
hierarchy_embedded_layer_index / 6 / uimsbf
reserved / 2 / bslbf
hierarchy_channel / 6 / uimsbf
}

Table 2-44 – Hierarchy_type field values

Value / Description
0 / Reserved
1 / Spatial Scalability
2 / SNR Scalability
3 / Temporal Scalability
4 / Data partitioning
5 / Extension bit-stream
6 / Private Stream
7 / Multi-view Profile
8-14 / Reserved
15 / Base layer

by

temporal_scalability_flag — This bit is set to 1 if the associated program element enhances the frame rate of the bit-stream resulting from the program element referenced by the hierarchy_embedded_layer_index.

spatial_scalability_flag — This bit is set to 1 if the associated program element enhances the spatial resolution of the bit-stream resulting from the program element referenced by the hierarchy_embedded_layer_index.

quality_scalability_flag — This bit is set to 1 if the associated program element enhances the SNR quality of the bit-stream resulting from the program element referenced by the hierarchy_embedded_layer_index.

hierarchy_type — The hierarchical relation between the associated hierarchy layer and its hierarchy embedded layer is defined in Table 2-44. If scalability applies in more than one dimension, this field is set to "Combined Scalability", and the flags temporal_scalability_flag, spatial_scalability_flag and quality_scalability_flag shall be set accordingly.

hierarchy_layer_index — The hierarchy_layer_index is a 6-bit field that defines a unique index of the associated program element in a table of coding layer hierarchies. Indices shall be unique within a single program definition. For stream type 0x1F this is the ES index, which is assigned in a way that the bit-stream order will be correct if associated NAL units of video sub-bitstreams of the same AU are appended in increasing order of x.

hierarchy_embedded_layer_index — The hierarchy_embedded_layer_index is a 6-bit field that defines the hierarchy table index of the program element that needs to be accessed before decoding of the elementary stream associated with this hierarchy_descriptor. This field is undefined if the hierarchy_type value is 15 (base layer).

hierarchy_channel — The hierarchy_channel is a 6-bit field that indicates the intended channel number for the associated program element in an ordered set of transmission channels. The most robust transmission channel is defined by the lowest value of this field with respect to the overall transmission hierarchy definition.

NOTE – A given hierarchy_channel may at the same time be assigned to several program elements.

Table 2-43 – Hierarchy descriptor

Syntax / No. of bits / Mnemonic
hierarchy_descriptor() {
descriptor_tag / 8 / uimsbf
descriptor_length / 8 / uimsbf
reserved / 1 / bslbf
temporal_scalability_flag / 1 / bslbf
spatial_scalability_flag / 1 / bslbf
quality_scalability_flag / 1 / bslbf
hierarchy_type / 4 / uimsbf
reserved / 2 / bslbf
hierarchy_layer_index / 6 / uimsbf
reserved / 2 / bslbf
hierarchy_embedded_layer_index / 6 / uimsbf
reserved / 2 / bslbf
hierarchy_channel / 6 / uimsbf
}

Table 2-44 – Hierarchy_type field values

Value / Description
0 / Reserved
1 / Spatial Scalability
2 / SNR Scalability
3 / Temporal Scalability
4 / Data partitioning
5 / Extension bit-stream
6 / Private Stream
7 / Multi-view Profile
8 / Combined Scalability
9-14 / Reserved
15 / Base layer

STD Extensions for SVC

Add after Subclause 2.14.3.2:

2.14.3.3 STD Extensions for SVC

The T-STD model described in Subclause 2.14.3.1 of ISO/IEC 13818-1:2006 is applied for ES of stream_type 0x1F as shown in Figure 1.

For each ES with stream_type 0x1F a chain of buffers is used which consists of TB, MB and EB. Subscripts i to i+n are associated to the different SVC ES. For each ES, the specifications of 2.14.3.1 apply. If the AVC_timing_and_HRD_descriptor is used, it shall signal appropriate HRD parameters for SVC ES it applies to.

The ji-th NAL unit of a SVC ES is removed from the EBi buffer and sent to the SVC decoder at the time tdi(ji) indicated by the DTS in the respective PES header. If there is more than one NAL unit associated to the same DTS, re-ordering shall be applied before the NAL units enter the SVC decoder. The NAL unit order is determined by the by the hierarchy_layer_index field in the hierarchy_descriptor specified in sub-clause 2.6.6. All NAL units associated with the same DTS enter the decoder in transport order. When all NAL units of the a ES and DTS have been removed from the EB, NAL units that belong to the same instance of DTS are taken from the next EB in increasing order of the hierarchy_layer_index. Decoded pictures are written into a common DPB by the SVC decoder.

Figure 1: T-STD Buffer Model for SVC

The P-STD model is applied for ES of stream_type 0x1F as shown in Figure 2. There is only one multiplex buffer Bi for each ESi. For each ESi, the specifications of 2.14.3.2 apply.

Figure 2: P-STD Buffer Model for SVC

References

Add the following references to the SVC standard ISO/IEC 13818-1:200X to section 1.2.

[1]Amendment 3 to ITU-T Rec. H.264 | ISO/IEC 14496-10 Video.

ITU-T Recommendation H.264 (2003), Advanced Video Coding for generic audiovisual services.

ISO/IEC 14496-10 (2003), Information technology, Advanced Video Coding. “

Add to the definition for access unit in section 2.1.1 in subclause 2:

“For the definition of an access unit for ITU-T Recommendation H.264 | ISO/IEC 14496-10 video see the AVC access unit definition in 2.1.3.

“2.1.2AVC 24-hour picture (system) : An AVC access unit with a presentation time that is more than 24 hours in the future. For the purpose of this definition, AVC access unit n has a presentation time that is more than 24 hours in the future if the difference between the initial arrival time tai(n) and the DPB output time to,dpb(n) is more than 24 hours.”

“2.1.3AVC access unit (system) : An access unit as defined for byte streams in ITU-T Recommendation H.264 | ISO/IEC 14496-10 with the constraints specified in 2.14.1.”

“2.1.4AVC Slice (system) : A byte_stream_nal_unit as defined in ITU-T Recommendation H.264 | ISO/IEC 14496-10 with nal_unit_type values of 1 or 5, or a byte_stream_nal_unit data structure with nal_unit_type value of 2 and any associated byte_stream_nal_unit data structures with nal_unit_type equal to 3 and/or 4.

“2.1.5AVC still picture (system) : An AVC still picture consists of an AVC access unit containing an IDR picture, preceded by SPS and PPS NAL units that carry sufficient information to correctly decode the IDR picture. Preceding an AVC still picture, there shall be another AVC still picture or an End of Sequence NAL unit terminating a preceding coded video sequence.”

“2.1.6AVC video sequence (system) : coded video sequence as defined in Clause 3.27 in ITU-T Recommendation H.264 | ISO/IEC 14496-10.”

“2.1.7AVC video stream (system) : an ITU-T Recommendation H.264 | ISO/IEC 14496-10 stream. An AVC video stream consists of one or more AVC video sequences.”

“2.1.52still picture[JQG1]: A coded still picture consists of a video sequence containing exactly one coded picture which is intra-coded. This picture has an associated PTS and the presentation time of succeeding pictures, if any, is later than that of the still picture by at least two picture periods.”

by

“2.1.52still picture[JQG2]: A still picture consists of a video sequence, coded as defined in ITU-T Rec. H.262 | ISO/IEC 13818-2, ISO/IEC 11172-2 or ISO/IEC 14496-2, that contains exactly one coded picture which is intra-coded. This picture has an associated PTS and in case of coding according to ISO/IEC 11172-2, ITU-T Rec. H.262 | ISO/IEC 13818-2 or ISO/IEC 14496-2, the presentation time of succeeding pictures, if any, is later than that of the still picture by at least two picture periods.”

“For the purpose of this clause, an elementary stream access point is defined as follows:

Video – The first byte of a video sequence header.

Audio – The first byte of an audio frame.

After a continuity counter discontinuity in a Transport packet which is designated as containing elementary stream data, the first byte of elementary stream data in a Transport Stream packet of the same PID shall be the first byte of an elementary stream access point or in the case of video, the first byte of an elementary stream access point or a sequence_end_code followed by an access point.”

by

“For the purpose of this clause, an elementary stream access point is defined as follows:

ISO/IEC 11172-2 video and ITU-T Rec. H.262 | ISO/IEC 13818-2 video – The first byte of a video sequence header.

ISO/IEC 14496-2 visual – The first byte of the visual object sequence header.

ITU-T Rec. H.264 | ISO/IEC 14496-10 video – The first byte of an AVC access unit. The SPS and PPS parameter sets referenced in this and all subsequent AVC access units in the coded video stream shall be provided after this access point in the byte stream and prior to their activation.

Audio – The first byte of an audio frame.

After a continuity counter discontinuity in a Transport packet which is designated as containing elementary stream data, the first byte of elementary stream data in a Transport Stream packet of the same PID shall be the first byte of an elementary stream access point. In the case of ISO/IEC 11172-2, or ITU-T Rec. H.262 | ISO/IEC 13818-2 or ISO/IEC 14496-2 video, the first byte of an elementary stream access point may also be the first byte of a sequence_end_code followed by an elementary stream access point.”

“Specifically, when the bit is set to '1', the next PES packet to start in the payload of Transport Stream packets with the current PID shall contain the first byte of a video sequence header if the PES stream type (refer to Table 2-29) is 1 or 2, or shall contain the first byte of an audio frame if the PES stream type is 3 or 4. In addition, in the case of video, a presentation timestamp shall be present in the PES packet containing the first picture following the sequence header.”

by

“Specifically, when the bit is set to '1', the next PES packet to start in the payload of Transport Stream packets with the current PID shall contain an elementary stream access point as defined in the semantics for the discontinuity_indicator field. In addition, in the case of video, a presentation timestamp shall be present for the first picture following the elementary stream access point.”

“In the case of video, this field may be set to '1' only if the payload contains one or more bytes from an intra-coded slice.”

by

“In the case of ISO/IEC 11172-2 or ITU-T Rec. H.262 | ISO/IEC 13818-2 or ISO/IEC 14496-2 video, this field may be set to '1' only if the payload contains one or more bytes from an intra-coded slice.

In the case of ITU-T Rec. H.264 | ISO/IEC 14496-10 video, this field may be set to ‘1’ only if the payload contains one or more bytes from a slice with slice_type set to 2, 4, 7, or 9.”

“For the purpose of this clause, an elementary stream access point is defined as follows:

Video – The first byte of a video sequence header.

Audio – The first byte of an audio frame.”

by

“For the definition of an elementary stream access point, see the semantics of discontinuity_indicator in section 2.4.3.5.”

“When this flag is set, if the elementary stream carried in this PID is an audio stream, the splice_type field shall be set to '0000'. If the elementary stream carried in this PID is a video stream, it shall fulfill the constraints indicated by the splice_type value.”

by

“When this flag is set, and if the elementary stream carried in this PID is not an ITU-T Rec. H.262 | ISO/IEC 13818-2 video stream, then the splice_type field shall be set to '0000'. If the elementary stream carried in this PID is an ITU-T Rec. H.262 | ISO/IEC 13818-2 video stream, it shall fulfill the constraints indicated by the splice_type value.”

“If the elementary stream carried in that PID is an audio stream, this field shall have the value '0000'. If the elementary stream carried in that PID is a video stream, this field indicates the conditions that shall be respected by this elementary stream for splicing purposes.”

by

“If the elementary stream carried in that PID is not an ITU-T Rec. H.262 | ISO/IEC 13818-2 video stream, then this field shall have the value '0000'. If the elementary stream carried in that PID is an ITU-T Rec. H.262 | ISO/IEC 13818-2 video stream, then this field indicates the conditions that shall be respected by this elementary stream for splicing purposes.”

Table 2-18 -- Stream_id assignments

stream_id / Note / stream coding
1011 1100 / 1 / program_stream_map
1011 1101 / 2 / private_stream_1
1011 1110 / padding_stream
1011 1111 / 3 / private_stream_2
110x xxxx / ISO/IEC 13818-3 or ISO/IEC 11172-3 or ISO/IEC 13818-7 or ISO/IEC 14496-3 audio stream number x xxxx
1110 xxxx / ITU-T Rec. H.262 | ISO/IEC 13818-2, ISO/IEC 11172-2, ISO/IEC 14496-2 or ITU-T Rec. H.264 | ISO/IEC 14496-10 video stream number xxxx
1111 0000 / 3 / ECM_stream
1111 0001 / 3 / EMM_stream
1111 0010 / 5 / ITU-T Rec. H.222.0 | ISO/IEC 13818-1 Annex A or ISO/IEC 13818-6_DSMCC_stream
1111 0011 / 2 / ISO/IEC_13522_stream
1111 0100 / 6 / ITU-T Rec. H.222.1 type A
1111 0101 / 6 / ITU-T Rec. H.222.1 type B
1111 0110 / 6 / ITU-T Rec. H.222.1 type C
1111 0111 / 6 / ITU-T Rec. H.222.1 type D
1111 1000 / 6 / ITU-T Rec. H.222.1 type E
1111 1001 / 7 / ancillary_stream
1111 1010 / ISO/IEC14496-1_SL-packetized_stream
1111 1011 / ISO/IEC14496-1_FlexMux_stream
1111 1100
1111 1101
1111 1110 / metadata stream
extended_stream_id
reserved data stream
1111 1111 / 4 / program_stream_directory
The notation x means that the values '0' or '1' are both permitted and results in the same stream type. The stream number is given by the values taken by the x’s.
NOTES
1PES packets of type program_stream_map have unique syntax specified in 2.5.4.1.
2PES packets of type private_stream_1 and ISO/IEC_13552_stream follow the same PES packet syntax as those for ITU-T Rec. H.262|ISO/IEC 13818-2 video and ISO/IEC 13818-3 audio streams.
3PES packets of type private_stream_2, ECM_stream and EMM_stream are similar to private_stream_1 except no syntax is specified after PES_packet_length field.
4PES packets of type program_stream_directory have a unique syntax specified in 2.5.5.
5PES packets of type DSM-CC_stream have a unique syntax specified in ISO/IEC 13818- 6.
6This stream_id is associated with stream_type 0x09 in Table 2-29.
7This stream_id is only used in PES packets, which carry data from a Program Stream or an ISO/IEC 11172-1 System Stream, in a Transport Stream (refer to 2.4.3.7).