INTERNATIONAL ORGANIZATION FOR STANDARDIZATION
ORGANISATION INTERNATIONALE NORMALISATION
ISO/IEC JTC 1/SC 29/WG 11
CODING OF MOVING PICTURES AND AUDIO
ISO/IEC JTC 1/SC 29/WG 11/
MPEG2008/N9709
January 2008, Antalya, Turkey
Source: / SystemsTitle: / Text of ISO/IEC CD 23000-11 for Stereoscopic Video Application Format
Editors: / Kyuheon Kim, Kugjin Yun, Yongtae Kim, Seo-Young Hwang
Status: / Approved
ISO/IECJTC1/SC29N
Date:2008-01-18
ISO/IECCD23000-11
ISO/IECJTC1/SC29/WG11
Secretariat:
Information technology— Multimedia application format (MPEG-A)— Part11: Stereoscopic video application format
Élément introductif— Élément central— Partie11: Titre de la partie
Warning
This document is not an ISO International Standard. It is distributed for review and comment. It is subject to change without notice and may not be referred to as an International Standard.
Recipients of this draft are invited to submit, with their comments, notification of any relevant patent rights of which they are aware and to provide supporting documentation.
ISO/IECCD23000-11
Copyright notice
This ISO document is a working draft or committee draft and is copyright-protected by ISO. While the reproduction of working drafts or committee drafts in any form for use by participants in the ISO standards development process is permitted without prior permission from ISO, neither this document nor any extract from it may be reproduced, stored or transmitted in any form for any other purpose without prior written permission from ISO.
Requests for permission to reproduce this document for the purpose of selling it should be addressed as shown below or to ISO's member body in the country of the requester:
[Indicate the full address, telephone number, fax number, telex number, and electronic mail address, as appropriate, of the Copyright Manger of the ISO member body responsible for the secretariat of the TC or SC within the framework of which the working document has been prepared.]
Reproduction for sales purposes may be subject to royalty payments or a licensing agreement.
Violators may be prosecuted.
Foreword
ISO (the International Organization for Standardization) and IEC (the International Electrotechnical Commission) form the specialized system for worldwide standardization. National bodies that are members of ISO or IEC participate in the development of International Standards through technical committees established by the respective organization to deal with particular fields of technical activity. ISO and IEC technical committees collaborate in fields of mutual interest. Other international organizations, governmental and non-governmental, in liaison with ISO and IEC, also take part in the work. In the field of information technology, ISO and IEC have established a joint technical committee, ISO/IECJTC1.
International Standards are drafted in accordance with the rules given in the ISO/IECDirectives, Part2.
The main task of the joint technical committee is to prepare International Standards. Draft International Standards adopted by the joint technical committee are circulated to national bodies for voting. Publication as an International Standard requires approval by at least 75% of the national bodies casting a vote.
Attention is drawn to the possibility that some of the elements of this document may be the subject of patent rights. ISO and IEC shall not be held responsible for identifying any or all such patent rights.
ISO/IEC2300011 was prepared by Joint Technical Committee ISO/IECJTC1, Information technology, Subcommittee SC29, Coding of audio, picture, multimedia and hypermedia information.
ISO/IEC23000 consists of the following parts, under the general title Information technology— Multimedia application format (MPEG-A):
¾ Part1: Application format framework
¾ Part2: Music player application format
¾ Part 3: Photo player application format
¾ Part 4: Musical slide show player application format
¾ Part5: Media streaming player application format
¾ Part 6: Professional archiving application format
¾ Part 7: Open release application format
¾ Part 8: Portable video player application format
¾ Part 9: MAF for Digital Multimedia Broadcasting
¾ Part10: Video surveillance application format
¾ Part11: Stereoscopic video application format
Introduction
In today’s technological arena, there is an abundance of digital content for digital image machinery such as cell-phones, digital cameras, and mobile devices. Stereoscopic contents provide natural three-dimensional scenes which are displayed using acquisition/generator techniques. The market for applying stereoscopic content on respective devices is fully formed, and is ready for deployment. Consequently, there is a need for standard data for acquisition/creation, storage and playing of stereoscopic contents. This document introduces the technical specification of stereoscopic video application format, which satisfies the proposed requirements (N9413).
©ISO/IEC2008– All rights reserved / vISO/IECCD23000-11
Information technology— Multimedia application format (MPEG-A)— Part11: Stereoscopic video application format
1 Stereoscopic contents composition typeStereoscopic Video Application Format Contents
This specification supports the following composition types for stereoscopic contents. à 아래 그림 수정The Stereoscopic MAF contents are composed of the following structures:
1.1 One ES type
1.1.1 sSide-by-side formattype
1.2 Vertical line interleaved formattype
1.3 Frame sequential typeformat
1.4 Stereoscopic Left/ Right view sequence
2 Stereoscopic Video Application Format File Strucutrues
3
The Stereoscopic MAFvideo application format file structure is compliant to the ISO base media file format [4], and also, is applicable to the types of one or multiple ESes with or without scene description.
Figure1— A file structure for a single stereoscopic track(pure stereoscopic)그림1. One ES의 경우 추가
Figure2— A file structure for multiple stereoscopic tracks(e.g. one for the left, other for the right view sequence) where individual one has one view sequence(pure stereoscopic)
- 상기 그림에서
snmi 수정 à scdi, svmi로 대체
meta box안에 정의한 box들로 표시토록 함.
mdat안에 Left view 1à Left view로 변경
mdat내의 LASeR로 one box 하기
Track à track(Stereoscopic video)
Non stereoscopic 삭제
Figure3— A file structure for a single stereoscopic track including stereoscopic and monoscopic fragments together
수정사항
Mdat의 S/M의 명기 부분은 Terms & Definition에 서 정의것으로 교체.
Moov/ mdat 아래의 LASeR 부분은 LASeR로 simplification
Mdat의 LASER 도 one box
Figure4— A file structure for multiple stereoscopic tracks(e.g. one for the left, other for the right view sequence) where individual one has one view sequence
수정사항
Mdat의 S/M의 명기 부분은 Terms & Definition에 서 정의것으로 교체.
Moov/ mdat 아래의 LASeR 부분은 LASeR로 simplification
Figure 3. A file structure for multiple stereoscopic tracks(e.g. one for the left, other for the right view sequence) where individual one has one view sequence(pure stereoscopic)
Figure 4. A file structure for multiple stereoscopic tracks(e.g. one for the left, other for the right view sequence) where individual one has one view sequence(only one track has monoscopic fragment )
In case of temporally partial 3Dstereoscopic contents such as Figure2 and Fiqure 4, the eachindividual trackES has stereoscopic and monoscopic fragments. and Also, for the storage efficiency, the common part of monoscopic fragments in the individual ES can be stored only in one of the ES as follows:
그림 추가: ES1: S/M/S, ES2; S/S
Also, as being shown in the below figure, temporally partial stereoscopic contents would have different monoscopic fragments (e.g. a stereoscopic Music Video, a stereoscopic Soap Drama and two monoscopic CF(Commercial film) :
In this case, a file structure for multiple stereoscopic ESes can be designed as follow:
Therefore, a temporally partial stereoscopic contents can be stored by a user’s preference of a file structure. some track can be composed as only stereoscopic fragments. The length of each monoscopic fragment can be different.
The stereoscopic track can have one more monoscopic fragments and each monoscopic fragment of each track can be different. Regarding on efficiency of terminal procedure, it is better to decode one of the monoscopic fragments when each track contains same monoscopic fragments which have same TimeStamp values,
The stereoscopic contents designed by the above file structure is to be decoded by the same TimeStamp for the stereoscopic left and right fragments in the two ESes, which will be explained in the 2.62.2.2.
3.1 Table for Boxes
Generally, the terminal procedure for temporally partial 3D contents should be as follow:
Check the stereoscopic contents whether which are composed as multiple stereoscopic ESes.
Distinguish between stereoscopic and monoscopic fragment using stereo_flag field.
Check TimeStamp of the each stereoscopic and monoscopic fragment of Eses.
Check TimeStamp of the each monoscopic fragment of ES whether they have the same values.
If so, check which ES has is_left_first field, set the monoscopic fragment of the ES as the main media.
An example of temporally partial 3D contents that is composed one more monoscopic fragments and each monoscopic fragment of each track is different (e.g. a stereoscopic Music Video, a stereoscopic Soap Drama and two monoscopic CF(Commercial film) :
In this use case, a file structure for multiple stereoscopic tracks that contains stereoscopic and monoscopic fragments can be composed as follow:
The temporally partial 3D contents can be composed as an above example (consecutive CF) and it also can be composed only one monoscopic CF such as Figure 4.
3.2 Table for Boxes
The Stereoscopic MAFvideo application format file format contains the various boxes based on ISO base media file format. The Stereoscopic MAFvideo application format proposed in this contribution provides new boxes such as ‘scdi’, ‘svmi’, which tell stereoscopic camera and display information and stereoscopic video media information, respectively.a new box such as ‘snmi’, which tells camera, display, visual type information. Table 1 briefly shows the structure of the boxes and their description.
Table1— Table for boxes of Stereoscopic MAFvideo application format
ftyp / file type and compatibilitypdin / Progressive download Information
moov / container for all the metadata
mvhd / movie header, overall declarations
trak / container for an individual track or stream
tkhd / track header, overall information about the track
tref / track reference container
edts / edit list container
elst / an edit list
mdia / container for the media information in a track
mdhd / media header, overall information about the media
hdlr / handler, declares the media (handler) type
“soun” for audio data
“vide” for visual data
“sdsm” for LASeR data
minf / media information container
vmhd / video media header, overall information (video track only)
smhd / sound media header, overall information(sound track only)
hmhd / hint media header, overall information (hint track only)
nmhd / Null media header, overall information(some tracks only)
dinf / data information box, container
dref / data reference box, declares soure(s) of media data in track
stbl / sample table box, container for the time/space map
stsd / sample descriptions (codec types, initialization etc.)
“lsr1” for LASeR data
stts / (decoding) time-to-sample
stsc / sample-to-chunk, partial data-offset information
stsz / sample sizes (framing)
stz2 / compact sample sizes (framing)
stco / chunk offset, partial data-offset information
co64 / 64-bit chunk offset
ipmc / IPMP Control Box
mdat / media data container
meta / metadata
hdlr / handler, declares the metadata (handler) type
iloc / item location
iinf / item information
xml / XML container
bxml / binary XML container
svmi / stereoscopic video media infomration
scdisnmi / stereoscopic camera and display infomrationstereoscopic and mono information
3.3 Syntax and Semantics of the Boxes
As described in Table 1, the boxes in Stereoscopic MAFvideo application format are compliant to the ISO base media file format, and their definition and syntax are described as follows:
3.3.1 ftyp (File Type Box)
The brand that identifies files conformant to this specification is “SS01” and “SS02” as being shown in the Table 2.
Table2— The brand of stereoscopic contents
types / Specificationsss01 / Stereoscopic content without partial monoscopic data
ss02 / Stereoscopic content with partial monoscopic data
3.3.2 svmi (Stereoscopic Video Media Information)
Container: Meta Box ('meta')
Mandatory: Yes
Quantity: Exactly one
The ‘svmi’ boxprovides stereoscopic video media informationregarding the stereoscopic visual type and fragment information. and fragments in the case of the combination of stereoscopic and monoscopic fragments in one sequence. The visual type information signals the composition type of stereoscopic video and the structure of fragments. along with the position information of left/right view and has only one composition type for stereoscopic fragments. TThe fragment information represents the number of fragments, the number of consecutive samples and whether the current sample is stereoscopic or not.
3.3.2.1 Syntax
aligned(8) class StereoscopicVideoMediaInformationBox extends
FullBox('svmi', version = 0, 0){
// stereoscopic visual type information
unsigned int(8) stereoscopic_composition_type;
unsigned int(1) is_left_first;
unsigned int(7) reserved;
// stereo_mono_change information
unsigned int(32) entrystereo_mono_change_count;
for(i=0; i<stereo_mono_change_countentry_count; i++){
unsigned int(32) sample_count;
unsigned int(18) stereo_flag;
unsigned int(7) ) reserved;
}
}
3.3.2.2 Semantics
stereoscopic_composition_type - the type of stereoscopic contents that are defined as the following table 3.
Table3— Stereoscopic composition type
stereoscopic_composition_type / Identification0 / Side-by-side
1 / Vertical line interleaved
2 / Frame sequential
3 / Stereoscopic left view sequence
4 / Stereoscopic right view sequence
is_left_first – represents which image will be firstly encoded between left and right images. This determines positions of left and right view sequence as defined in table 3.
Table4— The positions of stereoscopic left/right view sequence according to the is_left_first value
Identification / is_left_first = 1 / is_left_first = 0Left view sequenceimgae / Right imageview sequence / Left imageview sequence / Right imageview sequence
Side-by-side / Left side / Right side / Right side / Left side
Vertical line interleaved / Odd line / Even line / Even line / Odd line
Frame sequential / Odd frame / Even frame / Even frame / Odd frame
n ES / Main media / Sub media / Sub media / Main media
entry_count – is an integer that gives the number of fragment in the ES