© ISO/IECISO/IEC 144961:1999(E)
ORGANISATION INTERNATIONALE NORMALISATION
ISO/IEC JTC 1/SC 29/WG 11
CODING OF MOVING PICTURES AND AUDIO
ISO/IEC JTC 1/SC 29/WG11 N3204
December 1999
Source: / MPEG-4 SystemsTitle: / ISO/IEC 14496-1/PDAM2 (MPEG-4 version 3 BIFS Part)
Author: / Michelle Kim (editors)
Status: / Committee Draft
TABLE OF CONTENTS
0Introduction
This document is an amendment to the ISO/IEC 14496-1 PDAM 2. This document specifies:
- TemporalForm for the FlexTime Model,
- MediaControl,
- ServerCommand,
- EXTERNPROTO.
0.1Organization of this document
This document is written in a manner that would be easy to integrate with the different sections in ISO/IEC 14496-1:1999. Clause 1 describes Advanced Synchronization Model (FlexTime Model), Clause 2 specifies MediaControl, Clause 3 specifies ServerCommand, and Clause 4 specifies EXTERNPROTO. Annex B is an informative annex on the FlexTime model.
1TemporalForm Node
The TemporalForm node carries the information necessary for realizing the FlexTime Model described in the Annex B. It provides a framework for temporal groupings: it allows Elementary Stream time intervals to align their start times and end times. It also provides a mechanism for specifying flexibility of the time segments as the TemporalForm carries minDuration, maxDuration, optDuration, and endSyncPriority fields for each flexed time segment.
1.1Node Interface
TemporalForm {
exposedFieldMFInt32groups[]
exposedFieldMFStringconstraints[]
exposedField MFInt32ODIDs[]
exposedField MFTimestartTimeStamps[]
exposedField MFTimeendTimeStamps[]
exposedFieldMFTimeminDurations[]
exposedFieldMFTimemaxDurations[]
exposedFieldMFTimeoptDurations[]
exposedFieldMFInt32endSyncPriorities[]
exposedField MFStringstretchModes[]
exposedFieldMFStringshrinkModes[]
exposedFieldSFTimemaxSceneStartDelay0
}
1.2 Functionality and semantics
The TemporalForm node specifies the temporal placement of time intervals on audio-visual objects according to a relative alignment in a time line.
The groups field specifies the list of groups of indices in the lists of ODIDs, startTimeStamps, …, through shrinkModes, to which the constraints must be applied. One group is a list of indices that is terminated by a -1.
The constraints field specifies the list of temporal relationships amongst the lists of ODIDs, startTimeStamps, …, through shrinkModes. Each constraint applies to the group (as terminated by a -1) with the same index as the constraint.
The ODlIDs field specifies the list of Object Descriptor IDs for the Elementary Streams that are described by the Object Descriptor and where the time stamps can be found. To specify time intervals ("objects") on the BIFS and OD streams that are within the Initial Object Descriptor name scope the ODID field value shall be 0.
The startTimestamps field specifies the list of start time stamps that uniquely identifies the starts to be used as synchronization points
The endTimestamps field specifies the list of end time stamps that uniquely identifies the ends to be used as synchronization points
The minDurations field specifies the minimum rendering durations, the maxDurations field specifies the maximum rendering durations, the optDurations field specifies the optimum rendering durations of the objects identified by the elementary streams and the time stamps respectively.
The endSyncPriorities field specifies the priority that determines the end synchronization order amongst a group. The highest priority object will determine the end and cause other objects to be ended at that time.
The stretchModesfield specifies the preferred modes of stretching (increasing the length of the object rendering times) according to the table below.
The shrinkModes field specifies the preferred modes of shrinking (decreasing the length of the object rendering times) according to the table below.
The maxSceneStartDelay specifies the maximum delay allowed before rendering the first object in the scene.
Table 1: Temporal Alignment Constraints
Alignment Constraints / Type Index / EffectCoStart: Align Start Times / “CoStart” / The startTime of the second and following components become equal to the startTime of the first component.
CoEnd: Align End Times / “CoEnd” / The endTime of the second and following components become equal to the endTime of the first component.
Meet: Align End Time to Start Time / “Meet” / The startTime of the second and following components become equal to the endTime of the components right before them.
Table 2: Preferred Mode of Stretch/Shrink Values.
StrechModeValue / StretchMode
description / ShrinkMode
Value / ShrinkMode
description
"Hold" / Hold rendering of the last Access Unit / "Stop" / Stop rendering
"Linear" / Linear Access Unit composition rate decrease / "Linear" / Linear Access Unit composition rate increase
"Loop" / Repeat
1.3Additional Implementation Considerations
To accommodate for FlexTime, decoder buffer size may have to be increased. The value is calculated by the server/encoder and sent to the client via the ES descriptor. The value is required to instantiate the decoder buffer with the correct expanded size, and is calculated such that the buffer model holds (does not overflow) even in the situation where FlexTime imposes the longest possible delay onto this stream.
1.4TemporalForm Encoding Table
TemporalForm / SFWorldNodeSF2DNode
SF3DNode / xxxxxxx
xxxxx
xxxxxx
Field name / Field type / DEF id / IN id / OUT id / DYN id / [m, M] / Q / A
groups / MFInt32 / 0000 / 0000 / 0000 / [-1, 4294967294] / 13,32
constraints / MFString / 0001 / 0001 / 0001
ODIDs / MFInt32 / 0010 / 0010 / 0010 / [-1, 4294967294] / 13,32
startTimeStamps / MFTime
endTimeStamps / MFTime
minDurations / MFTime / 0011 / 0011 / 0011
maxDurations / MFTime / 0100 / 0100 / 0100
optDurations / MFTime / 0101 / 0101 / 0101
endSyncPriorities / MFInt32 / 1000 / 1000 / 1000 / [-1, 1022] / 13,10
stretchModes / MFString / 1001 / 1001 / 1001
ShrinkModes / MFString / 1010 / 1010 / 1010
maxSceneStartDelay / SFTime / 1110 / 1110 / 1110
2ServerCommand for Application Signalling
The application-signaling framework allows an application to communicate the application signaling messages or commands to a server(s). The CommandNode in BIFS enables the application signaling in MPEG-4 Systems. Commands are sent to servers upon the occurrence of events (synchronous events specified in the scene description or asynchronous events as a result of user interaction). The CommandNode framework consists of two elements; a CommandNode node, and a CommandNodeRequest structure. While the CommandNode enables event routing to the server, the CommandNodeRequest structure specifies the syntax for the messages communicated to the server over a back channel.
2.1.1CommandNode
2.1.1.1Node Interface
CommandNode {
eventIn / SFBool / trigger / FALSEexposedField / SFBool / enable / FALSE
exposedField / MFString / url / []
exposedField / SFString / command / ""
}
Functionality and Semantics
This node is used to communicate application-signaling messages (commands) from the client back to the server. The CommandNode is processed only when triggerreceives a TRUE event and enable is TRUE. When the CommandNode is processed, the command is sent to the servers indicated by the specified url. A url identifies the object descriptor that contains an elementary stream that flows from the terminal back to the server. If that object descriptor has more than one such elementary stream, then the one specified will be used. The commandfield contains the information that is transmitted back to the server. The syntax and semantics of the command string are application specific and not specified. The syntax of the CommandNodeRequest structures used to communicate the command to a server is specified in clause XXXXX.
2.1.2CommandNodeRequest
When the CommandNode is processed the associated command is communicated to the servers specified in the url using the CommandNodeRequest structures. The CommandNodeRequest is encapsulated into SL packets, using the SLConfigDescriptor contained in the ESDescriptor of the upchannel elementary stream that carries the commands. If a timestamp is provided in the SL layer (either decoding or composition) then it is directly derived from the System Time Base of the terminal.
Syntax
class CommandNodeRequest(BIFSConfig cfg) {
bit(cfg.nodeIDbits) nodeID;
SFStringcommand;
}
where nodeID is node ID of the CommandNode node that trigger the command (all such nodes must have IDs in order to route events into them), and command is the string contained in the CommandNode node's command field.
Node coding table
A.1.1
CommandNode / SFWorldNodeSF2DNode
SF3DNode / 1100
10
101
Field name / Field type / DEF id / IN id / OUT id / DYN id / [m, M] / Q / A
trigger / SFBool / 00
enable / SFBool / 00 / 01 / 00
url / MFUrl / 01 / 10 / 01
command / SFString / 10 / 11 / 10
2.1.3MediaControl Node
2.1.3.1Node interface
MediaControl {
exposedField / SFTime / duration / -1exposedField / SFBool / enabled / TRUE
exposedField / SFBool / loop / FALSE
exposedField / SFNode / media / NULL
exposedField / SFFloat / mediaRate / 1.0
exposedField / SFTime / mediaStartTime / -1
exposedField / SFTime / mediaStopTime / -1
exposedField / SFTime / playTime / 0
exposedField / SFTime / startTime / 0
exposedField / SFTime / stopTime / 0
eventOut / SFBool / isActive
eventOut / SFTime / mediaCurrentTime
}
NOTE — For the binary encoding of this node see Annex xxx
2.1.3.1.1.1Functionality and semantics
The MediaControl node is following the semantics of time dependent nodes. The semantics of the node getting active or inactive is derived on the rules as described in section 9.2.1.6.1 of ISO/IEC 14496-1. The fields enabled, loop, startTime, stopTime and isActive follow the semantic described in this section.
When the node becomes active, the media pointed by the media node starts playing in the range mediaStartTime, mediaStopTime. When mediaStartTime is equal to –1, then the media starts at the first composition unit available. If mediaStopTime is equal to –1, then the media stops at the last composition unit available. When the node is active, the mediaCurrentTime event is triggered and provides the current CTS of the media expressed in seconds.
All the elementary streams linked to the url field of the MediaControl are affected by the commands. Furthermore, if the Object Time Base of the controlled object was driving other
The playTime field has the same effect as the startTime field. However, if the node is activated with the playTime field, it will have the same effect as playing the media from the first composition unit available, regardless of the value of mediaStartTime. If both playTime and startTime have the same value and activate the node, then startTime takes precedence. This may be used in particular for creating a “pause and restart effect”.
playTime and startTime events are ignored while the node is active, that is to restart the media you need to stop it and then restart it.
When duration is unknown, it is left to –1. When the media has a known duration, the duration of the media expressed in seconds is set to the duration field. This information can be used for designing user interfaces for instance. When the duration is known and mediaStopTime – mediaStartTimeduration, then the media is played until the last available media composition unit.
The mediaRate is a multiplication factor to the normal speed of the media. The mediaRate is controlling the server to tell him to slow down or speed up the playing of a particular media. Negative numbers are not allowed for mediaRate.
The following table summarizes the different action when the node is just becoming active due to the startTime or PlayTime events:
Condition / Control commandcurrentTimestartTime≥playTime
mediaStartTime =-1
mediaStopTime = -1 / Default value: Play whatever media is coming to the composition buffer.
currentTimestartTime≥playTime
mediaStartTime =t0
mediaStopTime = -1 / Play the media from t0 and while there are available composition units
currentTimestartTime≥playTime
mediaStartTime = -1
mediaStopTime = t1 / Play the media composition unit available until t1
currentTimestartTime≥playTime
mediaStartTime = t0
mediaStopTime = t1 / Play the media from t0 to t1
currentTime playTimestartTime
mediaStopTime = -1 / Play the media from the current Composition Unit until the last available Composition Unit
currentTime playTimestartTime
mediaStopTime = t1 / Play the media until the from the current Composition Unit until t1.
EXAMPLEThe following scene shows how to control a video from a MediaControl the Script also receives the current video time and can trigger events according to the video time:
[....]
Shape {
texture DEF M MovieTexture { url “od=5”}
geometry BitMap{}
}
MediaControl {
media USE M
mediaStartTime 10.0
mediaStopTime 20.0
}
DEF S Script {
SFTime videoTime
....
}
ROUTE M.mediaCurrentTime TO S.videoTime
3EXTERNPROTO
The EXTERNPROTO is an authoring facility that enables to distribute PROTOs in external libraries and be reused across scenes. This is an extension of the PROTO v2 functionality. To add this functionality, an update to the PROTOCode syntax and semantics is required.
3.1.1PROTOcode
3.1.1.1Syntax
class PROTOcode(isedNodeData protoData) {
bit(1) isExtern
if (isExtern) {
MFUrl locations;
} else {
PROTOlist subProtos;
}
do {
SFNode node(SFWorldNodeType,protoData);
bit(1) moreNodes;
} while (moreNodes);
bit(1) hasROUTEs;
if (hasROUTEs) {
ROUTEs routes();
}
}
3.1.1.2Semantic
First a flag signals whether the prototype is a PROTO, which then has his code included in the proto declaration, or if is an EXTERNPROTO, in which case only an external reference is provided.
The EXTERNPROTO opens a BIFSCommand stream that contains a ReplaceScene command with a BIFSScene with the PROTO definitions. The EXTERNPROTO code is found in the PROTO in this new scene with the same ID in this scene. The nodes that may be contained in this scene are ignored.
In case of a PROTO, the PROTOcode contains a (possibly empty) list of the sub-PROTOs of this PROTO in subProtos, followed by the code to execute the PROTO. The code is specified as a set of SFNodes, using a standard SFNode definition with the additional possibility to declare an IS field. Moreover, the PROTO body may contain ROUTEs if the hasROUTE flag is set to 1.
Annex B
(Informative)
Advanced Synchronization Model (FlexTime Model)
The Flextime model allows multiple "objects" that are connected temporally using temporal relationships such as co-start, co-end, meet. Objects are defined as whole Elementary Stream life cycles, but can also mean time segments withing the duration of an Elementary Stream. The FlexTime Model realizes that two Elementary Streams, or intervals thereof, can either start at the same time, end at the same time, or the end time of one coincides with the start time of another (they follow each other). By describing the connectivity with time stamps, the synchronization points are not restricted to whole Elementary Stream life cycles, but also apply to time intervals thereof.
Flexible duration
The FlexTime model is based upon a so-called "spring" metaphor. A spring comes with a set of 3 constants: the minimum length beyond which it won’t shrink, the maximum length beyond which it will break, and the optimal length at which it may rest comfortably. Following this spring model, MPEG-4 objects are viewed temporally as springs, each with the corresponding 3 spring constants. The optimal spring length (object playback duration) can be viewed as a hint to aid a receiver to choose a particular duration when more than one value is possible. Note that whereas stretching or shrinking the duration for video implies respectively slowing down or speeding up playback, for a still image, shrinking or stretching is merely holding the display shorter or longer.
Relative start and end time
Two "objects are synchronized with respect to each other by defining that they either start at the same time, end at the same time, or the end time of one coincides with the start time of another (they follow each other). Additionally, it is possible to insert a delay between object start and end times, for example to let one object to start some length of time after the start of another.
Time Stamp Based TemporalForm
The TemporalForm node carries the information necessary for realizing the FlexTime Model. It provides a framework for temporal groupings: it allows Elementary Stream time intervals to align their start times and end times. It also provides a mechanism for specifying flexibility of the time segments as the TemporalForm carries minDuration, maxDuration, optDuration, and endSyncPriority fields for each flexed time segment.
The time segments are identified by Object Descriptor ID and start and end time stamps in the Elementary Streams described by the Object Descriptor. In the case of BIFS and OD streams, there are two ways to ensure uniqueness of these time stamps. One is to carry conflicting BIFS and OD command in a separate Elementary Stream and index each BIFS and OD command by its time. Two is to make each time stamp unique by varying the timing in a negligible way. Both methods can be applied where BIFS and OD time stamps need only be split across separate Elementary Streams when they flex independently of each other and have identical time stamps.