1. Opening

The 36th WG11 meeting was held in Chicago, IL, US on 96/09/30-10/02 at the kind invitation of ANSI, the American National Standards Body and hosted by Motorola.

2. Roll call of participants

Annex 1 gives the attendance list.

3. Approval of agenda

Annex 2 gives the approved agenda.

4. Allocation of contributions

Annex 3 gives the list of submitted documents.

5. Communications from Convenor

There were no special communications.

6. Report of previous meeting

The Convenor apologised for his inability to provide a report of the previous meeting.

7. Processing of NB Position Papers

These papers, from DE, FR, JP, US, were discussed and a response provided.

8. MPEG Phase 2

8.1 Audio

No activity took place.

8.2 Verification of MPEG-2

8.2.1 Video Quality

No activity took place.

8.2.2 Audio Quality

No activity took place.

8.3 Amendments

8.3.1 Private Data (System #3)

No activity took place.

8.3.2 Multi View Profile (Video #3)

The Disposition of Comments of DAM 3 to ISO/IEC 13818-2 (WG11 N1367) and the text of ISO/IEC AM 3 (WG11 N1366) were approved.

8.4 Part 7 (NBC Audio)

No activity took place.

8.5 Part 10 (DSM-CC Conformance)

No activity took place.

8.6 Workplan

This was approved.

9. MPEG Phase 4

9.1 Requirements

Ver. 1.1 of the MPEG-4 Requirements document (WG11 N1395) and the MPEG-4 profiles document (WG11 N1394) were approved.

9.2 Syntax

This activity took place in the context of the different VMs.

9.3 Tools

9.3.1 Systems

A considerable amount of activity took place in the area of multiplexer.

9.3.2 Natural Audio

A considerable amount of activity took place in the area of synthetic speech.

9.3.3 Synthetic Audio

A considerable amount of activity took place in the area of text-to-speech tools.

9.3.4 Natural Video

A considerable amount of activity took place in the area of video coding tools.

9.3.4 Synthetic Video

A considerable amount of activity took place in the area of face and body animation.

9.4 Verification Models

9.4.1 System

A System VM was not developed yet.

9.4.2 Video

A further version (4.0) of the Video VM was produced (WG11 N1380). The SNHC part is contained in WG11 N1364.

9.4.3 Audio

A further version of the Audio VM was produced (WG11 N1378). The SNHC part is contained in WG11 N1364.

9.5 Tests

A draft document on MPEG-4 test procedures for July 1997 was approved.

9.6 Call for proposals

Fifteen submissions in response to the SNHC Call were received. A new Call for Synthetic Audio was approved (WG11 N1397).

9.7 Simulation software

A Verification Model Development and Core Experiments document (WG11 N1375) extending over previous documents was approved.

9.8 Working Draft

MSDL WD 1.3 was approved.

9.9 Workplan

This was approved.

10. MPEG Phase 8

A disposition of comments to the Man to Multimedia Service Interface NP was produced (WG11 N1400). In response to a NB comment the NP was retitled as "Multimedia Content Description Interface" and nicknamed MPEG-7.

11. Overall WG11 workplan

This was approved.

12. Terms of Reference

This could not be done because of the short duration of the meeting.

13. Liaison matters

Input documents were considered and liaison letters approved.

14. Administrative matters

14.1 Schedule of future MPEG meetings

The April 1997 meeting will be held in Bristol, UK. Further the meeting recognised that the critical phase of development of the MPEG-4 standard required an extra meeting in the January- February time frame. The Convenor was asked to take all appropriate measures to secure an invitation for a meeting.

14.2 Promotion of MPEG

A general document describing MPEG-4 and a press release were approved.

15. Organisation of this meeting

15.1 Tasks for subgroups

Implementation Studies

15.2 Finalisation of meeting allocation

The following joint meetings were held:
SNHC-MSDL Mon 11:00-13:00
Video-ISG Tue 09:00-10:00
Video (MVP)-Test Mon 11:00-13:00
Video-Test-Req-MSDL-SNHC-ISG Tue 10:00- 11:00
Test-Audio Tue 12:00-13:00
MSDL-Req Tue 09:00-10:00
Video-Req Wed 09:00-09:30

16. Planning of future activities

The following ad-hoc groups were established:
1372 Adhoc Group MPEG-4 July 1997 Audio/Visual Tests
1390 Adhoc Group on Coding Efficiency
1371 Adhoc Group on Computational Graceful Degradation
1377 Adhoc Group on Core Experiments for MPEG-4 Audio
1389 Adhoc Group on Core Experiments on Multifunctional Coding
1407 Adhoc Group on Definition and Measurement of Statisitcal Performance Parameters of MPEG-4 Video VM
1392 Adhoc Group on Editing MPEG-4 Video VM Document
1363 Adhoc Group on Editing of SNHC VM Specifications
1406 Adhoc Group on Editing Video WD
1388 Adhoc Group on Error Resilience
1360 Adhoc Group on Face and Body Animation
1370 Adhoc Group on Investigating reduced complexity padding for the Video VM.
1409 Adhoc Group on Joint Video-SNHC Technical Issues
1361 Adhoc Group on Media Integration of Text and Graphics
1376 Adhoc Group on MPEG-4 Audio WD Editing and VM Software Implementation
1393 Adhoc Group on MPEG-4 Requirements
1405 Adhoc Group on MSDL Architecture Evolution
1403 Adhoc Group on MSDL Verification Model
1402 Adhoc Group on MSDL Working Draft editing
1404 Adhoc Group on Multiplex Specification and Signaling
1391 Adhoc Group on Region Oriented Texture Coding
1387 Adhoc Group on Shape and Alpha Coding
1362 Adhoc Group on SNHC
1359 Adhoc Group on SNHC Audio
1408 Adhoc Group on Video Low Delay Evaluations

Details of each ad-hoc group can be found in WG11 N1381.

17. Resolutions of this meeting

These were approved (WG11 N1352)

18. A.O.B

There were no other businesses.

19. Closing

The meeting closed on 1996/10/02 22:00.

Annex 1
Attendance list

Annex 2

Document Register

Source: Pete Schirling

Annex 4
Requirements Meeting Report

Source: Rob Koenen, Chairman MPEG

The Requirements Group had a useful meeting in Chicago, during which many questions were addressed. Not all were answered yet, but good progress was made. It was also good to see the attendance doubling from the last meeting, although the attendance level of the Video Group was not quite matched yet.
The following issues were discussed:
General Requirements Issues
A new version of the Requirements Document was issued (WG11 N1395, MPEG-4 Requirements version 1.1), and it was decided to create a separate Profiles Document (N1394, MPEG-4 Profiles version 1.1). To keep the Profiles and Requirements documents synchronised, the numbering of the Profiles Document starts with version 1.1. In addition to that there were minor revisions to general requirements; these are marked with revision marks, and so easy to spot.
In a joint meeting with the Systems Group, priorities were set for interworking and compatibility: Discussion about Profiles (@ flex_0)
A discussion about how profiles should be organised at flex_0 resulted in the following conclusions: Latency Issues
The Requirements Group recommended after a brief discussion that hat the Video, Audio, Systems and SNHC groups address latency issues in their core experiments and proposal evaluations.

The Profiles Document
The Real-Time Communications Profile and Object based Storage and Retrieval Profile were discussed. Also, there might be a need for a multimedia broadcast profile. The interested parties were invited to bring requirements to the next meeting. It was noted that people interested in profiles should bring ALL the relevant requirements to MPEG, even when they think these are already covered in the general requirements.

Real-Time Communications Profile
The three levels of the Real-Time Communications Profile were condensed to one. The Requirements Group feels confident with the current profile definition, and the profile was approved. The requirements for it will develop further, following the technical progress in the Video, Audio and Systems Groups. The requirements were essentially provided by the ITU, but the possibility is left open to include object-based capabilities.

Object based Storage and Retrieval Profile
The Requirements Group was happy to receive a request for an "Object based Storage and Retrieval Profile". This is a drafty, yet very useful start for a new profile. It will need many of MPEG-4's object based capabilities, and it will be further developed. It is very likely that a requirement exists for high quality video and audio. The profile was discussed in the joint meeting with the Video Group. A draft is included in the Profiles Document (WG11 N1394), and it will be further developed in the ad hoc group.

Copyright Issues
Copyright Issues were briefly discussed. They are very likely important in MPEG-4, but how to deal with
them is not yet entirely clear. The discussion will continue in the Ad Hoc Group for MPEG-4 requirements. The goal of this work will be: to derive requirements for copyright protection of MPEG-4 objects and composited scenes. The Requirements Group was glad to receive, through the French National Body, a contribution on copyright issues, and looks forward to meeting with experts during the Brazil meeting to further discuss the issue.

Intra and Still Picture Modes
The Japanese and American National Bodies had requests for good quality intra and still picture modes. The result of the discussions, also derived in the joint meeting with Video::

Annex 5
Systems Meeting Report
Source: O. Avaro, Chairman

1. Architecture

The main results on discussion on architecture ar ethe following :
- MSDL/SNHC agreed to that it will be a great benefit to have one architecture for MPEG- 4.
- The current architecture has been adopted by SNHC to develop their APIs.
- MSDL takes into account SNHC Requirement to support not less that VRML 2.0 capabilities.

This means that the current architecture is adopted. The verification model is build on it. The needed evolution to support VRML 2.0 capabilities will be developed in the AHG on Architecture (see below).

Architecture also discussed the format of what will be standardised. Procedural (language+APIs) or textual format. There were not enough inputs to make this issue progress. This point will therefore be adresses in the architecture AHG.

2. Flexibility

The current approach (language+APIs) seems to be stable and to provide for the requested flexibility.

3. Composition

The description of the composition (currently achieved by the description of the render method) of the AV Objects seeems to be sufficient.

An alternative to a procedural description has been proposed. The current description and the additionnal benefit of the proposal (e.g. easy capability to go through scenes nodes) should be merged in the new architecture. Phil proposed to achieve this in adding new AV objects called nodes. This proposal will be documented an validated in the architecture group.

The link between the composition and multiplex has been discussed. Two possible solutions have been proposed :
- the first one is an implicit association of multiplex streams to AV objects based either on the logical structure on the scenes and the logical structure of the multiplex.
- the second one is based on an explicit association through fixed identification of objects. Similar mechanism is used in DSM-CC.
These two solutions needs more investigation and more technical proposal to allow an educated choice.

4. Decompression

It was decided during this meeting to freeze the decoding flexibility since there is not such clear requirements from other MPEG group for a flexibility under the level of algorithms and above the level of syntax. The feasibility and the technical approach are however well defined, and studied in this area can be continued if needed.

The flexibility at the level of algorithm is achieved through the definition of the interface of decoding process objects. Such definition have been produced and documented in the WD. Same kind of interfaces will be defined for high level tools such as shape decoders, mesh decoders ...

The encapsulation of an audio decoding algorithm have been provided and extends the current approach to Audio algorithms and tools.

The flexibility at the level of the syntax decoding is achieved with MSDL-S (see below).

Syntax decoding

The main issue for syntax decoding was the following :

The integration of the syntax decoding within the architecture has been looked at in clode details. We have therefore now a good understanding of how to use MSDL-S in the VM which mainly resides in :
- The description of the syntax in MSDL-S (e.g. Foo)
- The compilation of the classes in a C++ or JAVA class (e.g Foo)
- The declaration of this syntax class in objects that need to parse
data from the bitstream (parsable object).
- Data will be made available to the parsable objects either by the interface of a get method either at the instanciation of the object.

This close analysis reveals still not solved technical issue, that will be examine in future work such as :
- Storage of the data for syntax objects containing loops.
- Sequential of parsing and decoding operations
- Binary description of the MSDL-S ...

A compiler from MSDL_S to C++ has been provided by Yihan. It will allow for validating and testing and will be the basis of a set of useful tools for other groups (such as syntax checker, compliance testing...).


Extensive discussions were held by the mux guys. The general outputs are :
- A prioritization of interworking requirements between multimedia terminals. The priority being first with MPEG-4 terminals through various networks, then MPEG previous standards, then others.
- The clarification of mux goals. Mux will produce the ability to mux content information streams and to develop the apropriate signaling to configure the mux. Error protection does not substitute network tools but complement them if necessary.
- There is no clear need of an object oriented description of the mux. However, the integration of the mux in the current VM may be facilitated by such an approach. This approach fits also well with mux tools such as error protection or encryption. Further studied are needed to establish the cost/added value of the approach.
- Synchronisation of mixed media types can be achieved using MPEG-2 like mechanisms within the context of VRML 2.0. The evolution of the architecture should define how these mechanism can take place.
- Close collaboration with ITU is wished, since there is still some room to make H.223/A evolve to take into account MPEG-4 requirements. In any case, interworking between the two standards should be facilitated as much as possible.

More information can be provided by Carsten who chaired this mainly parallel meeting.


The need of signalling was raised within multiplex activities who forseen first to define how the configuration is achieved. More generally speaking, MPEG-4 has a strong need of signaling. This needs have been drafted and will be forwarded to the DSM-CC group which took the responsibilities of signaling in the previous MPEG standard. A close collaboration of the two groups should be defined.

Concerning multiplex signaling, two existing standards meet the goals DSM-CC and H.245. This last one has currently the preference of mux experts since the coding of the message is efficient and the semantic is closer to multiplex need. In any case, before making any choice, the MSDL group expect expertise and coordination from/with DSM-CC.

Working draft and Verification Model

All these discussions and decisions will be made available in the edition of a working draft and a definition of a complete verification model.

Video Meeting Report
Source: T. Sikora, Chairman

The primary focus in the video group was dedicated to the review of Core Experiments and the progression of the MPEG-4 Video Verification Model (VM) to version 4.0. Joint meetings were held with the test group, the requirement group, as well as with the MSDL group to align activities.

The MPEG-2 Multiview Profile was promoted to the status of an Amendment (AM).

Following the results of the Core Experiments or problems detected, a new version 4.0 of the MPEG-4 Video VM (doc. N1380) was released. The main improvements were made for the coding efficiency of Intra frames, the coding efficiency of motion compensated video for very low bit rates, for the computational efficient implementation of the Macroblock Padding procedure for arbitrarily shaped VOP's as well as for error robustness in eror prone environments.

In particular:

* The Chrominance Subsampling description was improved, namely the way chrominance samples are computed in the borders of arbitrary shaped objects.

* The Motion Vectors Difference range was corrected.

* The Padding of arbitrarily shaped VOP's was changed from frame-based padding to Macroblock-based padding.

* Improvde error resilient due to the introduction of an error resilient syntax and error resilience tools, with main changes:
- A flag to enable/disable Error Resilience Mode.
- Byte alignment for the session start code.
- Resynchronisation markers are introduced in a row by row fashion.

* The Intra DC prediction was changed and a DCT-domain AC prediction was introduced to increase coding efficiency in Intra VOP's. In particular:
- At the MB layer a new flag (ACpred_flag) was introduced to enable/disable AC prediction at MB level.

* The VOP formation was changed to increase coding efficiency in arbitrarily shaped VOP's, in particular the computation of the bounding box changed in order to maximize the number of transparent MBs.

* A deblocking filter was introduced in the coding loop to minimize blocking artifacts at very low bit rates. A flag is avaialable in the VOL to disable the filter if required.

* The Temporal Referencing section has now a better description.

* Spatial scalability was revised in order to eliminate incoherences.

* The DBQUANT semantics was revised.

* The Bitstream syntax section starts with improved definitions.

* The description of the Separate mode for I, P, and B VOPs was updated.

Promising techniques were identified for coding efficiency of video. Discussions on additional functionalities took place in this context. In particular the Sprite Coding technique currently investigated in Core Experiments provides significant improvement in terms of coding efficiency for VOP-layered approaches and provides important additional functionality for additional object-based manipulation of video.

Major improvements have been demonstrated for the efficiency of shape coding in various Core Experiments, It is foreseen to adopt major changes for the VM related to shape coding at the Brasil meeting.

Many Core Experiments are continued until the Brasil meeting. It is expected that the number of Core Experiments related to new techniques will reduce in future meetings.

SNHC Meeting Report
Source: G. Rajan, Acting Chairman

1. Evaluation of Proposals Received in Response to CFP

Received 1 proposal in Geometry Compression, 7 contributions in the area of Face Animation, 2 relevant to Body Animation, 2 in Text-to-Speech synthesis, and 3 in the area of Media integration.

Apart from evaluating the technical merits and innovations of the contributions, we also decided to work on the integration of these contributions into one architectural framework. For the purpose of developing one right away, the proposed Systems architecture was deemed to be suitable for the moment.

3 areas of work for VM 1.0 were identified, viz., Face and Body Animation, Media Integration of Text and Graphics, Text-to-Speech and its possible interface to Face animation (there was one contribution from AT&T in this particular area).

SNHC agreed to adopt the MSDL architecture for the development of their VM 1.0 realizing that it would be useful to have a common architecture among the two groups.
MSDL and SNHC also agreed that, at the least, VRML 2.0 functionalities should eventually be supported within MPEG4. In that regard, the SNHC group would drive the architecture requirements for the MSDL group and contribute to its evolution via the AHG on Architecture.

2. Work on SNHC VM 1.0

Four main areas were concentrated on for this document: Face and Body parameter description and animation, Media integration of text and graphics, Text-to-Speech synthesis and associated interfaces with face animation.

SNHC audio functionality architecture was included in this document (w1364) for generating discussion and encouraging participation/contributions.

A separate document (w1365) was generated for the enumeration of the face and body {description, animation} parameters.

The text and graphics related functionalities generated a lot of discussion. Added to the mixture was a brief note from ITU-LBC group asking for a joint effort on exactly the same functionalities. At this stage, it was decided to confine the functionality to 2D text and graphics related with further extensions to be explored in the appropriate AHG. As one might notice, a reasonable part of the Media Integration section in the VM document was "borrowed" from the T.126 files.

Although a lot of the APIs are documented in the SNHC VM, they need to be integrated into the Systems architecture in order to ratify their functionalities.

3. New CFP on SNHC Audio

Since we had not received any contributions in the area of SNHC audio, a fresh Call for Proposals in the Integration of Synthetic audio within the MPEG4 framework was issued. Some understood the proposal to be calling for fresh contributions in the areas of synthetic audio coding algorithms, but that was clarified subsequently.

4. Ad Hoc Groups

Five SNHC AHGs were established.

Annex 8
Audio Meeting Report
Source: B. Edler, Acting Chairman

Opening of the meeting
The MPEG/Audio Subgroup meeting was held during the 36th meeting of WG11 in Chicago, USA on Sept 30 to Oct 2, 1996. The list of participants is given in Annex A-I. The acting chairman welcomed the delegates to the meeting and outlined the work for the three days.

Approval of agenda
The agenda as presented in Annex A-II was approved.

Allocation of contributions
All contributions were listed (see Annex A-VI) and allocated to the agenda. All contributions were presented in Audio plenary.

Communications from the Chair
Mr. Edler reported on the Sunday evening Chairman's meeting.

Tampere meeting report
The Audio Subgroup portion of the Tampere meeting report, July 1996, had been previously distributed and its MPEG-4 relevant parts were approved.

Report of ad hoc group activities
The reports of the ad hoc groups (ad hoc group on MPEG-4 audio VM, M1296 - Edler, and ad hoc group on MPEG-4 audio core experiments, M1307 - Brandenburg) were given in the opening plenary. The table of currently available VM modules in M1296 was updated and included in output document N1379 in order to reflect the latest changes.

Disposition on National Body Comments (DoC)

The input documents listed in Annex A-VI were discussed in the audio plenary. Recommendations were prepared in reaction to the USNB contributions and to M1362. Document M1271 will be taken into consideration for the design of core experiments on error resilience. Three task groups were formed in order to prepare a textual description of the Audio VM as indicated in Annex A-IV. In addition some flex-0 possibilities were summarized. The main results of the task groups were presented to and discussed by the audio plenary and their work resulted in the Audio VM 2.0 document N1378.

Mr. Kaneko reported the status of SNHC audio.

In a joint meeting with most of the other subgroups the important issues of the July 1997 test were discussed.

Preparation of a press statement
Contributions to the meeting press statement were prepared and approved by the Audio Subgroup.

Liaison matters

Discussion of unallocated contributions
A document entitled "Use of 'Mixed Voiced' mode for 2.0 kbps parametric core of the VM" which was not registered as an input document was presented by Mr. Nishiguchi.

Recommendations for final plenary
A list of recommendations was prepared for approval at the final MPEG plenary meeting. The following ad-hoc groups were established:
a) Ad Hoc Group on SNHC Audio, N1359 - Kaneko
b) Ad Hoc Group on MPEG-4 Audio WD Editing and VM Software Implementation, N1376 - Grill
c) Ad Hoc Group on Core Experiments for MPEG-4 Audio, N1377 - Brandenburg

The output documents given in Annex A-VII were produced by the Audio Subgroup.

Agenda for next meeting
The agenda for the MPEG Audio Subgroup meeting in November '96 in Maceio, Brazil was already approved during the meeting in July '96 in Tampere (see Annex A-III).


Closing of the meeting

Annex A-I
Annex A-II
Agenda for the 36th MPEG/Audio Subgroup Meeting in Chicago, September 1996

Annex A-III
Annex A-IV
Audio Task Groups

T/F tool description - Brandenburg
Bosi Brandenburg Iwakami
Koike Kroon Lindqvist
Lueck Mainard Moriya
Oomen Schnurr Thi
LPC tool description - N.N.
Kroon Lindqvist Nomura
Su Tan Tanaka
Parametric tool description - Nishiguchi
Iijima Kroon Matsumoto
Nishiguchi Purnhagen Su

Annex A-VI
Input Documents

Annex A-VII
Output Documents
No. Authors Title
N1378 Audio Subgroup MPEG-4 Audio Verification Model 2.0
N1379 Edler MPEG-4 Audio Software Library Overview
N1398 Moriya Prescreening Listening Test Procedure for Core Experiments of MPEG-4 Audio

Annex 9
Test Meeting Report

Source: Laura Contin, Chairman


The MPEG Test Subgroup met in Chicago during the 36th meeting of WG11.
The following items were addressed:
1. Results of the verification tests on MPEG-2 Multiview profile
2. Test procedures to be used in July '97 tests.

MPEG-2 Multiview profile tests

The results of tests carried out on stereo sequences coded with the MPEG-2 multiview profile (ISO/IEC 13818-2/AM3) were presented and discussed. The tests were carried out at three different test sites located in Japan (NHK), Germany (HHI) and Canada (CRC). Taking into account the different equipment used for displaying the sequences, a considerable consistency among the test sites was observed.

From the results it can be concluded that generally speaking, at the tested bit rates, viewers did not perceive too annoying coding artifacts. Details about test procedures, laboratory set up and experimental results can be found in document WG11/N1373. This concludes the activities on MPEG-2 multiview profile.

July '97 tests

Representatives of all the MPEG subgroups participated in a meeting to discuss goals and experimental conditions for the MPEG-4 tests scheduled by July '97.

These tests will be aimed on the one hand at comparing audio and video VMs with both existing standards and new emerging technology and on the other hand at checking the status of the VMs against the requirements for the standard. In other words, the purposes of July '97 tests should be both competition among VM and new proposals and verification of the standard itself.

Concerning the competition tests, the assessment methods and procedures will be basically those already used in the previous audio and video tests, a part from some exceptions as the introduction of the Double Stimulus Continuous Quality Evaluation (DSCQE) for testing error robustness. Document WG11/N1374 provides a preliminary description of the test methods and procedures to be used in the July '97 tests. The test subgroup has asked the support of audio and video experts to revise this document, in particular concerning the definition of testing conditions (e.g. source material, pre-processing, coding parameters, error conditions, etc.). The following individuals will coordinate the revision of particular sections of the document:
ResponsibleSections of doc. WG11/N1374 to be revised
B. EdlerSection 2 - Audio tests
J. MullerSection 3.3, 3.4, 3.5.1 - Video compression tests
M. FraterSection 3.3, 3.4, 3.5.2 - Video error robustness tests
J. OstermanSection 3.3, 3.4, 3.5.3 - Video content-based interactivity tests

A revised version of the document will be prepared by next MPEG meeting.

Concerning the verification tests, several possible evaluations have been taken into account. Test subgroup proposed audiovisual tests, Implementation subgroup made a proposal to evaluate graceful degradation and SNHC proposed tests to compare MPEG-2 and MPEG-4 on text overlay. Test subgroup also suggested task-based tests to evaluate facial animation performance (a possible task is for example the recognition of the emotions). If real-time decoder will be available by July '97, interactive tests, on audio, video, MSDL and maybe also SNHC could be carried out. For the verification tests, it would be advisable to focus the attention on particular profiles and levels and tailor the tests on a representative application of such profiles/levels. More thoughts are needed on the verification tests and suggestions from MSDL, implementation and requirements subgroup are expected.

The last, but very important point discussed was the need of new sequences.

The following material is absolutely needed:
1. audio-visual sequences lasting at least 10 seconds.
2. (audio-)visual sequences lasting at least 2 minutes.

Moreover, to fairly evaluate codecs' performance, it would be advisable to use new audio/speech and video material. This because the codecs under test will be likely tuned on actually available source material.

Test subgroup has invited all the MPEG members to bring in Maceio any kind of material that could be used for July '97 tests. A material screening section will be arranged during next meeting.

Annex 10
Implementation Studies Meeting Report

Source: Paul Fellows, Chairman

The meeting had a broader attendance in terms of expertise and application profiles and benefited as a result. As there were a number of new members to both the group and MPEG-4, some time was spent describing the work of the group in the past and the general approach that had been taken.

This was the first meeting that the ISG for MPEG-4 was able to deliver some concrete results in terns of complexity assessment of the Video VM. The chairman would like to thank in particular :- Simon Winder, Peter Kuhn, Franck Mamelet and Jean Gobert for the excellent work carried out before the meeting.

Using a performance orientated ANSI C implementation of the video verification model (VM2.2) produced by ACTS EMPHASIS, a tenfold increase in performance has been achieved against the existing software implementations of the VM. Detailed profiling of this software identified the key performance critical components of the standard. Based upon this information, the video and implementation studies groups jointly started an activity to reduce the complexity of the identified modules. There is still considerable scope to further improve the performance of the software and then in the future to include platform-dependent optimizations. The exercise will be repeated at some later date when a more stable version of the Video VM becomes available. i.e. the group will not provide results for each MPEG meeting.

Another key activity performed by the implementation studies group was the identification of ways to gracefully degrade the computational complexity under conditions of high processing demands from either MPEG-4 itself or other co-existent applications, for instance when the decoder runs in software on a personal computer. These techniques, if proven feasible, will lead to increased service availability to the user.

During the meeting the following implementation documents were reviewed :-

1192 Marco Mattavelli Report of the Ad-hoc group on computational graceful degradation
1193 Marco Mattavelli, Sylvain Brunetton Measures of the range of computational based scalability
1199 Keith Kenemer, Dmitriy Korchev, Michael Zeug Complexity Analysis of the Decoder used in the P5 Core Experiment
1257 Peter Kuhn Complexity Analysis of the MPEG-4 Video Verification Model Decoder
1258 Peter Kuhn AHG Report on Video Verification Model Complexity Assessment
1288 Gilles PRIVAT, Ivan LE HIN Hardware evaluation of shape decoding APIs

Graceful degradation.
Many issues remain open for contributions and/or discussion: A joint meeting between Test, Requirements and Implementation was held to discuss including graceful degradation into the July 97 Tests. Further work will continue on this subject and will be integrated within an Implementation Model (IM) of the Verification Model (VM).

Complexity Analysis of MPEG-4 Video Verification Model Decoder
Document M1257.doc describes the experimental conditions and the results of the complexity analysis. The results that are of most interest are provided below. The ley issue identified was that the alpha padding technique described for VM2.2 consumed 43% of the computation time as well as by far the largest amount of Memory bandwidth.

An instruction level profiling of an speed optimized VM 2.2 decoder (ACTS EMPHASIS) was performed, showing that QCIF realtime decoding of 4 VOPs (sequence coastguard) is possible on current Pentium and Ultrasparc architectures. The results show, that 280 RISC-MIPS and 298 MByte/s memory (i.e. cache) access bandwidth is required for the above mentioned scenario. It can be seen also, that clear written, flexible, extensible but not speed optimized code (e.g. ACTS Momusys VM) is not suited for implementation complexity analysis (but extermely valusable for other core experiments).

The results show also, that the CPU instructions used by a software implementation on a real system consist of about one third arithmetic instructions, one third memory access instructions and about 25 % control instructions.

Distribution of Instruction Usage and Memory Bandwith Usage

Sequence coastguard, 4 Vops, 30 fps, 10s, QCIF, Ultrasparc

% of time calls (iprof = gprof) Mega Instructions per second (iprof) Memory Bandwith (MByte/s) (iprof)
Function gprof
Arith. Control Memory Sum (incl. other)
pad_alpha 43.17 3588 55 42 69 191 215
internal_mcount 17.18 (library function used by gprof only)
add_one_vop_alpha 9.26 1200 10 0.3 9.7 21 18
idct 5.76 84613 5.7 0.4 3.0 9.6 7.4
decode_binary_shape 4.23 49125 3.2 1.2 2.8 8.5 7.8
interp_2h 2.34 109936 2.6 0.1 1.7 4.6 2.2
render_inter_texture 2.16 84082 1.5 0.1 2.8 4.5 5.7
get_macroblock_texture 2.16 43434 1.2 1 1.7 5.4 5.5
interp_c 1.44 99251 0.5 0.1 1.4 2.0 1.9
pad_noalpha_umv 1.26 3588 0.8 1 1.0 2.1 3.6
decode_level_0_to_2 1.17 49125 1.3 0.7 1.3 3.9 3.6
showbits 1.08 864096 2.0 0.7 1.7 5.0 4.4
interp_4 0.90 20782 0.8 0.02 0.5 1.3 0.6
test_and_swap 0.63 221976 0.4 0.3 0.6 1.5 1.4
get_coded_prediction 0.63 43280 0.4 0.3 0.8 1.8 2.4
mcount 0.63 (library function used by gprof only)
clear_viewport 0.54 301 0.2 0.1 0.4 0.8 1.6
get_non_obmc_block 0.45 142821 0.2 0.2 0.4 0.9 1.1
fill_16x16 0.45 22623 0.151 0.04 0.6 0.9 0.8
flushbits 0.36 862895 0.4 0.3 0.7 1.8 2.9
get_TCOEF 0.36 229038 0.6 0.3 0.6 1.8 1.6
fill_4x4 0.27 290684 0.1 0.06 0.5 0.7 0.6

The data stated above are time and instructions spent in the listed functions. Subfunctions are accounted seperately and are not included in the statistics of their calling functions. GNU gprof delivered sampled function execution times and calls and iprof delivered exact function call numbers and exact instruction usage statistics.

Simulation time
With gcc 2.7.2 the uninstrumented decoder runtime on Ultrasparc 167Mhz was 15 seconds (20 fps) for writing on the X11 display and 10s for writing into memory (30 fps). Note that however, the figures are for QCIF and that there is still a long way to go to Standard definition TV resolution.

Complexity Analysis of the Decoder used in the P5 Core Experiment

The results of a preliminary coomplexity analysis of the Iterated Systems decoder used for Core experiment P5 was reviewd (M1199.DOC). The analysis presented indicated that this scheme is a particularly inexpensive scheme for implementation and that using a 166Mhz Pentium (implementation was optimised for this platform), real time performance is possible.

Execution Times

The data presented below shows the measured execution times of the complete decoder (including unpacking and color conversion) on a 166 MHz Pentium.
Bit Rate (bps)Frame Rate (fps)
Table 1: Decoder performance vs. bit rate at QCIF resolution
Bit Rate (bps)Frame Rate (fps)
Table 2: Decoder performance vs. bit rate at CIF resolution

The decoding algorithm is a very simple paste function with an intensity adjustment. No floating point variables are required. The greatest memory usage involves storing the current and previous frames. Increases in computational power requirements are directly proportional to image size. Profiling of the decoder showed that the computational complexity of the paste function in the decoder is significantly lower than the computational complexity of YUV to RGB color conversion.

Hardware evaluation of shape decoding APIs

Doecument M1288.DOC discussed the merits of formalising the API interfaces between the decoder and a compositor tool. The following benefits were cited :- The contribution studied a set of shape representations which could be used to isolate higher-level 2D object descriptions from pixel-level back-end rendering/compositing operations. Rather than a single interface, these descriptions provide a consistent set of intermediate representations which could go from the higher to the lower levels, from contour/skeleton to binary masks and addressing patterns. Either a traditional processor-memory or an associative-processing/logic-enhanced memory model can be used to support the lowest levels of these representations in hardware, whereas higher level representations could be software-converted. In both cases, the inclusion of these representations in standardized APIs makes it possible to leverage all capabilities of the underlying hardware while maintaining cross-platform interoperability.

The idea is that a shape representation included in an API is a pivotal representation to which others are converted before they access hardware resources. To date, the only equivalent to this for video is the raster-scan format, which unifies other formats by their lowest common denominator and precludes parallel processing.

Annex 10
Liaison meeting report

The Liaison group considered input documents

SC29/N1744 from ITU-T SG15 on video and audio issues. Most of this document was discussed in Tampere, where a reply WG11 N1305 was produced.

SC29 N1730 from SC21 on progress in ASN.1. The document was distributed to the MSDL Mux group.

SC29/N1746 from IEC/TC100 was noted. No action was taken.

SC29/N1749 from DAVIC was discussed in Tampere. No further action.

MPEG96/1171 from ITU-T LBC group regarding text and graphics overlays was discussed. SNHC will provide these features.

MPEG96/1172 from ITU-T LBC group regarding cooperation and consideration of existing ITU-T standards.

MPEG96/1173 from ITU-T LBC group regarding MSDL-M specification. MSDL decided to try to propose changes to H.223A for its needs.

WG11/N1368 was produce for sending to ITU-T SG15 on the subjects of overlays and cooperation.

WG11/N1369 was produce for sending to ITU-T SG15 on the subject of multiplexing.

Karlheinz Brandenburg was approved as temporary liaison to ITU-R WP 10C.
Philip Chou and Ganesh Rajan were approved as liaisons to VRML.