Looking inside an MPEG meeting

Introduction

In There is more to say about MPEG standards I presented the entire spectrum of MPEG standards. No one should deny that it is an impressive set of disparate technologies integrated to cover  fields connected by the common thread of Data Compression: Coding of Video, Audio, 3D Graphics, Fonts, Digital Items, Sensors and Actuators Data, Genome, and Neural Networks; Media Description and Composition; Systems support; Intellectual Property Management and Protection (IPMP); Transport; Application Formats; API; and Media Systems.

How on earth can all these technologies be specified and integrated in MPEG standards to respond to industry needs?

This article will try and answer this question. It will do so by starting, as many novels do, from the end (of an MPEG meeting).

Let’s start from the end (of an MPEG meeting)

When an MPEG meeting closes, the plenary approves the results of the week, marking the end of formal collaborative work within the meeting. Back in 1990 MPEG developed a mechanism – called “ad hoc group” (AhG) – that would allow to continue a form of collaboration. This mechanism allows MPEG experts to continue working together, albeit with limitations:

  1. In the scope, i.e. an AhG may only work on the areas identified by the mandates (in Latin ad hoc means “for a specific purpose”). Of course experts are free to work individually on anything and in any way that please them;
  2. In the purpose, i.e. an AhG may only prepare recommendations – in the scope of its mandates – to be submitted to MPEG. This is done at the beginning of the following meeting, after which the AhG is disbanded;
  3. In the method of work, i.e. an AhG operates under the leadership of one or more Chairs. Clearly, though, the success of an AhG depends very much on the attitude and activity of its members.

On average some 25 AhGs are established at each meeting. There is not one-to-one correspondence between MPEG activities and AhGs. Actually AhGs are great opportunities to explore new and possibly cross-subgroup ideas.

Examples of AhG titles are

  1. Scene Description for MPEG-I
  2. System technologies for Point Cloud Coding (PCC)
  3. Network Based Media Processing (NBMP)
  4. Compression of Neural Networks (NNR).

What happens between MPEG meetings

An AhG uses different means to carry out collaborative work:  by using reflectors, by teleconferencing and by holding physical meetings. The last can only be held if they were scheduled in the AhG establishment form. Unscheduled physical meetings may only be held if there is unanimous agreement of those who subscribed to the AhG.

Most AhGs hold scheduled meetings on the weekend that precedes the next MPEG meeting. These are very useful to coordinate the results of the work done and to prepare the report that all AhGs must make to the MPEG plenary on the following Monday.

AhG meetings, including those in the weekend preceding the MPEG meeting, are not formally part of an MPEG meeting.

An MPEG meeting at a glance

Chairs meeting

MPEG chairs meet three times during an MPEG week:

  1. On Sunday evening to review the progress of AhG work, coordinate activities impacting more than one Subgroup and plan activities to be carried out during the week including the need for joint meetings;
  2. On Tuesday evening to assess the result of the first two days of work, review the work plan and time lines based on the expected outcomes and identify the need of new joint meetings;
  3. On Thursday evening to wrap up the expected results and review the preliminary results of the week.

Plenary meetings

During an MPEG week MPEG holds 3 plenaries

  1. On Monday morning: to make everybody aware of the results of work carried out since the last meeting and to plan work of the week. AHG reports are a main part of it as they are presented and, when necessary, discussed;
  2. On Wednesday morning to make everybody aware of the work done in all subgroups in the first two days and to plan work for the next two days;
  3. On Friday afternoon to approve the results of the work of Subgroups, including liaison letters, to establish new AhGs etc.

Subgroup, Breakout Group and Joint meetings

Subgroups start their meetings on Monday afternoon. They review their own activities and kick off work in their areas. Each subgroup assigns activities to breakout groups (BoG) who meet with their own schedules to achieve the goals assigned. Each Subgroup may hold other brief meetings to keep everybody in the Subgroup in sync with the general progress of the work.

For instance, the activities of the Systems Subgroups are currently: File format, DASH, OMAF, OMAF and DASH, OMAF and MIAF, MPEG Media Transport, Network Based Media Processing and PCC Systems.

The MPEG structure is designed to facilitate interactions between different Subgroups and BoGs from different Subgroups to discuss matters that affect different Subgroups and BoGs, because they are at the interface of MPEG subsystems, For example, the table below lists the joint meetings that the Systems Subgroup held with other Subgroups at the January 2019 meeting.

Table 1 – Joint meeting of Systems Subgroup with other Subgroups

Systems meeting with Topics
Reqs, Video, VCEG SEI messages in VVC
Audio, 3DG Scene Description
3DG Systems for Point Cloud Compression
3DG API for multiple decoders
Audio Uncompressed Audio
Reqs, JVET, VCEG Immersive Decoding Interface

NB: VCEG is the Video Coding Experts Group of ITU-T Study Group 16. It is not an MPEG Subgroup.

On Friday morning all Subgroups approve their own results. These are automatically integrated in the general document to be approved by the MPEG Plenary on Friday afternoon.

Advisors meeting

On Monday evening, an informal group of experts from different countries examines issues of general (non-technical) interest. In particular it calls for meeting hosts, reviews proposals of meeting hosts, makes recommendations of meeting hosts to the plenary etc.

A bird’s eye view of an MPEG meeting

Figure 1 depicts the workflow described in the paragraphs above, starting from the end of the N-1 th meeting to the end of the N-th meeting.

Figure 1 – A snapshot of MPEG works from the end of a meeting to the end of the next meeting

What is “done” at an MPEG meeting?

There are around 500 of the best worldwide experts attending an MPEG meeting. It is an incredible amount of brain power that is mobilised at an MPEG meeting. In the following I will try and explain how this brain power is directed.

An example – the video area

Let’s take as example the work done in the Video Coding area at the March 2019 meeting.

The table below has 3 columns:

  1. The standards on which work is done (Video has worked on MPEG-H, MPEG-I, MPEG-CICP, MPEG-5 and Explorations)
  2. The names of the activities and
  3. The types of documents resulting from the activities (see the following legend for an explanation of the acronyms).

Table 2 – Documents produced in the video coding area

Std Activity Document type
H High Efficiency (HEVC) TM, CE, CTC
I Versatile Video Coding (VVC) WD, TM, CE, CTC
3 Degrees of Freedom + (3DoF+) coding CfP, WD, TM, CE, CTC
CICP Usage of video signal type code points (Ed. 1) TR
Usage of video signal type code points (Ed. 2) WD
5 Essential Video Coding WD, TM, CE, CTC
Low Complexity Enhancement Video Coding CfP, WD, TM, CE, CTC
Expl 6 Degrees of Freedom (6DoF) coding EE, Tools
Coding of dense representation of light fields EE, CTC

Legend

TM: Test Model, software implementing the standard (encoder & decoder)

WD: Working Draft

CE: Core Experiment, i.e. definition of and experiment that should improve performance

CTC: Common Test Conditions, to be used by all CE participants

CfP: Call for Proposals (this time no new CfP produced, but reports and analyses of submissions in response to CfPs)

TR: Technical Report (ISO document)

EE: Exploration Experiment, an experiment to explore an issue because it si not mature enough to be a CE

Tools: other supporting material, e.g. software developed for common use in CEs/EEs

What is produced by an MPEG meeting

Figure 2 gives the number of activities for each type of activity defined in the legend (and others that were not part of the work in the video area). For instance, out of a total of 97 activities:

  1. 29 relate to processing of standards through the canonical stages of Committee Draft (CD), Draft International Standard (DIS) and Draft International Standard (FDIS) and the equivalent for Amendments, Technical Reports and Corrigenda. In other words, at every meeting MPEG is working on ~10 “deliverables” (i.e. standards, amendments, technical reports or corrigenda) in the approval stages;
  2. 22 relate to working drafts, i.e. “new” activities that have not entered the approval stages;
  3. 8 relate to Technologies under Consideration, i.e. new technologies that are being considered to enhance existing standards;
  4. 8 relate to requirements, typically for new standards;
  5. 6 relate to Core Experiments;
  6. Etc.

 

Figure 2 – Activities at an MPEG meeting

Figure 2 does not provide a quantitative measure of “how many” documents were produced for each activity or “how big” they were.  As an example, Point Cloud Compression has 20 Core Experiments and 8 Exploration Experiments under way, while MPEG-5 EVC has only one large CE.

An average value of activity at the March 2019 meeting is provided by dividing the number of output documents (212), by the number of activities (97), i.e. 2.2.

Conclusions

MPEG holds quarterly meetings with an attendance of ~500 experts. If we assume that the average salary of an MPEG expert is 500 $/working day and that every expert stays 6 days (to account for attendance at AhG meetings), the industry investment in attending MPEG meetings is 1.5 M$/meeting or 6 M$/year. Of course, the total investment is more than that and probably in excess of 1B$ a year.

With the meeting organisation described above MPEG tries to get the most out of the industry investment in MPEG standards.

Posts in this thread

MPEG and ISO

Introduction

The article MPEG: what it did, is doing, will do recounts my statistically not insignificant experience of asking taxi drivers across different cities of the world if they know MPEG. I do not have similar amount of data to report for ISO, but I am pretty sure that if I asked a taxi driver if they know ISO, the yes rate would be considerably lower than for MPEG.

This is not merit of MPEG or demerit of ISO as organisations. MPEG – Moving Pictures Experts Group – is lucky to deal with things that let people make content that other people can see and hear in ever new ways. ISO – International Organisation for Standardisation – is an organisation with the mission to develop international standards for anything that is not telecommunication – the purview of the International Telecommunication Union (ITU) – and electrotechnical – the purview of the International Electrotechnical Commission (IEC).

The ISO organisation

The above may seem rather abstract, so let’s see what the difference means in practice. ISO is a huge organisation structured in Technical Committees (TC). Actually, the structure is more complex than that (see Figure 1), but for the purpose of what I want to say, this is enough.

The first 3 – still active – TCs in ISO are: TC 1 Screw threads, TC 2 Fasteners and TC 4 Rolling bearings. The standards produced by these TCs are industrially very important, but the topics hardly make peoples’ hearts beat faster. The last 3 TCs in order of establishment are TC 322 Sustainable finance, TC 323 Circular economy and TC 324 Sharing economy. The standards produced by these TCs are important for the financial industries, but probably little known even in financial circles. Between these two extremes we have a large number of TCs, e.g., TC 35 Paints and varnishes, TC 186 Cutlery and table and decorative metal hollow-ware, TC 249 Traditional Chinese medicine, TC 282 Water reuse, TC 297 Waste collection and transportation management, etc.

ISO TCs work on areas of human endeavour that are extremely important to industrial and social life. Many of these activities, however, do not say much to man in the street.

Where is MPEG in this picture? To answer this question I need to dig deeper in the ISO organisation. Most TCs do not have a monolithic structure. They are organised in working groups (WG). TCs retain key functions such as strategy and management, and WGs are tasked to develop standards. In quite a few cases the area of responsibility is so broad that a horizontal organisation would not be functional. In this case a TC may decide to establish Subcommittees (SC). They are like mini TCs where WGs developstandards under them.`

Figure 1 – ISO governance structure

In 1987 ISO/TC 97 Data Processing merged with IEC/TC 83 Information technology equipment. The resulting (joint) technical committee was and is called ISO/IEC JTC 1 Information Technology. One JTC 1 SC, SC 2 Character sets and Information Coding of JTC 1, included WG 8 Coding of Audio and Picture Information. WG 8 established the Moving Picture Experts Group (MPEG) in January 1988. In 1991 when SC 2/WG 8 seceded from SC 2 and became SC 29, MPEG became WG 11  Coding of audio, picture, multimedia and hypermedia information (but everybody calls it MPEG).

MPEG changed the world of media

Those who have survived the description of the ISO organigram will now have the opportunity to understand how this group of experts, in the depths of the ISO (and IEC) organisation changed the world of media and impacted the lives of billions of people, probably all of those on the face of the Earth, if we exclude some hermits in the few surviving tropical forests, the many deserts or the frozen lands.

The main reason of the success of MPEG is that for 30 years it had carte blanche to implement its ideas. Some of them were clear at the outset, others took shape from a process of learning on the job.

Let’s revisit MPEG’s ideas of standardisation to understand what it did and why.

Idea #1 – Single standards for all countries and industries

The first idea relates to the scope of MPEG standards. In the analogue world absence or scarce availability of broadband communication or deliberate policies or the natural separation between industries that traditionally had little in common, favoured the definition of country-based or industry-based standards. The first steps toward digital video undertaken by countries and industries trod similar paths: different countries and industries tried their own way independently.

MPEG jumped in the scene at a time the different trials had not had the time to solidify, and the epochal analogue-to-digital transition gave MPEG a unique opportunity to effect its disruptive action.

MPEG knew that it was technically possible to develop generic standards that could be used in all countries of the world and in all industries that needed compressed digital media. MPEG saw that all actors affected – manufacturers, service providers and end users – would gain if such a bold action was taken. When MPEG began to tread its adventurous path, MPEG did not know  whether it was procedurally possible to achieve that goal. But it gambled and gave it a try. It used the Requirements subgroup to develop generic requirements, acted on the major countries and trade/standards associations of the main industries and magically got their agreement.

The network of liaisons and, sometimes, joint activities is the asset that allowed MPEG to implement idea #1 and helped achieve many of the subsequent goals.

Idea #2 – Standards for the market, not the other way around

Standards are ethereal entities, but their impact is very concrete. This was true and well understood in the world of analogue media. At that time a company that had developed a successful product would try to get a “standard” stamp on it, share the technology with its competitors and enjoy the economic benefits of their “standard” technology.

With its second idea MPEG reshuffled the existing order of steps. Instead of waiting for the market to decide which technology would win – an outcome that very often had little to do with the value of the technology – MPEG offered its standard development process where the collaboratively defined “best” is developed and assessed by MPEG experts who decide which individual technology wins. Then the “standard” technology package developed by MPEG is taken over by the market.

MPEG standards are consistently the best standards at a given time. Those who have technologies selected to be part of MPEG standards reap the benefits and most likely will continue investing in new technologies for future standards.

Idea #3 – Standards anticipate the future

The third idea is a consequence of the first two. MPEG-1 was driven by the expected possibilities of the audio and video compression technologies of the time. It was a bet on silicon making it possible to execute the complex operations implied by the standard so that industry could build products of which there was no evidence but only educated guesses: interactive video on CD and digital audio broadcasting. Ironically, neither really took off, but other products that relied on the MPEG-1 technologies – Video CD and MP3 – were (the former) and still are (the latter) extremely successful.

MPEG standards anticipate market needs. They are regularly bets that a certain standard technology will be adopted. In  More standards – more successes – more failures you can see how some MPEG standards are extremely successful and other less so.

Idea #4 – Industry-friendly standards

The fourth idea was simple and disruptive. Since its first instances in the 1920s, industry and governments have created tens of television formats, mostly around the basic NTSC, PAL and SECAM families. Even in the late 1960’s, when the Picturephone was developed, AT&T invented a new 267-line format, with no obvious connection with any of the existing video formats.

MPEG never wanted to define its own format. With its fourth idea, propped up by the nature of digital technologies, it just decided that it would support any  format. Here is how it did it:

  1. One standard with not options (this should be obvious, because it is what a standard should be about)
  2. Standards apply only to decoders; encoders are implicitly defined and have ample margins of implementation freedom
  3. Profiles (hierarchical, if possible) to accommodate special industry needs within the same standard
  4. Decoders are defined by their ability to process data, quantised in levels (based on bitrate, resolution etc.)
  5. How different formats are handled is outside of MPEG standards.

Idea #5 – Audio and video come together

The fifth idea was kind of obvious but no less disruptive. Because of the way audio and video industries had developed – audio for a century and video for half a century – people working on the corresponding technologies tended to operate in “watertight compartments”, be they in academia, research or companies. That attitude had some justification in the analogue world because the relevant technologies were indeed different and there was not so much added value in keeping the technologies together, considering the big effort needed to keep the experts together.

However, the digital world with its commonality of technologies, no longer justified keeping the two domains separate. That is why MPEG, just 6 months after its first meeting, kicked off the Audio subgroup after successfully assembling in a few months the best experts.

This injection of new technology with the experts that carried it was not effortless. When transformed into digital, audio and video signals are bits and bits and bits, but the sources are different and influence how they are compressed. Audio experts shared some (at a high level) compression technologies – Subband and Discrete Cosine Transform – but video is (was) a 2D signal changing in time often with “objects” in it, while audio is (was) a 1D signal. More importantly, audio experts were driven by other concerns such as the way the human hearing process handles the data coming out of the frequency analysis carried out by the human cochlea.

The audio work was never “dependent” on the video work. MPEG audio standards can have a stand-alone use (i.e. they do not assume that there is a video associated with it), but there is no MPEG video standard that is without an MPEG Audio standard. So it was necessary to keep the two together and it is even more important to do so now when both video and audio are both 3D signals changing in time.

Idea #6 – Don’t forget the glue that keeps audio and video together

The sixth idea can be described by the formula

Audio and Video ≠ (Audio + Video)

This may look cryptic but it states the obvious. Having audio and video together does not necessarily mean that audio and video will play together in the right way if they are stored on a disk or transmitted over a channel.

The fact that MPEG established a Digital Storage Media subgroup and a Systems subgroups 18 months after its foundation signals that MPEG has always been keenly aware of the issue that a bitstream composed by MPEG audio and video bitstreams need to be transported to be played back as intended by the bitstream creator. In MPEG-1 it was a bitstream in a controlled environment, in MPEG-2 it was a bitstream in a noisy environment, from MPEG-4 on it was on IP, in MPEG-DASH it had to deal with unpredictability of the Internet Protocol in the real world.

During its existence the issue of multiplexing and transport formats have shaped MPEG standards. Without a Systems subgroup, efficiently compressed audio and video bitstreams would have remained floating in the space without a standard means to plug them into real systems.

Idea #7 – Integrated standards as toolkits

Most MPEG standards are composed of the 3 key elements – audio, video and systems – that make an audio-visual system and some, such as MPEG-4 and MPEG-I, even include 3D Graphic information. These standards are integrated in the sense that, if you need a complete solution, you can get what you need from the package offered by MPEG.

The world is more complicated than that. Some users want to cherry pick technologies. In the case of MPEG-I, most likely MPEG will not standardise a Scene Description technology but will just indicate how externally defined technologies can be plugged into the syste.

With its seventh idea MPEG is ready to satisfy the needs of all customers. It defines the means to signal how an external technology can be plugged into a set of other native MPEG technologies. With one caveat: customer has to take care of the integration of the external technology. That MPEG will not do.

Idea #8 – Technology is always on the move

To describe the eight idea, I will seek help from the Greek philosopher Heraclitus (or whoever was the person who said it): τὰ πάντα ῥεῖ καὶ οὐδὲν μένει (everything flows and nothing stays). Digital technologies move fast and actually accelerate. By applying idea #3, #4, #5, #6 and #7, MPEG standards accelerated the orderly transition of analogue to digital media. By applying ideas #1 and #2, MPEG standards prompted technology convergence with its merging of industry segments and appearance of new players.

The seventh idea reminds MPEG that the technology landscape is constantly changing and this awareness must inform its standards. Until HEVC – one can even say, including the upcoming Versatile Video Coding (VVC) – video meant coding a 2D rectangular area (in MPEG-4, a flat area of any shape). The birth of immersive visual experiences is not without pain, but they are becoming possible and MPEG must be ready with solutions that take this basic assumption into account. This means that, in the technology scenario that is shaping up, the MPEG role of “anticipatory standards” is ever more important and ever more challenging to achieve.

Idea #9 – The nature and borders of compression

The ninth idea goes down to the very nature of compression. What is the meaning of compression? Is it “less bits is always good” or can it also be “as few meaningful bits as possible is also good”? The former is certainly desirable but, as the nature of information consumption changes and compression digs deeper in the nature of information, compressed representations that offer easier access to the information embedded in the data becomes more valuable.

What is the scope of application of MPEG compression? When MPEG started the MPEG-1 standards work, the gap that separated the telecom from the CE industries (the first two industries in attendance at that time) were as wide as the media industry and, say, the genomic industries today. Both are digital now and the dialogue gets easier.

With patience and determination MPEG has succeeded in creating a common language and mind set in the media industries. This is an important foundation of MPEG standards, The same amalgamation will continue between MPEG and other industries.

Now the results

Figure 2 intends to attach some concreteness to the nine ideas illustrated above by showing some of the most successful MPEG standards issued from 31 years of MPEG activity.

 

Figure #2 – Some successful MPEG standards

Conclusion

An entity at the lowest layer of the ISO hierarchy has masterminded the transition of media from the analogue to the digital world. Its standards underpin the evolution of digital media, foster the creation of new industries and offer unrelenting growth to old and new industries worth in excess of 1 trillion USD per year.

Many thanks to the parent body SC 29 for managing the balloting of MPEG standards.

Posts in this thread

Data compression in MPEG

That video is a high profile topic to people interested in MPEG is obvious – MP stands for Moving Pictures – and is shown by the most visited article in this blog Forty years of video coding and counting. Audio is also a high profile topic, so it should not be a surprise given that the official MPEG title is “Coding of Moving Pictures and Audio” and is confirmed by the fact that Thirty years of audio coding and counting has received almost the same amount of visits as the previous one.

What is less known, but potentially very important, is the fact that MPEG has already developed a few standards for compression of a wide range of other data types. Point Cloud is the data type that is acquiring a higher profile by the day is, but there are many more types, as represented by the table below.

Figure 1 – Data types and relevant MPEG standards

 

Video

The articles Forty years of video coding and counting and More video with more features provide a detailed history of video compression in MPEG from two different perspectives. Here I will briefly list the video-coding related standards produced or being produced by MPEG mentioned in the table.

  • MPEG-1 and MPEG-2 both produced widely used video coding standards.
  • MPEG-4 has been much more prolific.
    • It started with Part 2 Visual
    • It continued with Part 9 Reference Hardware Description, a standard that supports a reference hardware description of the standard expressed in VHDL (VLSI Hardware Description Language), a hardware description language used in electronic design automation.
    • Part 10 is the still high-riding Advanced Video Coding standard.
    • Part 29, 31 and 33 are the result of three attempts at developing Option 1 video compression standards (in a simple but imprecise way, standards that do not require payment of royalties).
  • MPEG-5 is currently expected to be a standard with 2 parts:
    • Part 1 Essential Video Coding will have a base layer/profile which is expected to be Option 1 and a second layer/profile with a performance ~25% better than HEVC. Licensing terms are expected to be published by patent holders within 2 years.
    • Part 2 Low Complexity Enhancement Video Coding (LCEVC) will be a two-layer video coding standard. The lower layer is not tied to any specific technology and can be any video codec; the higher layer is used to extend the capability of an existing video codec.
  • MPEG-7 is about Multimedia Content Description. There are different tools to describe visual information:
    • Part 3 Visual is a form of compression as it provides tools to describe Color, Texture, Shape, Motion, Localisation, Face Identity, Image signature and Video signature.
    • Part 13 Compact Descriptors for Visual Search can be used to compute compressed visual descriptors of an image. An application is to get further information about an image captured e.g. with a mobile phone.
    • Part 15 Compact Descriptors for Video Analysis allows to manage and organise large scale data bases of video content, e.g. to find content containing a specific object instance or location.
  • MPEG-C is a collection of video technology standard that do not fit with other standards. Part 4 – Media Tool Library is a collection of video coding tools (called Functional Units) that can be assembled using the technology standardised in MPEG-B Part 4 Codec Configuration Representation.
  • MPEG-H part 2 High Efficiency Video Coding is the latest MPEG video coding standard with an improved compression of 60% compared to AVC.
  • MPEG-I is the new standard, mostly under development, for immersive technologies
    • Part 3 Versatile Video Coding is the ongoing project to develop a video compression standard with an expected 50% more compression than HEVC.
    • MPEG-I part 7 Immersive Media Metadata is the current project to develop a standard for compressed Omnidirectional Video that allows limited translational movements of the head.
    • Exploration in 6 Degrees of Freedom (6DoF) and Lightfield are ongoing.

Audio

The article Thirty years of audio coding and counting provides a detailed history of audio compression in MPEG. Here I will briefly list the audio-coding related standards produced or being produced by MPEG mentioned in the table.

  • MPEG-1 part 3 Audio produced, among others, the foundational digital audio standard better known as MP3.
  • MPEG-2
    • Part 3 Audio extended the stereo user experience of MPEG-1 to Multichannel.
    • Part 7 Advanced Audio Coding is the foundational standard on which MPEG-4 AAC is based.
  • MPEG-4 part 3 Advanced Audio Coding (AAC) currently supports some 10 billion devices and software applications growing by half a billion unit every year.
  • MPEG-D is a collection of different audio technologies:
    • Part 1 MPEG Surround provides an efficient bridge between stereo and multi-channel presentations in low-bitrate applications as it can transmit 5.1 channel audio within the same 48 kbit/s transmission budget.
    • Part 2 Spatial Audio Object Coding (SAOC) allows very efficient coding of a multi-channel signal that is a mix of objects (e.g. individual musical instruments).
    • Part 3 Unified Speech and Audio Coding (USAC) combines the tools for speech coding and audio coding into one algorithm with a performance that is equal or better than AAC at all bit rates. USAC can code multichannel audio signals, and can also optimally encode speech content.
    • Part 4 Dynamic Range Control is a post-processor for any type of MPEG audio coding technology. It can modify the dynamic range of the decoded signal as it is being played.

2D/3D Meshes

Polygons meshes can be used to represent the approximate shape of a 2D image or a 3D object. 3D mesh models are used in various multimedia applications such as computer game, animation, and simulation applications. MPEG-4 provides various compression technologies

  • Part 2 Visual provides a standard for 2D and 3D Mesh Compression (3DMC) of generic, but static, 3D objects represented by first-order (i.e., polygonal) approximations of their surfaces. 3DMC has the following characteristics:
    • Compression: Near-lossless to lossy compression of 3D models
    • Incremental rendering: No need to wait for the entire file to download to start rendering
    • Error resilience: 3DMC has a built-in error-resilience capability
    • Progressive transmission: Depending on the viewing distance, a reduced accuracy may be sufficient
  • Part 16 Animation Framework eXtension (AFX) provides a set of compression tools for Shape, Appearance and Animation.

Face/Body Animation

Imagine you have a face model that you want to animate from remote. How do you represent the information that animates the model in a bit-thrifty way? MPEG-4 Part 2 Visual has an answer to this question with its Facial Animation Parameters (FAP). FAPs are defined at two levels.

  • High level
    • Viseme (visual equivalent of phoneme)
    • Expression (joy, anger, fear, disgust, sadness, surprise)
  • Low level: 66 FAPs associated with the displacement or rotation of the facial feature points.

In the figure feature points affected by FAPs are indicated as a black dot. Other feature point are indicated as a small circle.

Figure 2 – Facial Animation Parameters

It is possible to animate a default face model in the receiver with a stream of FAPs or a custom face can be initialised by downloading Face Definition Parameters (FDP)  with specific background images, facial textures and head geometry.

MPEG-4 Part 2 uses a similar approach for Body Animation.

Scene Graphs

So far MPEG has never developed a Scene Description technology. In 1996, when the development of the MPEG-4 standard required it, it took the Virtual Reality Modelling Language (VRML) and extended it to support MPEG-specific functionalities. Of course compression could not be absent from the list. So the Binary Format for Scenes (BiFS), specified in MPEG-4 Part 11 Scene description and application engine was born to allow for efficient representation of dynamic and interactive presentations, comprising 2D & 3D graphics, images, text and audiovisual material. The representation of such a presentation includes the description of the spatial and temporal organisation of the different scene components as well as user-interaction and animations.

In MPEG-I scene description is playing again an important role. However, MPEG this time does not even intend to pick a scene description technology. It will define instead some interface to a scene description parameters.

Font

Many thousands of fonts are available today for use as components of multimedia content. They often utilise custom design fonts that may not be available on a remote terminal. In order to insure faithful appearance and layout of content, the font data have to be embedded with the text objects as part of the multimedia presentation.

MPEG-4 part 18 Font Compression and Streaming defines and provides two main technologies:

  • OpenType and TrueType font formats
  • Font data transport mechanism – the extensible font stream format, signaling and identification

Multimedia

Multimedia is a combination of multiple media in some form. Probably the closest multimedia “thing” in MPEG is the standard called Multimedia Application Formats. However, MPEG-A is an integrated package of media for specific applications and does not does define any specific media format. It only specifies how you can combine MPEG (and sometimes other) formats.

MPEG-7 part 5 Multimedia Description Schemes (MDS) specifies the different description tools that are not visual and audio, i.e. generic and multimedia. By comprising a large number of MPEG-7 description tools from the basic audio and visual structures MDS enables the creation of the structure of the description, the description of collections and user preferences, and the hooks for adding the audio and visual description tools. This is depicted in Figure 3.

Figure 3 – The different functional groups of MDS description tools

Neural Networks

Requirements for neural network compression have been exposes in Moving intelligence around. After 18 months of intense preparation with development of requirements, identification of test material, definition of test methodology and drafting of a Call for Proposals(CfP), at the March 2019 (126th) meeting , MPEG analysed nine technologies submitted by industry leaders. The technologies proposed compress neural network parameters to reduce their size for transmission, while not or only moderately reducing their performance in specific multimedia applications. MPEG-7 Part 17 Neural Network Compression for Multimedia Description and Analysis is the standard, the part and the title given to the new standard.

XML

MPEG-B part 1 Binary MPEG Format for XML (BiM) is the current endpoint of an activity that started some 20 years ago when MPEG-7 Descriptors defined by XML schemas were compressed in a standard fashion by MPEG-7 Part 1 Systems. Subsequently MPEG-21 needed XML compression and the technology was extended in Part 15 Binary Format.

In order to reach high compression efficiency BiM relies on schema knowledge between encoder and decoder. It also provides fragmentation mechanisms to provide transmission and processing flexibility, and defines means to compile and transmit schema knowledge information to enable decompression of XML documents without a priori schema knowledge at the receiving end.

Genome

Genome is digital, and can be compressed presents the technology used in MPEG-G Genomic Information Representation. Many established compression technologies developed for compression of other MPEG media have found good use in genome compression. MPEG is currently busy developing the MPEG-G reference software and is investigating other genomic areas where compression is needed. More concretely MPEG plans to issue a Call for Proposal for Compression of Genome Annotation at its July 2019 (128th) meeting.

Point Clouds

3D point clouds can be captured with multiple cameras and depth sensors with points that can number a few thousands up to a few billions, and with attributes such as colour, material properties etc.

MPEG is developing two different standards whose choice depends on whether the point cloud is dense (this is done in MPEG-I Part 5 Video-based Point Cloud Compression) or less so (MPEG-I Part 9 Graphic-based PCC). The algorithms in both standards are lossy, scalable, progressive and support random access to subsets of the point cloud.

MPEG plans to release Video-based PCC as FDIS in October 2019 and Graphic-based PCC Point Cloud Compression as FDIS in April 2020.

Sensors/Actuators

MPEG felt the need to address compression for data from sensor and data to actuator when it considered the exchange of information taking place between the physical world where the user is located and any sort of virtual world generated by MPEG media.

So MPEG undertook the task to provide standard interactivity technologies that allow a user to

  • Map their real-world sensor and actuator context to a virtual-world sensor and actuator context, and vice-versa, and
  • Achieve communication between virtual worlds.

Figure 3 describes the context of the MPEG-V Media context and control standard.

Figure 3 – Communication between real and virtual worlds

The MPEG-V standards defines several data types and their compression

  • Part 2 – Control information specifies control devices interoperability (actuators and sensors) in real and virtual worlds
  • Part 3 – Sensory information specifies the XML Schema-based Sensory Effect Description Language to describe actuator commands such as light, wind, fog, vibration, etc. that trigger human senses
  • Part 4 – Virtual world object characteristics defines a base type of attributes and characteristics of the virtual world objects shared by avatars and generic virtual objects
  • Part 5 – Data formats for interaction devices specifies syntax and semantics of data formats for interaction devices – Actuator Commands and Sensed Information – required to achieve interoperability in controlling interaction devices (actuators) and in sensing information from interaction devices (sensors) in real and virtual worlds
  • Part 6 – Common types and tools specifies syntax and semantics of data types and tools used across MPEG-V parts.

MPEG-IoMT Internet of Media Things is the mapping of the general IoT context to MPEG media developed by MPEG. MPEG-IoMT Part 3 – IoMT Media Data Formats and API also addresses the issue of media-based sensors and actuators data compression.

What is next in data compression?

In Compression standards for the data industries I reported the proposal made by the Italian ISO member body to establish a Technical Committee on Data Compression Technologies. The proposal was rejected on the ground that Data Compression is part of Information Technology.

It was a big mistake because it has stopped the coordinated development of standards that would have fostered the move of different industries to the digital world. The article identified a few such as Automotive, Industry Automation, Geographic information and more.

MPEG has done some exploratory work and found that there quite a few of its existing standards could be extended to serve new application areas. One example is the conversion of MPEG-21 Contracts to Smart Contracts. An area of potential interest is data generated by machine tools in industry automation.

Conclusions

MPEG audio and video compression standards are the staples of the media industry. MPEG continues to develop those standards while investigating compression of other data types in order to be ready with standards when the market matures. Point clouds and DNA reads from high speed sequencing machines are just two examples of how, by anticipating market needs, MPEG prepares to serve timely the industry with its compression standards.

Posts in this thread