Audio Coding
The companion Audio Coding standards had a much different evolution. Unlike Video, whose bitrate was and is so high and growing to justify new generations of compression standards, Audio Coding was driven to large extent by applications and functionality, with compression always playing a role. Thirty years of audio coding and counting talks about the first MP3, then of MPEG-4 AAC in all its shapes, and then of MPEG Surround, Spatial Audio Object Coding (SAOC), Unified Speech and Audio Coding (USAC), Dynamic Range Control (DRC), MPEG-H 3D Audio and the coming MPEG-I Immersive Audio. MPEG-7 Audio should not be forgotten even though compression is appl;ied to descriptors, not to audio itself. The birth of each of these Audio Coding standards is a history in itself, ranging from being a part of a bigger plan developed at MPEG Plenary level, to specific Audio standards agreed by the Audio group, to a specific functionality to be added to a standard either developed or being developed.
3D Graphics Coding
The 3DG group worked on 2D/3D graphic compression, and then on Animation Framework eXtension (AFX). In the case of 2D/3D graphic compression, the birth of the standards was the result of an MPEG plenary decision, but the standards kept on evolving by adding new technologies for new functionalities, most often at the instigation or individual experts and companies.
Talking of 3D Graphics Coding, I could quote Rudyard Kipling’s verse
Oh, East is East, and West is West, and never the twain shall meet
and associate West to Video Coding and East to 3D Graphics Coding (or vice versa). Indeed, it looked like Video and 3D Graphics Coding would comply with Kipling’s verse until Point Cloud Compression (PCC) came to the fore. Proposed by an individual expert and under exploration for quite some time, it suddenly became one of the sexiest MPEG developments merging, in an intricated way, Video and 3D Graphics.
We can indeed say with Kipling
But there is neither East nor West, Border, nor Breed, nor Birth,
When two strong men stand face to face, though they come from the ends of the earth!
Font Coding
Font Coding is again a new story. This time the standard was proposed by the 3 companies – Adobe, Apple and Microsoft – who had developed the OpenType specification. The reason was that it had become burdensome for them to maintain and expand the specification in response to market needs. The MPEG plenary accepted the request, took over the task and developed several parts of several standards in multiple editions. As participants in the Font Coding activity do not typically attend MPEG meetings, new functionalities are mostly added at the request of experts or companies on the email reflector of the Font Coding ad hoc group.
Digital Item Coding
The initiative to start MPEG-21 was taken by the MPEG plenary, but the need to develop the 22 parts of the standards were largely identified by the subgroups – Requirements, Multimedia Description Schemes (MDS) and Systems,
Sensors and Actuators Data Coding
The birth of MPEG-V was the decision of the MPEG plenary, but the parts of the standard kept on evolving at the instigation or individual experts and companies. Four editions of the standard were produced.
Genome Coding
Development of Part 1 Storage and Transport and Part 2 Compression of a Genome Coding standard was obviously a major decision of the MPEG plenary. The need for other parts of the MPEG-G standard, namely Part 3 Metadata and API, and Part 6 Compression of Annotations, was identified by the Requirements group working on the standard.
Neural Network Coding
Neural Network Coding was proposed at the October 2017 meeting. The MPEG plenary was in doubt whether to call this “compression of another data type” (neural networks) or something in line with its “media” mandate. Eventually it opted to call it “Compression of neural networks for multimedia content description and analysis”, which is partly what it really is, an extended new version of CDVS and CDVA with embedded compression of descriptors. Neural Network Compression (NNC) is now (October 2019) at Working Draft 2 and is planned to reach FDIS in April 2021. Experts are too busy working on the current scope to have the time to think of more features, but we know there will be more because the technologies in the current draft do not support all the requirements.
Media Description
As mentioned above, the MPEG plenary decided to develop MPEG-7 seeing the inability of SC 29 to jump on the opportunity of a standard that would describe media in a standard way to help user access the content of their interest. The need for the early parts of MPEG-7 were largely identified by the groups working on MPEG-7. The need for later standards, such as CDVS and CDVA, was identified by a company (CDVS) and by a consortium (CDVS). The need for the latest standard (Neural Network Compression) was identified by the MPEG plenary.
Media Composition
Until 1995 MPEG was really a group working mostly for Broadcasting and Consumer Electronics (but MP3 was a sign of things to come) and did not have the need for a standard Media Description. MPEG-4 was the standard that extended the MPEG domain to the IT world and the “Coding of audio-visual objects” title of MPEG-4 meant that a Media Description technology was needed.
The MPEG plenary took the decision to extend the Virtual Reality Mark-up Language (VRML) establishing contacts with that group. The MPEG plenary did the same when a company proposed a new W3C recommendation-based Media Description technology. A company proposed to develop the MPEG Orchestration standard. After much investigation and debate, the Requirements and Systems groups have recently come to the conclusion that the MPEG-I Scene Description should be based on an extension to Khronos’ glTF2.
Systems support
Systems support was the first non-video need after audio identified by the MPEG plenary a few months after the establishment of MPEG. Today Systems support standards are mostly, but not exclusively, in MPEG-B. This standard contains parts that were the result of a decision of the MPEG plenary (e.g. Binary MPEG format for XML and Common encryption), the proposal of the Systems group (e.g. Sample Variants and Partial File Format) or the proposal of a company (e.g. Green metadata).
Intellectual Property Management and Protection (IPMP)
IPMP parts appear in MPEG-2, MPEG-4 and MPEG-21. They were triggered by the same context that produced MPEG-21, i.e. how to combine the liquidity of digital content with the need to guarantee a return to rights holders.
The need for MPEG-4 IPMP was identified by the MPEG plenary, but MPEG-2 and MPEG 21 IPMP was proposed by the Systems and MDS groups, respectively.
Transport
Although partly already in MPEG-1, Transport technology in MPEG flourished in MPEG-2 with Transport Stream and Program Stream. The development of MPEG-2 Systems was a major MPEG plenary decision, as was the case for MPEG-4 System, that at the time included the Scene description technology. MPEG-2 TS has dominated the broadcasting market (and is even used by AOM). As said above, MPEG-H MPEG Media Transport (MMT) and DASH are two major transport technologies whose development was identified and decided by the MPEG plenary. All 3 standards have been published several times (MPEG-2 Systems 7 times) as a result of needs identified by the Systems group or individual companies.
Application Formats
The first Application Formats were launched by the MPEG Plenary and the following Application Formats by different MPEG groups. Later individual companies or consortia proposed several Application Formats. The Common Media Application Format (CMAF) proposed by Apple and Microsoft is one of the most successful MPEG standards.
Application Programming Interfaces (API)
MPEG Extensible Middleware (MXM) was the first MPEG API standard. The decision to do this was made by the MPEG plenary but the proposal was made by a consortium. The MPEG-G Metadata and API standard was proposed by the Requirements group. The IoMT API standard was proposed by the 3D Graphics group.
Media Systems
This area collects the parts of standards that describe or specify the architecture underpinning MPEG standards. This is the case of part 1 of MPEG-7, MPEG-E, MPEG-V, MPEG-M, MPEG-I and MPEG-IoMT. These parts are typically kicked off by the MPEG plenary.
Reference implementation and Conformance
MPEG takes seriously its statement that MPEG standards should be published in two languages – one that is understood by humans and another that is understood by a machine – and that the two should be equivalent in terms of specification. The reference software – the version of the specification understood by a machine – is used to test conformance of an implementation to the specification. For these reasons the need for reference software and conformance is identified by the MPEG plenary.
With the understanding that all decision are formally made by the MPEG plenary, the trigger of an MPEG decision happens at different levels. Very often – and more often as MPEG matures and its organisation more solid – the trigger is in the hand of an individual experts or group of experts, or of an MPEG subgroup.
Posts in this thread