Is there a logic in MPEG standards?

So far MPEG has developed, is completing or is planning to develop 22 standards for a total of 201 specifications. For those not in MPEG, and even for some active in MPEG, there is natural question: what is the purpose of all these standards? Assuming that the answer to this question is given, a second one pops up: is there a logic in all these MPEG standards?

Depending on the amount of understanding of the MPEG phenomenon, you can receive different answers ranging from

“There is no logic. MPEG started its first standard with a vision of giving the telco and CE industries a single format. Later it exploited the opportunities that that its growing expertise allowed.”

to

“There is a logic. The driver of MPEG work was to extend its vision to more industries leveraging its assets while covering more functionalities.”

I will leave it to the reader to decide where to place their decision on this continuum of possibilities after reading this article that will only deal with the first 5 standards.

MPEG-1

The goal of MPEG-1 was to leverage the manufacturing power of the Consumer Electronics (CE) industry to develop the basic audio and video compression technology for an application that was considered particularly attractive when MPEG was established (1988), namely interactive audio and video on CD-ROM. This was the logic of the telco industry who thought that their future would be “real time audio-visual communication” but did not have a friendly industry to ask to develop the terminal equipment.

The bitrate of 1.5 Mbit/s mentioned in the official title of MPEG-1 Coding of moving pictures and associated audio at up to about 1,5 Mbit/s was an excellent common point for the telecom industry with their ADSL technology whose first generation targeted that bitrate and for the CE industry whose Compact Disc had a throughput of 1.44 Mbit/s (1.2 for the CD-ROM). With that bitrate, compression technology of the late 1980’s could only deal with a rather low, but still acceptable resolution (1/2 the horizontal and 1/2 the vertical resolution obtained by subsampling every other field, so that the input video is progressive), Considering that audio had to be impeccable (that is what humans want), at least 200 kbit/s had to be assigned to audio.

The figure below depicts the model of an MPEG-1 decoder

 

Figure 1 – Model of the MPEG-1 standard

The structure adopted for MPEG-1 set the pattern for most MPEG standards:

  1. Part 1 – Systems specifies how to combine one or more audio and video data streams with timing information to form a single stream (link)
  2. Part 2 – Video specifies the video coding algorithm applied to so-called SIF video of ¼ the standard definition TV (link)
  3. Part 3 – Audio specifies the audio compression. Audio is stereo and can be compressed with 3 different perfomance “layers”: layer 1 is for an entry level digital audio, layer 2 for digital broadcasting and layer 3, aka MP3, for digital music. The MPEG-1 Audio layers were the predecessors of MPEG-2 profiles (and of most subsequent MPEG standards) (link)
  4. Part 4 – Compliance testing (link)
  5. Part 5 – Software simulation (link).

 MPEG-2

MPEG-2 was a more complex beast to deal with. A digitised TV channel can yield 20-24 Mbit/s, depending on the delivery system (terrestrial/satellite broadcasting or cable TV). Digital stereo audio can take 0.2 Mbit/s and standard resolution 4 Mbit/s (say a little less with more compression). Audio could be multichannel (say, 5.1) and hopefully consume less bitrate for a total bitrate of a TV program of 4 Mbit/s. Hence the bandwidth taken by an analogue TV program can be used for 5-6 digital TV programs.

The fact that digital TV programs part of a multiplex may come from independent sources and that digital channels in the real world are subject to errors force the design of an entirely different Systems layer for MPEG-2. The fact that users need to access other data sent in a carousel, that in an interactive scenario (with a return channel) there is a need for session management and that a user may interact with a server forced MPEG to add a new stream for user-to-network and user-to-user protocols.

In conclusion the MPEG-2 model is a natural extension of the MPEG-1 model (superficially, the DSM-CC line, but the impact is more pervasive).

Figure 2 – Model of the MPEG-2 standard

The official title of MPEG-2 is Generic coding of moving pictures and associated audio information. It was originally intended for coding of standard definition television (MPEG-3 was expected to deal with coding of High Definition Television). As the work progressed, however, it became clear that a single format for both standard and high definition was not only desirable but possible. Therefore the MPEG-3 project never took off.

The standard is not specific of a video resolution (this was already the case for MPEG-1 Video) but rationalises the notion of profiles, i.e. assemblies of coding tools and levels a notion that applies to, say, resolution, bitrate etc. Profiles and levels have subsequently adopted in most MPEG standardisation areas.

The standard is composed of 10 parts, some of which are

  1. Part 1 – Systems specifies the Systems layer to enable the transport of a multichannel digital TV stream on a variety of delivery media (link)
  2. Part 2 – Video specifies the video coding algorithm. Video is interlaced and may have a wide range of resolutions with support to scalability and multiview in appropriate profiles (link)
  3. Part 3 – Audio specifies a MPEG-1 Audio backward-compatible multichannel audio coding algorithm. This means that an MPEG-1 Audio decoder is capable of extracting and decoding an MPEG-1 Audio bitstream (link)
  4. Part 6 – Extensions for DSM-CC specifies User-to-User and User-to-Network protocols for both broadcasting and interactive applications. For instance DSM-CC can be used to enable such functionalities as carousel or session set up (link)
  5. Part 7 – Advanced Audio Coding (AAC) specifies a non backward compatible multichannel audio coding algorithm. This was done because backward compatibility imposes too big a penalty for some applications, e.g. those that do not need backward compatibility (link), the first time MPEG was forced to develop two standards for apparently the same applications.

MPEG-4

MPEG-4 had the ambition of bringing interactive 3D spaces to every home. Media objects such as audio, video, 2D graphics were an enticing notion in the mid-1990’s. The WWW had shown that it was possible to implement interactivity inexpensively and the extension to media interactivity looked like it would be the next step. Hence the official title of MPEG-4 Coding of audio-visual objects.

This vision did not become true and one could say that even today it is not entirely clear what is interactivity and what is the interactive media experience a user is seeking, assuming that just one exists.

Is this a signal that MPEG-4 was a failure?

  • Yes, it was a failure, and so what? MPEG operates like a company. Its “audio-visual objects” product looked like a great idea, but the market thought differently.
  • No, it was a success, because 6 years after MPEG-2, MPEG-4 Visual yielded some 30% improvement in terms of compression.
  • Yes, it was a failure because a patent pool dealt a fatal blow with their “content fee” (i.e. “you pay royalties by the amount of time you stream”).
  • No it was a success because MPEG-4 has 34 parts, the largest number ever achieved by MPEG in a standard, that include some of the most foundational and successful standards such as the AAC audio coding format, the MP4 File Format, the Open Font Format and, of course the still ubiquitous Advanced Video Coding AVC video coding format whose success was not dictated so much by the 20% more compression that it delivers compared to MPEG-4 Visual (always nice to have), but to the industry-friendly licence released by a patent pool. Most important, the development of most MPEG standards is driven by a vision. Therefore, users have available a packaged solution, but they can also take the pieces that they need.

Figure 3 – Model of the MPEG-4 standard

An overview of the entire MPEG-4 standard is available here. The standard is composed of 34 parts, some of which are

  1. Part 1 – Systems specifies the means to interactively and synchronously represent and deliver audio-visual content composed of various objects (link)
  2. Part 2 – Visual specifies the coded representation of visual information in the form of natural objects (video sequences of rectangular or arbitrarily shaped pictures) and synthetic visual objects (moving 2D meshes, animated 3D face and body models, and texture) (link).
  3. Part 3 – Audio specifies a multi-channel perceptual audio coder with transparent quality compression of Compact Disc music coded at 128 kb/s that made it the standard of choice for many streaming and downloading applications (link)
  4. Part 6 – Delivery Multimedia Integration Framework (DMIF) specifies interfaces to virtualise the network
  5. Part 9 – Reference hardware description specifies the VHDL representation of MPEG-4 Visual (link)
  6. Part 10 – Advanced Video Coding adds another 20% of performance to part 2 (link)
  7. Part 11 – Scene description and application engine provides a time dependent interactive 3D environment building on VRML (link)
  8. Part 12 – ISO base media file format specifies a file format that has been enriched with many functionalities over the years to satisfy the needs of the multiple MPEG client industries (link)
  9. Part 16 – Animation Framework eXtension (AFX) specifies a range of 3D Graphics technologies, including 3D mesh compression (link)
  10. Part 22 – Open Font Format (OFF) is the result of the MPEG effort that took over an industry initiative (OpenType font format specification), brought it under the folds of international standardisation and expanded/maintained it in response to evolving industry needs (link)
  11. Part 29 – Web video coding (WebVC) specifies the Constrained Baseline Profile of AVC in a separate document
  12. Part 30 – Timed text and other visual overlays in ISO base media file format supports applications that need to overlay other media to video (link)
  13. Part 31 – Video coding for browsers (VCB) specifies a video compression format (unpublished)
  14. Part 33 – Internet Video Coding (IVC) specifies a video compression format (link).

Parts 29, 31 and 33 are the results of 3 attempts made by MPEG to develop Option 1 Video Coding standards with a good performance. All did not reach the goal because ISO rules allow a company to make a patent declaration without specifying which is the patented technology that the declaring company alleges to be affected by a standard. The patented technologies could not be removed because MPEG did not have a clue about which were the allegedly infringing technologies.

MPEG-7

In the late 1990’s the industry had been captured by the vision of “500 hundred channels” and telcos thought they could offer interactive media services. With the then being deployed MPEG-1 and MPEG-2, and with MPEG-4 under development,  MPEG expected that users would have zillions of media items.

MPEG-7 started with the idea of providing a standard that would enable users to find the media content of their interest in a sea of media content. Definitely MPEG-7 deviates from the logic of the previous two standards and the technologies used reflect that because it provides formats for data (called metadata) extracted from multimedia content to facilitate searching in multimedia items. As shown in the figure, metadata can be classified as Descriptions (metadata extracted from the media items, especially audio and video) and Description Schemes (compositions of descriptions). The figure also shows two additional key MPEG-7 technologies. The first is the Description Definition Language (DDL) used to define new Descriptors and the second id XML Compression. With Descriptions and Description Schemes represented in verbose XML, it is clear that MPEG needed a technology to effectively compress XML.

 

Figure 4 –Components of the MPEG-7 standard

 An overview of the entire MPEG-7 standard is available here. The official title of MPEG-7 is Multimedia content description interface and the standard is composed of 16 parts, some of which are:

  1. Part 1 – Systems has similar functions as the parts 1 of previous standards. In addition, it specifies a compression method for XML schemas used to represent MPEG-7 Descriptions and Description Schemes.
  2. Part 2 – Description definition language breaks the Systems-Video-Audio traditional sequences of previous standards to provide a language to describe descriptions (link)
  3. Part 3 – Visual specifies a wide variety of visual descriptors such as colour, texture, shape, motion etc. (link)
  4. Part 4 – Audio specifies a wide variety of audio descriptors such as signature, instrument timber, melody description, spoken content description etc. (link)
  5. Part 5 – Multimedia description schemes specifies description tools that are not visual and audio ones, i.e., generic and multimedia description tools such as description of the content structural aspects (link)
  6. Part 8 – Extraction and use of MPEG-7 descriptions explains how MPEG-7 descriptions can be practically extracted and used
  7. Part 12 – Query format defines format to query multimedia repositories (link)
  8. Part 13 – Compact descriptors for visual search specifies a format that can be used to search images (link)
  9. Part 15 – Compact descriptors for video analysis specifies a format that can be used to analyse video clips (link).

 MPEG-21

In the year 1999 MPEG understood that its technologies were having a disruptive impact on the media business. MPEG thought that the industry should not fend of a new threat with old repressive tools. The industry should convert the threat into an opportunity, but there were no standard tools to do that.

MPEG-21 is the standard resulting from the effort by MPEG to create a framework that would facilitate electronic commerce of digital media. It is a suite of specifications for end-to-end multimedia creation, delivery and consumption that can be used to enable open media markets.

This is represented in the figure below. The basic MPEG-21 element is the Digital Item, a structured digital object with a standard representation, identification and metadata, around which a number of specifications were developed. MPEG-21 also includes specifications of Rights and Contracts and basic technologies such as the file format.

Figure 5 –Components of the MPEG-21 standard

An overview of the entire MPEG-21 standard, whose official title of MPEG-21 is Multimedia Framework, is available here. Some of the 21 MPEG-21 parts are briefly described below:

  1. Part 2 – Digital Item Declaration specifies Digital Item (link)
  2. Part 3 – Digital Item Identification specifies identification methods for Digital Items and their components (link)
  3. Part 4 – Intellectual Property Management and Protection (IPMP) Components specifies how to include management and protection information and protected parts in a Digital Item (link)
  4. Part 5 – Rights Expression Language specifies a language to express rights (link)
  5. Part 6 – Rights Data Dictionary specifies a dictionary of rights-related data (link)
  6. Part 7 – Digital Item Adaptation specifies description tools to enable optimised adaptation of multimedia content (link)
  7. Part 15 – Event Reporting specifies a format to report events (links)
  8. Part 17 – Fragment Identification of MPEG Resources specifies a syntax for URI Fragment Identifiers (link)
  9. Part 19 – Media Value Chain Ontology specifies an ontology for Media Value Chains (link)
  10. Part 20 – Contract Expression Language specifies a language to express digital contracts (link)
  11. Part 21 – Media Contract Ontology specifies an ontology for media-related digital contracts (link).

 Conclusions

The standards from MPEG-1 to MPEG-21 contain 86 specifications covering the entire 30 years of MPEG activity. They should give a rough idea of how MPEG started from the vision of single standards for all industries belonging to what we can call today the “media industry” and has kept on adapting – without disowning – its vision. The original vision has been a seed that has grown – and continues to grow – into a tree. MPEG keeps track of the evolution of technologies to provide more efficient standards and to the needs of the industry with refurbished old and brand new standards.

Posts in this thread (in bold this post)

Forty years of video coding and counting

Introduction

For about 150 years, the telephone service has provided a socially important communication means to billions of people. For at least a century the telecom industry wanted to offer a more complete user experience (as we would call it today) by adding the visual to the speech component.

Probably the first large scale attempt at offering such an audio-visual service was AT&T’s PicturePhone in the mid 1960’s. The service was eventually discontinued but the idea of expanding the telephone service with a video service caught the attention of telephone companies. Many expected that digital video-phone or video-conference services on the emerging digital networks would guarantee the success that the PicturePhone service did not have and research in video coding was funded in many research labs of the telephone companies.

This article will tell the story of how this original investment, seconded by other industries, gave rise to the ever improving digital video experience that our generation is experiencing in ever greater number.

First Video Coding Standard

The first international standard that used video coding techniques – ITU-T Recommendation H.120 – originated from the European research project called COST 211. H.120 was intended for video-conference services, especially on satellite channels, was approved in 1984 and implemented in a limited number of specimens.

Second Video Coding Standard

The second international standard that used video coding techniques – ITU-T Recommendation H.261 – was intended for audio-visual services and was approved in 1988. This signaled the maturity of video coding standardisation that left the old and inefficient algorithms to enter the DCT/motion compensation age.

For several reasons H.261 was implemented by a limited number of manufacturing companies for a limited number of customers.

Third Video Coding Standard

Television broadcasting has always been – and, with challenges, continues to be also today – a socially important communication tool. Unlike audio-visual services that were mostly a strategic target on the part of the telecom industry, television broadcasting in the 1980’s was a thriving industry served by the Consumer Electronic (CE) industry providing devices to hundreds of millions of consumers.

The idea the originated ISO MPEG-1, the third international standard that used video coding techniques and intended for interactive video applications on CD-ROM, was approved by MPEG in November 1992. Besides the declared goal, the intention was to popularise video coding technologies by relying on the manufacturing prowess of the CE industry. MPEG-1 was the first example of a video coding standard developed by two industries that had had until that time very little in common: telecom and CE (terminals for the telecom market were developed by a special industry with little contact with the CE industry).

Fourth Video Coding Standard

Even though in the late 1990’s MPEG-1 Video eventually reached the 1 billion units sold with the nickname “Video CD”, especially in the Far East, the big game started with the fourth international standard that used video coding techniques – ISO MPEG-2 – whose original target was “digital television”. The number of industries interested in it made MPEG crowded: telecom had always sought to have a role in television, CE was obviously interested in having existing analogue TV sets replaced by shining digital TV sets or at least supplemented by a set top box, satellite broadcasters and cable were very keen on the idea of hundreds of TV programs in their bouquets, terrestrial broadcasters had different strategies in different regions but eventually joined, as well as the package media sector of the CE industry, with their tight contacts with the movie industry. This explains why the official title of MPEG-2 is “Generic coding of moving pictures and associated audio information” to signal the fact that MPEG-2 could be used by all the industries that, at that time, had an interest in digital video, a unique feat in the industry.

Fifth and Sixth Video Coding Standards

Remarkably, MPEG-2 Video (and Systems) was a standard jointly developed by MPEG and ITU-T. The world, however, follows the dictum of the Romance of Three Kingdoms (三國演義): 話說天下大勢.分久必合,合久必分. Adapted to the context this can be translated as in the world things divided for a long time shall unite, things united for a long time shall divide. So, the MPEG and ITU paths divided in the following phase. ITU-T developed its own H.263 Recommendation “Video coding for low bit rate communication” and MPEG developed its own MPEG-4 Visual standard, part 2 “Coding of audio-visual objects”. The conjunction of the two standards is a very tiny code that simply tells the decoder that a bitstream is H.263 or MPEG-4 Visual. A lot of coding tool commonality exists, but not at the bitstream level.

H.263 focused on low bitrate video communication, while MPEG-4 Visual kept on making real the vision of extending video coding to more industries: this time Information Technology and Mobile. MPEG-4 Visual was released in 2 versions in 1999 and 2000, while H.263 went through a series of updates documented in a series of Annexes to the H.263 Recommendation. H.263 enjoyed some success thanks to the common belief that it was “royalty free”, while MPEG-4 Visual suffered a devastating blow by a patent pool that decided to impose “content fees” on their licensing term.

Seventh Video Coding Standard

The year 2001 marked the return to the second half of Romance of Three Kingdoms’ dictum: 分久必合 (things separated for a long time shall divide), even though it was not too 久 (long time) since they had divided, certainly not on the scale intended by the Romance of Three Kingdoms. MPEG and ITU-T (through its Video Coding Experts Group – VCEG) joined forces again in 2001 and produced the seventh international standard in 2003. The standard is called Advanced Video Coding by both MPEG and ITU, but is labelled as AVC by MPEG and as H.264 by ITU-T. Reasonable licensing terms (of course always considered unreasonable by licensees) ensured AVC’s long-lasting success in the market place that continues to this day (for another 4 years and 3 months, I mean).

Eighth Video Coding Standard

The eight international video coding standard that used video coding techniques stands by itself because it is not a standard with “new” video coding technologies, but a standard that enables a video decoder to build a decoder matching the bitstream using standardised tools represented in a standard form available at the decoder. The technique, called Reconfigurable Video Coding (RVC) or, more generally, Reconfigurable Media Coding (RMC), because MPEG has applied the same technology to 3D Graphics Coding, is enabled by two standards: ISO/IEC 23002-4 Codec configuration representation and ISO/IEC 23003-4 Video tool library (VTL). The former defines the methods and general principles to describe codec configurations. The latter describes the MPEG VTL and specifies the Functional Units that are required to build a complete decoder for the following standards: MPEG-4 Simple Profile, AVC Constrained Baseline Profile and Progressive High Profile, MPEG-4 SC3DMC, and HEVC Main Profile.

Ninth Video Coding Standard

In 2010 MPEG and VCEG extended their collaboration to a new project: High Efficiency Video Coding (HEVC). A few months after the HEVC FDIS had been released, the HEVC Verification Tests showed that the standard had achieved 60% improvement on AVC, 10% more than originally planned. After that HEVC has been enriched with a number of features that at the time of development were not supported by previous standards such as High Dynamic Range (HDR) and Wide Colour Gamut (WCG), and support to Screen Content and omnidirectional video (video 360). Unfortunately, technical success did not translate into full market success because adoption of HEVC is still hampered – 6 years after its approval by MPEG – by an unclear licensing situation. In IP counting or revenue counting?; Business model based ISO/IEC standards, Can MPEG overcome its Video “crisis”? and A crisis, the causes and a solution an analysis is made of the reasons of the currently stalled situation and possible remedies are proposed.

Tenth Video Coding Standard

ISO, IEC and ITU share a common policy vis-à-vis patents in their standards. Using few imprecise but clear words (where a patent attorney would use many precise but unclear words), the policy is: it is good if a standard has no patents or if the patent holders are allowing use of their patents for free (Optioon 1); it is tolerable if a standard has patents but the patents holders allow use of their patent on fair and reasonable terms and non discriminatory conditions (Option 2); it is not permitted to have a standard with patents whose holders do not allow use of their patents (Option 3).

The target of MPEG standards until AVC had always been “best performance no matter what is the IPR involved” (of course if the IPR holders allow), but as the use of AVC extended to many domains, it was becoming clear that there was so much “old” IP (i.e. more than 20 years) that it was technically possible to make a standard whose IP components were Option 1.

In 2013 MPEG released the FDIS of WebVC, strictly speaking not a new standard because MPEG had simply extracted what was the Constrained Baseline Profile of AVC and made it a separate standard with the intention of making it Option 1. The attempt failed because some companies confirmed their Option 2 patent declarations already made against the AVC standard.

Eleventh Video Coding Standard

WebVC has not been the only effort made by MPEG to develop an Option 1 video coding standard (i.e. a standard for which only Option patent declarations have been made). A second effort, called Internet Video Coding (IVC), was concluded in 2017 with the release of the IVC FDIS. Verification Tests performed showed that the performance of IVC exceeded that of the best profile of AVC, by then a 14 years old standard. Three companies made Option 2 patent declarations that did not contain any detail so that MPEG could not remove the technologies in IVC that the companies claimed infringed their patents.

Twelfth Video Coding Standard

MPEG achieved a different result with its third attempt at developing an Option 1 video coding standard. The proposal made by a company in response to an MPEG Call for Proposals was reviewed by MPEG and achieved FDIS with the name of Video Coding for Browsers (VCB). However, a company made an Option 3 patent declaration that, like those made against IVC, did not contain any detail that would enable MPEG to remove the allegedly infringing technologies. Eventually ISO did not publish VCB.

Today ISO and IEC have disabled the possibility for companies to make Option 3 patent declarations without details (a policy that ITU had not allowed). As the VCB approval process has been completed, it is not possible to resume the study of VCB if MPEG does not restart the process. Therefore, VCB is likely to remain unpublished and therefore not an ISO standard.

Thirteenth Video Coding Standard

For the third time MPEG and ITU are collaborating in the development of a new video coding standard with the target of a 50% reduction of bitrate compared to HEVC. The development of Versatile Video Coding (VVC), as the new standard is called, is still under way and involves close to 300 experts attending VVC sessions. MPEG expects to reach the FDIS of Versatile Video Coding (VVC) in October 2020.

Fourteenth Video Coding Standard

Thirteen is a large number for video coding standards but this number should be measured against the number of years covered – close to 40. In this long period of time we have gone from 3 initial standards that were mostly application/industry-specific (H.120, MPEG-1 and H.261) to a series of generic (i.e. industry-neutral) standards (MPEG-2, MPEG-4 Visual, MPEG-4 AVC and HEVC) and then to a group of standards that sought to achieve Option 1 status (WebVC, IVC and VCB). Other proprietary video coding formats that have found significant use in the market point to the fact that MPEG cannot stay forever in its ivory tower of “best video coding standards no matter what”. MPEG has to face the reality of a market that becomes more and more diversified and where – unlike the golden age of a single coding standard – there is no longer one size that fits all.

At its 125th meeting MPEG has reviewed the responses to its Call for Proposals on a new video coding standard that sought proposals with a simplified coding structure and an accelerated development time of 12 months from working draft to FDIS. The new standard will be called MPEG-5 Essential Video Coding (EVC) and is expected to reach FDIS in January 2020.

The new video coding project will have a base layer/profile which is expected to be Option 1 and a second layer/profile that has already a performance ~25% better than HEVC. Licensing terms are expected to be published by patent holders within 2 years.

VCEG has decided not to work with MPEG on this coding standard. Are we back to the 合久必分 (things combined for a long time must split) situation? This is half true because the MPEG-VCEG collaboration in VVC is continuing. In any case VVC will provide 50% more than the HEVC compression performance.

Fifteenth Video Coding Standard

If there was a need to prove that there is no longer “one size fits all” in video coding, just look at the Call for Proposals for a “Low Complexity Video Coding Enhancements” standard issued by MPEG. This Call is not for a “new video codec”, but a technology capable to extend the capabilities of an existing video codec. A typical usage scenario is the addition of, say, the high definition capability to a set top boxes (typically deployed by the millions) that cannot be recalled. Proposals are due at the March 2019 meeting and FDIS is expected in April 2020.

Sixteenth Video Coding Standard

Point Clouds are not really the traditional “video” content as we know it, namely sequences of “frames” at a frequency sufficiently high frequency to fool the eye into believing that the motion is natural. In point clouds motion is given by dynamic point clouds that represent the surface of objects moving in the scene. For the eye, however, the end-result is the same: moving pictures displayed on a 2D surface, whose objects can be manipulated by the viewer (this, however, requires a system layer that MPEG is already developing).

MPEG is working on two different technologies: the first one uses HEVC to compress projections of portions of a point cloud (and is therefore well-suited for entertainment applications because it can rely on an existing HEVC decoder) and the second one uses computer graphics technologies (and is currently more suited to automotive applications). The former will achieve FDIS in January 2020 and the latter in April 2020.

Seventeenth and Eighteenth Video Coding Standards

Unfortunately, the crystal ball gets blurred as we move into the future. Therefore MPEG is investigating several technologies capable to providesolutions for alternative immersive experiences. After providing HEVC and OMAF for 3DoF experiences (where the user can only have roll, pitch, and yaw movement of the head), MPEG is working on OMAF v2 for 3DoF+ experiences (where the user can have a limited translation of the head). A Call for Proposal has been issued and responses are due in March 2019 and the FDIS is expected in July 2020. Investigations are being carried out on 6DoF (where the user can have full translation of the head) and on light field.

Conclusions

The last 40 years have seen digital video converted from a dream into a reality that involves billions of users every day. This long ride is represented in the figure that ventures into the next steps of the ride.

MPEG keeps working to make sure that manufacturers and content/services providers have access to more and better standard visual technologies for an increasingly diversified market of increasingly demanding users.

Posts in this thread (in bold this post)

 

The MPEG ecosystem

Introduction

An ecosystem is composed of elements variously interconnected and variously dependent on one another. Standardisation is a particular type of ecosystem. Purpose of this article is to analyse the elements of the MPEG ecosystem and their relationships.

Standardisation in the past

In days long bygone, standardisation in what today we would call the “media industry” followed a rather simple process. A company wishing to attach a “standard” label to a product that had become successful in the market made a request to a standards committee whose members, typically from companies in the same industry, had an interest in getting an open specification of what had to be until then a closed system. A good example is offered by the video cassette player for which two products from two different companies, ostensibly for the same functionality – VHS and Beta – were approved by the same standard organisation – the International Electrotechnical Committee (IEC) and by the same committee – SC 60 B at that time.

Things were a little different in the International Telecommunication Union (ITU) where ITU-T (then called CCITT) had a Study Group where the telecommunication industry – represented by the Post and Telecommunication Administrations of the member countries, at that time the only ones admitted to the committee – requested a standard (called recommendation in the ITU) for digital telephony speech. ITU-T ended up with two different specifications in the same standard: one called A-law and the other called µ-law.

In ITU-R (then called CCIR) National Administrations were operating, or had authorised various entities to operate, television broadcasting services (some had even started doing so before WW II) and were therefore unable to settle on even a limited number of television systems. The only thing they could do was to produce a document called Report 624 Television Systems that collected the 3 main television systems (NTSC, PAL and SECAM) with tens of pages where country A selected, e.g., a different frequency or a different tolerance of the colour subcarrier than country B or C.

Standardisation, à la MPEG

Not unaware of past failures of standardisation and taking advantage of the radical technology discontinuity, MPEG took a different approach to standardisation which can be expressed by the synthetic expression “one functionality – one tool”. To apply this expression to the example of ITU-T’s A-law – µ-law dichotomy, if MPEG had to decide on a standard for digital speech, it would

  1. Develop requirements
  2. Select speech samples to be used for tests
  3. Issue a Call for Proposals (CfP)
  4. Run the selected test speech with the proposals
  5. Subjectively assess the quality
  6. Check the proposals for any issue such as complexty etc.
  7. Create a Test Model with the proposals
  8. Create Core Experiments (CE)
  9. Iterate the Test Model with the results of CEs
  10. Produce WD, CD, DIS and FDIS

The process would be long – an overkill in this case because a speech digitiser is a simple analogue-to-digital (A/D) converter – but not necessarily longer that having a committee decide on competing proposals with the goal of accepting only one. The result would be a single standard providing seamless bitstream interoperability without the need to convert speech from one format to another when speech moves from one environment (country, application etc.) to another.

If there were only the 10 points listed above do not make the MPEG process would not be much more complex than the ITU’s. The real difference is that MPEG does not have the mindset of the telecom industry who had decided A-law – µ-law digital speech 50+ years ago. MPEG is different because it would address speech digitisation taking into consideration the needs of a range of other industries who intend to use and hence want to have a say in how the standard is made: Consumer Electronic (CE), Information Technology (IT), broadcasting, gaming and probably more. Taking into account so many views is a burden for those developing the standard (actually, not necessarily) but the standards eventually produced is abstracted from the little (or big) needs that are specific of individual industries. Profiles and Level allow an industry not to be overburdened by technologies introduced to satisfy requirements from other industries that are irrelevant (and possibly costly) to an industry. Those who need the functionality, not matter what the cost, can do it with different profiles and levels.

Exploring the MPEG ecosystem

The article It worked twice and will work again contains a figure, reproduced below, that colourfully depicts with how MPEG has succeeded in its role of “abstracting” the needs of client “digital media” industries currently served by MPEG. The figure does not include other industries, such as genomics, that MPEG has begun to serve.

Figure 1 – MPEG and its client “digital media” industries

Figure 1, however, does not describe all the ecosystem actors. In MPEG-1 the Consumer Electronics industry was typically able to develop by itself (or found it more convenient to develop) the technology needed to make products that used the MPEG-1 standard. With MPEG-2 this was less the case as pointed out in the paragraph “A standard for all” in Why is MPEG successful? Today the industry implementing (as opposed to using or selling products based on) MPEG standards has grown to be a very important element of the MPEG ecosystem. This industry typically provides components to companies who actually manufacture a complete product (sometimes this happens inside the same company, but the logic is the same).

MPEG standards can be implemented using various combinations of software, hardware and hybrid software/hardware technologies. The choice for hardware is very wide: from various integrated circuit architectures to analogue technologies. The latter choice is for devices with extremely low power consumption, although with limited compression. Just about to come are devices that use neural networks. Other technologies are likely to find use in the future, such as quantum computing or even genomic technologies.

Figure 2 identifies 3 “layers” in the MPEG ecosystem, and the arrows show their relationships.

Figure 2 – MPEG, its Client Industries and Implementation Industries

Client industries in need of a standard provide requirements. However, the “Implementation layer” industries, examples of which have been provided above, also provide requirements. The MPEG layer eventually develops standards that are fed to the Client Industry layer that requested it, but also to the Implementation layer. Requests to implement a standard are generated by companies in the Client industry layer and directed to companies in the Implementation layer who eventually deliver the implementation to the companies requesting it. Conformance testing typically plays a role in assessing conformance of an implementation to the standard.

Figure 2, however, is not a full description of the MPEG ecosystem. More elements are provided by Figure 3 which describes how the MPEG process actually takes place.

Figure 3 – The MPEG process

The new elements highlighted by the Figure are

  1. The MPEG Toolkit assembling all technologies that have been used in MPEG standards
  2. The MPEG Competence Centres mastering specific technology areas and
  3. The Technology industries providing new technologies to MPEG by responding to Calls for Proposals (CfP).

In the early days the Implementation Industries did not have a clear identity and were largely merged with the Client Industries and Implementation Industries. Today, as highlighted above, the providers of basic technologies are well identified and separate industries.

Revisiting the MPEG process

Using Figure 3 it is possible to describe how the MPEG process unfolds (the elements of the MPEG ecosystem are in italic).

  1. MPEG receives a request for a standard from a Client Industry
  2. The MPEG Requirements Competence Centre develops requirements by interacting with Client Industries and Implementation Industries
  3. MPEG issues CfPs (Calls for technologies in the figure)
  4. Technology Industries respond to CfP by submitting technologies
  5. MPEG mobilises appropriate Competence Centres
  6. Competence Centres, coordinated by MPEG, develop standards by selecting/adapting existing technologies (drawn from the toolkit) and submitted technologies
  7. MPEG updates the toolkit with new technologies

It should be clear now that MPEG cannot be described by the simple “Standards Provider – Client Industry” relationship. MPEG is a complex ecosystem that works because all its entities play the role proper to them.

Dividing MPEG by Client Industries means losing the commonality of technologies. Dividing MPEG by Implementation Industries makes no sense, because in principle any MPEG standard must be implementable with different technology. Dividing MPEG by Competence Centres means losing the interaction between them.

Conclusions

MPEG is a complex ecosystem that has successfully operated for decades serving the needs of the growing number of its component industries. As much as you would not allow a child to open a toy with complicated gears inside just to see “how it works”, industry should not allow apprentice sorcerers to undo this wonderful ecosystem called MPEG.

Posts in this thread (in bold this post)

Why is MPEG successful?

There are people who do not like MPEG (I wonder why), but so far I have not found anybody disputing the success of MPEG. Some people claim that only a few MPEG standards are successful, but maybe that is because some MPEG standards are_so_ successful.

In this article the reasons of MPEG success are identified and analysed by using the 18 elements of the figure below.

A standard for all. In the late 1980’s many industries, regions and countries had understood that the state of digital technologies justified the switch from analogue to digital (some acted against that understanding and paid dearly for it). At that time several companies had developed prototypes, regional initiatives were attempting to develop formats for specific countries and industries, some companies were planning products and some standards organisations were actually developing standards for their industries. The MPEG proposal of a generic standard, i.e. a common technology for all industries, caught the attention because it offered global interoperability, created a market that was global – geographically and across industries – and placed the burden of developing the very costly VLSI technology on a specific industry accustomed to do that. The landscape today has changed beyond recognition, but today the revolutionary idea of that time is taken as a matter of fact.

One step at a time. Even before MPEG came to the fore many players were trying to be “first” and “impose” their early solution on other countries or industries or companies. If the newly-born MPEG had proposed itself as the developer of an ambitious generic standard digital media technology for all industries, the proposal would have been seen as far fetched. So, MPEG started with a moderately ambitious project: a video coding standard for interactive applications on digital storage media (CD-ROM) at a rather low bitrate (1.5 Mbit/s) targeting the market covered by the video cassette (VHS/Beta). Moving one step at a time has been MPEG policy for all its subsequent standards.

Complete standards. In 6 months after its inception MPEG had already realised the obvious, namely that digital media is not just video (although this it the first component that catches the attention), but it is also audio (no less challenging and with special quality requirements). In 12 months it had realised that bits do not flow in the air but that a stream of bits needs some means to adapt the stream to the mechanism that carries it (originally the CD-ROM). If the transport mechanism is analogue (as was 25 years ago and, to large extent, still today), the adaptation is even more challenging. Later MPEG also realised that a user interacts with the bits (even though it is so difficult to understand what exactly is the interaction that the user wants). With its MPEG-2 standard MPEG was able to provide the industry with a complete Audio-Video-Systems (and DSM-CC) solution whose pieces could also be used independently. That was possible because MPEG could attract, organise and retain the necessary expertise to address such a broad problem area and provide not just a solution that worked, but the best that technology could offer at the time.

Requirements first. Clarifying to yourself the purpose of something you want to make is a rule that should apply to any human endeavour. This rule is a must when you are dealing with a standard developed by a committee of like-minded people. When the standard is not designed by and for a single industry but by many, keeping this rule is vital for the success of the effort. When the standard involves disparate technologies whose practitioners are not even accustomed to talk to one another, complying with this rule is a prerequisite. Starting from its early days MPEG has developed a process designed to achieve the common understanding that lies at the basis of the technical work to follow: describe the environment (context and objectives), single out a set of exemplary uses of the target standard (use cases), and identify requirements.

Leveraging research. In the late 1980s compression of video and audio (and other data, e.g. facsimile) had been the subject of research for a quarter of a century, but how could MPEG access that wealth of technologies and know-how? The choice was the mechanism of Call for Proposals (CfP) because an MPEG CfP is open to anybody (not just the members of the committee) – see How does MPEG actually work? All respondents are given the opportunity to present their proposals that can, at their choice, address individual technologies, subsystems or full systems, and defend them (by becoming MPEG member). MPEG does not do research, MPEG uses the best research results to assemble the system specified in the requirements that always accompany a CfP. Therefore MPEG has a symbiotic relationship with research. MPEG could not operate without a tight relationship with research and research would certainly lose a big customer if that relationship did not exist.

Minimum standards. Industries can be happy to share the cost of an enabling technology but not at the cost of compromising their individual needs. MPEG develops standards so that the basic technology can be shared, but it must allow room for customisation. The notion of Profiles and Level provides the necessary flexibility to the many different users of MPEG standards. With profiles MPEG defines subsets of the general interoperability, with levels it defines different levels of performance within a profile. Further, by restricting standardisation to the decoding functionality MPEG extends the life of its standards because it allows industry players to compete on the basis of their constantly improved encoders.

Best out of good. When the responses to a CfP are on the table, how can MPEG select the best from the good? MPEG uses five tools:

  1. Comprehensive description of the technology proposed in each response (no black box allowed)
  2. Assessment of the performance of the technology proposed (e.g. subjective or objective tests)
  3. Line-up of aggressive “judges” (meeting participants, especially other proponents)
  4. Test Model assembling the candidate components selected by the “judges”
  5. Core Experiments to improve the Test Model.

By using these tools MPEG is able to provide the best standard in a given time frame.

Competition & collaboration. MPEG favours competition to the maximum extent possible. Many participants in the meeting are actual proponents of technologies in response to a CfP and obviously keen to have their proposals accepted. Extending competition beyond a certain point, however, is counterproductive and prevents the group from reaching the goal with the best results. MPEG uses the Test Model as the platform that help participants to collaborate by improving different areas of the Test Model. Improvement are obtained through

  1. Core Experiments, first defined in March 1992 as “a technical experiment where the alternatives considered are fully documented as part of the test model, ensuring that the results of independent experimenters are consistent”, a definition that applies unchanged to the work being done today;
  2. Reference Software, which today is a shared code base that is progressively improved with the addition of all the software validated through Core Experiments.

IPR (un)aware. MPEG uses the process described above and seeks to produce the best performing standards that satisfy the requirements independently of the existence of IPR. Should an IPR in the standard turn out not to be available, the technology protected by that IPR should be removed. Since only the best technologies are adopted and MPEG members are uniquely adept at integrating them, industry knows that the latest MPEG standards are top of the range. Of course one should not think that the best is free. In general it has a cost because IP holders need to be remunerated. Market (outside of MPEG) decides how that can be achieved.

Internal competition. If competition is the engine of innovation, why should those developing MPEG standards be shielded from it? The MPEG mission is not to please its members but to provide the best standards to industry. Probably the earliest example of this tenet is provided by MPEG-2 part 3 (Audio). When backward compatibility requirements did not allow the standard to yield the performance of algorithms not constrained by compatibility, MPEG issued a CfP and developed MPEG-2 part 7 (Advanced Audio Codec) that eventually evolved and became the now ubiquitous MPEG-4 AAC. Had MPEG not made this decision, probably we would still have MP3 everywhere, but no other MPEG Audio standards. MPEG values all its standards but cannot afford not to provide the best technology to those who demand it.

Separation of concerns. Even for the purpose of developing its earliest standards such as MPEG-1 and MPEG-2, MPEG needed to assemble disparate technological competences that had probably never worked together in a project (with its example MPEG has favoured the organisational aggregation of audio and video research in many institutions where the two were separate). To develop MPEG-4 (a standard with 34 parts whose development continues unabated), MPEG has assembled the largest ever number of competences ranging from audio and video to scene description, to XML compression, to font, timed text and many more. MPEG keeps competences organisationally separate in different in MPEG subgroups, but retains all flexibility to combine and deploy the needed resources to respond to specific needs.

New ways for new standards. MPEG works at the forefront of digital media technologies and it would be odd if it had not innovated the way it makes its own standard. Since its early days, MPEG has made massive use of ad hoc groups to progress collaborative work, innovated the way input and output documents are shared in the community and changed the way documents are discussed at meetings and edited in groups.

Talk to the world. Using extreme words, MPEG does have an industry of its own. It only has the industry that develops the technologies used to make standards for the industries it serves. Therefore MPEG needs to communicate its plans, the progress of its work and the results achieved more actively than other groups. See MPEG communicates the many ways MPEG uses to achieve this goal.

Standards are for a time. Digital media is one of the most fast evolving digital technology areas because most of the developers of good technologies incorporated in MPEG standards invest the royalties earned from previous standards to develop new technologies for new standards. As soon as a new technology shows interesting performance (which MPEG assesses by issuing Calls for Evidence – CfE) or the context changes offering new opportunities, MPEG swiftly examines the case, develops requirements and issues CfPs. For instance this has happened for its many video and audio compression standards. A paradigmatic case of a standard addressing a change of context is MPEG Media Transport (MMT) that MPEG designed having in mind a broadcasting system for which the layer below it is IP, unlike MPEG-2 Transport Stream, originally designed for a digitised analogue channel (but also used today for transport over IP as in IPTV).

Standards lead. When technology moves fast, as in the case of digital media, waiting is a luxury MPEG cannot afford. MPEG-1 and MPEG-2 were standards whose enabling technologies were already considered by some industries and MPEG-4 (started in 1993) was a bold and successful attempt to bring media into the IT world (or the other way around). That it is no longer possible to wait is shown by MPEG-I, a challenging undertaking where MPEG is addressing standards for interfaces that are still shaky or just hypothetical. Having standard that lead as opposed to trail, is a tough trial-and-error game, but the only possible game today. The alternative is to stop making standards for digital media because if MPEG waits until market needs are clear, the market is already full of incompatible solutions and there is no room left for standards.

Standards as enablers. An MPEG standard cannot be “owned” by an industry. Therefore MPEG, keeping faith to its “generic standards” mission, tries to accommodate all legitimate functional requirements when it develops a new standard. MPEG assesses each requirement for its merit (value of functionality, cost of implementation, possibility to aggregate the functionality with others etc.). Ditto if an industry comes with a legitimate request to add a functionality to an existing standard. The decision to accept or reject a request is only driven by a value substantiated by use cases, not because an industry gets an advantage or another is penalised.

What is “standard”? In human societies there are laws and entities (tribunals) with the authority to decide if a specific human action conforms to the law. In certain regulated environments (e.g. terrestrial broadcasting in many countries) there are standards and entities (authorised test laboratories) with the authority to decide if a specific implementation conforms to the standard. MPEG has neither but, in keeping with its “industry-neutral” mission, it provides the technical means – tools for conformance assessment, e.g. bitstreams and reference software – for industries to use in case they want to establish authorised test laboratories for their own purposes.

Rethinking what we are. MPEG started as a “club” of Telecommunication and Consumer Electronics companies. With MPEG-2 the “club” was enlarged to Terrestrial and Satellite Broadcasters, and Cable concerns. With MPEG-4, IT companies joined forces. Later, a large number of research institutions and academia joined (today they count for ~25% of the total membership). With MPEG-I, MPEG faces new challenges because the demand for standards for immersive services and applications is there, but technology immaturity deprives MPEG of its usual “anchors”. Thirty years ago MPEG was able to invent itself and, subsequently, to morph itself to adapt to the changed conditions while keeping its spirit intact. If MPEG will be able to continue to do as it did in the last 30 years, it can continue to support the industry it serves in the future, no matter what will be the changes of context.
I mean, if some mindless industry elements will not get in the way.

Suggestions? If you have comments or suggestions about MPEG, please write to leonardo@chiariglione.org.

Posts in this thread (in bold this post)