The MPEG frontier


MPEG has developed standards in many areas. One the latest is compression of DNA reads from high-speed sequencing machines and is now working on Compression of Neural Networks for Multimedia Content Description and Analysis.

How could a group who was originally tasked to develop standards for video coding for interactive applications on digital storage media (CD-ROM) get to this point?

This article posits that the answer is in the same driving force that pushed the original settlers on the East Coast of the North American continent to reach the West Coast. This article also posits that, unlike the ocean after the West Coast that put an end to the frontier and forced John F. Kennedy to propose the New Frontier, MPEG has an endless series of frontiers in sight. Unless, I mean, some mindless industry elements will declare that there is no longer a frontier to overcome.

The MPEG “frontiers”

The ideal that made MPEG experts work over the last 31 years finds its match in the ideal that defined the American frontier. As much as “the frontier”, according to Frederick Jackson Turner, was the defining process of American civilisation, so the development of a series of 180 standards for Coding of Moving Pictures and Audio that extended the capability of compression to deliver better and new services has been the defining process of MPEG, the Moving Picture Experts Group.

The only difference is that the MPEG frontier is a collection of frontiers held together by the title “Coding of Moving Pictures and Audio”. It is difficult to give a rational order in a field undergoing a tumultous development, but I count 10 frontiers:

  1. Making rewarding visual experiences possible by reducing the number of bits required to digitally represent video while keeping the same visual quality and adding more features (see Forty years of video coding and counting and More video with more features)
  2. Making rewarding audio experiences possible by reducing the number of bits required to digitally represent audio with enhanced user experiences (see Thirty years of audio coding and counting)
  3. Making rewarding user experiences by reducing the number of bits required by other non-audio-visual data such as computer-generated or sensor data
  4. Adding infrastructure components to 1, 2 and 3 so as to provide a viable user experience
  5. Making rewarding spatially remote or time-shifted user experiences possible by developing technologies that enable the transport of compressed data of 1, 2 and 3 (What would MPEG be without Systems?)
  6. Making possible user experiences involving combinations of different media
  7. Giving users the means to search for the experiences of 1, 2, 3 and 4 of interest to them
  8. Enabling users to interact with the experiences made possible by 1, 2, 3 and 4
  9. Making possible electronic commerce of user experiences made possible by 1, 2, 3, 4 5 and 6
  10. Defining interfaces to facilitate the development of services.

The table below provides a mapping between MPEG frontiers and MPEG standards, both completed and under development.


No end to the MPEG frontier

Thirty-one years and 180 standards later, MPEG has not accomplished its mandate, yet. That is not because it has not tried hard enough, but because there is an unceasing stream of new technologies providing new opportunities to better accomplish its mandate with improved user satisfaction.

While developing its standards, MPEG has substantially clarified the content of its mandate, still within the title of “Coding of Moving Pictures and Audio”. The following topics highlight what will be the main directions of work in the next few (I must say, quite a few) years to come.

More of the same

The quest for new solutions that do better or just simply different than what has been done in the past 31 years will continue unabated. The technologies that will achieve the stated goals will change and new ones will be added. However, old needs are not going to disappear because solutions exist today.

Immersive media

This is the area that many expect will host the bulk of MPEG standards due to appear in the next few years. Point Cloud Compression is one of the first standards that will provide immediately usable 3D objects – both static and dynamic – for a variety of applications. But other, more traditional, video based approached are also being investigated. Immersive Audio will also be needed to provide complete user experiences. The Omnidirectional MediA Format (OMAF) will probably be, at least for some time, the platform where the different technologies will be integrated. Other capturing and presentation technologies will possibly require new approaches at later stages.

Media for all types of users

MPEG has almost completed the development of the Internet of Media Things (IoMT) standard (in October 2019 part 1 will become FDIS, while parts 2 and 3 have already achieved FDIS in March 2019). IoMT is still an Internet of Things, but Media Things are much more demanding because the amount of information transmitted and processed is typically huge (Mbit/s if not Gbit/s) and at odds with the paradigm of distributed Things that are expected to stay unattended possibly for years. In the IoMT paradigm information is typically processed by machines. Sometimes, however, human users are also involved. Can we develop standards that provide satisfactory (compressed) information to human and machine users alike?

Digital data are natively unsuitable to processing

The case of Audio and Video compression has always been clear. In 1992/1994 industry could only thank MPEG  for providing standards that made it economically feasible to deliver audio-visual services to millions (at that time), and billions (today), of users. Awareness for other data types took time to percolate, but industry now realises that point clouds is an excellent vehicle for delivery of content for entertainment and of 3D environments for automotive; DNA reads from high-speed sequencing machines can be compressed and made easier to process; large neural networks can be compressed for delivery to millions of devices. There is an endless list of use cases all subject to the same paradigm: huge amount of data that can hardly be distributed but can be compressed with or without loss of information, depending on the application.

MPEG is now exploring the use case of machine tools where signals are sampled at a sampling rate > 40 kHz with 10 bit accuracy. In these conditions the machine tool generates 1 TByte/year. The data stored are valuable resources for machine manufactures and operators because they can be used for optimisation of machine operation, determination of on-demand maintenance and factory optimisation.

The password here is: industry 4.0. In order to limit the amount of data stored on the factory floor, a standard for data compression of machine tool data would be invaluable.

In you are interested in this new promising area please subscribe at and join the email reflector

Is MPEG going to change?

MPEG is going to change but, actually, nothing needs to change because the notions outlined above are already part of MPEG’s cultural heritage. In the future we will probably make use of neural networks for different purposes. This, however, is already a reality today because the Compact Descriptors for Video Analysis (CDVA) standard uses neural networks and many proposals to use neural networks in the Versatile Video Coding (VVC) standard have already been made. We will certainly need neural network compression but we are already working on it in MPEG-7 part 17 Neural Networks for Multimedia Content Description and Analysis.

MPEG has been working on MPEG-I Coded representation of immersive media for some time. A standard has already been produced (OMAF), two others (V-PCC and NBMP) have reached DIS level and two others have reached CD level (G-PCC and VVC). Many parallel activities are under way at different stages of maturity.

I have already mentioned that MPEG has produced standards in the IoMT space, but the 25 year old MPEG-7 notion of describing, i.e. coding, content in compressed form is just a precursor of the current exploratory work on Video Coding for Machines (reflector:, subscription:

In the Italian novel The Leopard, better known in the 1963 film version (director Luchino Visconti, starring Burt Lancaster, Claudia Cardinale and Alain Delon), the grandson of the protagonist says: “Se vogliamo che tutto rimanga come è, bisogna che tutto cambi” (if we want that everything stays the same, everything needs to change).

The MPEG version of this cryptic sentence is “if we want that everything changes, everything needs to stay the same”.

Guidance for the future

MPEG is driven by a 31-year long ideal that it has pursued using guidelines that it is good to revisit here while we are about to enter a new phase:

  1. MPEG standard are designed to serve multiple industries. MPEG does not – and does not want to – have a “reference industry”. MPEG works with the same dedication for all industries trying to extract the requirements of each without favouring any.
  2. MPEG standards are provided to the market, not the other way around. At times when de facto standards are popping up in the market, it is proper to reassert the policy that international standards should be developed by experts in a committee.
  3. MPEG standards anticipate the future. MPEG standard cannot trail technology development. If it did otherwise is would be forced to adopt solution that a particular company in a particular industry has already developed.
  4. MPEG standards are the result of a competition followed by collaboration. Competition is as the root of progress. MPEG should continue publishing its work plan so that companies can develop their solutions. MPEG will assess the proposals, select and integrate the best technologies and develop its standards in a collaborative fashion.
  5. MPEG standards thrive on industry research. MPEG is not in the research business, but MPEG would go nowhere is not constantly fed with research, in responses to Calls for Proposals and in the execution of Core Experiments.
  6. MPEG Standards are enablers, not disablers. As MPEG standards are not “owned” by a specific industry, MPEG will continue assessing and accommodating all legitimate functional requirements from whichever source they come.
  7. MPEG standards need a business model. MPEG standards has been successful also because those successful in contributing good technologies MPEG standard have been handsomely remunerated and could invest in new technologies. This business model will not be sufficient to sustain MPEG in its new endeavours.


Leonardo da Vinci, an accomplished performer in all arts and probably the greatest inventor of all times, lived in the unique age of history called Renaissance, when European literati became aware that knowledge was boundless and that they had the capability to know everything. Leonardo’s dictum “Homo sum, humani nihil a me alienum puto” (I am human, and nothing human I consider alien to me) well represents the new consciousness of the intellectual power of humans in the Renaissance age.

MPEG does not have the power to know everything – but it knows quite a few useful things for its mission. MPEG does not have the power to do everything – but it knows how to make the best standards in the area of Coding of Moving Pictures and Audio (and in a few nearby areas as well).

It would indeed be a great disservice if MPEG could not continue serving industry and humankind in the challenges to come as it has successfully done in the challenges of the last 31 years.

Posts in this thread

Tranquil 7+ days of hard work in Gothenburg


Purpose of this article is to offer some glimpses of 7 (actually 12, counting JVET activity) days of hard work at the 127th MPEG meeting (8 to 12 July 2019) in Sweden.

MPEG 127 was an interesting conjunction of the stars because the first MPEG meeting in Sweden (Stockholm, July 1989) was #7 (111 binary) and the last meeting in Sweden (Gothenburg, July 2019) was #127 (111111 binary). Will there be a 255th (1111111 binary) meeting in Sweden in July 2049? Maybe not, looking at some odd – one would call suicidal – proposals for the future of MPEG.

Let’s first have the big – logistic, but emblematic – news. For a few years the number of MPEG participants has been lurking at the level of 500, but in Gothenburg the number of participants has crossed the 600 people mark for the first time. Clearly MPEG remains a highly attractive business proposition if it has mobilised such a huge mass of experts.

It is not my intention to talk about everything that happened at MPEG 127. I will concentrate on some major results starting from, guess what, video.

MPEG-I Versatile Video Coding (VVC)

Versatile Video Coding (VVC) reached the first formal stage in the ISO/IEC standards approval process: Committee Draft (CD). This is the stage of a technical document that has been developed by experts but has not undergone any official scrutiny outside the committee.

The VVC standard has been designed to be applicable to a very broad range of applications, with substantial improvements compared to older standards but also with new functionalities. It is too early to announce a definite level of improvement in coding efficiency, but the current estimate is in the range of 35–60% bitrate reduction compared to HEVC in a range of video types going from 1080p HD to 4K and 8K, for both standard and high dynamic range video, and also for wide colour gamut.

Beyond these “flash news” like announcement, it is important to highlight the fact that, to produce the VVC CD, at MPEG 127 some 800 documents were reviewed. Many worked until close to midnight to process all input documents.

MPEG-5 Essential Video Coding (EVC)

Another video coding standard reached CD level at MPEG 127. Why is it that two video coding standards reached CD at the same meeting? The answer is simple: as a provider of digital media standards, MPEG has VVC as its top of the line video compression “product” but it is has other “products” under development that are meant to satisfy different needs.

One of them is “complexity”, a multi-dimensional entity. VVC is “complex” on several aspects. Therefore EVC does not have the goal to provide the best video quality money can buy, which is what VVC does, but a standard video coding solution for business needs that cover cases, such as video streaming, where MPEG video coding standards have hitherto not had the wide adoption that their technical characteristics suggested they should have.

Currently EVC includes two profiles:

  1. A baseline profile that contains only technologies that are over 20 years old or are otherwise expected to be obtainable royalty-free by a user.
  2. A main profile with a small number of additional tools, each providing significant performance gain. All main profile tools are capable of being individually switched off or individually switched over to a corresponding baseline tool.

Worth noting is the fact that organisations making proposals for the main profile have agreed to publish applicable licensing terms within two years of FDIS stage, either individually or as part of a patent pool.

MPEG-5 Low Complexity Enhancement Video Coding (LCEVC)

LCEVC is another video compression technology MPEG is working on. This is still at Working Draft (WD) level, but the plans call for achieving CD level at the next meeting.

LCEVC specifies a data stream structure made up of two component streams, a base stream decodable by a hardware decoder, and an enhancement stream suitable for software processing implem­entation with sustainable power consumption. The enhancement stream will provide new feat­ures, such as compression capability extension to existing codecs, lower encoding and decoding complexity. The standard is intended for on demand and live streaming applications.

It should be noted that LCEVC is not, stricto sensu, a video coding standard like VVC or EVC, but does cover the business need of enhancing a large number of deployed set top boxes with new capabilities without replacing them.

3 Degrees of Freedom+ (3DoF+) Video

This activity, still at an early (WD) stage, will reach CD stage in January 2020. The standard will allow an encoder to send a limited number of views of a scene so that a decoder can display specific views at the request of the user. If the request is for a view that is actually available in the bitstream, the decoder will simply display it. If the request is for a view that is not in the bitstream, the decoder will synthesise the view using all available information.

Figure 1 shows the effect of decreasing the number of views available at the decoder. With 32 views the image looks perfect, with 8 views there are barely visible artifacts on the tube on the floor, but with only two views artifacts become noticeable.

Of course this is an early stage result that will further be improved until the standard reaches Final Draft International Standard (FDIS) stage in October 2020.

Figure 1 – Quality of synthesised video as a function of the number of views

Video coding for machines

This is an example of work that looks like it is brand new to MPEG but has memories of the past.

In 1996 MPEG started working on MPEG-7, a standard to describe images, video, audio and multimedia data. The idea was that a user would tell a machine what was being looked for. The machine would then convert the request into some standard descriptors and use them to search in the data base where the descriptors of all content of interest had been stored.

I should probably not have to say that the descriptors had a compressed representation because moving data is always “costly”.

Some years ago, MPEG revisited the issue and developed Compact Descriptors for Visual Search (CVDS). The standard was meant to provide a unified and interoperable framework for devices and services in the area of visual search and object recognition.

Soon after CDVS, MPEG revisited the issue for video and developed Compact Descriptord for Video Analysis (CDVA). The standard is intended to achieve the goals of designing interoperable applications for object search, minimising the size of video descriptors and ensuring high matching performance of objects (both in accuracy and complexity).

As the “compact” adjective in both standards signals, CDVS and CDVA descriptors are compressed, with a user-selectable compression ratio.

Recently MPEG has defined requirements, issued a call for evidence and a call for proposals, and developed a WD of a standard whose long name is “Compression of neural networks for multimedia content description and analysis”. Let’s call it for simplicity Neural Network Representation (NNR).

Artificial neural networks are already used for extraction of descriptors, classification and encoding of multimedia content. A case in point is provided by CDVA that is already using several neural networks in its algorithm.

The efficient transmission and deployment of neural networks for multimedia applications require methods to compress these large data structures. NNR defines tools for compressing neural networks for multimedia applications and representing the resulting bitstreams for efficient transport. Distributing a neural network to billions of people may be hard to achieve if the neural network is not compressed.

I am now ready to introduce the new, but also old, idea behind video coding for machines. The MPEG “bread and butter” video coding technology is a sort of descriptor extraction: DCT (or other) coefficients provide the average value of a regions and frequency analysis, motion vectors describe how certain areas in the image move from frame to frame etc.

So far, video coding “descriptors” were designed to achieve the best visual quality – as assessed by humans – at a given bitrate. The question asked by video coding for machines is: “what descriptors provide the best performance for use by a machine at a given bitrate?”

Tough question for which currently there is no answer.

If you want to contribute to the answer, you can join the email reflector after subscribing here. MPEG 128 will be eager to know how the question has been addressed.

A consideration

MPEG is a unique organisation with activities covering the entire scope of a standard: idea, requirements, technologies, integration and specification.

How can mindless industry elements think they can do better than a Darwinian process that has shaped the MPEG machine for 30 years?

Maybe they think they are God, because only He can perform better than Darwin.

Posts in this thread

Hamlet in Gothenburg: one or two ad hoc groups?

In The Mule, Foundation and MPEG, the latest article published on this blog, I wrote: In 30 years of MPEG, and counting? I somehow referred to the MPEG Mule when I wrote “Another thirty years await MPEG, if some mindless industry elements will not get in the way”. We may be close to know the fate of the MPEG Mule.”

We are nowhere close to knowing the fate of MPEG and in this article I will tell another episode of the saga.

In Which future for MPEG? I exposed my ideas about the future of MPEG based on a very simple reasoning. MPEG has developed global industry-agnostic digital-media standards that have led the media industry from analogue to digital, given opportunities to develop new business models and enabled the continuous expansion of the media industry. This is not a tale of the past but a reality that continues today with sustained production of digital media standards. The proof is in the record attendance last week of more than 600 MPEG members in Gothenburg.

Finally, as I wrote in MPEG and ISO, even taxi drivers know MPEG, demonstrating that the name MPEG does not just refer to a technology hidden in devices no one knows about but is imprinted in people’s minds.

Next to my proposal to leverage such a unique organisation making official the strategic role that MPEG has played for the last 30 years, there are many other proposals that can be summarised as follows

The first of these other proposals says: JPEG and MPEG are two working groups in the parent Subcommittee (SC). The former is in charge of coding of images and the latter is in charge of coding of moving pictures. By making MPEG an SC, JPEG remains alone in the parent SC and there will be no more collaboration.

The problem of this argument is that, especially in the last few years, for whatever reasons, JPEG and MPEG have not collaborated. JPEG used to meet collocated with MPEG, but then decided to meet separately. This does not mean that MPEG has not worked for JPEG because it developed two standards for the transport of JPEG2000 and JPEG XS images on MPEG-2 Transport Stream (TS), the standard that transports digital television.

Starting from 1992 MPEG has developed 5 standards jointly with ITU-T Study Group 16 (SG16) and is now developing a 6th standard. Still ITU-T SG16 is not even part of ISO! Another example is that MPEG has developed 3 standards and is developing 3 more standards jointly with TC 276 Biotechnology. Here we are talking of an ISO Technical Committee whose mission is to develop standards for biotechnology that do not have anything to do with digital media (but the jointly developed standard – MPEG-G – is very much needed by TC 276 for their workflows)!

This proves that collaboration happens when there is a common interest, not because the parties in the collaboration belong to the same organisational structure. This a bureaucratic view of collaboration that is unfortunately prevalent in ineffective organisations. Indeed, for bureaucrats it is so difficult to understand the essence of a common problem across organisational borders, while it is so easy to understand what happens inside an organisation (if it is understood, I mean).

The second of these proposals is a further attempt at creating organisational bindings where none existed before because they were never needed. In a few words the proposal is: instead of becoming an independent SC of 600 members (larger than many Technical Committees) the MPEG subgroups should melt in the parent SC.

This proposal demonstrates that the proponents miss the basic understanding of what MPEG is. MPEG is an ecosystem of groups developing integrated standards whose parts can also be used independently. To achieve this result, MPEG has developed the organisation described in More standards – more successes – more failures.

Figure 1 – Independent parts making an integrated standard

The parts of an MPEG standard (Blue circles) are typically “owned” (i.e. developed) by different groups, but there is a need to provide a “glue” (red lines in Figure 1) between the different parts of a standard if the standard is to be used as an integrated whole. The glue is provided by MPEG subgroups assisted by ad hoc groups, breakout groups and joint meetings and orchestrated by studies made at chairs meetings.

Dumping the MPEG organisation to melt in the parent SC will lead to the loss of the glue that make MPEG standards uniquely effective and successful in the industry. The components of a disbanded MPEG will not deliver as before in the new environment. Sure, given time, a new structure can emerge, but it is vital that a new structure operate now at same level of performance of MPEG, not in some years. Competition to MPEG is at the gates.

The third of these proposals is to give the parent SC the role of strategic planning, technical coordination and external relations that MPEG has – successfully – carried out for the last 30 years. This proposal is so naïve that not many words are needed to kill it (in Japanese you would use the word 黙殺, literally meaning “killing with silence”). For 30 years the parent organisation has performed administrative functions and, as much as you cannot make a racehorse from a horse who has pulled a cart for years, because its master so decides, in the same way the parent SC cannot become a strategic planner, a technical coordinator or an external relation manager. After years a new structure and modus operandi can very well settle (MPEG did not become what it is in a day), but in the meantime the cohesion that has kept MPEG components together will wither never to come back again and industry will just spurn its standards.

The fourth and last proposal (in this article, because there are many more) comes from a Non-Performing Entity (NPE). Appoint a new parent committee chair, disbands what exists today and create a new organisation from scratch. Sure, if the intention is to keep with a leash a tame committee whose sole purpose is to add IP in standards without any consideration for their industrial value, this is an excellent proposal.

In Gothenburg these and other proposals were discussed. How to make progress? One proposal was to make two ad hoc groups: one studying the first, well documented, proposal and the other trying to put order in the patchwork of ideas parts of which I have described above. Another proposal was to create only one ad hoc group combining the mandates of the two.

The matter was discussed for hours. Hamlet had to be called from neighbouring Denmark to decide. Whose skull did he use?

Posts in this thread

The Mule, Foundation and MPEG

What do the three entities of the title have to do together?

The second entity is Isaac Asimov’s Foundation Trilogy,  the tale of an organisation, actually more than one, established by Hari Seldon, who had invented psychohistory. According to that fictional theory the responses of large human populations to certain stimuli will remain the same over time if conditions remain as planned. Then, according to Asimov, psychohistory can predict the main elements of the evolution of society over the centuries and the Foundation is the organisation created to make sure that the future remains as Hari Seldon had planned it.

The first element is the Mule, a character of the trilogy, a mutant that quickly conquers the Galactic Empire with the power of his mental capabilities. It is an element of that fictional society whose appearance Hari Seldon’s psychohistory could not predict. The Mule was not expected to appear, but did.

The third is the MPEG data compression – especially media – group I have been writing about for some time on this blog. a group whose appearance in the media industry could not be predicted because it was completely outside of the rules of that industry, maybe the best real-world equivalent of Hari Seldon’s psychohistory.

Which were those rules? At certain points in history, several discoveries were made that rendered a range of inventions possible. Very often the only merit of guy who made the invention was that he put together a process whose elements were either known or already “floating in the air”. Regularly the invention was patented and gave the inventor the right to exploit his invention for the number of years granted by the law of his country.

In spite of this often chaotic process, several media types converged to the same technology. The photographic industry settled on a limited number of film sizes and the cinematographic industry settled on a limited number of formats: number of frames per second and film sizes. The sound recorded on vinyl records that were played at a limited number of speeds. All this according  to a series of steps that could not individually be predicted, but whose general outcome could.

Use of magnetics and electronics allowed more effective recording and, more importantly, enabled the instantaneous transmission of sound and images to remote places. Here the chaos reigned supreme with a large and growing number of formats for sound and television, real time and stored. If there had been a Hari Seldon of the media industry he could have applied his psychohistory.

In the Media Empire yhe Foundation materialised as a growing number of standards organisations who Tried to keep some order in the field. Table 1 shows just those at the international level, but others popped up at regional, national and industry level.

Table  1 – Media-related standards committees (1980’s)

ITU-T Speech SG XV WP 1
Video SG XV WP 2
ITU-R Audio SG 10
Video SG 11
IEC Recording of audio SC 60 A
Recording of video SC 60 B
Audio-visual equipment TC 84
Receivers SC 12A and G
ISO Photography TC 42
Cinematography TC 36

In “Foundation”, Hari Seldon had anticipated a number of “crises”. In the Media Empire, too, one crisis was due, the advent of digital technologies. Normally, this crisis should have been absorbed by making some cosmetic changes while keeping the system unchanged.

This is not what happened in the Media Empire because The Mule appeared in the form of a wild group of experts banding together under the MPEG flag. In the early days their very existence was not even detected by the most sophisticated devices, but soon the Mule’s onslaught was unstoppable. In a sequence of strikes  the MPEG Mule conquered  the media Empire: interactive video on compact disc, portable music, digital audio broadcasting, digital televisions, audio and video on the internet, file format, common encryption, IP-based television, 3D Audio, streaming on the unreliable internet and more. Billions of people were lured, without complaint but with joy, into the new world.

The forces of the MPEG Mule have clearly triumphed over the forces of darkness and anarchy. The Mule – the ultimate outsider – has exploited the confusion and brought order to everybody’s satisfaction if not to the forces of the Foundation who have been .designing their comeback

What will then be the eventual fate of the MPEG Mule?

In the Foundation, the Mule is eventually wiped out, not because his powers disappear but because others learned some of the methods of the Mule and applied them for their own sake, i.e. to re-instate confusion.

In 30 years of MPEG, and counting? I somehow referred to the MPEG Mule when I wrote “Another thirty years await MPEG, if some mindless industry elements will not get in the way”.

We may be close to know the fate of the MPEG Mule.

Posts in this thread