Matching technology supply with demand


There have always been people in need of technology and, most of the time, people ready to provide something in response to the demand. In book XVIII of Iliad, Thetis, Achilles’s mother, asks Hephestus, the god of, blacksmiths, craftsmen, artisans, sculptors and more, to provide a new armour to her son who had lost it to Hector. Hephestus duly complied. Still in the fictional domain, but in more recent years, Agent 007 visits Q Branch to get the latest gadgets for his next spy mission, which are inevitably put to good use in the mission.

Wars have always been times when the need for technologies stretches the ability to supply them. In our, supposedly peaceful, age, there are lots of technology around, but it is often difficult for companies needing a particular technology to find the solution matching their needs and budget.

Supply and demand in standardisation

Standardisation is an interesting case of an entity,  typically non-commercial and non-governmental, needing technologies to make a standard. Often standards organisations, too, need technologies to accomplish their mission. How can they access the needed technologies?

A few decades back, if industry needed a standard for, say, a video cassette recorder, the process was definitely supply-driven: a company who had developed a successful product (call it Sony or JVC) submitted a proposal (call it Betamax or VHS) to a standards committee (call it IEC SC 60B). Too bad if the process produced two standards.

In the second half of the 1980’s, ITU SG XV (ITU numbering of that time) started developing the H.261 recommendation. Experts developed the standard piece by piece in the committee (so-called Okubo group) by acquiring the necessary technologies in a process where the roles of demand and supply were rather blurred. Only participants were entitled to provide their technologies to fulfill the needs of the standard.

A handful of years later, MPEG further innovated technology procurement in a standardisation environment. To get the technologies needed to make a standard, it used a demand-driven tool – MPEG’s Call for Proposals (CfP). Since then, technologies provided by respondents and assessed for relevance by the group are used to 1) create the initial reference model (RM0) and 2) initiate a first round of Core Experiments (CE). CEs result from the agreement among participating experts that there is room for improving the performance of the standard under development by opening a particular area to optimisation. CEs are continued until available room for optimisation is exhausted. While anybody is entitled to respond to a CfP and contribute technology to RM0, only experts participating in the standardisation project can provide technology for CEs. This, however, is not really a limitation because the process is open to anybody wishing to join a recognised standards organisation who is a member of ISO.

The MPEG process of standards development has allowed the industry to maintain a sustained development and expansion for many years. In fairness, this is not entirely MPEG’s merit. Patent pools have played a synergistic role to MPEG’s by providing (industry) users with the means to practice MPEG standards and IP holders the means to be remunerated for the use of their IP.

The situation today

The HEVC case has shown that the cooperation of different parties to achieve the common goal of enabling the use of a standard is not discounted (see, e.g., A crisis, the causes and a solution). There are several reasons for this: the increasing number of individual technologies needed to make a high-performance MPEG standard, the increasing number of IP holders, the increasing number of Non-Performing Entities (NPE) as providers of technology and the increasing number of patent pools who stand as independent licence providers of a portion of a patent.

I have already made several proposals with the intention of helping MPEG from this stalemate (see, e.g., Business model based ISO/IEC standards). Here I would like to present an additional idea that extends the MPEG process of standards development (see How does MPEG actually work?).

A new process proposal

A possible implementation of the proposal applying to an entity (a company, an industry forum or a standards organisation) wishing to develop a specification or a standard, could run like this (some details could be fine tuned or changed on a case-by-case basis):

  1. An entity (a company, an industry forum or a standards organisation) wishes to develop a specification or a standard
  2. The entity issues a Call for Proposals (CfP) including requirements requesting proponents to to accept the process (as defined in this numbered list) and to commit to RAND licensing of their technologies for the specification or standard
  3. The entity assesses the proposals received
  4. The entity sets aside a certain amount of tokens for the entire standard (e.g. 1,000)
  5. The entity builds a “minimal” Reference Model zero (RM0) using technologies contained in the proposals in a conservative way so as to create ample space for a healthy Core Experiment (CE) process
  6. The entity
    1. Assigns a percentage of tokens to RM0
    2. Establishes the amount of tokens that will be given to the proponent who achieves the highest performance in a CE (e.g. 1 token for each 0.1% improvements)
  7. The entity identifies and publishes CEs at the pace required by the specification or standard
  8. For each CE the entity makes a call containing
    1. A description of the CE
    2. A minimum performance target (say, at least 1% improvement)
    3. A deadline for submission of 1) results and 2) code that proves that the target has been achieved
    4. The maximum level of associated complexity
  9. CE proponents should do due diligence and
    1. Make proposals that contain only their own technologies, or
    2. Ask any third party to join in the response (ad accept the conditions of the CfP)
  10. If the tokens are all used and there is still room for optimisation, new tokens are created and token holders have their tokens scaled down so that the total number of tokens is still 1,000
  11. If room for RM optimisation is exhausted but there are still tokens unassigned, token holders have their tokens scaled up so that the total number of tokens is 1,000.

Depending on the nature of the entity (company-industry forum-standards organisation) another entity, which can be the same entity who has managed the process or a patent pool

  1. Identifies who are IP holders in RM0 and CEs
  2. Removes technology in case the status of IP has not been clarified
  3. After completing steps 1 and 2, pays royalties to IP holders based on the number of tokens they have acquired in the process (i.e. RM0 and CEs)

Merits and limits of the proposal

The proposal achieves the goal to

  1. Associate patent holders to RM0 and CE areas as opposed to associate patent holders just to the standard
  2. Enablthe turning off of technologies of a CE area if this has unclear IP status and turning them on again if the status is clarified

In case the entity developing the specification is a standards organisation, more than one patent pool can develop a licence using the results of the process.


This idea was developed in collaboration with Malvika Rao, the founder of Incentives Research and holder of a PhD from Harvard University, and Don Marti, an open source expert and an advisor at Incentives Research.

Converting the basic concept described above into a workable market design requires further work. There may be opportunities to game the system, and the design must consider issues such as how to attract and retain participation. In addition the design must be tested (e.g., via simulation or usability study) to understand its performance.

Please send comments to Leonardo.

Posts in this thread




What would MPEG be without Systems?

The most visited articles on this blog Forty years of video coding and counting and Thirty years of audio coding and counting prove that MPEG is known for its audio and video coding standards. But I will not tire of saying that MPEG would not it be what it has become if the Systems aspects had not been part of most of its standards. This is what I intend to talk about in this article.

It is hard to acknowledge, but MPEG was not the first to deal with the problem of putting together digital audio and video for delivery purposes. In the second half of the 1990’s ITU-T dealt with the problem of handling audio-visual services using the basic ISDN access at 2B+D (2×64 kbit/s) or the primary ISDN access, the first digital streams made possible by ITU-T Recommendations.

Figure 1 depicts the solution specified in ITU Recommendation H.221. Let’s assume that we have 2 B channels at 64 kbit/s (Basic Access ISDN). H.221 creates on each B channel a Frame Structure of 80 bytes, i.e. 640 bits repeating itself 100 times per second. Each bit position in an octet can be considered as an 8 kbit/s sub-channel. The 8th bit in each octet represents the 8th sub-channel, called the Service Channel.

Within the Service Channel bits 1-8 are used by the Frame Alignment Signal (FAS) and bits 9-16 are used by the Bit Alignment Signal (BAS). Audio is always carried by the first B channel, e.g. by the first 2 subchannels, and Video and Data by the other subchannels (less the bitrate allocated to FAS and BAS).

Figure 1 – ITU Recommendation H.221

MPEG-1 Systems

The solution depicted in Figure 1 bears the mark of the transmission part of the telecom industry that had never been much friendly to packet communication. That is why MPEG in the late 1990’s had an opportunity to bring some fresh air in this space. Starting from a blank sheet of paper (at that time MPEG still used paper 😊) MPEG designed a flexible packet-based multiplexer to convey in a single stream compressed audio and video, and clock information in such a way as to enable audio‑video synchronisation (Figure 2).

Figure 2 – MPEG-1 Systems

The MPEG Systems revolution took time to take effect. Indeed the European EU 95 project used MPEG-1 Audio layer 2, but designed a frame-based multiplexer for the Digital Audio Broadcasting service.

MPEG-2 Systems

In the early 1990’s MPEG started working on another blank sheet of paper. MPEG had the experience of MPEG-1 Systems design but the requirements were significantly different. In MPEG-1, audio and video (possibly many of them in the same stream) had a common time base, but the main users of MPEG-2 wanted a system that could deliver a plurality of TV programs, possibly coming from different sources (i.e. with different time bases) and with possibly a lot of metadata related to the programs, not to mention some key business enabler like conditional access information. Moreover, unlike MPEG-1 where it was safe to assume that the bits issuing from a Compact Disc would travel without errors to a demultiplexer, in MPEG-2 it was mandatory to assume that the transmission channel was anything but error-free.

MPEG-2 Transport Stream (TS) provides efficient mechanisms to multiplex multiple audio-visual data streams into one delivery stream. Audio-visual data streams are packetised into small fixed-size packets and interleaved to form a single stream. Information about the multiplexing structure is interleaved with the data packets so that the receiving entity can efficiently identify a specific stream. Sequence numbers help identify missing packets at the receiving end, and timing information is assigned after multiplexing with the assumption that the multiplexed stream will be delivered and played in sequential order.

MPEG-2 Systems is actually two specifications in one (Figure 3). The Transport Stream (TS) is a fixed-length packet-based transmission system designed to work for digital television distribution on error-prone physical channels, while the Program Stream (PS) is a packet-based multiplexer with many points in common with MPEG-1 Systems. `While TS and PS share significant information, moving from one to the other may not be immediate.

Figure 3 – MPEG-2 Systems

MPEG-4 Systems

MPEG-4 gave MPEG the opportunity to experience an epochal transition in data delivery. When MPEG-2 Systems was designed Asynchronous Transfer Mode (ATM) was high on the agenda of the telecom industry and was considered as the vehicle to transport MPEG-2 TS streams on telecommunication networks. Indeed, the Digital Audio-Visual Council (DAVIC) designed its specifications on that assumption. At that time, however, IP was still unknown to the telecom (at least to the transmission part, broadcast and consumer electronics worlds.

The MPEG-4 Systems work was a completely different story than MPEG-2 Systems. An MPEG4 Mux (M4Mux) was developed along the lines of MPEG-1 and MPEG-2 Systems, but MPEG had to face an unknown world where many transports were surging as possible candidates. MPEG was obviously unable to make choices (today, 25 years later, the choice is clear) and developed the notion of Delivery Multimedia Integration Framework (DMIF), where all communications and data transfers between the data source and the terminal were abstracted through a logical API called the DAI (DMIF Application Interface), independent of the transport type (broadcast, network, storage).

MPEG-4 Systems, however, was about more than interfacing with transport and multiplexing. The MPEG-4 model was a 3D space populated with dynamic audio, video and 3D Graphics objects. Binary Format for Scenes (BIFS) was the technology designed to provide the needed functionality.

Figure 4 shows the 4 MPEG-4 layers: Transport, Synchonisation, Compression and Composition.

Figure 4 – MPEG-4 Systems

MPEG-4 File Format

For almost 10 years – until 1997 – MPEG was a group who made intense use of IT tools (in the form of computer programs that simulated encoding and decoding operation of the standards it was developing) but was not an “IT group”. The proof? Until that time it had not developed a single file format. Today MPEG can claim to have another such attribute (IT group) along with the many others it has.

In those years MP3 files were already being created and exchanged by the millions, but the files did not provide any structure. The MP4 File Format, officially called ISO Base Media File Format (ISO BMFF), filled that gap as it can be used for editing, HTTP streaming and broadcasting.

Let’s have a high level look to understand the sea that separates MP3 files from the MP4 FF. MP4 FF contains tracks for each media type (audio, video etc.), with additional information: a four-character the media type ‘name’ with all parameters needed by the media type decoder. “Track selection data” helps a decoder identify what aspect of a track can be used and to determine which alternatives are available.

Data are stored in a basic structure called box with attributes of length, type (4 printable characters), possibly version and flags. No data can be found outside of a box. Figure 5 shows a possible organisation of an MP4 file


Figure 5 – Boxes in an MP4 File

MP4 FF can store:

  1. Structural and media data information for timed presentations of media data (e.g. audio, video, subtitles);
  2. Un-timed data (e.g. meta-data);
  3. Elementary stream encryption and encryption parameter (CENC);
  4. Media for adaptive streaming (e.g. DASH);
  5. High Efficiency Image Format (HEIF);
  6. Omnidirectional Media Format (OMAF);
  7. Files partially received over lossy links for further processing such as playback or repair (Partial File Format);
  8. Web resources (e.g. HTML, JavaScript, CSS, …).

Save for the first two features, all others were added in the years following 2001 when MP4 FF was approved. The last two are still under development.

MPEG-7 Systems

With MPEG-7, MPEG made the first big departure from media compression and turned its attention to media description including ways to compress that information. In addition to descriptors for visual audio and multimedia information, MPEG-7 includes a Systems layer used by an application, say, navigation of a multimedia information repository, to access coded information coming from a delivery layer in the form of coded descriptors (in XML or in BiM, MPEG’s XML compression technology). The figure illustrates the operation of MPEG-7 Systems decoder.

Figure 6 – MPEG-7 Systems

An MPEG-7 Systems decoder operates in two phases

  1. Initialisation when DecoderInit initialises the decoder by conveying description format information (textual or binary), a list of URIs that identifies schemas, parameters to configure the Fragment Update decoder, and an initial description. The list of URIs is passed to a schema resolver that associates the URIs with schemas to be passed to Fragment Update Decoder.
  2. Main operation, when the Description Stream (composed of Access Units containing fragment updates) is fed to the decoder which processes
    1. Fragment Update Command specifying the update type (i.e., add, replace or delete content or a node, or reset the current description tree);
    2. Fragment Update Context that identifies the data type in a given schema document, and points to the location in the current description tree where the fragment update command applies; and
    3. Fragment Update Payload conveying the coded description fragment to be added toor replaced in the description.


MPEG Multimedia Middleware (M3W), also called MPEG-E, is an 8-part standard defining the protocol stack of consumer-oriented multimedia devices, as depicted in Figure 7.

Figure 7 – MPEG Multimedia Middleware (M3W)

The M3W model includes 3 layers:

  1. Applications non part of the specifications but enabled by the M3W Middleware API;
  2. Middleware consisting of
    1. M3W middleware exposing the M3W Middleware API;
    2. Multimedia platform supporting the M3W Middleware by exposing the M3W Multimedia API;
    3. Support platform providing the means to manage the lifetime of, and interaction with, realisation entities by exposing the M3W Support API (it also enables management of support properties, e.g. resource management, fault management and integrity management);
  3. Computing platform: whose API are outside of M3W scope.


Multimedia service platform technologies (MPEG-M) specifies two main components of a multimedia device, called peer in MPEG-M.

As shown in Figure 8, the first component is API: High-Level for applications and Low Level for network, energy and security.

Figure 8 – High Level and Low Level API

The second components is a middleware called MXM that relies on its multimedia technologies

Figure 9 – The MXM architecture

The Middleware is composed of two types of engine. Technology Engines are used to call functionalities defined by MPEG standards such as creating or interpreting a licence attached to a content item. Protocol Engines are used to communicate with other peer, e.g. in case a peer does not have a particular Technology Engine that another peer has. For instance, a peer can use a Protocol Engine to call a licence server to get a licence to attach to a multimedia content item. The MPEG-M middleware has the ability to create chains of Technology Engines (Orchestration) or Protocol Engines (Aggregation).


MPEG Media Transport (MMT) is part 1 of High efficiency coding and media delivery in heterogeneous environments (MPEG-H). It is the solution for the new world of broadcasting where delivery of content can take place over different channels each with different characteristics, e.g. one-way (traditional broadcasting) and two-way (the ever more pervasive broadband network). MMT assumes that the Internet Protocol is common to all channels.

Figure 10 depicts the MMT protocol stack

Figure 10 – The MMT protocol stack

Figure 11 focuses on the MMT Payload, i.e. on the content structure.

Figure 11 – Structure of MMT Payload

The MMT Payload has an onion-like structure:

  1. . Media Fragment Unit (MFU), the atomic unit which can be independently decoded;
  2. Media Processing Unit (MPU), the atomic unit for storage and consumption of MMT content (structured according to ISO BMFF), containing one or more MFUs;
  3. MMT Asset, the logical unit for elementary streams of multimedia component, e.g. audio, video and data, containing one or more MPU files;
  4. MMT Package, a logical unit of multimedia content such as a broadcasting program, containing one or more MMT Assets, also containing
    1. Composition Information (CI), describing the spatio-temporal relationships among MMT Assets
    2. Delivery Information, describing the network characteristics.


Dynamic adaptive streaming over HTTP (DASH) is another MPEG Systems standard that was motivated by the popularity of HTTP streaming and the existence of different protocols used in different streaming platforms, e.g. different manifest and segment formats. By developing the DASH standard for HTTP streaming of multimedia content, MPEG has enabled a standard-based client to stream content from any standard-based server, thereby enabling interoperability between servers and clients of different make.

Figure 12 – DASH model

As depicted in Figure 12, the multimedia content is stored on an HTTP server in two components: 1) Media Presentation Description (MPD) which describes a manifest of the available content, its various alternatives, their URL addresses and other characteristics, and 2) Segments which contain the actual multimedia bitstreams in form of chunks, in single or multiple files.

A typical operation of the system would follow the steps

  1. DASH client obtains the MPD;
  2. Parses the MPD;
  3. Gets information on several parameters, e.g. program timing, media content availability, media types, resolutions, min/max bandwidths, existence of alternatives of multimedia components, accessibility features and the required protection mechanism, the location of each media component on the network and other content characteristics;
  4. Selects the appropriate encoded alternative and starts streaming the content by fetching the segments using HTTP GET requests;
  5. Fetches the subsequent segments after appropriate buffering to allow for network throughput variations
  6. Monitors the network bandwidth fluctuations;
  7. Decides how to adapt to the available bandwidth depending on its measurements by fetching segments of different alternatives (with lower or higher bitrate) to maintain an adequate buffer.

DASH only defines the MPD and the segment formats. MPD delivery and media encoding formats containing the segments as well as client behavior for fetching, adaptation heuristics and content playing are outside of MPEG-DASH’s scope.


The reader should not think that this is an exhaustive presentation of MPEG’s Systems work. I hope the description will reveal the amount of work that MPEG has invested in Systems aspects, sometimes per se, and sometimes to provide adequate support to users of its media coding standards. This article also describes some of the most successful MPEG standards. At the top certainly towers MPEG-2 Systems of which 9 editions have been produced to keep up with continuous user demands for new functionalities.

Without mentioning the fact that MPEG-2 Systems has received an Emmy Award 😉.

Posts in this thread



MPEG: what it did, is doing, will do


If I exchange words with taxi drivers in a city somewhere in the world, one of the questions I am usually asked is: “where are you from?”. As I do not like straight answers, I usually ask back “where do you think I am from?” It usually takes time before the driver gets the information he asked for. Then the next question is: “what is your job?”. Again, instead of giving a straight answer, I ask the question: “do you know MPEG?” Well, believe it or not, 9 out of 10 times the answer is “yes”, often supplemented by an explanation decently connected with what MPEG is.

Wow! Do we need a more convincing proof that MPEG has conquered the minds of the people of the world?

The interesting side of the story, though, is that, even if the name MPEG is known by billions of people, it is not a trademark. Officially, the word MPEG does not even exist. When talking to ISO you should say “ISO/IEC JTC 1/SC 29/WG 11” (next time, ask your taxi driver if they know this letter soup). The last insult is that the domain is owned by somebody who just keeps it without using it.

Should all this be of concern? Maybe for some, but not for me. What I have just talked about is just one aspect of what MPEG has always been. Do you think that MPEG was the result of high-level committees made of luminaries advising governments to take action on the future of media? You are going to be disappointed. MPEG was born haphazardly (read here, if you want to know how). Its strength is that it has been driven by the idea that the epochal transition from analogue to digital should not become another PAL-SECAM-NTSC or VHS-Betamax trap.

In 30 years MPEG has grown 20-fold, changed the way companies do business with media, made music liquid, multiplied the size of TV screens, brought media where there were stamp-size displays, made internet the primary delivery for media, created new experiences, shown that its technologies can successfully be applied beyond media…

There is no sign that its original driving force is abating, unless… Read until the end if you want to know more.

What did MPEG do?


MPEG was the first standards group that brought digital media to the masses. In the 2nd half of the 1990’s the MPEG-1 and MPEG-2 standards were converted to products and services as the list below will show (not that the use of MPEG-1 and MPEG-2 is confined to the 1990’s).

  • Digital Audio Broadcasting: in 1995, just 3 years after MPEG-1 was approved, DAB services began to appear in Europe with DAB receivers becoming available some time later.
  • Portable music: in 1997, 5 years after MPEG-1 was approved, Saehan Information Systems launched MPMan, probably the first portable digital audio player for the mass market that used MP3. This was followed by a long list of competing players until the mobile handset largely took over that function.
  • Video CD: in the second half of the 1990’s VCD spread especially in South East Asia until the MPEG-2 based DVD, with its superior quality, slowly replaced it. It uses all 3 parts of MPEG-1 (layer 2 for audio).
  • Digital Satellite broadcasting: in June 1994 DirecTV launched its satellite TV broadcasting service for the US market, even before MPEG released the MPEG-2 standard in November of that year! It used MPEG-2 and its lead was followed by many other regions who gradually converted their analogue broadcast services to digital.
  • Digital Cable distribution: in 1992 John Malone launched the “500-channel” vision for future cable services and MPEG gave the cable industry the means to make that vision real.
  • Digital Terrestrial broadcasting:
    • In 1996 the USA Federal Communications Commission adopted the ATSC A/53 standard. It took some time, however, before wide coverage of the country, and of other countries following the ATSC standards, was achieved.
    • In 1998 the UK introduced Digital Terrestrial Television (DTT).
    • In 2003 Japan started DTT services using MPEG-2 AAC for audio in addition to MPEG-2 Video and TS.
    • DTT is not deployed in all countries yet, and there are regularly news of a country switching to digital, the MPEG way of course.
  • Digital Versatile Disc (DVD): toward the end of the 1990’s the first DVD players were put to market. They used MPEG-2 Program Stream (part 1 of MPEG-2) and MPEG-2 Video, and a host of audio formats, some from MPEG.


In the 1990s the Consumer Electronics industry provided devices to the broadcasting and telecom industries. and devices for package media. The shift to digital services called for the IT industry to join as providers of big servers for broadcasting and interactive services (even though in the 1990’s the latter did not take off). The separate case of portable audio players provided by startups did not fit the established categories.

MPEG-4 played the fundamental role of bringing the IT industry under the folds of MPEG as a primary player in the media space.

  • Internet-based audio services: The great original insight of Steve Jobs and other industry leaders transformed Advanced Audio Coding (AAC) from a promising technology to a standard that dominates mobile devices and internet services
  • Internet video: MPEG-4 Visual, with the MP4 nickname, did not repeat the success of MP3 for video. Still it was the first example of digital media on the internet as DivX (a company name). Its hopes to become the streaming video format for the internet were dashed by the licensing terms of MPEG-4 Visual, the first example of ill-influence of technology rights on an MPEG standard
  • Video for all: MPEG-4 Advanced Video Coding (AVC) became a truly universal standard adopted in all areas and countries. Broadcasting, internet distribution, package media (Blu-ray) and more.
  • Media files: the MP4 File Format is the general structure for time-based media files, that has become another ubiquitous standard at the basis of modern digital media.
  • Advanced text and graphics: the Open Font Format (OFF), based on the OpenType specification, revised and extended by MPEG, is universally used.



  • Format for encrypted, adaptable multimedia presentation: is provided by the Common Media Application Format (CMAF), a format optimised for large scale delivery of protected media with a variety of adaptive streaming, broadcast, download, and storage delivery methods including DASH and MMT.
  • Interoperable image format: the Multi-Image Application Format (MIAF) enables precise interoperability points for creating, reading, parsing, and decoding images embedded in HEIF.


  • Generic binary format for XML: is provided by Binary format for XML (BiM), a standard used by products and services designed to work according to ARIB and DVB specifications.
  • Common encryption for files and streams: is provided by Common Encryption (CENC) defined in two MPEG-B standards – Part 7 for MP4 Files and Parts 9 for MPEG-2 Transport Stream. CENC is widely used for the delivery of video to billions of devices capable to access internet-delivered stored files, MPEG-2 Transport Syteam and live adaptive streaming.


  • IP-based television: MPEG Media Transport (MMT) is the “transport layer” of IP-based television. MMT assumes that delivery is achieved by an IP network with in-network intelligent caches close to the receiving entities. Caches adaptively packetise and push the content to receiving entities. MMT has been adopted by the ATSC 3.0 standard and is currently being deployed in countries adopting ATSC standards and also used in low-delay streaming applications.
  • More video compression, siempre!: has been provided by High Efficiency Video Coding (HEVC), the AVC successor yielding an improved compression up to 60% compared to AVC. Natively, HEVC supports High Dynamic Range (HDR) and Wider Colour Gamut (WCG). However, its use is plagued by a confused licensing landscape as described, e.g. in A crisis, the causes and a solution
  • Not the ultimate audio experience, but close: MPEG-H 3D Audio is a comprehensive audio compression standard capable of providing very satisfactory immersive audio experiences in broadcast and interactive applications, It is part of the ATSC 3.0 standard.
  • Comprehensive image file format: High Efficiency Image File Format (HEIF) is a file format for individual HEVC-encoded images and sequences of images. It is a container capable of storing HEVC intra-images and constrained HEVC inter-images, together with other data such as audio in a way that is compatible with the MP4 File Format. HEIF is widely used and supported by major OSs and image editing software.


Streaming on the unreliable internet: Dynamic Adapting Streaming on HTTP (DASH) is the widely used standard that enables a media client connected to a media server via the internet to obtain instant-by-instant the version, among those available on the server, that best suites the momentary network conditions.

What is MPEG doing now?

In the preceding chapter I singled out only MPEG standards that have been (and often still continue to be) extremely successful.

I am  unable to single out those that will be successful in the future 😊, so the reasonable thing to do is to show the entire MPEG work plan

At the risk of making the wrong bet 😊. let me introduce some of the most high profile standards under development,  subdivided in the three categories Media Coding, Systems and Tools, and Beyond Media. But you have better become acquainted with all ongoing activities. In MPEG sometimes the last become the first.

Media Coding

  • Versatile Video Coding (VVC): is the flagship video compression activity that will deliver another round of improved video compression. It is expected to be the platform on which MPEG will build new technologies for immersive visual experiences (see below).
  • Enhanced Video Coding (EVC): is the shorter term project with less ambitious goals. EVC is designed to satisfy urgent needs from those who need a standard with a less complex IP landscape
  • Immersive visual technologies: investigations on technologies applicable to visual information captured by different camera arrangements are under way, as described in The MPEG drive to immersive visual experiences.
  • Point Cloud Compression (PCC): refers to two standards capable of compressing 3D point clouds captured with multiple cameras and depth sensors. The algorithms in both standards are lossy, scalable, progressive and support random access to point cloud subsets. See The MPEG drive to immersive visual experiences for more details.
  • Immersive audio: MPEG-H 3D Audio supports a 3 Degrees of Freedom or 3DoF (yaw, pitch, roll) experience at the movie “sweet spot”. More complete user experiences, however, are needed, i.e. 6 DoF (adding x, y, z). These can be achieved with additional metadata and rendering technology.

Systems and Tools

  • Omnidirectional media format: Omnidirectional Media Application Format (OMAF) v1 is a format supporting the interoperable exchange of omnidirectional (VR 360) content for a user who can only Yaw, Pitch and Roll their head. OMAF v2 will support some head translation movements. See The MPEG drive to immersive visual experiences for more details.
  • Storage of PCC data in MP4 FF: MPEG is developing systems support to enable storage and transport of compressed point clouds with DASH, MMT etc.
  • Scene Description Interface: MPEG is investigating the interface to the scene description (not the technology) to enable rich immersive experiences.
  • Service interface for immersive media: Network-based Media Processing will enable a user to obtain potentially very sophisticated processing functionality from a network service via standard API.
  • IoT when Things are Media Things: Internet of Media Things (IoMT) will enable the creation of networks of intelligent Media Things (i.e. sensors and actuators)

Beyond Media

  • Standards for biotechnology applications: MPEG is finalising all 5 parts of the MPEG-G standard and establishing new liaisons to investigate new opportunities.
  • Coping with neural networks everywhere: shortly (25 March 2019) MPEG will receive responses to its Call for Proposals for Neural Network Compression as described in Moving intelligence around.

What will MPEG do in the future?

At the risk of being considered boastful, I would think that MPEG should have deserved attention from some of the business schools that study socio-economic phenomena. Why? Because many have talked about media convergence, but they have forgotten that MPEG, with its standards, has actually triggered that convergence. MPEG people know the ecosystem at work in MPEG and I for one see how it is unique.

This has not happened. Let’s say that it is better to be neglected than to receive unwanted attention.

I would also think that a body that started from a Subcommittee on character sets and has become the reference standards group for the media industry, i.e. devices, content, services and applications, worth hundreds of billion USD with potent influences on a nearby industry such as telecommunication, should have suggested standards organisations to study the work method and possibly apply it to other domains.

This has not happened. Let’s say, again, that its is better to be neglected than to receive unwanted attention.

So can we expect MPEG to continue its mission, and apply its technologies and know how to continue delivering compression standards for immersive experiences and new compression standards for other domains?

Maybe this time MPEG will attract attention. So, don’t count on it.

Posts in this thread



The MPEG drive to immersive visual experiences


In How does MPEG actually work? I described the MPEG process: once an idea is launched, context and objectives of the idea are identified; use cases submitted and analysed; requirements derived from use cases; and technologies proposed, validated for their effectiveness for eventual incorporation into the standard.

Some people complain that MPEG standards contain too many technologies supporting “non-mainstream” use cases. Such complaints are understandable but misplaced. MPEG standards are designed to satisfy the needs of different industries and what is a must for some, may well not be needed by others.

To avoid burdening a significant group of users of the standard with technologies considered irrelevant, from the very beginning MPEG adopted the “profile approach”. This allows to retain a technology for those who need it without encumbering those who do not.

It is true that there are a few examples where some technologies in an otherwise successful standard get unused. Was adding such technologies a mistake? In hindsight yes, but at the time a standard is developed the future is anybody’s guess and MPEG does not want find out later that one of its standards misses a functionality that was deemed to be necessary in some use cases and that technology could support at the time the standard was developed.

For sure there is a cost in adding the technology to the standard – and this is borne by the companies proposing the technology – but there is no burden to those who do not need it because they can use another profile.

Examples of such “non-mainstream” technologies are provided by those supporting stereo vision. Since as early as MPEG-2 Video, multiview and/or 3D profile(s) have been present in most MPEG video coding standards. Therefore, this article will review the attempts made by MPEG at developing new and better technologies to support what are called today immersive experiences.

The early days

MPEG-1 did not have big ambitions (but the outcome was not modest at all ;-). MPEG-2 was ambitious because it included scalability – a technology that reached maturity only some 10 years later – and multiview. As depicted in Figure 1, multiview was possible because if you have two close cameras pointing to the same scene, you can exploit intraframe, interframe and interview redundancy.

Figure 1 – Redundancy in multiview

Both MPEG-2 scalability and multiview saw little take up.

Both MPEG-4 Visual and AVC had multiview profiles. AVC had 3D profiles next to multiview profiles. Multiview Video Coding (MVC) of AVC was adopted by the Blu-ray Disc Association, but the rest of the industry took another turn as depicted in Figure 2.

Figure 2 – Frame packing in AVC and HEVC

If the left and right frames of two video streams are packed in one frame, regular AVC compression can be applied to the packed frame. At the decoder, the frames are de-packed after decompression and the two video streams are obtained.

This is a practical but less that optimal solution. Unless the frame size of the codec is not doubled, you either compromise the horizontal or the vertical resolution depending on the frame-packing method used. Because of this a host of other more sophisticates, but eventually non successful, frame packing methods have been introduced into the AVC and HEVC standards. The relevant information is carried by Supplemental Enhancement Information (SEI) messages, because the specific frame packing method used is not normative.

The HEVC standard, too, supports 3D vision with tools that efficiently compress depth maps, and exploit the redundancy between video pictures and associated depth maps. Unfortunately use of HEVC for 3D video has also been limited.


The MPEG-I project – ISO/IEC 23090 Coded representation of immersive media – was launched at a time when the word “immersive” was prominent in many news headings. Figure 3 gives three examples of immersivity where technology challenges increase moving from left to right.

Figure 3 – 3DoF (left), 3DoF+ (centre) and 6DoF (left)

In 3 Degrees of Freedom (3DoF) the user is static but the head that can Yaw, Pitch and Roll. In 3DoF+ the user has the added capability of some head movements in the three directions. In 6 Degrees of Freedom the user can freely walk in a 3D space.

Currently there are several activities in MPEG that aim at developing standards that support some form of immersivity. While they had different starting points, they are likely to converge to one or, at least, a cluster of points (hopefully not to a cloud😊).


Omnidirectional Media Application Format (OMAF) is not a way to compress immersive video but a storage and delivery format. Its main features are:

  1. Support of several projection formats in addition to the equi-rectangular one
  2. Signalling of metadata for rendering of 360ᵒ monoscopic and stereoscopic audio-visual data
  3. Use of MPEG-H video (HEVC) and audio (3D Audio)
  4. Several ways to arrange video pixels to improve compression efficiency
  5. Use of the MP4 File Format to store data
  6. Delivery of OMAF content with MPEG-DASH and MMT.

MPEG has released OMAF in 2018 that is now published as an ISO standard (ISO/IEC 23090-2).


If the current version of OMAF is applied to a 3DoF+ scenario, the user may feel parallax errors that are more annoying the larger the movement of the head.

To address this problem, at the January 2019 meeting MPEG has issued a call for proposals requesting appropriate metadata (see the red blocks in Figure 4) to help the Post-processor to present the best image based on the viewer’s position if available, or to synthesise a missing one, if not available.

Figure 4 – 3DoF+ use scenario

The 3DoF+ standard will be added to OMAF which will be published as 2nd edition. Both standards are planned to be completed in October 2020.


Versatile Video Coding (VVC) is the latest in the line of MPEG video compression standards supporting 3D vision. Currently VVC does not specifically include full-immersion technologies, as it only supports omnidirectional video as in HEVC. However, VVC could not only replace HEVC in the Figure 4, but also be the target of other immersive technologies as will be explained later.

Point Cloud Compression

3D point clouds can be captured with multiple cameras and depth sensors with points that can number a few thousands up to a few billions, and with attributes such as colour, material properties etc. MPEG is developing two different standards whose choice depends on whether the points are dense (Video-based PCC) or less so (Graphic-based PCC). The algorithms in both standards are lossy, scalable, progressive and support random access to subsets of the point cloud. See here for an example of a Point Cloud test sequence being used by MPEG for developing the V-PCC standard.

MPEG plans to release Video-based Point Cloud Compression as FDIS in October 2019 and Graphic-based PCC Point Cloud Compression as FDIS in April 2020.

Next to PCC compression MPEG is working on Carriage of Point Cloud Data with the goal to specify how PCC data can be stored in ISOBMFF and transported with DASH, MMT etc.

Other immersive technologies


MPEG is carrying out explorations on technologies that enable 6 degrees of freedom (6DoF). The reference diagram for that work is what looks like a minor extension of the 3DoF+ reference model (see Figure 5), but may have huge technology implications.

Figure 5 – 6DoF use scenario

To enable a viewer to freely move in a space and enjoy a 3D virtual experience that matches the one in the real world, we still need some metadata as in 3DoF+ but also additional video compression technologies that could be plugged into the VVC standard.

Light field

The MPEG Video activity is all about standardising efficient technologies that compress digital representations of sampled electromagnetic fields in the visible range captured by digital cameras. Roughly speaking we have 4 types of camera:

  1. Conventional cameras with a 2D array of sensors receiving the projection of a 3D scene
  2. An array of cameras, possibly supplemented by depth maps
  3. Point clouds cameras
  4. Plenoptic cameras whose sensors capture the intensity of light from a number of directions that the light rays travel to reach the sensor.

Technologically speaking, #4 is an area that has not been shy in promises and is delivering on some of them. However, economic sustainability for companies engaged in developing products for the entertainment market has been a challenge.

MPEG is currently engaged in Exploration Experiments (EE) to check

  1. The coding performance of Multiview Video Data (#2) for 3DoF+ and 6DoF, and Lenslet Video Data (#4) for Light Field
  2. The relative coding performance of Multiview coding and Lenslet coding, both for Lenslet Video Data (#4).

However, MPEG is not engaged in checking the relative coding performance of #2 data and #4 data because there are no #2 and #4 test data for the same scene.


In good(?) old times MPEG could develop video coding standards – from MPEG-1 to VVC – by relying on established input video formats. This somehow continues to be true for Point Clouds as well. On the other hand, Light Field is a different matter because the capture technologies are still evolving and the actual format in which the data are provided has an impact on the actual processing that MPEG applies to reduce the bitrate.

MPEG has bravely picked up the gauntlet and its machine is grinding data to provide answers that will eventually lead to one or more visual compression standards to enable rewarding immersive user experiences.

MPEG is planning a “Workshop on standard coding technologies for immersive visual experiences” in Gothenburg (Sweden) on 10 July 2019. The workshop, open to the industry, will be an opportunity for MPEG to meet its client industries, report on its results and discuss industries’ needs for immersive visual experiences standards.

Posts in this thread