The discontinuity of digital technologies


Last week I published as an article of this blog the Executive summary of my book A vision made real – Past, present and future of MPEG. This time I publish as an article the first chapter of the book about the four aspects of the media distribution business and their enabling tech­nologies:

  1. Analogue media distribution describes the vertical businesses of analogue media distribution;
  2. Digitised media describes media digitisation and why it was largely irrelevant to distribution;
  3. Compressed digital media describes how industry tried to use compression for distribution;
  4. Digital technologies for media distribution describes the potential structural impact of compressed digital media for distribution.

Analogue media distribution

In the 1980’s media were analogue, the sole exception being music on compact disc (CD). Different industries were engaged in the business of distributing media: telecom­mun­ication companies distributed music, cable operators distributed television via cable, terrestrial and sat­ellite broadcasters did the same via terrestrial and satellite networks and different types of busin­esses distributed all sort of recorded media on physical support (film, laser discs, compact cas­set­te, VHS/Betamax cassette, etc.).

Even if the media content was exactly the same, say a movie, the baseband signals that represented the media content were all different and specific of the delivery media: film for the theatrical vision, television for the terre­strial or satellite network or for the cable, a different format for video cassette. Added to these technological differences caused by the physical nature of the delivery media, there were often substantial differences that depended on countries or manufacturers.

Figure 1 depicts the vertical businesses of the analogue world when media distribution was a collection of industry-dependent distribution systems each using their own technologies for the baseband signal. The figure is simplified because it does not take into ac­count the country- or region-based differences within each industry.

Figure  1 – Analogue media distribution

Digitised media

Since the 1930’s the telecom industry had investigated digitisation of signals (speech at that time). In the 1960’s technology could support digitisation and ITU created G.711, the standard for digital speech, i.e. analogue speech sampled at 8 kHz with a nonlinear 8 bits quantisation. For several decades digital speech only existed in the (fixed) network, but few were aware of it because the speech did not leave the network as bits.

It was necessary to wait until 1982 for Philips and Sony to develop the Compact Disc (CD) which carried digital stereo audio, specified in the “Red Book”: analogue stereo audio sampled at 44.1 kHz with 16 bits linear. It was a revolution because consumers could have an audio quality that did nor deteriorate with time.

In 1980 a digital video standard was issued by ITU-R.  The luminance and the two colour-differ­ence signals were sampled at 13.5 and 6.75 MHz, respectively, at 8 bits per sample yielding an exceedingly high bitrate of 216 Mbit/s. It was a major achievement, but digital television never left the studio if not as bulky magnetic tapes.

The network could carry 64 kbit/s of digital speech, but no consumer-level delivery media of that time could carry the 1.41 Mbit/s of digital audio and much less the 216 Mbit/s of digital video. Therefore, in the 1960s studies on compression of digitised media begun in earnest.

Compressed digital media

In the 1980’s compression research yielded its first fruits:

  1. In 1980 ITU approved Recommendation T.4: Standardization of Group 3 facsimile terminals for document transmission. In the following decades hundreds of million Group 3 facsimile devices were installed worldwide because, thanks to compression, transmission time of an A4 sheet was cut from 6 min (Group 1 facsim­ile), or 3 min (Group 2 facsimile) to about 1 min.
  2. In 1982 ITU approved H.100 (11/88) Visual telephone systems for transmission of videocon­ference at 1.5/2 Mbit/s. Analogue videoconferencing was not unknown at that time because several com­panies had trials, but many hoped that H.100 would enable diffused business communication.
  3. In 1984 ITU started the standardisation activity that would give rise to Recommendations H.261: Video codec for audio-visual services at p x 64 kbit/s approved in 1988.
  4. In the mid-1980s several CE laboratories were studying digital video recording for magnetic tapes. One example was the European Digital Video Recorder (DVS) project that people ex­pected would provide a higher-quality alternative to the analogue VHS or Betamax video­cassette recorder, as much as CDs were supposed to be a higher-quality alternative to LP records.
  5. Still in the area of recording, but for a radically new type of application – interactive video on compact disc – Philips and RCA were independently studying methods to encode video signals at bitrates of 1.41 Mbit/s (the output bitrate of CD).
  6. In the same years CMTT, a special Group of the ITU dealing with transmission of radio and television programs on telecommunication networks, had started working on a standard for transmission of digital television for “primary contribution” (i.e. transmission between stu­dios).
  7. In 1987 the Advisory Committee on Advanced Television Service was formed to devise a plan to introduce HDTV in the USA and Europe was doing the same with their HD-MAC project.
  8. At the end of the 1980’s RAI and Telettra had developed an HDTV codec for satellite broad­casting that was used for demonstrations during the Soccer World cup in 1990 and General Instrument had showed its Digicipher II system for terrestrial HDTV broadcasting in the band­width of 6 MHz used by American terrestrial television.

 Digital technologies for media distribution

The above shows how companies, industries and standards committees were jockeying for a pos­ition in the upcoming digital world. These disparate and often uncorrelated initiatives betrayed the mindset that guided them: future distribution of digital media would have an arrangement similar to the one sketched in Figure 1 for analogue media: the “baseband signal” of each delivery medium would be digital, thus using new technology, but different for each industry and possibly for each country/region.

In the analogue world these scattered roles and responsibilities were not particularly harmful be­cause the delivery media and the baseband signals were so different that unification had never been attempted. But in the digital world unification made a lot of sense.

MPEG was conceived as the organisation that would achieve unification and provide generic, i.e. domain-independent digital media compression. In other words, MPEG envisaged the completely different set up depicted in Figure 2.

Figure  2 – Digital media distribution (à la MPEG)

 In retrospect that was a daunting task. If its magnitude had been realised, it would probably never have started.

Posts in this thread

A vision made real – Past, present and future of MPEG

Why this book?

In a generation, life of the large majority of human beings is incredibly different than the life of the generation before. The ability to  communicate made possible by ubiquitous internet and to convey media content to others made possible by MPEG standards can probably be mentioned among the most important factors of change. However, unlike internet about which a lot has been written, little is known about the MPEG group besides its name.

This book wants to make up for this lack of information.

It will talk about the radical transformation that MPEG standards wrought to the media distribution business by replacing a multitude of technologies owned by different businesses with a single technology shared by all; the environment in which it operates; the radically new philosophy that underpins this transformation; the means devised to put the philosophy into practice; the industrial and economic impact of MPEG standards; what  new standards are being developed; and what is the future that the author conceives for MPEG as an organisation that plays such an important industrial and social role.

Bottom line, MPEG is about technology. Therefore, the book offers an overview of all MPEG standards and, in particular, videoaudiomedia qualitysystems and data. This is for those more (but not a lot more) technology-minded.

Important – there are short Conclusions worth reading.

Leonardo Chiariglione

Table of Contents of A vision made real – Past, present and future of MPEG

Introduction of A vision made real – Past, present and future of MPEG

The impact of MPEG standards

I suppose that few visitors of this blog need to be convinced that MPEG is important because they have some personal experience of the MPEG importance. Again, I suppose not all visitors have full visibility of all the application areas where MPEG is important.

This article describes different application domains showing how applications have benefited from MPEG standards. The list is not exhaustive and the order in which applications are presented follows approximately the time in which MPEG enabled the application.

Digital Television for distribution

MPEG-2 was the first integrated digital television standard first deployed in 1994, even before the MPEG-2 standard was approved. While most countries have adopted MPEG-2 Video for their terrestrial broadcasting services, with one notable major exception, countries have made different selections of for the audio component.

MPEG-2 Transport Stream is the Systems layer of Digital Television. The Systems layer can carry the “format identifier”. In case the media (audio or video) carried by the Systems layer are different from MPEG, the format identifier indicates which of the registered formats is being actually used.

Digital Television exploits Digital Storage Media Command and Control (DSM-CC) to set up a network connection (used by CATV services) and the carousel to send the content of a slowly changing information source that each receiver that happens to “tune-in” can acquire after some time.

MPEG-4 AVC has replaced MPEG-2 Video in many instances because of its superior compression performance. MPEG-H HEVC is also being used in different countries especially for Ultra High Definition (UHD) distribution. HEVC has the advantage of providing better compression that AVC. Additionally it supports High Dynamic Range (HDR) and Wider Colour Gamut (WCG).

MPEG-B Part 9 provides a specification for Common Encryption of MPEG-2 Transport Streams.

MPEG-H part 1 MPEG Media Transport (MMT), replaces the original MPEG-2 Transport Stream. MMT is part of the ATSC 3.0 specification.

Digital Audio Broadcasting

In the mid-1990’s different European countries began to launch Digital Audio Broadcasting services based on the specifications of the Eureka 147 (EU 147) research project. EU 147 used MPEG-1 Audio Layer II as compressed audio format, in addition to other EU 147-proper specifications. The specification were widely adopted in other countries outside of Europe promoted by the non-government organisation WorldDAB.

In 2006 the DAB+ specifications were released. DAB+ includes HE-AAC v2 and MPEG surround (MPEG-D Part 1).

Technologically connected to DAB for the transport layer, but addressing video (AVC), is the Digital Multimedia Broadcasting (DMB) system developed by Korea for video transmission on mobile handsets.

Other audio services, such as XM, use HE-AAC.

Digital Audio

MP3 (MPEG-1 Audio Layer III) brought a revolution in the music world because it triggered new ways to distribute and enjoy music content. MP3 players continued the revolution brought about by the Walkman. Different versions of AAC continued that trend and triggered the birth of music distribution over the internet. Today most music is distributed via the internet using MPEG standards.

Digital Video for package media distribution

Video Compact Disc (VCD)

The original target of MPEG-1 – interactive video on compact disc – did not happen but, especially in Far East markets, VCD was a big success – probably 1 billion devices sold – anticipating the coming of the more performing but more complex MPEG-2 based DVD. VCD used MPEG-1 Systems, Video and Audio Layer II.

Digital Versatile Disc (DVD)

The widely successful DVD specification used MPEG-2 Video, MPEG-2 Program Stream and a selection of audio codecs for different world regions.

Blu-ray Disc (BD)

The BD specification makes reference to AVC and to Multiview Video Coding. MPEG-2 TS is used instead of MPEG-2 PS. Apparently, no MPEG audio codecs are supported.

Ultra HD Blu-ray

The specification supports 4K UHD video encoded in HEVC with 10-bit High Dynamic Range and Wider Colour Gamut.

Digital video for the studio

MPEG was born to serve the “last mile” of video distribution, but some companies requested to make a version of MPEG-2 targeting studio use. This is the origin of the MPEG-2 4:2:2 profile which only supports intraframe coding and a higher number of bits per pixels.

All standards following MPEG-2, starting from MPEG-4 Visual, have had a few profiles dedicates to use in the studio.

Not strictly in the video coding area is the Audio-Visual Description Profile (AVDP), defined in MPEG-7 Part 9. AVDP was developed to facilitate the introduction of automatic information extraction tools in media production, through the definition of a common format for the exchange of the metadata they generate, e.g. shot/scene detection, face recognition/tracking, speech recognition, copy detection and summarisation, etc.

Digital video

Repeating the “MP3 use case for video” was the ambition of many. MPEG-4 Visual provided the standard technology for doing it. DivX (a company) took over the spec and triggered the birth of “DVD-to-video file” industry that attracted significant attention for some time.

Video distribution over the internet

MPEG-4 Visual was the first video coding standard designed to be “IT-friendly”. Some companies started plans to deliver video over the then internet then growing (in bitrate). Those plans suffered a deadly blow with the publication of the MPEG-4 Visual licensing terms with the “content fee” clause.

The more relaxed AVC licensing terms favoured the development of MPEG-standard based internet-based video distribution. Unfortunately, the years lost with the MPEG-4 Visual licensing terms gave time to alternative proprietary video codecs to consolidate their position in the market.

A similar story continues with HEVC whose licensing terms are of concern to many not for what they say, but for what some patent holders do not say (because they do not provide licensing terms).

Not strictly in the video coding area, but extremely important for video distribution over the internet, is Dynamic Adaptive Streaming for HTTP. DASH enables a client to request a server to send a video segment of the quality that can be streamed on the bandwidth available at a particular time, as measured by client.

In the same space MPEG produced the Common Media Applic­ation Format (CMAF) standard. Several technologies drawn from different MPEG standards are restricted and integrated to enable efficient delivery of large scale, possibly protected, video applications, e.g. streaming of televised events. CMAF Segments can be delivered once to edge servers in content delivery networks (CDN), then accessed from cache by streaming video players without additional network backbone traffic or transmission delay.

File Format

To be “IT-friendly” MPEG-4 needed a file format and this is exactly what MPEG has provided

The MP4 File Format, officially called ISO Base Media File Format (ISO BMFF), was the MPEG response to the need. It can be used for editing, HTTP streaming and broadcasting.

MP4 FF contains tracks for each media type (audio, video etc.), with additional information: a four-character the media type ‘name’ with all parameters needed by the media type decoder. “Track selection data” helps a decoder identify what aspect of a track can be used and to determine which alternatives are available.

An important support to the file format is the Common Encryption for files provided by MPEG-B Part 7.

Posts in this thread

Still more to say about MPEG standards

In Is there a logic in MPEG standards? and There is more to say about MPEG standards I have made an overview of the first 11 MPEG standards (white squares in Figure 1). In this article I would like to continue the overview and briefly present the remaining 11 MPEG standards, including those what are still being developed. Using the same convention as before those marked yellow indicate that no work was done on them for a few years

Figure 1 – The 22 MPEG standards. Those in colour are presented in this article


When MPEG begun the development of the Augmented Reality Application Format (ARAF) it also started a specification called Augmented Reality Reference Model. Later it became aware that SC 24 Computer graphics, image processing and environmental data representation was doing a similar work and joined forces to develop a standard called Mixed and Augmented Reality Reference Model with them.

In the Mixed and Augmented Reality (MAR) paradigm, representations of physical and computer mediated virtual objects are combined in various modalities. The MAR standard has been developed to enable

  1. The design of MAR applications or services. The designer may refer and select the needed components from those specified in the MAR model architecture taking into account the given application/service requirements.
  2. The development of a MAR business model. Value chain and actors are identified in the Reference Model and implementors may map them to their business models or invent new ones.
  3. The extension of existing or creation of new MAR standards. MAR is interdisciplinary and creates ample opportunities for extending existing technology solutions and standards.

MAR-RM and ARAF paradigmatically express the differences between MPEG standardisation and “regular” IT standardisation. MPEG defines interfaces and technologies while IT standardars typically defines architectures and reference models. This explains why the majority of patent declarations that ISO receives relate to MPEG standards. It is also worth noting that in the 6 years it took to develop the standard, MPEG developed 3 editions of its ARAF standard.

The Reference architecture of the MAR standard is depicted in the figure below.

Information from the real world is sensed and enters the MAR engine either directly or after being “understood”. The engine can also access media assets or external services. All information is processed by the engine which outputs the result of its processing and manages the interaction with the user.

Figure 2 – MAR Reference System Architecture

Based on this model, the standard elaborates the Entreprise Viewpoint with classes of actors, roles, business model, successful criteria, the Computational Viewpoint with functionalities at the component level and the Informational Viewpoint with data communication between components.

MM-RM is a one-part standard.


Multimedia service platform technologies (MPEG-M) specifies two main components of a multimedia device, called peer in MPEG-M.

As shown in Figure 3, the first component is API: High-Level API for applications and Low Level API for network, energy and security. 

Figure 3 – High Level and Low Level MPEG-M API

The second components is a middleware called MXM that relies specifically on MPEG multimedia technologies (Figure 4)

Figure 4 – The MXM architecture

The Middleware is composed of two types of engine. Technology Engines are used to call functionalities defined by MPEG standards such as creating or interpreting a licence attached to a content item. Protocol Engines are used to communicate with other peer, e.g. in case a peer does not have a particular Technology Engine that another peer has. For instance, a peer can use a Protocol Engine to call a licence server to get a licence to attach to a multimedia content item. The MPEG-M middleware has the ability to create chains of Technology Engines (Orchestration) or Protocol Engines (Aggregation).

MPEG-M is a 5-part standard

  • Part 1 – Architecture specifies the architecture, and High and Low level API of Figure 3
  • Part 2 – MPEG extensible middleware (MXM) API specifies the API of Figure 4
  • Part 3 – Conformance and reference software
  • Part 4 – Elementary services specifies the elementary services provided by the Protocol Engines
  • Part 5 – Service aggregation specifies how elementary services can be aggregated.


The development of the MPEG-U standards was motivated by the evolution of User Interfaces that integrate advanced rich media content such as 2D/3D, animations and video/audio clips and aggregate dedicated small applications called widgets. These are standalone applications embedded in a Web page and rely on Web technologies (HTML, CSS, JS) or equivalent.

With its MPEG-U standard, MPEG sought to have a common UI on different devices, e.g. TV, Phone, Desktop and Web page.

Therefore MPEG-U extends W3C recommendations to

  1. Cover non-Web domains (Home network, Mobile, Broadcast)
  2. Support MPEG media types (BIFS and LASeR) and transports (MP4 FF and MPEG-2 TS)
  3. Enable Widget Communications with restricted profiles (without scripting)

The MPEG-U architecture is depicted in Figure 5.

Figure 5 – MPEG-U Architecture

The normative behaviour of the Widget Manager includes the following elements of a widget

  1. Packaging formats
  2. Representation format (manifest)
  3. Life Cycle handling
  4. Communication handling
  5. Context and Mobility management
  6. Individual rendering (i.e. scene description normative behaviour)

Figure 6 depicts the operation of an MPEG-U widget for TV in a DLNA enviornment.

Figure 6 – MPEG-U for TV in a DLNA environment

MPEG-U is a 3-part standard

  • Part 1 – Widgets
  • Part 2 – Additional gestures and multimodal interaction
  • Part 3 – Conformance and reference software


High efficiency coding and media delivery in heterogeneous environments (MPEG-H) is an integrated standard that resumes the original MPEG “one and trine” Systems-Video-Audio standards approach. In the wake of those standards, the 3 parts can be and are actually used independently, e.g. in video streaming applications. On the other hand, ATSC have adopted the full Systems-Video-Audio triad with extensions of their own.

MPEG-H has 15 parts, as follows

  1. Part 1 – MPEG Media Transport (MMT) is the solution for the new world of broadcasting where delivery of content can take place over different channels each with different characteristics, e.g. one-way (traditional broadcasting) and two-way (the ever more pervasive broadband network). MMT assumes that the Internet Protocol is common to all channels.
  2. Part 2 – High Efficiency Video Coding (HEVC) is the latest approved MPEG video coding standard supporting a range of functionalities: scalability, multiview, from 4:2:0 to 4:4:4, up to 16 bits, Wider Colour Gamut and High Dynamic Range and Screen Content Coding
  3. Part 3 – 3D Audio il the latest approved audio coding standards supporting enhanced 3D audio experiences
  4. Parts 4, 5 and 6 Reference software for MMT, HEVC and 3D Audio
  5. Parts 7, 8, 9 Conformance testing for MMT, HEVC and 3D Audio
  6. Part 10 – MPEG Media Transport FEC Codes specifies several Forward Erroro Correcting Codes for use by MMT.
  7. Part 11 – MPEG Composition Information specifies an extention to HTML 5 for use with MMT
  8. Part 12 – Image File Format specifies a file format for individual images and image sequences
  9. Part 13 – MMT Implementation Guidelines collects useful guidelines for MMT use
  10. Parts 14 – Conversion and coding practices for high-dynamic-range and wide-colour-gamut video and 15 – Signalling, backward compatibility and display adaptation for HDR/WCG video are technical reports to guide users in supporting HDR/WCC,


Dynamic adaptive streaming over HTTP (DASH) is a suite of standards for the efficient and easy streaming of multimedia using available HTTP infrastructure (particularly servers and CDNs, but also proxies, caches, etc.). DASH was motivated by the popularity of HTTP streaming and the existence of different protocols used in different streaming platforms, e.g. different manifest and segment formats.

By developing the DASH standard for HTTP streaming of multimedia content, MPEG has enabled a standard-based client to stream content from any standard-based server, thereby enabling interoperability between servers and clients of different vendors.

As depicted in Figure 7, the multimedia content is stored on an HTTP server in two components: 1) Media Presentation Description (MPD) which describes a manifest of the available content, its various alternatives, their URL addresses and other characteristics, and 2) Segments which contain the actual multimedia bitstreams in form of chunks, in single or multiple files.

Figure 7 – DASH model

Currently DASH is composed of 8 parts

  1. Part 1 – Media presentation description and segment formats specifies 1) the Media Presentation Description (MPD) which provides sufficient information for a DASH client to adaptive stream the content by downloading the media segments from a HTTP server, and 2) the segment formats which specify the formats of the entity body of the request response when issuing a HTTP GET request or a partial HTTP GET.
  2. Part 2 – Conformance and reference software the regular component of an MPEG standard
  3. Part 3 – Implementation guidelines provides guidance to implementors
  4. Part 4 – Segment encryption and authentication specifies encryption and authentication of DASH segments
  5. Part 5 – Server and Network Assisted DASH specifies asynchronous network-to-client and network-to-network communication of quality-related assisting information
  6. Part 6 – DASH with Server Push and WebSockets specified the carriage of MPEG-DASH media presentations over full duplex HTTP-compatible protocols, particularly HTTP/2 and WebSockets
  7. Part 7 – Delivery of CMAF content with DASH specifies how the content specified by the Common Media Application Format can be carried by DASH
  8. Part 8 – Session based DASH operation will specify a method for MPD to manage DASH sessions for the server to instruct the client about some operation continuously applied during the session.


Coded representation of immersive media (MPEG-I) represents the current MPEG effort to develop a suite of standards to support immersive media products, services and applications.

Currently MPEG-I has 11 parts but more parts are likely to be added.

  1. Part 1 – Immersive Media Architectures outlines possible architectures for immersive media services.
  2. Part 2 – Omnidirectional MediA Format specifies an application format that enables consumption of omnidirectional video (aka Video 360). Version 2 is under development
  3. Part 3 – Immersive Video Coding will specify the emerging Versatile Video Coding standard
  4. Part 4 – Immersive Audio Coding will specify metadata to enable enhanced immersive audio experiences compared to what is possible today with MPEG-H 3D Audio
  5. Part 5 – Video-based Point Cloud Compression will specify a standard to compress dense static and dynamic point clouds
  6. Part 6 – Immersive Media Metrics will specify different parameters useful for immersive media services and their measurability
  7. Part 7 – Immersive Media Metadata will specify systems, video and audio metadata for immersive experiences. One example is the current 3DoF+ Video activity
  8. Part 8 – Network-Based Media Processing will specify APIs to access remote media processing services
  9. Part 9 – Geometry-based Point Cloud Compression will specify a standard to compress sparse static and dynamic point clouds
  10. Part 10 – Carriage of Point Cloud Data will specify how to accommodate compressed point clouds in the MP4 File Format
  11. Part 11 – Implementation Guidelines for Network-based Media Processing is the usual collection of guidelines


Coding-Independent Code-Points (MPEG-CICP) is a collecion of code points that have been assemnled in single media- and technology-specific documents because they are not standard-specific.

Part 1 – Systems, Part 2 – Video and Part 3 – Audio collelct the respective code points and Part 4 – Usage of video signal type code points contains guidelines for their use


Genomic Information Representation (MPEG-G) is a suite of specifications developed jointly with TC 276 Biotechnology that allows to reduce the amount of information required to losslessly store and transmit DNA reads from high speed sequencing machines.

Figure 8 depicts the encoding process

An MPEG-G file can be created with the following sequence of operations:

  1. Put the reads in the input file (aligned or unaligned) in bins corresponding to segments of the reference genome
  2. Classify the reads in each bin in 6 classes: P (perfect match with the reference genome), M (reads with variants), etc.
  3. Convert the reads of each bin to a subset of 18 descriptors specific of the class: e.g., a class P descriptor is the start position of the read etc.
  4. Put the descriptors in the columns of a matrix
  5. Compress each descriptor column (MPEG-G uses the very efficient CABAC compressor already present in several video coding standards)
  6. Put compressed descriptors of a class of a bin in an Access Unit (AU) for a maximum of 6 AUs per bin

Figure 8 – MPEG-G compression

MPEGG-G currently includes 6 parts

  1. Part 1 – Transport and Storage of Genomic Information specifies the file and streaming formats
  2. Part 2 – Genomic Information Representation specified the algorithm to compress DNA reads from jigh speed sequencing machines
  3. Part 3 – Genomic information metadata and application programming interfaces (APIs) specifies metadat and API to access an MPEG-G file
  4. Part 4 – Reference Software and Part 5 – Conformance are the usual components of a standard
  5. Part 6 – Genomic Annotation Representation will specify how to compress annotations.


Internet of Media Things (MPEG-IoMT) is a suite of specifications:

  1. API to discover Media Things,
  2. Data formats and API to enable communication between Media Things.

A Media Thing (MThing) is the media “version” of IoT’s Things.

The IoMT reference model is represented in Figure 9

Figure 9: IoT in MPEG is for media – IoMT

Currently MPEG-IoMT includes 4 parts

  1. Part 1 – IoMT Architecture will specify the architecture
  2. Part 2 – IoMT Discovery and Communication API specifies Discovery and Communication API
  3. Part 3 – IoMT Media Data Formats and API specifies Media Data Formats and API
  4. Part 4 – Reference Software and Conformance is the usual part of MPEG stndards


General Video Coding (MPEG-5) is expected to contain video coding specifications. Currently two specifications are envisaged

  1. Part 1 – Essential Video Coding is expected to be the specification of a video codec with two layers. The first layer will provide a significant improvement over AVC but significantly less than HEVC and the second layer will provide a significant improvement over HEVC but significantly less than to VVC.
  2. Part 2 – Low Complexity Video Coding Enhancements is expected to be the specification of a data stream structure defined by two component streams, a base stream decodable by a hardware decoder, and an enhancement stream suitable for software processing implementation with sustainable power consumption. The enhancement stream will provide new features such as compression capability extension to existing codecs, lower encoding and decoding complexity, for on demand and live streaming applications. The LCEVC decoder is depicted in Figure 18.

Figure 18: Low Complexity Enhancement Video Coding

That’s all?

Well, yes, in terms of standards that have been developed, are being developed or being extended, or for which MPEG thinks that a standard should be developed. Well, no, because MPEG is a forge of ideas and new proposals may come at every meeting.

Currently MPEG is investigating the following topics

  1. In advance signalling of MPEG containers content is motivated by scenarios where the full content of a file is not available to a player but the player needs to take a decision to retrieve the file or not. Therefore the player needs to have sufficient information to determine if it can/cannot play the entire content or only a part.
  2. Data Compression continues the exploration in search for non typical media areas that can benefit from MPEG’s compression expertise. Currently MPEG is investigating Data compression for machine tools.
  3. MPEG-21 Based Smart Contracts investigates the benefits of converting MPEG-21 contract technologies, which can be human readable, to smart contracts for execution on blockchains.

Posts in this thread

The MPEG work plan (March 2019)


In Life inside MPEG I introduced the MPEG work plan. The clock in MPEG moves fast and that work plan is now obsolete. Here is a new re-formatted version of the MPEG work plan as of March 2019.

 The MPEG work plan at a glance

Figure 1 shows the main standards that MPEG has developed or is developing in the 2017-2023 period. The figure is organised in 3 main sections:

  • Media Coding (e.g. AAC and AVC)
  • Systems and Tools (e.g. MPEG-2 TS and File Format)
  • Beyond Media (currently Genome Compression).\

Figure 1 – The MPEG work plan (March 2019)

Disclaimer: dates in the figure and in the following are all planned.

 Navigating the areas of the MPEG work plan

The 1st column in Figure 2 gives the currently active MPEG standardisation areas. The first row gives the currently active MPEG standards. The non-empty white cells give the number of “deliverables” (Standards, Amendments and Technical Reports) currently identified in the work plan.

Figure 2 – Standards (S), Amendments (A) and Technical Reports (T)  in the MPEG work plan (as of March 2019)

Video coding

In the Video coding area MPEG is currently developing specifications for 4 standards: MPEG-H, -I, -5 and -CICP) and is conducting explorations in advanced technologies for immersive visual experiences.


Part 2 – High Efficiency Video Coding 4th edition specifies a profile of HEVC that will have an encoding of a single (i.e. monochrome) colour plane and will be restricted to a maximum of 10 bits per sample, as done in past HEVC range extensions profiles, and additional Supplemental Enhancement Information (SEI) messages, e.g. fisheye video, SEI manifest, and SEI prefix messages.


Part 3 – Versatile Video Coding, currently being developed jointly with VCEG, MPEG is working on the new video compression standard after HEVC. VVC is expected to reach FDIS stage in July 2020 for the core compression engine. Other parts, such as high level syntax and SEI messages will follow later.


Part 4 – Usage of video signal type code points 2nd edition will document additional combinations of commonly used code points and baseband signalling.


This standard is still awaiting approval, but MPEG has already obtained all technologies necessary to develop standards with the intended functionalities and performance from the Calls for Proposals (CfP).

  1. Part 1 – Essential Video Coding will specify a video codec with two layers: layer 1 significantly improves over AVC but performs significantly less than HEVC and layer 2 significantly improves over HEVC but performs significantly less than VVC.
  2. Part 2 – Low Complexity Video Coding Enhancements will specify a data stream structure defined by two component streams: stream 1 is decodable by a hardware decoder, stream 2 can be decoded in software with sustainable power consumption. Stream 2 provides new features such as compression capability extension to existing codecs, lower encoding and decoding complexity, for on demand and live streaming applications.


MPEG experts are collaborating in the development of support tools, acquisition of test sequences and understanding of technologies required for 6DoF and lightfields.

  1. Compression of 6DoF visual will enable a user to move more freely than in 3DoF+, eventually, allowing any translation and rotation in space.
  2. Compression of dense representation of light fields is stimulated by new devices that capture light field with both spatial and angular light information. As the size of data is large and different from traditional images, effective compression schhemes are required.

Audio coding

In the Audio coding area MPEG is working on 2 standards (MPEG-D, and -I).


In Part 5 – Uncompressed Audio in MP4 File Format, MPEG extends MP4 to enable carriage of uncompressed audio (e.g. PCM). At the moment MP4 only carries compressed audio.


Part 4 Immersive Audio. As MPEG-H 3D Audio already supports a 3DoF user experience, MPEG-I builds upon it to provide a 6DoF immersive audio experience. A Call for Proposal will be issued in October 2019. Submissions are expected in October 2021 and FDIS stage is expected to be reached in April 2022. Even though this standard will not be about compression, but about metadata as for 3DoF+ Visual, we have kept this activity under Audio Coding.

3D Graphics Coding

In the 3D Graphics Coding area MPEG is developing two parts of MPEG-I.

  • Part 5 – Video-based Point Cloud Compression (V-PCC) for which FDIS stage is planned to be reached in October 2019.
  • Part 9 – Geometry-based Point Cloud Compression (G-PCC) for which FDIS stage is planned to be reached in January 2020.

The two PCC standards employ different technologies and target different application areas, generally speaking, entertainment and automotive/unmanned aerial vehicles,

Font Coding

In the Font coding area MPEG is working on a new edition of MPEG-4 part 22.

Part 22 – Open Font Format. 4th edition specifies support of complex layouts and additional support for new layout features. FDIS stage will be reached in April 2020.

Genome Coding

In the Genome coding area MPEG has achieved FDIS level for  the 3 foundational parts of the MPEG-G standard:

  • Part 1 – Transport and Storage of Genomic Information
  • Part 2 – Genomic Information Representation
  • Part 3 – Genomic information metadata and application programming interfaces (APIs).

In October 2019 MPEG will complete Part 4 – Reference Software and Part 5 – Conformance. In July 2019 MPEG will issue a Call for Proposals for Part 6 – Genomic Annotation Representation.

Neural Network Coding

Compression of this type of data is motivated by the increasing use of neural networks in many applications that require the deployment of a particular trained network instance to a potentially large number of devices, which may have limited processing power and memory.

MPEG has restricted the general field to neural networks trained with media data, e.g. for the object identification and content description, and is therefore developing the standard in MPEG-7 which already contains two standards – CDVS and CDVA – which offer similar functionalities achieved with different technologies (and therefore the standard should be classified under Media description).


Part 17 – Compression of neural networks for multimedia content description and analysis MPEG is developing a standard that enable compression of artificial neural networks trained with audio and video data. FDIS is expected in January 2021.

Media Description

Media description is the goal of the MPEG-7 standard which contains technologies for describing media, e.g. for the purpose of searching media.

In the Media Description area MPEG has completed Part 15 Compact descriptors for video analysis (CDVA) in October 2018 and is now working on 3DoF+ visual.


Part 7 – Immersive Media Metadata will specify a set of metadata that enable a decoder to provide a more realistic user experience in OMAF v2. The FDIS is planned for July 2021.

System support

In the System support area MPEG is working on MPEG-4 and -I.


Part 34 – Registration Authorities aligns the existing MPEG-4 Registration Autorities to current ISO practice.


In MPEG-H MPEG is working on

Part 10 – MPEG Media Transport FEC Codes. This is being enhanced with the Window-based FEC code. FDAM is expected to be reached in January 2020.


Part 6 – Immersive Media Metrics specifies the metrics and measurement framework in support of immersive media experiences. FDIS stage is planned to be reached in July 2020.


In the Transport area MPEG is working on MPEG-2, -4, -B, -H, -DASH, -I and Explorations.


Part 2 – Systems continues to be a lively area of work 25 years after MPEG-2 Systems reached FDIS. After producing Edition 7, MPEG is working on two amendments to carry two different types of content

  • Carriage of JPEG XS in MPEG-2 TS JPEG XS
  • Carriage of associated CMAF boxes for audio-visual elementary streams in MPEG-2 TS


Part 12 – ISO Based Media File Format Systems continues to be a lively area of work 20 years after MP4 File Format reached FDIS. MPEG is working on two amendments

  • Corrected audio handling, expected to reach FDAM in July 2019
  • Compact movie fragment is expected to reach FDAM stage in January 2020


In MPEG-B MPEG is working on two new standards

  • Part 14 – Partial File Format provides a standard mechanism to store HTTP entities and the partial file in broadcast applications for later cache population. The standard is planned to reach FDIS stage in July 2020.
  • Part 15 – Carriage of Web Resources in ISOBMFF will make it possible to enrich audio/video content, as well as audio-only content, with synchronised, animated, interactive web data, including overlays. The standard is planned to reach FDIS stage in January 2020.


In MPEG-DASH MPEG is working on

  • Part 1 – Media presentation description and segment formats will see a new edition in July 2019 and will be enhanced with an Amendment on Client event and timed metadata processing. FDAM is planned to be reached in January 2020.
  • Part 3 – MPEG-DASH Implementation Guidelines 2nd edition will become TR in July 2019
  • Part 5 – Server and network assisted DASH (SAND) will be enriche by an Amendment on Improvements on SAND messages. FDAM to be reached in July 2019.
  • Part 7 – Delivery of CMAF content with DASH a Technical Report with guidelines on the use of the most popular delivery schemes for CMAF content using DASH. TR is planned to be reached in March 2019
  • Part 8 – Session based DASH operation will reach FDIS in July 2020.


Part 2 – Omnidirectional Media Format (OMAF) released in October 2017 is the first standard format for delivery of omnidirectional content. With OMAF 2nd Edition Interactivity support for OMAF, planned to reach FDIS in July 2020, MPEG is extending OMAF with 3DoF+ functionalities.

Application Formats

MPEG-A ISO/IEC 23000 Multimedia Application Formats is a suite of standards for combinations of MPEG and other standards (only if there are no suitable MPEG standard for the purpose).  MPEG is working on

Part 19 – Common Media Application Format 2nd edition with support of new formats

Application Programming Interfaces

The Application Programming Interfaces area comprises standards that make possible effective use of some MPEG standards.


Part 8 – Network-based Media Processing (NBMP), a framework that will allow users to describe media processing operations to be performed by the network. The standard is expected to reach FDIS stage in January 2020.

Media Systems

Media Systems includes standards or Technical Reports targeting architectures and frameworks.


Part 1 – IoMT Architecture, expected to reach FDIS stage in October 2019. The architecture used in this standard is compatible with the IoT architecture developed by JTC 1/SC 41.

Reference Implementation

MPEG is working on the development of standards for reference software of MPEG-4, -7, A, -B, -V, -H, -DASH, -G, -IoMT


MPEG is working on the development of standards for conformance of MPEG-4, -7, A, -B, -V, -H, -DASH, -G, -IoMT.

The MPEG standards

MPEG uses acronyms for its standards and industry knows them by them. Here you will find the full list of MPEG standards ordered by the 5-digit ISO numbers.

MPEG-1 ISO/IEC 11172 Coding of moving pictures and associated audio for digital storage media at up to about 1,5 Mbit/s

MPEG-2 ISO/IEC 13818 Generic coding of moving pictures and associated audio information

MPEG-4 ISO/IEC 14496 Coding of audio-visual objects

MPEG-7 ISO/IEC 15938 Multimedia content description interface

MPEG-21 ISO/IEC 21000 Multimedia Framework

MPEG-A ISO/IEC 23000 Multimedia Application Formats

MPEG-B ISO/IEC 23001 MPEG systems technologies

MPEG-C ISO/IEC 23002 MPEG video technologies

MPEG-D ISO/IEC 23003 MPEG audio technologies

MPEG-E ISO/IEC 23004 Multimedia Middleware

MPEG-V ISO/IEC 23005 Media context and control

MPEG-M ISO/IEC 23006 Multimedia service platform technologies

MPEG-U ISO/IEC 23007 Rich media user interfaces

MPEG-H ISO/IEC 23008 High efficiency coding and media delivery in heterogeneous environments

MPEG-DASH ISO/IEC 23009 Dynamic adaptive streaming over HTTP (DASH)

MPEG-I ISO/IEC 23090 Coded representation of immersive media

MPEG-CICP ISO/IEC 23091 Coding-Independent Code-Points

MPEG-G ISO/IEC 23092 Genomic Information Representation

MPEG-IoMT ISO/IEC 23093 Internet of Media Things

MPEG-5 ISO/IEC 23094 General Video Coding

Posts in this thread

Looking inside an MPEG meeting


In There is more to say about MPEG standards I presented the entire spectrum of MPEG standards. No one should deny that it is an impressive set of disparate technologies integrated to cover  fields connected by the common thread of Data Compression: Coding of Video, Audio, 3D Graphics, Fonts, Digital Items, Sensors and Actuators Data, Genome, and Neural Networks; Media Description and Composition; Systems support; Intellectual Property Management and Protection (IPMP); Transport; Application Formats; API; and Media Systems.

How on earth can all these technologies be specified and integrated in MPEG standards to respond to industry needs?

This article will try and answer this question. It will do so by starting, as many novels do, from the end (of an MPEG meeting).

Let’s start from the end (of an MPEG meeting)

When an MPEG meeting closes, the plenary approves the results of the week, marking the end of formal collaborative work within the meeting. Back in 1990 MPEG developed a mechanism – called “ad hoc group” (AhG) – that would allow to continue a form of collaboration. This mechanism allows MPEG experts to continue working together, albeit with limitations:

  1. In the scope, i.e. an AhG may only work on the areas identified by the mandates (in Latin ad hoc means “for a specific purpose”). Of course experts are free to work individually on anything and in any way that please them;
  2. In the purpose, i.e. an AhG may only prepare recommendations – in the scope of its mandates – to be submitted to MPEG. This is done at the beginning of the following meeting, after which the AhG is disbanded;
  3. In the method of work, i.e. an AhG operates under the leadership of one or more Chairs. Clearly, though, the success of an AhG depends very much on the attitude and activity of its members.

On average some 25 AhGs are established at each meeting. There is not one-to-one correspondence between MPEG activities and AhGs. Actually AhGs are great opportunities to explore new and possibly cross-subgroup ideas.

Examples of AhG titles are

  1. Scene Description for MPEG-I
  2. System technologies for Point Cloud Coding (PCC)
  3. Network Based Media Processing (NBMP)
  4. Compression of Neural Networks (NNR).

What happens between MPEG meetings

An AhG uses different means to carry out collaborative work:  by using reflectors, by teleconferencing and by holding physical meetings. The last can only be held if they were scheduled in the AhG establishment form. Unscheduled physical meetings may only be held if there is unanimous agreement of those who subscribed to the AhG.

Most AhGs hold scheduled meetings on the weekend that precedes the next MPEG meeting. These are very useful to coordinate the results of the work done and to prepare the report that all AhGs must make to the MPEG plenary on the following Monday.

AhG meetings, including those in the weekend preceding the MPEG meeting, are not formally part of an MPEG meeting.

An MPEG meeting at a glance

Chairs meeting

MPEG chairs meet three times during an MPEG week:

  1. On Sunday evening to review the progress of AhG work, coordinate activities impacting more than one Subgroup and plan activities to be carried out during the week including the need for joint meetings;
  2. On Tuesday evening to assess the result of the first two days of work, review the work plan and time lines based on the expected outcomes and identify the need of new joint meetings;
  3. On Thursday evening to wrap up the expected results and review the preliminary results of the week.

Plenary meetings

During an MPEG week MPEG holds 3 plenaries

  1. On Monday morning: to make everybody aware of the results of work carried out since the last meeting and to plan work of the week. AHG reports are a main part of it as they are presented and, when necessary, discussed;
  2. On Wednesday morning to make everybody aware of the work done in all subgroups in the first two days and to plan work for the next two days;
  3. On Friday afternoon to approve the results of the work of Subgroups, including liaison letters, to establish new AhGs etc.

Subgroup, Breakout Group and Joint meetings

Subgroups start their meetings on Monday afternoon. They review their own activities and kick off work in their areas. Each subgroup assigns activities to breakout groups (BoG) who meet with their own schedules to achieve the goals assigned. Each Subgroup may hold other brief meetings to keep everybody in the Subgroup in sync with the general progress of the work.

For instance, the activities of the Systems Subgroups are currently: File format, DASH, OMAF, OMAF and DASH, OMAF and MIAF, MPEG Media Transport, Network Based Media Processing and PCC Systems.

The MPEG structure is designed to facilitate interactions between different Subgroups and BoGs from different Subgroups to discuss matters that affect different Subgroups and BoGs, because they are at the interface of MPEG subsystems, For example, the table below lists the joint meetings that the Systems Subgroup held with other Subgroups at the January 2019 meeting.

Table 1 – Joint meeting of Systems Subgroup with other Subgroups

Systems meeting with Topics
Reqs, Video, VCEG SEI messages in VVC
Audio, 3DG Scene Description
3DG Systems for Point Cloud Compression
3DG API for multiple decoders
Audio Uncompressed Audio
Reqs, JVET, VCEG Immersive Decoding Interface

NB: VCEG is the Video Coding Experts Group of ITU-T Study Group 16. It is not an MPEG Subgroup.

On Friday morning all Subgroups approve their own results. These are automatically integrated in the general document to be approved by the MPEG Plenary on Friday afternoon.

Advisors meeting

On Monday evening, an informal group of experts from different countries examines issues of general (non-technical) interest. In particular it calls for meeting hosts, reviews proposals of meeting hosts, makes recommendations of meeting hosts to the plenary etc.

A bird’s eye view of an MPEG meeting

Figure 1 depicts the workflow described in the paragraphs above, starting from the end of the N-1 th meeting to the end of the N-th meeting.

Figure 1 – A snapshot of MPEG works from the end of a meeting to the end of the next meeting

What is “done” at an MPEG meeting?

There are around 500 of the best worldwide experts attending an MPEG meeting. It is an incredible amount of brain power that is mobilised at an MPEG meeting. In the following I will try and explain how this brain power is directed.

An example – the video area

Let’s take as example the work done in the Video Coding area at the March 2019 meeting.

The table below has 3 columns:

  1. The standards on which work is done (Video has worked on MPEG-H, MPEG-I, MPEG-CICP, MPEG-5 and Explorations)
  2. The names of the activities and
  3. The types of documents resulting from the activities (see the following legend for an explanation of the acronyms).

Table 2 – Documents produced in the video coding area

Std Activity Document type
H High Efficiency (HEVC) TM, CE, CTC
I Versatile Video Coding (VVC) WD, TM, CE, CTC
3 Degrees of Freedom + (3DoF+) coding CfP, WD, TM, CE, CTC
CICP Usage of video signal type code points (Ed. 1) TR
Usage of video signal type code points (Ed. 2) WD
5 Essential Video Coding WD, TM, CE, CTC
Low Complexity Enhancement Video Coding CfP, WD, TM, CE, CTC
Expl 6 Degrees of Freedom (6DoF) coding EE, Tools
Coding of dense representation of light fields EE, CTC


TM: Test Model, software implementing the standard (encoder & decoder)

WD: Working Draft

CE: Core Experiment, i.e. definition of and experiment that should improve performance

CTC: Common Test Conditions, to be used by all CE participants

CfP: Call for Proposals (this time no new CfP produced, but reports and analyses of submissions in response to CfPs)

TR: Technical Report (ISO document)

EE: Exploration Experiment, an experiment to explore an issue because it si not mature enough to be a CE

Tools: other supporting material, e.g. software developed for common use in CEs/EEs

What is produced by an MPEG meeting

Figure 2 gives the number of activities for each type of activity defined in the legend (and others that were not part of the work in the video area). For instance, out of a total of 97 activities:

  1. 29 relate to processing of standards through the canonical stages of Committee Draft (CD), Draft International Standard (DIS) and Draft International Standard (FDIS) and the equivalent for Amendments, Technical Reports and Corrigenda. In other words, at every meeting MPEG is working on ~10 “deliverables” (i.e. standards, amendments, technical reports or corrigenda) in the approval stages;
  2. 22 relate to working drafts, i.e. “new” activities that have not entered the approval stages;
  3. 8 relate to Technologies under Consideration, i.e. new technologies that are being considered to enhance existing standards;
  4. 8 relate to requirements, typically for new standards;
  5. 6 relate to Core Experiments;
  6. Etc.


Figure 2 – Activities at an MPEG meeting

Figure 2 does not provide a quantitative measure of “how many” documents were produced for each activity or “how big” they were.  As an example, Point Cloud Compression has 20 Core Experiments and 8 Exploration Experiments under way, while MPEG-5 EVC has only one large CE.

An average value of activity at the March 2019 meeting is provided by dividing the number of output documents (212), by the number of activities (97), i.e. 2.2.


MPEG holds quarterly meetings with an attendance of ~500 experts. If we assume that the average salary of an MPEG expert is 500 $/working day and that every expert stays 6 days (to account for attendance at AhG meetings), the industry investment in attending MPEG meetings is 1.5 M$/meeting or 6 M$/year. Of course, the total investment is more than that and probably in excess of 1B$ a year.

With the meeting organisation described above MPEG tries to get the most out of the industry investment in MPEG standards.

Posts in this thread



The article MPEG: what it did, is doing, will do recounts my statistically not insignificant experience of asking taxi drivers across different cities of the world if they know MPEG. I do not have similar amount of data to report for ISO, but I am pretty sure that if I asked a taxi driver if they know ISO, the yes rate would be considerably lower than for MPEG.

This is not merit of MPEG or demerit of ISO as organisations. MPEG – Moving Pictures Experts Group – is lucky to deal with things that let people make content that other people can see and hear in ever new ways. ISO – International Organisation for Standardisation – is an organisation with the mission to develop international standards for anything that is not telecommunication – the purview of the International Telecommunication Union (ITU) – and electrotechnical – the purview of the International Electrotechnical Commission (IEC).

The ISO organisation

The above may seem rather abstract, so let’s see what the difference means in practice. ISO is a huge organisation structured in Technical Committees (TC). Actually, the structure is more complex than that (see Figure 1), but for the purpose of what I want to say, this is enough.

The first 3 – still active – TCs in ISO are: TC 1 Screw threads, TC 2 Fasteners and TC 4 Rolling bearings. The standards produced by these TCs are industrially very important, but the topics hardly make peoples’ hearts beat faster. The last 3 TCs in order of establishment are TC 322 Sustainable finance, TC 323 Circular economy and TC 324 Sharing economy. The standards produced by these TCs are important for the financial industries, but probably little known even in financial circles. Between these two extremes we have a large number of TCs, e.g., TC 35 Paints and varnishes, TC 186 Cutlery and table and decorative metal hollow-ware, TC 249 Traditional Chinese medicine, TC 282 Water reuse, TC 297 Waste collection and transportation management, etc.

ISO TCs work on areas of human endeavour that are extremely important to industrial and social life. Many of these activities, however, do not say much to man in the street.

Where is MPEG in this picture? To answer this question I need to dig deeper in the ISO organisation. Most TCs do not have a monolithic structure. They are organised in working groups (WG). TCs retain key functions such as strategy and management, and WGs are tasked to develop standards. In quite a few cases the area of responsibility is so broad that a horizontal organisation would not be functional. In this case a TC may decide to establish Subcommittees (SC). They are like mini TCs where WGs developstandards under them.`

Figure 1 – ISO governance structure

In 1987 ISO/TC 97 Data Processing merged with IEC/TC 83 Information technology equipment. The resulting (joint) technical committee was and is called ISO/IEC JTC 1 Information Technology. One JTC 1 SC, SC 2 Character sets and Information Coding of JTC 1, included WG 8 Coding of Audio and Picture Information. WG 8 established the Moving Picture Experts Group (MPEG) in January 1988. In 1991 when SC 2/WG 8 seceded from SC 2 and became SC 29, MPEG became WG 11  Coding of audio, picture, multimedia and hypermedia information (but everybody calls it MPEG).

MPEG changed the world of media

Those who have survived the description of the ISO organigram will now have the opportunity to understand how this group of experts, in the depths of the ISO (and IEC) organisation changed the world of media and impacted the lives of billions of people, probably all of those on the face of the Earth, if we exclude some hermits in the few surviving tropical forests, the many deserts or the frozen lands.

The main reason of the success of MPEG is that for 30 years it had carte blanche to implement its ideas. Some of them were clear at the outset, others took shape from a process of learning on the job.

Let’s revisit MPEG’s ideas of standardisation to understand what it did and why.

Idea #1 – Single standards for all countries and industries

The first idea relates to the scope of MPEG standards. In the analogue world absence or scarce availability of broadband communication or deliberate policies or the natural separation between industries that traditionally had little in common, favoured the definition of country-based or industry-based standards. The first steps toward digital video undertaken by countries and industries trod similar paths: different countries and industries tried their own way independently.

MPEG jumped in the scene at a time the different trials had not had the time to solidify, and the epochal analogue-to-digital transition gave MPEG a unique opportunity to effect its disruptive action.

MPEG knew that it was technically possible to develop generic standards that could be used in all countries of the world and in all industries that needed compressed digital media. MPEG saw that all actors affected – manufacturers, service providers and end users – would gain if such a bold action was taken. When MPEG began to tread its adventurous path, MPEG did not know  whether it was procedurally possible to achieve that goal. But it gambled and gave it a try. It used the Requirements subgroup to develop generic requirements, acted on the major countries and trade/standards associations of the main industries and magically got their agreement.

The network of liaisons and, sometimes, joint activities is the asset that allowed MPEG to implement idea #1 and helped achieve many of the subsequent goals.

Idea #2 – Standards for the market, not the other way around

Standards are ethereal entities, but their impact is very concrete. This was true and well understood in the world of analogue media. At that time a company that had developed a successful product would try to get a “standard” stamp on it, share the technology with its competitors and enjoy the economic benefits of their “standard” technology.

With its second idea MPEG reshuffled the existing order of steps. Instead of waiting for the market to decide which technology would win – an outcome that very often had little to do with the value of the technology – MPEG offered its standard development process where the collaboratively defined “best” is developed and assessed by MPEG experts who decide which individual technology wins. Then the “standard” technology package developed by MPEG is taken over by the market.

MPEG standards are consistently the best standards at a given time. Those who have technologies selected to be part of MPEG standards reap the benefits and most likely will continue investing in new technologies for future standards.

Idea #3 – Standards anticipate the future

The third idea is a consequence of the first two. MPEG-1 was driven by the expected possibilities of the audio and video compression technologies of the time. It was a bet on silicon making it possible to execute the complex operations implied by the standard so that industry could build products of which there was no evidence but only educated guesses: interactive video on CD and digital audio broadcasting. Ironically, neither really took off, but other products that relied on the MPEG-1 technologies – Video CD and MP3 – were (the former) and still are (the latter) extremely successful.

MPEG standards anticipate market needs. They are regularly bets that a certain standard technology will be adopted. In  More standards – more successes – more failures you can see how some MPEG standards are extremely successful and other less so.

Idea #4 – Industry-friendly standards

The fourth idea was simple and disruptive. Since its first instances in the 1920s, industry and governments have created tens of television formats, mostly around the basic NTSC, PAL and SECAM families. Even in the late 1960’s, when the Picturephone was developed, AT&T invented a new 267-line format, with no obvious connection with any of the existing video formats.

MPEG never wanted to define its own format. With its fourth idea, propped up by the nature of digital technologies, it just decided that it would support any  format. Here is how it did it:

  1. One standard with not options (this should be obvious, because it is what a standard should be about)
  2. Standards apply only to decoders; encoders are implicitly defined and have ample margins of implementation freedom
  3. Profiles (hierarchical, if possible) to accommodate special industry needs within the same standard
  4. Decoders are defined by their ability to process data, quantised in levels (based on bitrate, resolution etc.)
  5. How different formats are handled is outside of MPEG standards.

Idea #5 – Audio and video come together

The fifth idea was kind of obvious but no less disruptive. Because of the way audio and video industries had developed – audio for a century and video for half a century – people working on the corresponding technologies tended to operate in “watertight compartments”, be they in academia, research or companies. That attitude had some justification in the analogue world because the relevant technologies were indeed different and there was not so much added value in keeping the technologies together, considering the big effort needed to keep the experts together.

However, the digital world with its commonality of technologies, no longer justified keeping the two domains separate. That is why MPEG, just 6 months after its first meeting, kicked off the Audio subgroup after successfully assembling in a few months the best experts.

This injection of new technology with the experts that carried it was not effortless. When transformed into digital, audio and video signals are bits and bits and bits, but the sources are different and influence how they are compressed. Audio experts shared some (at a high level) compression technologies – Subband and Discrete Cosine Transform – but video is (was) a 2D signal changing in time often with “objects” in it, while audio is (was) a 1D signal. More importantly, audio experts were driven by other concerns such as the way the human hearing process handles the data coming out of the frequency analysis carried out by the human cochlea.

The audio work was never “dependent” on the video work. MPEG audio standards can have a stand-alone use (i.e. they do not assume that there is a video associated with it), but there is no MPEG video standard that is without an MPEG Audio standard. So it was necessary to keep the two together and it is even more important to do so now when both video and audio are both 3D signals changing in time.

Idea #6 – Don’t forget the glue that keeps audio and video together

The sixth idea can be described by the formula

Audio and Video ≠ (Audio + Video)

This may look cryptic but it states the obvious. Having audio and video together does not necessarily mean that audio and video will play together in the right way if they are stored on a disk or transmitted over a channel.

The fact that MPEG established a Digital Storage Media subgroup and a Systems subgroups 18 months after its foundation signals that MPEG has always been keenly aware of the issue that a bitstream composed by MPEG audio and video bitstreams need to be transported to be played back as intended by the bitstream creator. In MPEG-1 it was a bitstream in a controlled environment, in MPEG-2 it was a bitstream in a noisy environment, from MPEG-4 on it was on IP, in MPEG-DASH it had to deal with unpredictability of the Internet Protocol in the real world.

During its existence the issue of multiplexing and transport formats have shaped MPEG standards. Without a Systems subgroup, efficiently compressed audio and video bitstreams would have remained floating in the space without a standard means to plug them into real systems.

Idea #7 – Integrated standards as toolkits

Most MPEG standards are composed of the 3 key elements – audio, video and systems – that make an audio-visual system and some, such as MPEG-4 and MPEG-I, even include 3D Graphic information. These standards are integrated in the sense that, if you need a complete solution, you can get what you need from the package offered by MPEG.

The world is more complicated than that. Some users want to cherry pick technologies. In the case of MPEG-I, most likely MPEG will not standardise a Scene Description technology but will just indicate how externally defined technologies can be plugged into the syste.

With its seventh idea MPEG is ready to satisfy the needs of all customers. It defines the means to signal how an external technology can be plugged into a set of other native MPEG technologies. With one caveat: customer has to take care of the integration of the external technology. That MPEG will not do.

Idea #8 – Technology is always on the move

To describe the eight idea, I will seek help from the Greek philosopher Heraclitus (or whoever was the person who said it): τὰ πάντα ῥεῖ καὶ οὐδὲν μένει (everything flows and nothing stays). Digital technologies move fast and actually accelerate. By applying idea #3, #4, #5, #6 and #7, MPEG standards accelerated the orderly transition of analogue to digital media. By applying ideas #1 and #2, MPEG standards prompted technology convergence with its merging of industry segments and appearance of new players.

The seventh idea reminds MPEG that the technology landscape is constantly changing and this awareness must inform its standards. Until HEVC – one can even say, including the upcoming Versatile Video Coding (VVC) – video meant coding a 2D rectangular area (in MPEG-4, a flat area of any shape). The birth of immersive visual experiences is not without pain, but they are becoming possible and MPEG must be ready with solutions that take this basic assumption into account. This means that, in the technology scenario that is shaping up, the MPEG role of “anticipatory standards” is ever more important and ever more challenging to achieve.

Idea #9 – The nature and borders of compression

The ninth idea goes down to the very nature of compression. What is the meaning of compression? Is it “less bits is always good” or can it also be “as few meaningful bits as possible is also good”? The former is certainly desirable but, as the nature of information consumption changes and compression digs deeper in the nature of information, compressed representations that offer easier access to the information embedded in the data becomes more valuable.

What is the scope of application of MPEG compression? When MPEG started the MPEG-1 standards work, the gap that separated the telecom from the CE industries (the first two industries in attendance at that time) were as wide as the media industry and, say, the genomic industries today. Both are digital now and the dialogue gets easier.

With patience and determination MPEG has succeeded in creating a common language and mind set in the media industries. This is an important foundation of MPEG standards, The same amalgamation will continue between MPEG and other industries.

Now the results

Figure 2 intends to attach some concreteness to the nine ideas illustrated above by showing some of the most successful MPEG standards issued from 31 years of MPEG activity.


Figure #2 – Some successful MPEG standards


An entity at the lowest layer of the ISO hierarchy has masterminded the transition of media from the analogue to the digital world. Its standards underpin the evolution of digital media, foster the creation of new industries and offer unrelenting growth to old and new industries worth in excess of 1 trillion USD per year.

Many thanks to the parent body SC 29 for managing the balloting of MPEG standards.

Posts in this thread

Data compression in MPEG

That video is a high profile topic to people interested in MPEG is obvious – MP stands for Moving Pictures – and is shown by the most visited article in this blog Forty years of video coding and counting. Audio is also a high profile topic, so it should not be a surprise given that the official MPEG title is “Coding of Moving Pictures and Audio” and is confirmed by the fact that Thirty years of audio coding and counting has received almost the same amount of visits as the previous one.

What is less known, but potentially very important, is the fact that MPEG has already developed a few standards for compression of a wide range of other data types. Point Cloud is the data type that is acquiring a higher profile by the day is, but there are many more types, as represented by the table below.

Figure 1 – Data types and relevant MPEG standards



The articles Forty years of video coding and counting and More video with more features provide a detailed history of video compression in MPEG from two different perspectives. Here I will briefly list the video-coding related standards produced or being produced by MPEG mentioned in the table.

  • MPEG-1 and MPEG-2 both produced widely used video coding standards.
  • MPEG-4 has been much more prolific.
    • It started with Part 2 Visual
    • It continued with Part 9 Reference Hardware Description, a standard that supports a reference hardware description of the standard expressed in VHDL (VLSI Hardware Description Language), a hardware description language used in electronic design automation.
    • Part 10 is the still high-riding Advanced Video Coding standard.
    • Part 29, 31 and 33 are the result of three attempts at developing Option 1 video compression standards (in a simple but imprecise way, standards that do not require payment of royalties).
  • MPEG-5 is currently expected to be a standard with 2 parts:
    • Part 1 Essential Video Coding will have a base layer/profile which is expected to be Option 1 and a second layer/profile with a performance ~25% better than HEVC. Licensing terms are expected to be published by patent holders within 2 years.
    • Part 2 Low Complexity Enhancement Video Coding (LCEVC) will be a two-layer video coding standard. The lower layer is not tied to any specific technology and can be any video codec; the higher layer is used to extend the capability of an existing video codec.
  • MPEG-7 is about Multimedia Content Description. There are different tools to describe visual information:
    • Part 3 Visual is a form of compression as it provides tools to describe Color, Texture, Shape, Motion, Localisation, Face Identity, Image signature and Video signature.
    • Part 13 Compact Descriptors for Visual Search can be used to compute compressed visual descriptors of an image. An application is to get further information about an image captured e.g. with a mobile phone.
    • Part 15 Compact Descriptors for Video Analysis allows to manage and organise large scale data bases of video content, e.g. to find content containing a specific object instance or location.
  • MPEG-C is a collection of video technology standard that do not fit with other standards. Part 4 – Media Tool Library is a collection of video coding tools (called Functional Units) that can be assembled using the technology standardised in MPEG-B Part 4 Codec Configuration Representation.
  • MPEG-H part 2 High Efficiency Video Coding is the latest MPEG video coding standard with an improved compression of 60% compared to AVC.
  • MPEG-I is the new standard, mostly under development, for immersive technologies
    • Part 3 Versatile Video Coding is the ongoing project to develop a video compression standard with an expected 50% more compression than HEVC.
    • MPEG-I part 7 Immersive Media Metadata is the current project to develop a standard for compressed Omnidirectional Video that allows limited translational movements of the head.
    • Exploration in 6 Degrees of Freedom (6DoF) and Lightfield are ongoing.


The article Thirty years of audio coding and counting provides a detailed history of audio compression in MPEG. Here I will briefly list the audio-coding related standards produced or being produced by MPEG mentioned in the table.

  • MPEG-1 part 3 Audio produced, among others, the foundational digital audio standard better known as MP3.
  • MPEG-2
    • Part 3 Audio extended the stereo user experience of MPEG-1 to Multichannel.
    • Part 7 Advanced Audio Coding is the foundational standard on which MPEG-4 AAC is based.
  • MPEG-4 part 3 Advanced Audio Coding (AAC) currently supports some 10 billion devices and software applications growing by half a billion unit every year.
  • MPEG-D is a collection of different audio technologies:
    • Part 1 MPEG Surround provides an efficient bridge between stereo and multi-channel presentations in low-bitrate applications as it can transmit 5.1 channel audio within the same 48 kbit/s transmission budget.
    • Part 2 Spatial Audio Object Coding (SAOC) allows very efficient coding of a multi-channel signal that is a mix of objects (e.g. individual musical instruments).
    • Part 3 Unified Speech and Audio Coding (USAC) combines the tools for speech coding and audio coding into one algorithm with a performance that is equal or better than AAC at all bit rates. USAC can code multichannel audio signals, and can also optimally encode speech content.
    • Part 4 Dynamic Range Control is a post-processor for any type of MPEG audio coding technology. It can modify the dynamic range of the decoded signal as it is being played.

2D/3D Meshes

Polygons meshes can be used to represent the approximate shape of a 2D image or a 3D object. 3D mesh models are used in various multimedia applications such as computer game, animation, and simulation applications. MPEG-4 provides various compression technologies

  • Part 2 Visual provides a standard for 2D and 3D Mesh Compression (3DMC) of generic, but static, 3D objects represented by first-order (i.e., polygonal) approximations of their surfaces. 3DMC has the following characteristics:
    • Compression: Near-lossless to lossy compression of 3D models
    • Incremental rendering: No need to wait for the entire file to download to start rendering
    • Error resilience: 3DMC has a built-in error-resilience capability
    • Progressive transmission: Depending on the viewing distance, a reduced accuracy may be sufficient
  • Part 16 Animation Framework eXtension (AFX) provides a set of compression tools for Shape, Appearance and Animation.

Face/Body Animation

Imagine you have a face model that you want to animate from remote. How do you represent the information that animates the model in a bit-thrifty way? MPEG-4 Part 2 Visual has an answer to this question with its Facial Animation Parameters (FAP). FAPs are defined at two levels.

  • High level
    • Viseme (visual equivalent of phoneme)
    • Expression (joy, anger, fear, disgust, sadness, surprise)
  • Low level: 66 FAPs associated with the displacement or rotation of the facial feature points.

In the figure feature points affected by FAPs are indicated as a black dot. Other feature point are indicated as a small circle.

Figure 2 – Facial Animation Parameters

It is possible to animate a default face model in the receiver with a stream of FAPs or a custom face can be initialised by downloading Face Definition Parameters (FDP)  with specific background images, facial textures and head geometry.

MPEG-4 Part 2 uses a similar approach for Body Animation.

Scene Graphs

So far MPEG has never developed a Scene Description technology. In 1996, when the development of the MPEG-4 standard required it, it took the Virtual Reality Modelling Language (VRML) and extended it to support MPEG-specific functionalities. Of course compression could not be absent from the list. So the Binary Format for Scenes (BiFS), specified in MPEG-4 Part 11 Scene description and application engine was born to allow for efficient representation of dynamic and interactive presentations, comprising 2D & 3D graphics, images, text and audiovisual material. The representation of such a presentation includes the description of the spatial and temporal organisation of the different scene components as well as user-interaction and animations.

In MPEG-I scene description is playing again an important role. However, MPEG this time does not even intend to pick a scene description technology. It will define instead some interface to a scene description parameters.


Many thousands of fonts are available today for use as components of multimedia content. They often utilise custom design fonts that may not be available on a remote terminal. In order to insure faithful appearance and layout of content, the font data have to be embedded with the text objects as part of the multimedia presentation.

MPEG-4 part 18 Font Compression and Streaming defines and provides two main technologies:

  • OpenType and TrueType font formats
  • Font data transport mechanism – the extensible font stream format, signaling and identification


Multimedia is a combination of multiple media in some form. Probably the closest multimedia “thing” in MPEG is the standard called Multimedia Application Formats. However, MPEG-A is an integrated package of media for specific applications and does not does define any specific media format. It only specifies how you can combine MPEG (and sometimes other) formats.

MPEG-7 part 5 Multimedia Description Schemes (MDS) specifies the different description tools that are not visual and audio, i.e. generic and multimedia. By comprising a large number of MPEG-7 description tools from the basic audio and visual structures MDS enables the creation of the structure of the description, the description of collections and user preferences, and the hooks for adding the audio and visual description tools. This is depicted in Figure 3.

Figure 3 – The different functional groups of MDS description tools

Neural Networks

Requirements for neural network compression have been exposes in Moving intelligence around. After 18 months of intense preparation with development of requirements, identification of test material, definition of test methodology and drafting of a Call for Proposals(CfP), at the March 2019 (126th) meeting , MPEG analysed nine technologies submitted by industry leaders. The technologies proposed compress neural network parameters to reduce their size for transmission, while not or only moderately reducing their performance in specific multimedia applications. MPEG-7 Part 17 Neural Network Compression for Multimedia Description and Analysis is the standard, the part and the title given to the new standard.


MPEG-B part 1 Binary MPEG Format for XML (BiM) is the current endpoint of an activity that started some 20 years ago when MPEG-7 Descriptors defined by XML schemas were compressed in a standard fashion by MPEG-7 Part 1 Systems. Subsequently MPEG-21 needed XML compression and the technology was extended in Part 15 Binary Format.

In order to reach high compression efficiency BiM relies on schema knowledge between encoder and decoder. It also provides fragmentation mechanisms to provide transmission and processing flexibility, and defines means to compile and transmit schema knowledge information to enable decompression of XML documents without a priori schema knowledge at the receiving end.


Genome is digital, and can be compressed presents the technology used in MPEG-G Genomic Information Representation. Many established compression technologies developed for compression of other MPEG media have found good use in genome compression. MPEG is currently busy developing the MPEG-G reference software and is investigating other genomic areas where compression is needed. More concretely MPEG plans to issue a Call for Proposal for Compression of Genome Annotation at its July 2019 (128th) meeting.

Point Clouds

3D point clouds can be captured with multiple cameras and depth sensors with points that can number a few thousands up to a few billions, and with attributes such as colour, material properties etc.

MPEG is developing two different standards whose choice depends on whether the point cloud is dense (this is done in MPEG-I Part 5 Video-based Point Cloud Compression) or less so (MPEG-I Part 9 Graphic-based PCC). The algorithms in both standards are lossy, scalable, progressive and support random access to subsets of the point cloud.

MPEG plans to release Video-based PCC as FDIS in October 2019 and Graphic-based PCC Point Cloud Compression as FDIS in April 2020.


MPEG felt the need to address compression for data from sensor and data to actuator when it considered the exchange of information taking place between the physical world where the user is located and any sort of virtual world generated by MPEG media.

So MPEG undertook the task to provide standard interactivity technologies that allow a user to

  • Map their real-world sensor and actuator context to a virtual-world sensor and actuator context, and vice-versa, and
  • Achieve communication between virtual worlds.

Figure 3 describes the context of the MPEG-V Media context and control standard.

Figure 3 – Communication between real and virtual worlds

The MPEG-V standards defines several data types and their compression

  • Part 2 – Control information specifies control devices interoperability (actuators and sensors) in real and virtual worlds
  • Part 3 – Sensory information specifies the XML Schema-based Sensory Effect Description Language to describe actuator commands such as light, wind, fog, vibration, etc. that trigger human senses
  • Part 4 – Virtual world object characteristics defines a base type of attributes and characteristics of the virtual world objects shared by avatars and generic virtual objects
  • Part 5 – Data formats for interaction devices specifies syntax and semantics of data formats for interaction devices – Actuator Commands and Sensed Information – required to achieve interoperability in controlling interaction devices (actuators) and in sensing information from interaction devices (sensors) in real and virtual worlds
  • Part 6 – Common types and tools specifies syntax and semantics of data types and tools used across MPEG-V parts.

MPEG-IoMT Internet of Media Things is the mapping of the general IoT context to MPEG media developed by MPEG. MPEG-IoMT Part 3 – IoMT Media Data Formats and API also addresses the issue of media-based sensors and actuators data compression.

What is next in data compression?

In Compression standards for the data industries I reported the proposal made by the Italian ISO member body to establish a Technical Committee on Data Compression Technologies. The proposal was rejected on the ground that Data Compression is part of Information Technology.

It was a big mistake because it has stopped the coordinated development of standards that would have fostered the move of different industries to the digital world. The article identified a few such as Automotive, Industry Automation, Geographic information and more.

MPEG has done some exploratory work and found that there quite a few of its existing standards could be extended to serve new application areas. One example is the conversion of MPEG-21 Contracts to Smart Contracts. An area of potential interest is data generated by machine tools in industry automation.


MPEG audio and video compression standards are the staples of the media industry. MPEG continues to develop those standards while investigating compression of other data types in order to be ready with standards when the market matures. Point clouds and DNA reads from high speed sequencing machines are just two examples of how, by anticipating market needs, MPEG prepares to serve timely the industry with its compression standards.

Posts in this thread

More video with more features

In Forty years of video coding and counting I presented a short but intense history of ITU and MPEG video compression standards. In this article I will focus on how more functionalities got added to video compression over the years to MPEG standards and how the next generation of standards will add even more.

The table below gives an overview of all MPEG video compression standards – past, present and planned. Those in italic have not reached Final Draft International Standard (FDIS) level.

Figure 1 – Video coding standards and functionalities

In 1988 MPEG started its first video coding project for interactive video applications on compact disc (MPEG-1). Input video was assumed to be progressive (25/29.97 Hz, but it also supported more frame rates) and spatial resolution was Source Image Format (CIF ), i.e. 240 or 288 lines x 352 pixels. The syntax supported spatial resolutions up to 16 Kpixels. Obviously progressive scanning is a feature that all MPEG video coding standards have supported since MPEG-1. The (obvious) exception is point clouds because there are no “frames”.

In 1990 MPEG started its second video coding project targeting digital television (MPEG-2). Therefore the input was assumed to be interlaced (frame rate of 50/59.94 Hz, but it also supported more frame rates) and spatial resolution was standard/high definition, and up. The resolution space was quantised by means of levels, the second dimension after profiles. MPEG-4 Visual and AVC are the two last standards with specific interlace tools. An attempt was made to introduce interlace tools in HEVC but the technologies presented did not show appreciable improvements if compared with progressive tools. HEVC does have have some indicators (SEI/VUI) to tell the decoder that the video is interlaced.

MPEG-2 was the first standard to tackle scalability (High Profile), multiview (Multiview Profile) and higher croma resolution (4:2:2 Profile). Several subsequent video coding standards (MPEG-4 Visual and AVC and HEVC) also support these new features. VVC is expected to do the same, probably not in version 1.

MPEG-4 Visual supports coding of video objects and error resilience. The first feature has remained specific to MPEG-4 Visual. Most video codecs allow for some error resilience (e.g. starting from slices in MPEG-1). However, MPEG-4 Visual – mobile communication being one relevant use case – was the first to specifically consider error resilience as a tool.

MPEG-2 first tried to develop 10-bit support and the empty part 8 is what is left of that attempt.

Wide Colour Gamut (WCG), High Dynamic Range (HDR) and 3 Degrees of Freedom (3DoF)  are all supported by AVC.  These functionalities were first introduced in HEVC, and later added to AVC and are planned to be supported in VVC as well. WCG allows to display a wider gamut of colours, HDR allows to display pictures with brighter regions and with more visible detail in dark areas, SCC allows to achieve better compression of non natural (synthetic) material such as characters and graphics and 3DoF (also called Video 360) allows to represent pictures projected on a sphere.

AVC supports more than 8 quantisation bits extended to 14 bits. HEVC even support 16 bits. VVC, EVC and LCEVC are expected to also support more than 8 quantisation bits.

WebVC was the first MPEG attempt at defining a video coding standard that would not require a licence that involves payment of fees (Option 1 in ISO language, legal language more complex than this). Strictly speaking, WebVC is not a new standard because MPEG has simply extracted what was the Constrained Baseline Profile in AVC (originally, AVC tried to define an Option 1 profile but did not achieve the goal and did not define the profile) with the hope that WebVC could achieve Option 1 status. The attempt failed because some companies confirmed their Option 2 patent declarations (i.e. a licence is required to use the standard) already made against the AVC standard. The brackets in the figure convey this fact.

Video Coding for Browsers (VCB) is the result of a proposal made by a company in response to an MPEG Call for Proposals for Option 1 video coding technology. Another company made an Option 3 patent declaration (i.e. unavailability to license the technology). As the declaration did not contain any detail that could allow MPEG to remove the allegedly infringing technologies, ISO did not publish VCB as a standard. The square brackets in the figure convey this fact.

Internet Video Coding (IVC) is the third video coding standard intended to be Option 1. Three Option 2 patent declarations were received and MPEG has declared its availability to remove patented technology from the standard if specific technology claims will be made. The brackets convey this fact.

Finally, Essential Video Coding (EVC), part 1 of MPEG-5 (however, the project has not been formally approved by ISO yet), is expected to be a two-layer video coding standard. The EVC Call for Proposals requested that the technologies provided in response to the Call for the first (lower) layer of the standard be Option 1. Technologies for the second (higher) layer are Option 2. The curled brackets in the figure convey this fact.

Screen Content Coding (SCC) SCC allows to achieve better compression of non natural (synthetic) material such as characters and graphics. It is supported by HEVC and is planned to be supported in VVC and possibly EVC.

Low Complexity Enhancement Video Coding (LCEVC) is another two-layer video coding standard. Unlike EVC, however, in LCEVC the lower layer is not tied to any specific technology and can be any video codec. The goal of the 2nd layer is to extend the capability of an existing video codec. A typical usage scenario is to give a large amount of already deployed standard definition set top boxes that cannot be recalled the ability to decode high definition pictures. The LCEVC decoder is depicted in Figure 2.

Figure 2 – Low Complexity Enhancement Video Coding

Today technologies are available to capture 3D point clouds, typically with multiple cameras and depth sensors producing up to billions of points for realistically reconstructed scenes. Point clouds can have attributes such as colors, material properties and/or other attributes and are useful for real-time communications, GIS, CAD and cultural heritage applications. MPEG-I part 5 will specify lossy compression of 3D point clouds employing efficient geometry and attributes compression, scalable/progressive coding, and coding of point clouds sequences captured over time with support of random access to subsets of the point cloud.

Other technologies capture points clouds potentially with low density of points to allow users to freely navigate in multi-sensory 3D media experiences. Such representations require a large amount of data, not feasible for transmission on today’s networks. MPEG is developing a second, graphics-based PCC standard, as opposed to the previous one which is video-based, for efficient compression of sparse point clouds.

3DoF+ is a terms used by MPEG to indicate a usage scenario where the user can have translational movements of the head. In a 3DoF scenario if the user moves the head too much, annoying parallax error is felt. In March 2019 MPEG has received responses to its Call for Proposals requesting appropriate metadata (see the red blocks in Figure 3) to help the Post-processor present the best image based on the viewer’s position if available, or to synthesise a missing one, if not available.

Figure 3 – 3DoF+ use scenario

6DoF indicates a use scenario where the user can freely move in a space and enjoy a 3D virtual experience that matches the one in the real world. Light field refers to new devices that can capture a spatially sampled version of a light field that has both spatial and angular light information in one shot. The size of captured data is not only larger but also different than traditional camera images. MPEG is investigating new and compatible compression methods for potential new services.

In 30 years compressed digital video has made a lot of progress, e.g., bigger and brighter pictures with less bitrate and other features. The end point is nowhere in sight.

Thanks to Gary Sullivan and Jens-Rainer Ohm for useful comments.

Posts in this thread

Matching technology supply with demand


There have always been people in need of technology and, most of the time, people ready to provide something in response to the demand. In book XVIII of Iliad, Thetis, Achilles’s mother, asks Hephestus, the god of, blacksmiths, craftsmen, artisans, sculptors and more, to provide a new armour to her son who had lost it to Hector. Hephestus duly complied. Still in the fictional domain, but in more recent years, Agent 007 visits Q Branch to get the latest gadgets for his next spy mission, which are inevitably put to good use in the mission.

Wars have always been times when the need for technologies stretches the ability to supply them. In our, supposedly peaceful, age, there are lots of technology around, but it is often difficult for companies needing a particular technology to find the solution matching their needs and budget.

Supply and demand in standardisation

Standardisation is an interesting case of an entity,  typically non-commercial and non-governmental, needing technologies to make a standard. Often standards organisations, too, need technologies to accomplish their mission. How can they access the needed technologies?

A few decades back, if industry needed a standard for, say, a video cassette recorder, the process was definitely supply-driven: a company who had developed a successful product (call it Sony or JVC) submitted a proposal (call it Betamax or VHS) to a standards committee (call it IEC SC 60B). Too bad if the process produced two standards.

In the second half of the 1980’s, ITU SG XV (ITU numbering of that time) started developing the H.261 recommendation. Experts developed the standard piece by piece in the committee (so-called Okubo group) by acquiring the necessary technologies in a process where the roles of demand and supply were rather blurred. Only participants were entitled to provide their technologies to fulfill the needs of the standard.

A handful of years later, MPEG further innovated technology procurement in a standardisation environment. To get the technologies needed to make a standard, it used a demand-driven tool – MPEG’s Call for Proposals (CfP). Since then, technologies provided by respondents and assessed for relevance by the group are used to 1) create the initial reference model (RM0) and 2) initiate a first round of Core Experiments (CE). CEs result from the agreement among participating experts that there is room for improving the performance of the standard under development by opening a particular area to optimisation. CEs are continued until available room for optimisation is exhausted. While anybody is entitled to respond to a CfP and contribute technology to RM0, only experts participating in the standardisation project can provide technology for CEs. This, however, is not really a limitation because the process is open to anybody wishing to join a recognised standards organisation who is a member of ISO.

The MPEG process of standards development has allowed the industry to maintain a sustained development and expansion for many years. In fairness, this is not entirely MPEG’s merit. Patent pools have played a synergistic role to MPEG’s by providing (industry) users with the means to practice MPEG standards and IP holders the means to be remunerated for the use of their IP.

The situation today

The HEVC case has shown that the cooperation of different parties to achieve the common goal of enabling the use of a standard is not discounted (see, e.g., A crisis, the causes and a solution). There are several reasons for this: the increasing number of individual technologies needed to make a high-performance MPEG standard, the increasing number of IP holders, the increasing number of Non-Performing Entities (NPE) as providers of technology and the increasing number of patent pools who stand as independent licence providers of a portion of a patent.

I have already made several proposals with the intention of helping MPEG from this stalemate (see, e.g., Business model based ISO/IEC standards). Here I would like to present an additional idea that extends the MPEG process of standards development (see How does MPEG actually work?).

A new process proposal

A possible implementation of the proposal applying to an entity (a company, an industry forum or a standards organisation) wishing to develop a specification or a standard, could run like this (some details could be fine tuned or changed on a case-by-case basis):

  1. An entity (a company, an industry forum or a standards organisation) wishes to develop a specification or a standard
  2. The entity issues a Call for Proposals (CfP) including requirements requesting proponents to to accept the process (as defined in this numbered list) and to commit to RAND licensing of their technologies for the specification or standard
  3. The entity assesses the proposals received
  4. The entity sets aside a certain amount of tokens for the entire standard (e.g. 1,000)
  5. The entity builds a “minimal” Reference Model zero (RM0) using technologies contained in the proposals in a conservative way so as to create ample space for a healthy Core Experiment (CE) process
  6. The entity
    1. Assigns a percentage of tokens to RM0
    2. Establishes the amount of tokens that will be given to the proponent who achieves the highest performance in a CE (e.g. 1 token for each 0.1% improvements)
  7. The entity identifies and publishes CEs at the pace required by the specification or standard
  8. For each CE the entity makes a call containing
    1. A description of the CE
    2. A minimum performance target (say, at least 1% improvement)
    3. A deadline for submission of 1) results and 2) code that proves that the target has been achieved
    4. The maximum level of associated complexity
  9. CE proponents should do due diligence and
    1. Make proposals that contain only their own technologies, or
    2. Ask any third party to join in the response (ad accept the conditions of the CfP)
  10. If the tokens are all used and there is still room for optimisation, new tokens are created and token holders have their tokens scaled down so that the total number of tokens is still 1,000
  11. If room for RM optimisation is exhausted but there are still tokens unassigned, token holders have their tokens scaled up so that the total number of tokens is 1,000.

Depending on the nature of the entity (company-industry forum-standards organisation) another entity, which can be the same entity who has managed the process or a patent pool

  1. Identifies who are IP holders in RM0 and CEs
  2. Removes technology in case the status of IP has not been clarified
  3. After completing steps 1 and 2, pays royalties to IP holders based on the number of tokens they have acquired in the process (i.e. RM0 and CEs)

Merits and limits of the proposal

The proposal achieves the goal to

  1. Associate patent holders to RM0 and CE areas as opposed to associate patent holders just to the standard
  2. Enablthe turning off of technologies of a CE area if this has unclear IP status and turning them on again if the status is clarified

In case the entity developing the specification is a standards organisation, more than one patent pool can develop a licence using the results of the process.


This idea was developed in collaboration with Malvika Rao, the founder of Incentives Research and holder of a PhD from Harvard University, and Don Marti, an open source expert and an advisor at Incentives Research.

Converting the basic concept described above into a workable market design requires further work. There may be opportunities to game the system, and the design must consider issues such as how to attract and retain participation. In addition the design must be tested (e.g., via simulation or usability study) to understand its performance.

Please send comments to Leonardo.

Posts in this thread