Still more to say about MPEG standards

In Is there a logic in MPEG standards? and There is more to say about MPEG standards I have made an overview of the first 11 MPEG standards (white squares in Figure 1). In this article I would like to continue the overview and briefly present the remaining 11 MPEG standards, including those what are still being developed. Using the same convention as before those marked yellow indicate that no work was done on them for a few years

Figure 1 – The 22 MPEG standards. Those in colour are presented in this article


When MPEG begun the development of the Augmented Reality Application Format (ARAF) it also started a specification called Augmented Reality Reference Model. Later it became aware that SC 24 Computer graphics, image processing and environmental data representation was doing a similar work and joined forces to develop a standard called Mixed and Augmented Reality Reference Model with them.

In the Mixed and Augmented Reality (MAR) paradigm, representations of physical and computer mediated virtual objects are combined in various modalities. The MAR standard has been developed to enable

  1. The design of MAR applications or services. The designer may refer and select the needed components from those specified in the MAR model architecture taking into account the given application/service requirements.
  2. The development of a MAR business model. Value chain and actors are identified in the Reference Model and implementors may map them to their business models or invent new ones.
  3. The extension of existing or creation of new MAR standards. MAR is interdisciplinary and creates ample opportunities for extending existing technology solutions and standards.

MAR-RM and ARAF paradigmatically express the differences between MPEG standardisation and “regular” IT standardisation. MPEG defines interfaces and technologies while IT standardars typically defines architectures and reference models. This explains why the majority of patent declarations that ISO receives relate to MPEG standards. It is also worth noting that in the 6 years it took to develop the standard, MPEG developed 3 editions of its ARAF standard.

The Reference architecture of the MAR standard is depicted in the figure below.

Information from the real world is sensed and enters the MAR engine either directly or after being “understood”. The engine can also access media assets or external services. All information is processed by the engine which outputs the result of its processing and manages the interaction with the user.

Figure 2 – MAR Reference System Architecture

Based on this model, the standard elaborates the Entreprise Viewpoint with classes of actors, roles, business model, successful criteria, the Computational Viewpoint with functionalities at the component level and the Informational Viewpoint with data communication between components.

MM-RM is a one-part standard.


Multimedia service platform technologies (MPEG-M) specifies two main components of a multimedia device, called peer in MPEG-M.

As shown in Figure 3, the first component is API: High-Level API for applications and Low Level API for network, energy and security. 

Figure 3 – High Level and Low Level MPEG-M API

The second components is a middleware called MXM that relies specifically on MPEG multimedia technologies (Figure 4)

Figure 4 – The MXM architecture

The Middleware is composed of two types of engine. Technology Engines are used to call functionalities defined by MPEG standards such as creating or interpreting a licence attached to a content item. Protocol Engines are used to communicate with other peer, e.g. in case a peer does not have a particular Technology Engine that another peer has. For instance, a peer can use a Protocol Engine to call a licence server to get a licence to attach to a multimedia content item. The MPEG-M middleware has the ability to create chains of Technology Engines (Orchestration) or Protocol Engines (Aggregation).

MPEG-M is a 5-part standard

  • Part 1 – Architecture specifies the architecture, and High and Low level API of Figure 3
  • Part 2 – MPEG extensible middleware (MXM) API specifies the API of Figure 4
  • Part 3 – Conformance and reference software
  • Part 4 – Elementary services specifies the elementary services provided by the Protocol Engines
  • Part 5 – Service aggregation specifies how elementary services can be aggregated.


The development of the MPEG-U standards was motivated by the evolution of User Interfaces that integrate advanced rich media content such as 2D/3D, animations and video/audio clips and aggregate dedicated small applications called widgets. These are standalone applications embedded in a Web page and rely on Web technologies (HTML, CSS, JS) or equivalent.

With its MPEG-U standard, MPEG sought to have a common UI on different devices, e.g. TV, Phone, Desktop and Web page.

Therefore MPEG-U extends W3C recommendations to

  1. Cover non-Web domains (Home network, Mobile, Broadcast)
  2. Support MPEG media types (BIFS and LASeR) and transports (MP4 FF and MPEG-2 TS)
  3. Enable Widget Communications with restricted profiles (without scripting)

The MPEG-U architecture is depicted in Figure 5.

Figure 5 – MPEG-U Architecture

The normative behaviour of the Widget Manager includes the following elements of a widget

  1. Packaging formats
  2. Representation format (manifest)
  3. Life Cycle handling
  4. Communication handling
  5. Context and Mobility management
  6. Individual rendering (i.e. scene description normative behaviour)

Figure 6 depicts the operation of an MPEG-U widget for TV in a DLNA enviornment.

Figure 6 – MPEG-U for TV in a DLNA environment

MPEG-U is a 3-part standard

  • Part 1 – Widgets
  • Part 2 – Additional gestures and multimodal interaction
  • Part 3 – Conformance and reference software


High efficiency coding and media delivery in heterogeneous environments (MPEG-H) is an integrated standard that resumes the original MPEG “one and trine” Systems-Video-Audio standards approach. In the wake of those standards, the 3 parts can be and are actually used independently, e.g. in video streaming applications. On the other hand, ATSC have adopted the full Systems-Video-Audio triad with extensions of their own.

MPEG-H has 15 parts, as follows

  1. Part 1 – MPEG Media Transport (MMT) is the solution for the new world of broadcasting where delivery of content can take place over different channels each with different characteristics, e.g. one-way (traditional broadcasting) and two-way (the ever more pervasive broadband network). MMT assumes that the Internet Protocol is common to all channels.
  2. Part 2 – High Efficiency Video Coding (HEVC) is the latest approved MPEG video coding standard supporting a range of functionalities: scalability, multiview, from 4:2:0 to 4:4:4, up to 16 bits, Wider Colour Gamut and High Dynamic Range and Screen Content Coding
  3. Part 3 – 3D Audio il the latest approved audio coding standards supporting enhanced 3D audio experiences
  4. Parts 4, 5 and 6 Reference software for MMT, HEVC and 3D Audio
  5. Parts 7, 8, 9 Conformance testing for MMT, HEVC and 3D Audio
  6. Part 10 – MPEG Media Transport FEC Codes specifies several Forward Erroro Correcting Codes for use by MMT.
  7. Part 11 – MPEG Composition Information specifies an extention to HTML 5 for use with MMT
  8. Part 12 – Image File Format specifies a file format for individual images and image sequences
  9. Part 13 – MMT Implementation Guidelines collects useful guidelines for MMT use
  10. Parts 14 – Conversion and coding practices for high-dynamic-range and wide-colour-gamut video and 15 – Signalling, backward compatibility and display adaptation for HDR/WCG video are technical reports to guide users in supporting HDR/WCC,


Dynamic adaptive streaming over HTTP (DASH) is a suite of standards for the efficient and easy streaming of multimedia using available HTTP infrastructure (particularly servers and CDNs, but also proxies, caches, etc.). DASH was motivated by the popularity of HTTP streaming and the existence of different protocols used in different streaming platforms, e.g. different manifest and segment formats.

By developing the DASH standard for HTTP streaming of multimedia content, MPEG has enabled a standard-based client to stream content from any standard-based server, thereby enabling interoperability between servers and clients of different vendors.

As depicted in Figure 7, the multimedia content is stored on an HTTP server in two components: 1) Media Presentation Description (MPD) which describes a manifest of the available content, its various alternatives, their URL addresses and other characteristics, and 2) Segments which contain the actual multimedia bitstreams in form of chunks, in single or multiple files.

Figure 7 – DASH model

Currently DASH is composed of 8 parts

  1. Part 1 – Media presentation description and segment formats specifies 1) the Media Presentation Description (MPD) which provides sufficient information for a DASH client to adaptive stream the content by downloading the media segments from a HTTP server, and 2) the segment formats which specify the formats of the entity body of the request response when issuing a HTTP GET request or a partial HTTP GET.
  2. Part 2 – Conformance and reference software the regular component of an MPEG standard
  3. Part 3 – Implementation guidelines provides guidance to implementors
  4. Part 4 – Segment encryption and authentication specifies encryption and authentication of DASH segments
  5. Part 5 – Server and Network Assisted DASH specifies asynchronous network-to-client and network-to-network communication of quality-related assisting information
  6. Part 6 – DASH with Server Push and WebSockets specified the carriage of MPEG-DASH media presentations over full duplex HTTP-compatible protocols, particularly HTTP/2 and WebSockets
  7. Part 7 – Delivery of CMAF content with DASH specifies how the content specified by the Common Media Application Format can be carried by DASH
  8. Part 8 – Session based DASH operation will specify a method for MPD to manage DASH sessions for the server to instruct the client about some operation continuously applied during the session.


Coded representation of immersive media (MPEG-I) represents the current MPEG effort to develop a suite of standards to support immersive media products, services and applications.

Currently MPEG-I has 11 parts but more parts are likely to be added.

  1. Part 1 – Immersive Media Architectures outlines possible architectures for immersive media services.
  2. Part 2 – Omnidirectional MediA Format specifies an application format that enables consumption of omnidirectional video (aka Video 360). Version 2 is under development
  3. Part 3 – Immersive Video Coding will specify the emerging Versatile Video Coding standard
  4. Part 4 – Immersive Audio Coding will specify metadata to enable enhanced immersive audio experiences compared to what is possible today with MPEG-H 3D Audio
  5. Part 5 – Video-based Point Cloud Compression will specify a standard to compress dense static and dynamic point clouds
  6. Part 6 – Immersive Media Metrics will specify different parameters useful for immersive media services and their measurability
  7. Part 7 – Immersive Media Metadata will specify systems, video and audio metadata for immersive experiences. One example is the current 3DoF+ Video activity
  8. Part 8 – Network-Based Media Processing will specify APIs to access remote media processing services
  9. Part 9 – Geometry-based Point Cloud Compression will specify a standard to compress sparse static and dynamic point clouds
  10. Part 10 – Carriage of Point Cloud Data will specify how to accommodate compressed point clouds in the MP4 File Format
  11. Part 11 – Implementation Guidelines for Network-based Media Processing is the usual collection of guidelines


Coding-Independent Code-Points (MPEG-CICP) is a collecion of code points that have been assemnled in single media- and technology-specific documents because they are not standard-specific.

Part 1 – Systems, Part 2 – Video and Part 3 – Audio collelct the respective code points and Part 4 – Usage of video signal type code points contains guidelines for their use


Genomic Information Representation (MPEG-G) is a suite of specifications developed jointly with TC 276 Biotechnology that allows to reduce the amount of information required to losslessly store and transmit DNA reads from high speed sequencing machines.

Figure 8 depicts the encoding process

An MPEG-G file can be created with the following sequence of operations:

  1. Put the reads in the input file (aligned or unaligned) in bins corresponding to segments of the reference genome
  2. Classify the reads in each bin in 6 classes: P (perfect match with the reference genome), M (reads with variants), etc.
  3. Convert the reads of each bin to a subset of 18 descriptors specific of the class: e.g., a class P descriptor is the start position of the read etc.
  4. Put the descriptors in the columns of a matrix
  5. Compress each descriptor column (MPEG-G uses the very efficient CABAC compressor already present in several video coding standards)
  6. Put compressed descriptors of a class of a bin in an Access Unit (AU) for a maximum of 6 AUs per bin

Figure 8 – MPEG-G compression

MPEGG-G currently includes 6 parts

  1. Part 1 – Transport and Storage of Genomic Information specifies the file and streaming formats
  2. Part 2 – Genomic Information Representation specified the algorithm to compress DNA reads from jigh speed sequencing machines
  3. Part 3 – Genomic information metadata and application programming interfaces (APIs) specifies metadat and API to access an MPEG-G file
  4. Part 4 – Reference Software and Part 5 – Conformance are the usual components of a standard
  5. Part 6 – Genomic Annotation Representation will specify how to compress annotations.


Internet of Media Things (MPEG-IoMT) is a suite of specifications:

  1. API to discover Media Things,
  2. Data formats and API to enable communication between Media Things.

A Media Thing (MThing) is the media “version” of IoT’s Things.

The IoMT reference model is represented in Figure 9

Figure 9: IoT in MPEG is for media – IoMT

Currently MPEG-IoMT includes 4 parts

  1. Part 1 – IoMT Architecture will specify the architecture
  2. Part 2 – IoMT Discovery and Communication API specifies Discovery and Communication API
  3. Part 3 – IoMT Media Data Formats and API specifies Media Data Formats and API
  4. Part 4 – Reference Software and Conformance is the usual part of MPEG stndards


General Video Coding (MPEG-5) is expected to contain video coding specifications. Currently two specifications are envisaged

  1. Part 1 – Essential Video Coding is expected to be the specification of a video codec with two layers. The first layer will provide a significant improvement over AVC but significantly less than HEVC and the second layer will provide a significant improvement over HEVC but significantly less than to VVC.
  2. Part 2 – Low Complexity Video Coding Enhancements is expected to be the specification of a data stream structure defined by two component streams, a base stream decodable by a hardware decoder, and an enhancement stream suitable for software processing implementation with sustainable power consumption. The enhancement stream will provide new features such as compression capability extension to existing codecs, lower encoding and decoding complexity, for on demand and live streaming applications. The LCEVC decoder is depicted in Figure 18.

Figure 18: Low Complexity Enhancement Video Coding

That’s all?

Well, yes, in terms of standards that have been developed, are being developed or being extended, or for which MPEG thinks that a standard should be developed. Well, no, because MPEG is a forge of ideas and new proposals may come at every meeting.

Currently MPEG is investigating the following topics

  1. In advance signalling of MPEG containers content is motivated by scenarios where the full content of a file is not available to a player but the player needs to take a decision to retrieve the file or not. Therefore the player needs to have sufficient information to determine if it can/cannot play the entire content or only a part.
  2. Data Compression continues the exploration in search for non typical media areas that can benefit from MPEG’s compression expertise. Currently MPEG is investigating Data compression for machine tools.
  3. MPEG-21 Based Smart Contracts investigates the benefits of converting MPEG-21 contract technologies, which can be human readable, to smart contracts for execution on blockchains.

Posts in this thread

The MPEG work plan (March 2019)


In Life inside MPEG I introduced the MPEG work plan. The clock in MPEG moves fast and that work plan is now obsolete. Here is a new re-formatted version of the MPEG work plan as of March 2019.

 The MPEG work plan at a glance

Figure 1 shows the main standards that MPEG has developed or is developing in the 2017-2023 period. The figure is organised in 3 main sections:

  • Media Coding (e.g. AAC and AVC)
  • Systems and Tools (e.g. MPEG-2 TS and File Format)
  • Beyond Media (currently Genome Compression).\

Figure 1 – The MPEG work plan (March 2019)

Disclaimer: dates in the figure and in the following are all planned.

 Navigating the areas of the MPEG work plan

The 1st column in Figure 2 gives the currently active MPEG standardisation areas. The first row gives the currently active MPEG standards. The non-empty white cells give the number of “deliverables” (Standards, Amendments and Technical Reports) currently identified in the work plan.

Figure 2 – Standards (S), Amendments (A) and Technical Reports (T)  in the MPEG work plan (as of March 2019)

Video coding

In the Video coding area MPEG is currently developing specifications for 4 standards: MPEG-H, -I, -5 and -CICP) and is conducting explorations in advanced technologies for immersive visual experiences.


Part 2 – High Efficiency Video Coding 4th edition specifies a profile of HEVC that will have an encoding of a single (i.e. monochrome) colour plane and will be restricted to a maximum of 10 bits per sample, as done in past HEVC range extensions profiles, and additional Supplemental Enhancement Information (SEI) messages, e.g. fisheye video, SEI manifest, and SEI prefix messages.


Part 3 – Versatile Video Coding, currently being developed jointly with VCEG, MPEG is working on the new video compression standard after HEVC. VVC is expected to reach FDIS stage in July 2020 for the core compression engine. Other parts, such as high level syntax and SEI messages will follow later.


Part 4 – Usage of video signal type code points 2nd edition will document additional combinations of commonly used code points and baseband signalling.


This standard is still awaiting approval, but MPEG has already obtained all technologies necessary to develop standards with the intended functionalities and performance from the Calls for Proposals (CfP).

  1. Part 1 – Essential Video Coding will specify a video codec with two layers: layer 1 significantly improves over AVC but performs significantly less than HEVC and layer 2 significantly improves over HEVC but performs significantly less than VVC.
  2. Part 2 – Low Complexity Video Coding Enhancements will specify a data stream structure defined by two component streams: stream 1 is decodable by a hardware decoder, stream 2 can be decoded in software with sustainable power consumption. Stream 2 provides new features such as compression capability extension to existing codecs, lower encoding and decoding complexity, for on demand and live streaming applications.


MPEG experts are collaborating in the development of support tools, acquisition of test sequences and understanding of technologies required for 6DoF and lightfields.

  1. Compression of 6DoF visual will enable a user to move more freely than in 3DoF+, eventually, allowing any translation and rotation in space.
  2. Compression of dense representation of light fields is stimulated by new devices that capture light field with both spatial and angular light information. As the size of data is large and different from traditional images, effective compression schhemes are required.

Audio coding

In the Audio coding area MPEG is working on 2 standards (MPEG-D, and -I).


In Part 5 – Uncompressed Audio in MP4 File Format, MPEG extends MP4 to enable carriage of uncompressed audio (e.g. PCM). At the moment MP4 only carries compressed audio.


Part 4 Immersive Audio. As MPEG-H 3D Audio already supports a 3DoF user experience, MPEG-I builds upon it to provide a 6DoF immersive audio experience. A Call for Proposal will be issued in October 2019. Submissions are expected in October 2021 and FDIS stage is expected to be reached in April 2022. Even though this standard will not be about compression, but about metadata as for 3DoF+ Visual, we have kept this activity under Audio Coding.

3D Graphics Coding

In the 3D Graphics Coding area MPEG is developing two parts of MPEG-I.

  • Part 5 – Video-based Point Cloud Compression (V-PCC) for which FDIS stage is planned to be reached in October 2019.
  • Part 9 – Geometry-based Point Cloud Compression (G-PCC) for which FDIS stage is planned to be reached in January 2020.

The two PCC standards employ different technologies and target different application areas, generally speaking, entertainment and automotive/unmanned aerial vehicles,

Font Coding

In the Font coding area MPEG is working on a new edition of MPEG-4 part 22.

Part 22 – Open Font Format. 4th edition specifies support of complex layouts and additional support for new layout features. FDIS stage will be reached in April 2020.

Genome Coding

In the Genome coding area MPEG has achieved FDIS level for  the 3 foundational parts of the MPEG-G standard:

  • Part 1 – Transport and Storage of Genomic Information
  • Part 2 – Genomic Information Representation
  • Part 3 – Genomic information metadata and application programming interfaces (APIs).

In October 2019 MPEG will complete Part 4 – Reference Software and Part 5 – Conformance. In July 2019 MPEG will issue a Call for Proposals for Part 6 – Genomic Annotation Representation.

Neural Network Coding

Compression of this type of data is motivated by the increasing use of neural networks in many applications that require the deployment of a particular trained network instance to a potentially large number of devices, which may have limited processing power and memory.

MPEG has restricted the general field to neural networks trained with media data, e.g. for the object identification and content description, and is therefore developing the standard in MPEG-7 which already contains two standards – CDVS and CDVA – which offer similar functionalities achieved with different technologies (and therefore the standard should be classified under Media description).


Part 17 – Compression of neural networks for multimedia content description and analysis MPEG is developing a standard that enable compression of artificial neural networks trained with audio and video data. FDIS is expected in January 2021.

Media Description

Media description is the goal of the MPEG-7 standard which contains technologies for describing media, e.g. for the purpose of searching media.

In the Media Description area MPEG has completed Part 15 Compact descriptors for video analysis (CDVA) in October 2018 and is now working on 3DoF+ visual.


Part 7 – Immersive Media Metadata will specify a set of metadata that enable a decoder to provide a more realistic user experience in OMAF v2. The FDIS is planned for July 2021.

System support

In the System support area MPEG is working on MPEG-4 and -I.


Part 34 – Registration Authorities aligns the existing MPEG-4 Registration Autorities to current ISO practice.


In MPEG-H MPEG is working on

Part 10 – MPEG Media Transport FEC Codes. This is being enhanced with the Window-based FEC code. FDAM is expected to be reached in January 2020.


Part 6 – Immersive Media Metrics specifies the metrics and measurement framework in support of immersive media experiences. FDIS stage is planned to be reached in July 2020.


In the Transport area MPEG is working on MPEG-2, -4, -B, -H, -DASH, -I and Explorations.


Part 2 – Systems continues to be a lively area of work 25 years after MPEG-2 Systems reached FDIS. After producing Edition 7, MPEG is working on two amendments to carry two different types of content

  • Carriage of JPEG XS in MPEG-2 TS JPEG XS
  • Carriage of associated CMAF boxes for audio-visual elementary streams in MPEG-2 TS


Part 12 – ISO Based Media File Format Systems continues to be a lively area of work 20 years after MP4 File Format reached FDIS. MPEG is working on two amendments

  • Corrected audio handling, expected to reach FDAM in July 2019
  • Compact movie fragment is expected to reach FDAM stage in January 2020


In MPEG-B MPEG is working on two new standards

  • Part 14 – Partial File Format provides a standard mechanism to store HTTP entities and the partial file in broadcast applications for later cache population. The standard is planned to reach FDIS stage in July 2020.
  • Part 15 – Carriage of Web Resources in ISOBMFF will make it possible to enrich audio/video content, as well as audio-only content, with synchronised, animated, interactive web data, including overlays. The standard is planned to reach FDIS stage in January 2020.


In MPEG-DASH MPEG is working on

  • Part 1 – Media presentation description and segment formats will see a new edition in July 2019 and will be enhanced with an Amendment on Client event and timed metadata processing. FDAM is planned to be reached in January 2020.
  • Part 3 – MPEG-DASH Implementation Guidelines 2nd edition will become TR in July 2019
  • Part 5 – Server and network assisted DASH (SAND) will be enriche by an Amendment on Improvements on SAND messages. FDAM to be reached in July 2019.
  • Part 7 – Delivery of CMAF content with DASH a Technical Report with guidelines on the use of the most popular delivery schemes for CMAF content using DASH. TR is planned to be reached in March 2019
  • Part 8 – Session based DASH operation will reach FDIS in July 2020.


Part 2 – Omnidirectional Media Format (OMAF) released in October 2017 is the first standard format for delivery of omnidirectional content. With OMAF 2nd Edition Interactivity support for OMAF, planned to reach FDIS in July 2020, MPEG is extending OMAF with 3DoF+ functionalities.

Application Formats

MPEG-A ISO/IEC 23000 Multimedia Application Formats is a suite of standards for combinations of MPEG and other standards (only if there are no suitable MPEG standard for the purpose).  MPEG is working on

Part 19 – Common Media Application Format 2nd edition with support of new formats

Application Programming Interfaces

The Application Programming Interfaces area comprises standards that make possible effective use of some MPEG standards.


Part 8 – Network-based Media Processing (NBMP), a framework that will allow users to describe media processing operations to be performed by the network. The standard is expected to reach FDIS stage in January 2020.

Media Systems

Media Systems includes standards or Technical Reports targeting architectures and frameworks.


Part 1 – IoMT Architecture, expected to reach FDIS stage in October 2019. The architecture used in this standard is compatible with the IoT architecture developed by JTC 1/SC 41.

Reference Implementation

MPEG is working on the development of standards for reference software of MPEG-4, -7, A, -B, -V, -H, -DASH, -G, -IoMT


MPEG is working on the development of standards for conformance of MPEG-4, -7, A, -B, -V, -H, -DASH, -G, -IoMT.

The MPEG standards

MPEG uses acronyms for its standards and industry knows them by them. Here you will find the full list of MPEG standards ordered by the 5-digit ISO numbers.

MPEG-1 ISO/IEC 11172 Coding of moving pictures and associated audio for digital storage media at up to about 1,5 Mbit/s

MPEG-2 ISO/IEC 13818 Generic coding of moving pictures and associated audio information

MPEG-4 ISO/IEC 14496 Coding of audio-visual objects

MPEG-7 ISO/IEC 15938 Multimedia content description interface

MPEG-21 ISO/IEC 21000 Multimedia Framework

MPEG-A ISO/IEC 23000 Multimedia Application Formats

MPEG-B ISO/IEC 23001 MPEG systems technologies

MPEG-C ISO/IEC 23002 MPEG video technologies

MPEG-D ISO/IEC 23003 MPEG audio technologies

MPEG-E ISO/IEC 23004 Multimedia Middleware

MPEG-V ISO/IEC 23005 Media context and control

MPEG-M ISO/IEC 23006 Multimedia service platform technologies

MPEG-U ISO/IEC 23007 Rich media user interfaces

MPEG-H ISO/IEC 23008 High efficiency coding and media delivery in heterogeneous environments

MPEG-DASH ISO/IEC 23009 Dynamic adaptive streaming over HTTP (DASH)

MPEG-I ISO/IEC 23090 Coded representation of immersive media

MPEG-CICP ISO/IEC 23091 Coding-Independent Code-Points

MPEG-G ISO/IEC 23092 Genomic Information Representation

MPEG-IoMT ISO/IEC 23093 Internet of Media Things

MPEG-5 ISO/IEC 23094 General Video Coding

Posts in this thread

Looking inside an MPEG meeting


In There is more to say about MPEG standards I presented the entire spectrum of MPEG standards. No one should deny that it is an impressive set of disparate technologies integrated to cover  fields connected by the common thread of Data Compression: Coding of Video, Audio, 3D Graphics, Fonts, Digital Items, Sensors and Actuators Data, Genome, and Neural Networks; Media Description and Composition; Systems support; Intellectual Property Management and Protection (IPMP); Transport; Application Formats; API; and Media Systems.

How on earth can all these technologies be specified and integrated in MPEG standards to respond to industry needs?

This article will try and answer this question. It will do so by starting, as many novels do, from the end (of an MPEG meeting).

Let’s start from the end (of an MPEG meeting)

When an MPEG meeting closes, the plenary approves the results of the week, marking the end of formal collaborative work within the meeting. Back in 1990 MPEG developed a mechanism – called “ad hoc group” (AhG) – that would allow to continue a form of collaboration. This mechanism allows MPEG experts to continue working together, albeit with limitations:

  1. In the scope, i.e. an AhG may only work on the areas identified by the mandates (in Latin ad hoc means “for a specific purpose”). Of course experts are free to work individually on anything and in any way that please them;
  2. In the purpose, i.e. an AhG may only prepare recommendations – in the scope of its mandates – to be submitted to MPEG. This is done at the beginning of the following meeting, after which the AhG is disbanded;
  3. In the method of work, i.e. an AhG operates under the leadership of one or more Chairs. Clearly, though, the success of an AhG depends very much on the attitude and activity of its members.

On average some 25 AhGs are established at each meeting. There is not one-to-one correspondence between MPEG activities and AhGs. Actually AhGs are great opportunities to explore new and possibly cross-subgroup ideas.

Examples of AhG titles are

  1. Scene Description for MPEG-I
  2. System technologies for Point Cloud Coding (PCC)
  3. Network Based Media Processing (NBMP)
  4. Compression of Neural Networks (NNR).

What happens between MPEG meetings

An AhG uses different means to carry out collaborative work:  by using reflectors, by teleconferencing and by holding physical meetings. The last can only be held if they were scheduled in the AhG establishment form. Unscheduled physical meetings may only be held if there is unanimous agreement of those who subscribed to the AhG.

Most AhGs hold scheduled meetings on the weekend that precedes the next MPEG meeting. These are very useful to coordinate the results of the work done and to prepare the report that all AhGs must make to the MPEG plenary on the following Monday.

AhG meetings, including those in the weekend preceding the MPEG meeting, are not formally part of an MPEG meeting.

An MPEG meeting at a glance

Chairs meeting

MPEG chairs meet three times during an MPEG week:

  1. On Sunday evening to review the progress of AhG work, coordinate activities impacting more than one Subgroup and plan activities to be carried out during the week including the need for joint meetings;
  2. On Tuesday evening to assess the result of the first two days of work, review the work plan and time lines based on the expected outcomes and identify the need of new joint meetings;
  3. On Thursday evening to wrap up the expected results and review the preliminary results of the week.

Plenary meetings

During an MPEG week MPEG holds 3 plenaries

  1. On Monday morning: to make everybody aware of the results of work carried out since the last meeting and to plan work of the week. AHG reports are a main part of it as they are presented and, when necessary, discussed;
  2. On Wednesday morning to make everybody aware of the work done in all subgroups in the first two days and to plan work for the next two days;
  3. On Friday afternoon to approve the results of the work of Subgroups, including liaison letters, to establish new AhGs etc.

Subgroup, Breakout Group and Joint meetings

Subgroups start their meetings on Monday afternoon. They review their own activities and kick off work in their areas. Each subgroup assigns activities to breakout groups (BoG) who meet with their own schedules to achieve the goals assigned. Each Subgroup may hold other brief meetings to keep everybody in the Subgroup in sync with the general progress of the work.

For instance, the activities of the Systems Subgroups are currently: File format, DASH, OMAF, OMAF and DASH, OMAF and MIAF, MPEG Media Transport, Network Based Media Processing and PCC Systems.

The MPEG structure is designed to facilitate interactions between different Subgroups and BoGs from different Subgroups to discuss matters that affect different Subgroups and BoGs, because they are at the interface of MPEG subsystems, For example, the table below lists the joint meetings that the Systems Subgroup held with other Subgroups at the January 2019 meeting.

Table 1 – Joint meeting of Systems Subgroup with other Subgroups

Systems meeting with Topics
Reqs, Video, VCEG SEI messages in VVC
Audio, 3DG Scene Description
3DG Systems for Point Cloud Compression
3DG API for multiple decoders
Audio Uncompressed Audio
Reqs, JVET, VCEG Immersive Decoding Interface

NB: VCEG is the Video Coding Experts Group of ITU-T Study Group 16. It is not an MPEG Subgroup.

On Friday morning all Subgroups approve their own results. These are automatically integrated in the general document to be approved by the MPEG Plenary on Friday afternoon.

Advisors meeting

On Monday evening, an informal group of experts from different countries examines issues of general (non-technical) interest. In particular it calls for meeting hosts, reviews proposals of meeting hosts, makes recommendations of meeting hosts to the plenary etc.

A bird’s eye view of an MPEG meeting

Figure 1 depicts the workflow described in the paragraphs above, starting from the end of the N-1 th meeting to the end of the N-th meeting.

Figure 1 – A snapshot of MPEG works from the end of a meeting to the end of the next meeting

What is “done” at an MPEG meeting?

There are around 500 of the best worldwide experts attending an MPEG meeting. It is an incredible amount of brain power that is mobilised at an MPEG meeting. In the following I will try and explain how this brain power is directed.

An example – the video area

Let’s take as example the work done in the Video Coding area at the March 2019 meeting.

The table below has 3 columns:

  1. The standards on which work is done (Video has worked on MPEG-H, MPEG-I, MPEG-CICP, MPEG-5 and Explorations)
  2. The names of the activities and
  3. The types of documents resulting from the activities (see the following legend for an explanation of the acronyms).

Table 2 – Documents produced in the video coding area

Std Activity Document type
H High Efficiency (HEVC) TM, CE, CTC
I Versatile Video Coding (VVC) WD, TM, CE, CTC
3 Degrees of Freedom + (3DoF+) coding CfP, WD, TM, CE, CTC
CICP Usage of video signal type code points (Ed. 1) TR
Usage of video signal type code points (Ed. 2) WD
5 Essential Video Coding WD, TM, CE, CTC
Low Complexity Enhancement Video Coding CfP, WD, TM, CE, CTC
Expl 6 Degrees of Freedom (6DoF) coding EE, Tools
Coding of dense representation of light fields EE, CTC


TM: Test Model, software implementing the standard (encoder & decoder)

WD: Working Draft

CE: Core Experiment, i.e. definition of and experiment that should improve performance

CTC: Common Test Conditions, to be used by all CE participants

CfP: Call for Proposals (this time no new CfP produced, but reports and analyses of submissions in response to CfPs)

TR: Technical Report (ISO document)

EE: Exploration Experiment, an experiment to explore an issue because it si not mature enough to be a CE

Tools: other supporting material, e.g. software developed for common use in CEs/EEs

What is produced by an MPEG meeting

Figure 2 gives the number of activities for each type of activity defined in the legend (and others that were not part of the work in the video area). For instance, out of a total of 97 activities:

  1. 29 relate to processing of standards through the canonical stages of Committee Draft (CD), Draft International Standard (DIS) and Draft International Standard (FDIS) and the equivalent for Amendments, Technical Reports and Corrigenda. In other words, at every meeting MPEG is working on ~10 “deliverables” (i.e. standards, amendments, technical reports or corrigenda) in the approval stages;
  2. 22 relate to working drafts, i.e. “new” activities that have not entered the approval stages;
  3. 8 relate to Technologies under Consideration, i.e. new technologies that are being considered to enhance existing standards;
  4. 8 relate to requirements, typically for new standards;
  5. 6 relate to Core Experiments;
  6. Etc.


Figure 2 – Activities at an MPEG meeting

Figure 2 does not provide a quantitative measure of “how many” documents were produced for each activity or “how big” they were.  As an example, Point Cloud Compression has 20 Core Experiments and 8 Exploration Experiments under way, while MPEG-5 EVC has only one large CE.

An average value of activity at the March 2019 meeting is provided by dividing the number of output documents (212), by the number of activities (97), i.e. 2.2.


MPEG holds quarterly meetings with an attendance of ~500 experts. If we assume that the average salary of an MPEG expert is 500 $/working day and that every expert stays 6 days (to account for attendance at AhG meetings), the industry investment in attending MPEG meetings is 1.5 M$/meeting or 6 M$/year. Of course, the total investment is more than that and probably in excess of 1B$ a year.

With the meeting organisation described above MPEG tries to get the most out of the industry investment in MPEG standards.

Posts in this thread



The article MPEG: what it did, is doing, will do recounts my statistically not insignificant experience of asking taxi drivers across different cities of the world if they know MPEG. I do not have similar amount of data to report for ISO, but I am pretty sure that if I asked a taxi driver if they know ISO, the yes rate would be considerably lower than for MPEG.

This is not merit of MPEG or demerit of ISO as organisations. MPEG – Moving Pictures Experts Group – is lucky to deal with things that let people make content that other people can see and hear in ever new ways. ISO – International Organisation for Standardisation – is an organisation with the mission to develop international standards for anything that is not telecommunication – the purview of the International Telecommunication Union (ITU) – and electrotechnical – the purview of the International Electrotechnical Commission (IEC).

The ISO organisation

The above may seem rather abstract, so let’s see what the difference means in practice. ISO is a huge organisation structured in Technical Committees (TC). Actually, the structure is more complex than that (see Figure 1), but for the purpose of what I want to say, this is enough.

The first 3 – still active – TCs in ISO are: TC 1 Screw threads, TC 2 Fasteners and TC 4 Rolling bearings. The standards produced by these TCs are industrially very important, but the topics hardly make peoples’ hearts beat faster. The last 3 TCs in order of establishment are TC 322 Sustainable finance, TC 323 Circular economy and TC 324 Sharing economy. The standards produced by these TCs are important for the financial industries, but probably little known even in financial circles. Between these two extremes we have a large number of TCs, e.g., TC 35 Paints and varnishes, TC 186 Cutlery and table and decorative metal hollow-ware, TC 249 Traditional Chinese medicine, TC 282 Water reuse, TC 297 Waste collection and transportation management, etc.

ISO TCs work on areas of human endeavour that are extremely important to industrial and social life. Many of these activities, however, do not say much to man in the street.

Where is MPEG in this picture? To answer this question I need to dig deeper in the ISO organisation. Most TCs do not have a monolithic structure. They are organised in working groups (WG). TCs retain key functions such as strategy and management, and WGs are tasked to develop standards. In quite a few cases the area of responsibility is so broad that a horizontal organisation would not be functional. In this case a TC may decide to establish Subcommittees (SC). They are like mini TCs where WGs developstandards under them.`

Figure 1 – ISO governance structure

In 1987 ISO/TC 97 Data Processing merged with IEC/TC 83 Information technology equipment. The resulting (joint) technical committee was and is called ISO/IEC JTC 1 Information Technology. One JTC 1 SC, SC 2 Character sets and Information Coding of JTC 1, included WG 8 Coding of Audio and Picture Information. WG 8 established the Moving Picture Experts Group (MPEG) in January 1988. In 1991 when SC 2/WG 8 seceded from SC 2 and became SC 29, MPEG became WG 11  Coding of audio, picture, multimedia and hypermedia information (but everybody calls it MPEG).

MPEG changed the world of media

Those who have survived the description of the ISO organigram will now have the opportunity to understand how this group of experts, in the depths of the ISO (and IEC) organisation changed the world of media and impacted the lives of billions of people, probably all of those on the face of the Earth, if we exclude some hermits in the few surviving tropical forests, the many deserts or the frozen lands.

The main reason of the success of MPEG is that for 30 years it had carte blanche to implement its ideas. Some of them were clear at the outset, others took shape from a process of learning on the job.

Let’s revisit MPEG’s ideas of standardisation to understand what it did and why.

Idea #1 – Single standards for all countries and industries

The first idea relates to the scope of MPEG standards. In the analogue world absence or scarce availability of broadband communication or deliberate policies or the natural separation between industries that traditionally had little in common, favoured the definition of country-based or industry-based standards. The first steps toward digital video undertaken by countries and industries trod similar paths: different countries and industries tried their own way independently.

MPEG jumped in the scene at a time the different trials had not had the time to solidify, and the epochal analogue-to-digital transition gave MPEG a unique opportunity to effect its disruptive action.

MPEG knew that it was technically possible to develop generic standards that could be used in all countries of the world and in all industries that needed compressed digital media. MPEG saw that all actors affected – manufacturers, service providers and end users – would gain if such a bold action was taken. When MPEG began to tread its adventurous path, MPEG did not know  whether it was procedurally possible to achieve that goal. But it gambled and gave it a try. It used the Requirements subgroup to develop generic requirements, acted on the major countries and trade/standards associations of the main industries and magically got their agreement.

The network of liaisons and, sometimes, joint activities is the asset that allowed MPEG to implement idea #1 and helped achieve many of the subsequent goals.

Idea #2 – Standards for the market, not the other way around

Standards are ethereal entities, but their impact is very concrete. This was true and well understood in the world of analogue media. At that time a company that had developed a successful product would try to get a “standard” stamp on it, share the technology with its competitors and enjoy the economic benefits of their “standard” technology.

With its second idea MPEG reshuffled the existing order of steps. Instead of waiting for the market to decide which technology would win – an outcome that very often had little to do with the value of the technology – MPEG offered its standard development process where the collaboratively defined “best” is developed and assessed by MPEG experts who decide which individual technology wins. Then the “standard” technology package developed by MPEG is taken over by the market.

MPEG standards are consistently the best standards at a given time. Those who have technologies selected to be part of MPEG standards reap the benefits and most likely will continue investing in new technologies for future standards.

Idea #3 – Standards anticipate the future

The third idea is a consequence of the first two. MPEG-1 was driven by the expected possibilities of the audio and video compression technologies of the time. It was a bet on silicon making it possible to execute the complex operations implied by the standard so that industry could build products of which there was no evidence but only educated guesses: interactive video on CD and digital audio broadcasting. Ironically, neither really took off, but other products that relied on the MPEG-1 technologies – Video CD and MP3 – were (the former) and still are (the latter) extremely successful.

MPEG standards anticipate market needs. They are regularly bets that a certain standard technology will be adopted. In  More standards – more successes – more failures you can see how some MPEG standards are extremely successful and other less so.

Idea #4 – Industry-friendly standards

The fourth idea was simple and disruptive. Since its first instances in the 1920s, industry and governments have created tens of television formats, mostly around the basic NTSC, PAL and SECAM families. Even in the late 1960’s, when the Picturephone was developed, AT&T invented a new 267-line format, with no obvious connection with any of the existing video formats.

MPEG never wanted to define its own format. With its fourth idea, propped up by the nature of digital technologies, it just decided that it would support any  format. Here is how it did it:

  1. One standard with not options (this should be obvious, because it is what a standard should be about)
  2. Standards apply only to decoders; encoders are implicitly defined and have ample margins of implementation freedom
  3. Profiles (hierarchical, if possible) to accommodate special industry needs within the same standard
  4. Decoders are defined by their ability to process data, quantised in levels (based on bitrate, resolution etc.)
  5. How different formats are handled is outside of MPEG standards.

Idea #5 – Audio and video come together

The fifth idea was kind of obvious but no less disruptive. Because of the way audio and video industries had developed – audio for a century and video for half a century – people working on the corresponding technologies tended to operate in “watertight compartments”, be they in academia, research or companies. That attitude had some justification in the analogue world because the relevant technologies were indeed different and there was not so much added value in keeping the technologies together, considering the big effort needed to keep the experts together.

However, the digital world with its commonality of technologies, no longer justified keeping the two domains separate. That is why MPEG, just 6 months after its first meeting, kicked off the Audio subgroup after successfully assembling in a few months the best experts.

This injection of new technology with the experts that carried it was not effortless. When transformed into digital, audio and video signals are bits and bits and bits, but the sources are different and influence how they are compressed. Audio experts shared some (at a high level) compression technologies – Subband and Discrete Cosine Transform – but video is (was) a 2D signal changing in time often with “objects” in it, while audio is (was) a 1D signal. More importantly, audio experts were driven by other concerns such as the way the human hearing process handles the data coming out of the frequency analysis carried out by the human cochlea.

The audio work was never “dependent” on the video work. MPEG audio standards can have a stand-alone use (i.e. they do not assume that there is a video associated with it), but there is no MPEG video standard that is without an MPEG Audio standard. So it was necessary to keep the two together and it is even more important to do so now when both video and audio are both 3D signals changing in time.

Idea #6 – Don’t forget the glue that keeps audio and video together

The sixth idea can be described by the formula

Audio and Video ≠ (Audio + Video)

This may look cryptic but it states the obvious. Having audio and video together does not necessarily mean that audio and video will play together in the right way if they are stored on a disk or transmitted over a channel.

The fact that MPEG established a Digital Storage Media subgroup and a Systems subgroups 18 months after its foundation signals that MPEG has always been keenly aware of the issue that a bitstream composed by MPEG audio and video bitstreams need to be transported to be played back as intended by the bitstream creator. In MPEG-1 it was a bitstream in a controlled environment, in MPEG-2 it was a bitstream in a noisy environment, from MPEG-4 on it was on IP, in MPEG-DASH it had to deal with unpredictability of the Internet Protocol in the real world.

During its existence the issue of multiplexing and transport formats have shaped MPEG standards. Without a Systems subgroup, efficiently compressed audio and video bitstreams would have remained floating in the space without a standard means to plug them into real systems.

Idea #7 – Integrated standards as toolkits

Most MPEG standards are composed of the 3 key elements – audio, video and systems – that make an audio-visual system and some, such as MPEG-4 and MPEG-I, even include 3D Graphic information. These standards are integrated in the sense that, if you need a complete solution, you can get what you need from the package offered by MPEG.

The world is more complicated than that. Some users want to cherry pick technologies. In the case of MPEG-I, most likely MPEG will not standardise a Scene Description technology but will just indicate how externally defined technologies can be plugged into the syste.

With its seventh idea MPEG is ready to satisfy the needs of all customers. It defines the means to signal how an external technology can be plugged into a set of other native MPEG technologies. With one caveat: customer has to take care of the integration of the external technology. That MPEG will not do.

Idea #8 – Technology is always on the move

To describe the eight idea, I will seek help from the Greek philosopher Heraclitus (or whoever was the person who said it): τὰ πάντα ῥεῖ καὶ οὐδὲν μένει (everything flows and nothing stays). Digital technologies move fast and actually accelerate. By applying idea #3, #4, #5, #6 and #7, MPEG standards accelerated the orderly transition of analogue to digital media. By applying ideas #1 and #2, MPEG standards prompted technology convergence with its merging of industry segments and appearance of new players.

The seventh idea reminds MPEG that the technology landscape is constantly changing and this awareness must inform its standards. Until HEVC – one can even say, including the upcoming Versatile Video Coding (VVC) – video meant coding a 2D rectangular area (in MPEG-4, a flat area of any shape). The birth of immersive visual experiences is not without pain, but they are becoming possible and MPEG must be ready with solutions that take this basic assumption into account. This means that, in the technology scenario that is shaping up, the MPEG role of “anticipatory standards” is ever more important and ever more challenging to achieve.

Idea #9 – The nature and borders of compression

The ninth idea goes down to the very nature of compression. What is the meaning of compression? Is it “less bits is always good” or can it also be “as few meaningful bits as possible is also good”? The former is certainly desirable but, as the nature of information consumption changes and compression digs deeper in the nature of information, compressed representations that offer easier access to the information embedded in the data becomes more valuable.

What is the scope of application of MPEG compression? When MPEG started the MPEG-1 standards work, the gap that separated the telecom from the CE industries (the first two industries in attendance at that time) were as wide as the media industry and, say, the genomic industries today. Both are digital now and the dialogue gets easier.

With patience and determination MPEG has succeeded in creating a common language and mind set in the media industries. This is an important foundation of MPEG standards, The same amalgamation will continue between MPEG and other industries.

Now the results

Figure 2 intends to attach some concreteness to the nine ideas illustrated above by showing some of the most successful MPEG standards issued from 31 years of MPEG activity.


Figure #2 – Some successful MPEG standards


An entity at the lowest layer of the ISO hierarchy has masterminded the transition of media from the analogue to the digital world. Its standards underpin the evolution of digital media, foster the creation of new industries and offer unrelenting growth to old and new industries worth in excess of 1 trillion USD per year.

Many thanks to the parent body SC 29 for managing the balloting of MPEG standards.

Posts in this thread

Data compression in MPEG

That video is a high profile topic to people interested in MPEG is obvious – MP stands for Moving Pictures – and is shown by the most visited article in this blog Forty years of video coding and counting. Audio is also a high profile topic, so it should not be a surprise given that the official MPEG title is “Coding of Moving Pictures and Audio” and is confirmed by the fact that Thirty years of audio coding and counting has received almost the same amount of visits as the previous one.

What is less known, but potentially very important, is the fact that MPEG has already developed a few standards for compression of a wide range of other data types. Point Cloud is the data type that is acquiring a higher profile by the day is, but there are many more types, as represented by the table below.

Figure 1 – Data types and relevant MPEG standards



The articles Forty years of video coding and counting and More video with more features provide a detailed history of video compression in MPEG from two different perspectives. Here I will briefly list the video-coding related standards produced or being produced by MPEG mentioned in the table.

  • MPEG-1 and MPEG-2 both produced widely used video coding standards.
  • MPEG-4 has been much more prolific.
    • It started with Part 2 Visual
    • It continued with Part 9 Reference Hardware Description, a standard that supports a reference hardware description of the standard expressed in VHDL (VLSI Hardware Description Language), a hardware description language used in electronic design automation.
    • Part 10 is the still high-riding Advanced Video Coding standard.
    • Part 29, 31 and 33 are the result of three attempts at developing Option 1 video compression standards (in a simple but imprecise way, standards that do not require payment of royalties).
  • MPEG-5 is currently expected to be a standard with 2 parts:
    • Part 1 Essential Video Coding will have a base layer/profile which is expected to be Option 1 and a second layer/profile with a performance ~25% better than HEVC. Licensing terms are expected to be published by patent holders within 2 years.
    • Part 2 Low Complexity Enhancement Video Coding (LCEVC) will be a two-layer video coding standard. The lower layer is not tied to any specific technology and can be any video codec; the higher layer is used to extend the capability of an existing video codec.
  • MPEG-7 is about Multimedia Content Description. There are different tools to describe visual information:
    • Part 3 Visual is a form of compression as it provides tools to describe Color, Texture, Shape, Motion, Localisation, Face Identity, Image signature and Video signature.
    • Part 13 Compact Descriptors for Visual Search can be used to compute compressed visual descriptors of an image. An application is to get further information about an image captured e.g. with a mobile phone.
    • Part 15 Compact Descriptors for Video Analysis allows to manage and organise large scale data bases of video content, e.g. to find content containing a specific object instance or location.
  • MPEG-C is a collection of video technology standard that do not fit with other standards. Part 4 – Media Tool Library is a collection of video coding tools (called Functional Units) that can be assembled using the technology standardised in MPEG-B Part 4 Codec Configuration Representation.
  • MPEG-H part 2 High Efficiency Video Coding is the latest MPEG video coding standard with an improved compression of 60% compared to AVC.
  • MPEG-I is the new standard, mostly under development, for immersive technologies
    • Part 3 Versatile Video Coding is the ongoing project to develop a video compression standard with an expected 50% more compression than HEVC.
    • MPEG-I part 7 Immersive Media Metadata is the current project to develop a standard for compressed Omnidirectional Video that allows limited translational movements of the head.
    • Exploration in 6 Degrees of Freedom (6DoF) and Lightfield are ongoing.


The article Thirty years of audio coding and counting provides a detailed history of audio compression in MPEG. Here I will briefly list the audio-coding related standards produced or being produced by MPEG mentioned in the table.

  • MPEG-1 part 3 Audio produced, among others, the foundational digital audio standard better known as MP3.
  • MPEG-2
    • Part 3 Audio extended the stereo user experience of MPEG-1 to Multichannel.
    • Part 7 Advanced Audio Coding is the foundational standard on which MPEG-4 AAC is based.
  • MPEG-4 part 3 Advanced Audio Coding (AAC) currently supports some 10 billion devices and software applications growing by half a billion unit every year.
  • MPEG-D is a collection of different audio technologies:
    • Part 1 MPEG Surround provides an efficient bridge between stereo and multi-channel presentations in low-bitrate applications as it can transmit 5.1 channel audio within the same 48 kbit/s transmission budget.
    • Part 2 Spatial Audio Object Coding (SAOC) allows very efficient coding of a multi-channel signal that is a mix of objects (e.g. individual musical instruments).
    • Part 3 Unified Speech and Audio Coding (USAC) combines the tools for speech coding and audio coding into one algorithm with a performance that is equal or better than AAC at all bit rates. USAC can code multichannel audio signals, and can also optimally encode speech content.
    • Part 4 Dynamic Range Control is a post-processor for any type of MPEG audio coding technology. It can modify the dynamic range of the decoded signal as it is being played.

2D/3D Meshes

Polygons meshes can be used to represent the approximate shape of a 2D image or a 3D object. 3D mesh models are used in various multimedia applications such as computer game, animation, and simulation applications. MPEG-4 provides various compression technologies

  • Part 2 Visual provides a standard for 2D and 3D Mesh Compression (3DMC) of generic, but static, 3D objects represented by first-order (i.e., polygonal) approximations of their surfaces. 3DMC has the following characteristics:
    • Compression: Near-lossless to lossy compression of 3D models
    • Incremental rendering: No need to wait for the entire file to download to start rendering
    • Error resilience: 3DMC has a built-in error-resilience capability
    • Progressive transmission: Depending on the viewing distance, a reduced accuracy may be sufficient
  • Part 16 Animation Framework eXtension (AFX) provides a set of compression tools for Shape, Appearance and Animation.

Face/Body Animation

Imagine you have a face model that you want to animate from remote. How do you represent the information that animates the model in a bit-thrifty way? MPEG-4 Part 2 Visual has an answer to this question with its Facial Animation Parameters (FAP). FAPs are defined at two levels.

  • High level
    • Viseme (visual equivalent of phoneme)
    • Expression (joy, anger, fear, disgust, sadness, surprise)
  • Low level: 66 FAPs associated with the displacement or rotation of the facial feature points.

In the figure feature points affected by FAPs are indicated as a black dot. Other feature point are indicated as a small circle.

Figure 2 – Facial Animation Parameters

It is possible to animate a default face model in the receiver with a stream of FAPs or a custom face can be initialised by downloading Face Definition Parameters (FDP)  with specific background images, facial textures and head geometry.

MPEG-4 Part 2 uses a similar approach for Body Animation.

Scene Graphs

So far MPEG has never developed a Scene Description technology. In 1996, when the development of the MPEG-4 standard required it, it took the Virtual Reality Modelling Language (VRML) and extended it to support MPEG-specific functionalities. Of course compression could not be absent from the list. So the Binary Format for Scenes (BiFS), specified in MPEG-4 Part 11 Scene description and application engine was born to allow for efficient representation of dynamic and interactive presentations, comprising 2D & 3D graphics, images, text and audiovisual material. The representation of such a presentation includes the description of the spatial and temporal organisation of the different scene components as well as user-interaction and animations.

In MPEG-I scene description is playing again an important role. However, MPEG this time does not even intend to pick a scene description technology. It will define instead some interface to a scene description parameters.


Many thousands of fonts are available today for use as components of multimedia content. They often utilise custom design fonts that may not be available on a remote terminal. In order to insure faithful appearance and layout of content, the font data have to be embedded with the text objects as part of the multimedia presentation.

MPEG-4 part 18 Font Compression and Streaming defines and provides two main technologies:

  • OpenType and TrueType font formats
  • Font data transport mechanism – the extensible font stream format, signaling and identification


Multimedia is a combination of multiple media in some form. Probably the closest multimedia “thing” in MPEG is the standard called Multimedia Application Formats. However, MPEG-A is an integrated package of media for specific applications and does not does define any specific media format. It only specifies how you can combine MPEG (and sometimes other) formats.

MPEG-7 part 5 Multimedia Description Schemes (MDS) specifies the different description tools that are not visual and audio, i.e. generic and multimedia. By comprising a large number of MPEG-7 description tools from the basic audio and visual structures MDS enables the creation of the structure of the description, the description of collections and user preferences, and the hooks for adding the audio and visual description tools. This is depicted in Figure 3.

Figure 3 – The different functional groups of MDS description tools

Neural Networks

Requirements for neural network compression have been exposes in Moving intelligence around. After 18 months of intense preparation with development of requirements, identification of test material, definition of test methodology and drafting of a Call for Proposals(CfP), at the March 2019 (126th) meeting , MPEG analysed nine technologies submitted by industry leaders. The technologies proposed compress neural network parameters to reduce their size for transmission, while not or only moderately reducing their performance in specific multimedia applications. MPEG-7 Part 17 Neural Network Compression for Multimedia Description and Analysis is the standard, the part and the title given to the new standard.


MPEG-B part 1 Binary MPEG Format for XML (BiM) is the current endpoint of an activity that started some 20 years ago when MPEG-7 Descriptors defined by XML schemas were compressed in a standard fashion by MPEG-7 Part 1 Systems. Subsequently MPEG-21 needed XML compression and the technology was extended in Part 15 Binary Format.

In order to reach high compression efficiency BiM relies on schema knowledge between encoder and decoder. It also provides fragmentation mechanisms to provide transmission and processing flexibility, and defines means to compile and transmit schema knowledge information to enable decompression of XML documents without a priori schema knowledge at the receiving end.


Genome is digital, and can be compressed presents the technology used in MPEG-G Genomic Information Representation. Many established compression technologies developed for compression of other MPEG media have found good use in genome compression. MPEG is currently busy developing the MPEG-G reference software and is investigating other genomic areas where compression is needed. More concretely MPEG plans to issue a Call for Proposal for Compression of Genome Annotation at its July 2019 (128th) meeting.

Point Clouds

3D point clouds can be captured with multiple cameras and depth sensors with points that can number a few thousands up to a few billions, and with attributes such as colour, material properties etc.

MPEG is developing two different standards whose choice depends on whether the point cloud is dense (this is done in MPEG-I Part 5 Video-based Point Cloud Compression) or less so (MPEG-I Part 9 Graphic-based PCC). The algorithms in both standards are lossy, scalable, progressive and support random access to subsets of the point cloud.

MPEG plans to release Video-based PCC as FDIS in October 2019 and Graphic-based PCC Point Cloud Compression as FDIS in April 2020.


MPEG felt the need to address compression for data from sensor and data to actuator when it considered the exchange of information taking place between the physical world where the user is located and any sort of virtual world generated by MPEG media.

So MPEG undertook the task to provide standard interactivity technologies that allow a user to

  • Map their real-world sensor and actuator context to a virtual-world sensor and actuator context, and vice-versa, and
  • Achieve communication between virtual worlds.

Figure 3 describes the context of the MPEG-V Media context and control standard.

Figure 3 – Communication between real and virtual worlds

The MPEG-V standards defines several data types and their compression

  • Part 2 – Control information specifies control devices interoperability (actuators and sensors) in real and virtual worlds
  • Part 3 – Sensory information specifies the XML Schema-based Sensory Effect Description Language to describe actuator commands such as light, wind, fog, vibration, etc. that trigger human senses
  • Part 4 – Virtual world object characteristics defines a base type of attributes and characteristics of the virtual world objects shared by avatars and generic virtual objects
  • Part 5 – Data formats for interaction devices specifies syntax and semantics of data formats for interaction devices – Actuator Commands and Sensed Information – required to achieve interoperability in controlling interaction devices (actuators) and in sensing information from interaction devices (sensors) in real and virtual worlds
  • Part 6 – Common types and tools specifies syntax and semantics of data types and tools used across MPEG-V parts.

MPEG-IoMT Internet of Media Things is the mapping of the general IoT context to MPEG media developed by MPEG. MPEG-IoMT Part 3 – IoMT Media Data Formats and API also addresses the issue of media-based sensors and actuators data compression.

What is next in data compression?

In Compression standards for the data industries I reported the proposal made by the Italian ISO member body to establish a Technical Committee on Data Compression Technologies. The proposal was rejected on the ground that Data Compression is part of Information Technology.

It was a big mistake because it has stopped the coordinated development of standards that would have fostered the move of different industries to the digital world. The article identified a few such as Automotive, Industry Automation, Geographic information and more.

MPEG has done some exploratory work and found that there quite a few of its existing standards could be extended to serve new application areas. One example is the conversion of MPEG-21 Contracts to Smart Contracts. An area of potential interest is data generated by machine tools in industry automation.


MPEG audio and video compression standards are the staples of the media industry. MPEG continues to develop those standards while investigating compression of other data types in order to be ready with standards when the market matures. Point clouds and DNA reads from high speed sequencing machines are just two examples of how, by anticipating market needs, MPEG prepares to serve timely the industry with its compression standards.

Posts in this thread

More video with more features

In Forty years of video coding and counting I presented a short but intense history of ITU and MPEG video compression standards. In this article I will focus on how more functionalities got added to video compression over the years to MPEG standards and how the next generation of standards will add even more.

The table below gives an overview of all MPEG video compression standards – past, present and planned. Those in italic have not reached Final Draft International Standard (FDIS) level.

Figure 1 – Video coding standards and functionalities

In 1988 MPEG started its first video coding project for interactive video applications on compact disc (MPEG-1). Input video was assumed to be progressive (25/29.97 Hz, but it also supported more frame rates) and spatial resolution was Source Image Format (CIF ), i.e. 240 or 288 lines x 352 pixels. The syntax supported spatial resolutions up to 16 Kpixels. Obviously progressive scanning is a feature that all MPEG video coding standards have supported since MPEG-1. The (obvious) exception is point clouds because there are no “frames”.

In 1990 MPEG started its second video coding project targeting digital television (MPEG-2). Therefore the input was assumed to be interlaced (frame rate of 50/59.94 Hz, but it also supported more frame rates) and spatial resolution was standard/high definition, and up. The resolution space was quantised by means of levels, the second dimension after profiles. MPEG-4 Visual and AVC are the two last standards with specific interlace tools. An attempt was made to introduce interlace tools in HEVC but the technologies presented did not show appreciable improvements if compared with progressive tools. HEVC does have have some indicators (SEI/VUI) to tell the decoder that the video is interlaced.

MPEG-2 was the first standard to tackle scalability (High Profile), multiview (Multiview Profile) and higher croma resolution (4:2:2 Profile). Several subsequent video coding standards (MPEG-4 Visual and AVC and HEVC) also support these new features. VVC is expected to do the same, probably not in version 1.

MPEG-4 Visual supports coding of video objects and error resilience. The first feature has remained specific to MPEG-4 Visual. Most video codecs allow for some error resilience (e.g. starting from slices in MPEG-1). However, MPEG-4 Visual – mobile communication being one relevant use case – was the first to specifically consider error resilience as a tool.

MPEG-2 first tried to develop 10-bit support and the empty part 8 is what is left of that attempt.

Wide Colour Gamut (WCG), High Dynamic Range (HDR) and 3 Degrees of Freedom (3DoF)  are all supported by AVC.  These functionalities were first introduced in HEVC, and later added to AVC and are planned to be supported in VVC as well. WCG allows to display a wider gamut of colours, HDR allows to display pictures with brighter regions and with more visible detail in dark areas, SCC allows to achieve better compression of non natural (synthetic) material such as characters and graphics and 3DoF (also called Video 360) allows to represent pictures projected on a sphere.

AVC supports more than 8 quantisation bits extended to 14 bits. HEVC even support 16 bits. VVC, EVC and LCEVC are expected to also support more than 8 quantisation bits.

WebVC was the first MPEG attempt at defining a video coding standard that would not require a licence that involves payment of fees (Option 1 in ISO language, legal language more complex than this). Strictly speaking, WebVC is not a new standard because MPEG has simply extracted what was the Constrained Baseline Profile in AVC (originally, AVC tried to define an Option 1 profile but did not achieve the goal and did not define the profile) with the hope that WebVC could achieve Option 1 status. The attempt failed because some companies confirmed their Option 2 patent declarations (i.e. a licence is required to use the standard) already made against the AVC standard. The brackets in the figure convey this fact.

Video Coding for Browsers (VCB) is the result of a proposal made by a company in response to an MPEG Call for Proposals for Option 1 video coding technology. Another company made an Option 3 patent declaration (i.e. unavailability to license the technology). As the declaration did not contain any detail that could allow MPEG to remove the allegedly infringing technologies, ISO did not publish VCB as a standard. The square brackets in the figure convey this fact.

Internet Video Coding (IVC) is the third video coding standard intended to be Option 1. Three Option 2 patent declarations were received and MPEG has declared its availability to remove patented technology from the standard if specific technology claims will be made. The brackets convey this fact.

Finally, Essential Video Coding (EVC), part 1 of MPEG-5 (however, the project has not been formally approved by ISO yet), is expected to be a two-layer video coding standard. The EVC Call for Proposals requested that the technologies provided in response to the Call for the first (lower) layer of the standard be Option 1. Technologies for the second (higher) layer are Option 2. The curled brackets in the figure convey this fact.

Screen Content Coding (SCC) SCC allows to achieve better compression of non natural (synthetic) material such as characters and graphics. It is supported by HEVC and is planned to be supported in VVC and possibly EVC.

Low Complexity Enhancement Video Coding (LCEVC) is another two-layer video coding standard. Unlike EVC, however, in LCEVC the lower layer is not tied to any specific technology and can be any video codec. The goal of the 2nd layer is to extend the capability of an existing video codec. A typical usage scenario is to give a large amount of already deployed standard definition set top boxes that cannot be recalled the ability to decode high definition pictures. The LCEVC decoder is depicted in Figure 2.

Figure 2 – Low Complexity Enhancement Video Coding

Today technologies are available to capture 3D point clouds, typically with multiple cameras and depth sensors producing up to billions of points for realistically reconstructed scenes. Point clouds can have attributes such as colors, material properties and/or other attributes and are useful for real-time communications, GIS, CAD and cultural heritage applications. MPEG-I part 5 will specify lossy compression of 3D point clouds employing efficient geometry and attributes compression, scalable/progressive coding, and coding of point clouds sequences captured over time with support of random access to subsets of the point cloud.

Other technologies capture points clouds potentially with low density of points to allow users to freely navigate in multi-sensory 3D media experiences. Such representations require a large amount of data, not feasible for transmission on today’s networks. MPEG is developing a second, graphics-based PCC standard, as opposed to the previous one which is video-based, for efficient compression of sparse point clouds.

3DoF+ is a terms used by MPEG to indicate a usage scenario where the user can have translational movements of the head. In a 3DoF scenario if the user moves the head too much, annoying parallax error is felt. In March 2019 MPEG has received responses to its Call for Proposals requesting appropriate metadata (see the red blocks in Figure 3) to help the Post-processor present the best image based on the viewer’s position if available, or to synthesise a missing one, if not available.

Figure 3 – 3DoF+ use scenario

6DoF indicates a use scenario where the user can freely move in a space and enjoy a 3D virtual experience that matches the one in the real world. Light field refers to new devices that can capture a spatially sampled version of a light field that has both spatial and angular light information in one shot. The size of captured data is not only larger but also different than traditional camera images. MPEG is investigating new and compatible compression methods for potential new services.

In 30 years compressed digital video has made a lot of progress, e.g., bigger and brighter pictures with less bitrate and other features. The end point is nowhere in sight.

Thanks to Gary Sullivan and Jens-Rainer Ohm for useful comments.

Posts in this thread

Matching technology supply with demand


There have always been people in need of technology and, most of the time, people ready to provide something in response to the demand. In book XVIII of Iliad, Thetis, Achilles’s mother, asks Hephestus, the god of, blacksmiths, craftsmen, artisans, sculptors and more, to provide a new armour to her son who had lost it to Hector. Hephestus duly complied. Still in the fictional domain, but in more recent years, Agent 007 visits Q Branch to get the latest gadgets for his next spy mission, which are inevitably put to good use in the mission.

Wars have always been times when the need for technologies stretches the ability to supply them. In our, supposedly peaceful, age, there are lots of technology around, but it is often difficult for companies needing a particular technology to find the solution matching their needs and budget.

Supply and demand in standardisation

Standardisation is an interesting case of an entity,  typically non-commercial and non-governmental, needing technologies to make a standard. Often standards organisations, too, need technologies to accomplish their mission. How can they access the needed technologies?

A few decades back, if industry needed a standard for, say, a video cassette recorder, the process was definitely supply-driven: a company who had developed a successful product (call it Sony or JVC) submitted a proposal (call it Betamax or VHS) to a standards committee (call it IEC SC 60B). Too bad if the process produced two standards.

In the second half of the 1980’s, ITU SG XV (ITU numbering of that time) started developing the H.261 recommendation. Experts developed the standard piece by piece in the committee (so-called Okubo group) by acquiring the necessary technologies in a process where the roles of demand and supply were rather blurred. Only participants were entitled to provide their technologies to fulfill the needs of the standard.

A handful of years later, MPEG further innovated technology procurement in a standardisation environment. To get the technologies needed to make a standard, it used a demand-driven tool – MPEG’s Call for Proposals (CfP). Since then, technologies provided by respondents and assessed for relevance by the group are used to 1) create the initial reference model (RM0) and 2) initiate a first round of Core Experiments (CE). CEs result from the agreement among participating experts that there is room for improving the performance of the standard under development by opening a particular area to optimisation. CEs are continued until available room for optimisation is exhausted. While anybody is entitled to respond to a CfP and contribute technology to RM0, only experts participating in the standardisation project can provide technology for CEs. This, however, is not really a limitation because the process is open to anybody wishing to join a recognised standards organisation who is a member of ISO.

The MPEG process of standards development has allowed the industry to maintain a sustained development and expansion for many years. In fairness, this is not entirely MPEG’s merit. Patent pools have played a synergistic role to MPEG’s by providing (industry) users with the means to practice MPEG standards and IP holders the means to be remunerated for the use of their IP.

The situation today

The HEVC case has shown that the cooperation of different parties to achieve the common goal of enabling the use of a standard is not discounted (see, e.g., A crisis, the causes and a solution). There are several reasons for this: the increasing number of individual technologies needed to make a high-performance MPEG standard, the increasing number of IP holders, the increasing number of Non-Performing Entities (NPE) as providers of technology and the increasing number of patent pools who stand as independent licence providers of a portion of a patent.

I have already made several proposals with the intention of helping MPEG from this stalemate (see, e.g., Business model based ISO/IEC standards). Here I would like to present an additional idea that extends the MPEG process of standards development (see How does MPEG actually work?).

A new process proposal

A possible implementation of the proposal applying to an entity (a company, an industry forum or a standards organisation) wishing to develop a specification or a standard, could run like this (some details could be fine tuned or changed on a case-by-case basis):

  1. An entity (a company, an industry forum or a standards organisation) wishes to develop a specification or a standard
  2. The entity issues a Call for Proposals (CfP) including requirements requesting proponents to to accept the process (as defined in this numbered list) and to commit to RAND licensing of their technologies for the specification or standard
  3. The entity assesses the proposals received
  4. The entity sets aside a certain amount of tokens for the entire standard (e.g. 1,000)
  5. The entity builds a “minimal” Reference Model zero (RM0) using technologies contained in the proposals in a conservative way so as to create ample space for a healthy Core Experiment (CE) process
  6. The entity
    1. Assigns a percentage of tokens to RM0
    2. Establishes the amount of tokens that will be given to the proponent who achieves the highest performance in a CE (e.g. 1 token for each 0.1% improvements)
  7. The entity identifies and publishes CEs at the pace required by the specification or standard
  8. For each CE the entity makes a call containing
    1. A description of the CE
    2. A minimum performance target (say, at least 1% improvement)
    3. A deadline for submission of 1) results and 2) code that proves that the target has been achieved
    4. The maximum level of associated complexity
  9. CE proponents should do due diligence and
    1. Make proposals that contain only their own technologies, or
    2. Ask any third party to join in the response (ad accept the conditions of the CfP)
  10. If the tokens are all used and there is still room for optimisation, new tokens are created and token holders have their tokens scaled down so that the total number of tokens is still 1,000
  11. If room for RM optimisation is exhausted but there are still tokens unassigned, token holders have their tokens scaled up so that the total number of tokens is 1,000.

Depending on the nature of the entity (company-industry forum-standards organisation) another entity, which can be the same entity who has managed the process or a patent pool

  1. Identifies who are IP holders in RM0 and CEs
  2. Removes technology in case the status of IP has not been clarified
  3. After completing steps 1 and 2, pays royalties to IP holders based on the number of tokens they have acquired in the process (i.e. RM0 and CEs)

Merits and limits of the proposal

The proposal achieves the goal to

  1. Associate patent holders to RM0 and CE areas as opposed to associate patent holders just to the standard
  2. Enablthe turning off of technologies of a CE area if this has unclear IP status and turning them on again if the status is clarified

In case the entity developing the specification is a standards organisation, more than one patent pool can develop a licence using the results of the process.


This idea was developed in collaboration with Malvika Rao, the founder of Incentives Research and holder of a PhD from Harvard University, and Don Marti, an open source expert and an advisor at Incentives Research.

Converting the basic concept described above into a workable market design requires further work. There may be opportunities to game the system, and the design must consider issues such as how to attract and retain participation. In addition the design must be tested (e.g., via simulation or usability study) to understand its performance.

Please send comments to Leonardo.

Posts in this thread




What would MPEG be without Systems?

The most visited articles on this blog Forty years of video coding and counting and Thirty years of audio coding and counting prove that MPEG is known for its audio and video coding standards. But I will not tire of saying that MPEG would not it be what it has become if the Systems aspects had not been part of most of its standards. This is what I intend to talk about in this article.

It is hard to acknowledge, but MPEG was not the first to deal with the problem of putting together digital audio and video for delivery purposes. In the second half of the 1990’s ITU-T dealt with the problem of handling audio-visual services using the basic ISDN access at 2B+D (2×64 kbit/s) or the primary ISDN access, the first digital streams made possible by ITU-T Recommendations.

Figure 1 depicts the solution specified in ITU Recommendation H.221. Let’s assume that we have 2 B channels at 64 kbit/s (Basic Access ISDN). H.221 creates on each B channel a Frame Structure of 80 bytes, i.e. 640 bits repeating itself 100 times per second. Each bit position in an octet can be considered as an 8 kbit/s sub-channel. The 8th bit in each octet represents the 8th sub-channel, called the Service Channel.

Within the Service Channel bits 1-8 are used by the Frame Alignment Signal (FAS) and bits 9-16 are used by the Bit Alignment Signal (BAS). Audio is always carried by the first B channel, e.g. by the first 2 subchannels, and Video and Data by the other subchannels (less the bitrate allocated to FAS and BAS).

Figure 1 – ITU Recommendation H.221

MPEG-1 Systems

The solution depicted in Figure 1 bears the mark of the transmission part of the telecom industry that had never been much friendly to packet communication. That is why MPEG in the late 1990’s had an opportunity to bring some fresh air in this space. Starting from a blank sheet of paper (at that time MPEG still used paper 😊) MPEG designed a flexible packet-based multiplexer to convey in a single stream compressed audio and video, and clock information in such a way as to enable audio‑video synchronisation (Figure 2).

Figure 2 – MPEG-1 Systems

The MPEG Systems revolution took time to take effect. Indeed the European EU 95 project used MPEG-1 Audio layer 2, but designed a frame-based multiplexer for the Digital Audio Broadcasting service.

MPEG-2 Systems

In the early 1990’s MPEG started working on another blank sheet of paper. MPEG had the experience of MPEG-1 Systems design but the requirements were significantly different. In MPEG-1, audio and video (possibly many of them in the same stream) had a common time base, but the main users of MPEG-2 wanted a system that could deliver a plurality of TV programs, possibly coming from different sources (i.e. with different time bases) and with possibly a lot of metadata related to the programs, not to mention some key business enabler like conditional access information. Moreover, unlike MPEG-1 where it was safe to assume that the bits issuing from a Compact Disc would travel without errors to a demultiplexer, in MPEG-2 it was mandatory to assume that the transmission channel was anything but error-free.

MPEG-2 Transport Stream (TS) provides efficient mechanisms to multiplex multiple audio-visual data streams into one delivery stream. Audio-visual data streams are packetised into small fixed-size packets and interleaved to form a single stream. Information about the multiplexing structure is interleaved with the data packets so that the receiving entity can efficiently identify a specific stream. Sequence numbers help identify missing packets at the receiving end, and timing information is assigned after multiplexing with the assumption that the multiplexed stream will be delivered and played in sequential order.

MPEG-2 Systems is actually two specifications in one (Figure 3). The Transport Stream (TS) is a fixed-length packet-based transmission system designed to work for digital television distribution on error-prone physical channels, while the Program Stream (PS) is a packet-based multiplexer with many points in common with MPEG-1 Systems. `While TS and PS share significant information, moving from one to the other may not be immediate.

Figure 3 – MPEG-2 Systems

MPEG-4 Systems

MPEG-4 gave MPEG the opportunity to experience an epochal transition in data delivery. When MPEG-2 Systems was designed Asynchronous Transfer Mode (ATM) was high on the agenda of the telecom industry and was considered as the vehicle to transport MPEG-2 TS streams on telecommunication networks. Indeed, the Digital Audio-Visual Council (DAVIC) designed its specifications on that assumption. At that time, however, IP was still unknown to the telecom (at least to the transmission part, broadcast and consumer electronics worlds.

The MPEG-4 Systems work was a completely different story than MPEG-2 Systems. An MPEG4 Mux (M4Mux) was developed along the lines of MPEG-1 and MPEG-2 Systems, but MPEG had to face an unknown world where many transports were surging as possible candidates. MPEG was obviously unable to make choices (today, 25 years later, the choice is clear) and developed the notion of Delivery Multimedia Integration Framework (DMIF), where all communications and data transfers between the data source and the terminal were abstracted through a logical API called the DAI (DMIF Application Interface), independent of the transport type (broadcast, network, storage).

MPEG-4 Systems, however, was about more than interfacing with transport and multiplexing. The MPEG-4 model was a 3D space populated with dynamic audio, video and 3D Graphics objects. Binary Format for Scenes (BIFS) was the technology designed to provide the needed functionality.

Figure 4 shows the 4 MPEG-4 layers: Transport, Synchonisation, Compression and Composition.

Figure 4 – MPEG-4 Systems

MPEG-4 File Format

For almost 10 years – until 1997 – MPEG was a group who made intense use of IT tools (in the form of computer programs that simulated encoding and decoding operation of the standards it was developing) but was not an “IT group”. The proof? Until that time it had not developed a single file format. Today MPEG can claim to have another such attribute (IT group) along with the many others it has.

In those years MP3 files were already being created and exchanged by the millions, but the files did not provide any structure. The MP4 File Format, officially called ISO Base Media File Format (ISO BMFF), filled that gap as it can be used for editing, HTTP streaming and broadcasting.

Let’s have a high level look to understand the sea that separates MP3 files from the MP4 FF. MP4 FF contains tracks for each media type (audio, video etc.), with additional information: a four-character the media type ‘name’ with all parameters needed by the media type decoder. “Track selection data” helps a decoder identify what aspect of a track can be used and to determine which alternatives are available.

Data are stored in a basic structure called box with attributes of length, type (4 printable characters), possibly version and flags. No data can be found outside of a box. Figure 5 shows a possible organisation of an MP4 file


Figure 5 – Boxes in an MP4 File

MP4 FF can store:

  1. Structural and media data information for timed presentations of media data (e.g. audio, video, subtitles);
  2. Un-timed data (e.g. meta-data);
  3. Elementary stream encryption and encryption parameter (CENC);
  4. Media for adaptive streaming (e.g. DASH);
  5. High Efficiency Image Format (HEIF);
  6. Omnidirectional Media Format (OMAF);
  7. Files partially received over lossy links for further processing such as playback or repair (Partial File Format);
  8. Web resources (e.g. HTML, JavaScript, CSS, …).

Save for the first two features, all others were added in the years following 2001 when MP4 FF was approved. The last two are still under development.

MPEG-7 Systems

With MPEG-7, MPEG made the first big departure from media compression and turned its attention to media description including ways to compress that information. In addition to descriptors for visual audio and multimedia information, MPEG-7 includes a Systems layer used by an application, say, navigation of a multimedia information repository, to access coded information coming from a delivery layer in the form of coded descriptors (in XML or in BiM, MPEG’s XML compression technology). The figure illustrates the operation of MPEG-7 Systems decoder.

Figure 6 – MPEG-7 Systems

An MPEG-7 Systems decoder operates in two phases

  1. Initialisation when DecoderInit initialises the decoder by conveying description format information (textual or binary), a list of URIs that identifies schemas, parameters to configure the Fragment Update decoder, and an initial description. The list of URIs is passed to a schema resolver that associates the URIs with schemas to be passed to Fragment Update Decoder.
  2. Main operation, when the Description Stream (composed of Access Units containing fragment updates) is fed to the decoder which processes
    1. Fragment Update Command specifying the update type (i.e., add, replace or delete content or a node, or reset the current description tree);
    2. Fragment Update Context that identifies the data type in a given schema document, and points to the location in the current description tree where the fragment update command applies; and
    3. Fragment Update Payload conveying the coded description fragment to be added toor replaced in the description.


MPEG Multimedia Middleware (M3W), also called MPEG-E, is an 8-part standard defining the protocol stack of consumer-oriented multimedia devices, as depicted in Figure 7.

Figure 7 – MPEG Multimedia Middleware (M3W)

The M3W model includes 3 layers:

  1. Applications non part of the specifications but enabled by the M3W Middleware API;
  2. Middleware consisting of
    1. M3W middleware exposing the M3W Middleware API;
    2. Multimedia platform supporting the M3W Middleware by exposing the M3W Multimedia API;
    3. Support platform providing the means to manage the lifetime of, and interaction with, realisation entities by exposing the M3W Support API (it also enables management of support properties, e.g. resource management, fault management and integrity management);
  3. Computing platform: whose API are outside of M3W scope.


Multimedia service platform technologies (MPEG-M) specifies two main components of a multimedia device, called peer in MPEG-M.

As shown in Figure 8, the first component is API: High-Level for applications and Low Level for network, energy and security.

Figure 8 – High Level and Low Level API

The second components is a middleware called MXM that relies on its multimedia technologies

Figure 9 – The MXM architecture

The Middleware is composed of two types of engine. Technology Engines are used to call functionalities defined by MPEG standards such as creating or interpreting a licence attached to a content item. Protocol Engines are used to communicate with other peer, e.g. in case a peer does not have a particular Technology Engine that another peer has. For instance, a peer can use a Protocol Engine to call a licence server to get a licence to attach to a multimedia content item. The MPEG-M middleware has the ability to create chains of Technology Engines (Orchestration) or Protocol Engines (Aggregation).


MPEG Media Transport (MMT) is part 1 of High efficiency coding and media delivery in heterogeneous environments (MPEG-H). It is the solution for the new world of broadcasting where delivery of content can take place over different channels each with different characteristics, e.g. one-way (traditional broadcasting) and two-way (the ever more pervasive broadband network). MMT assumes that the Internet Protocol is common to all channels.

Figure 10 depicts the MMT protocol stack

Figure 10 – The MMT protocol stack

Figure 11 focuses on the MMT Payload, i.e. on the content structure.

Figure 11 – Structure of MMT Payload

The MMT Payload has an onion-like structure:

  1. . Media Fragment Unit (MFU), the atomic unit which can be independently decoded;
  2. Media Processing Unit (MPU), the atomic unit for storage and consumption of MMT content (structured according to ISO BMFF), containing one or more MFUs;
  3. MMT Asset, the logical unit for elementary streams of multimedia component, e.g. audio, video and data, containing one or more MPU files;
  4. MMT Package, a logical unit of multimedia content such as a broadcasting program, containing one or more MMT Assets, also containing
    1. Composition Information (CI), describing the spatio-temporal relationships among MMT Assets
    2. Delivery Information, describing the network characteristics.


Dynamic adaptive streaming over HTTP (DASH) is another MPEG Systems standard that was motivated by the popularity of HTTP streaming and the existence of different protocols used in different streaming platforms, e.g. different manifest and segment formats. By developing the DASH standard for HTTP streaming of multimedia content, MPEG has enabled a standard-based client to stream content from any standard-based server, thereby enabling interoperability between servers and clients of different make.

Figure 12 – DASH model

As depicted in Figure 12, the multimedia content is stored on an HTTP server in two components: 1) Media Presentation Description (MPD) which describes a manifest of the available content, its various alternatives, their URL addresses and other characteristics, and 2) Segments which contain the actual multimedia bitstreams in form of chunks, in single or multiple files.

A typical operation of the system would follow the steps

  1. DASH client obtains the MPD;
  2. Parses the MPD;
  3. Gets information on several parameters, e.g. program timing, media content availability, media types, resolutions, min/max bandwidths, existence of alternatives of multimedia components, accessibility features and the required protection mechanism, the location of each media component on the network and other content characteristics;
  4. Selects the appropriate encoded alternative and starts streaming the content by fetching the segments using HTTP GET requests;
  5. Fetches the subsequent segments after appropriate buffering to allow for network throughput variations
  6. Monitors the network bandwidth fluctuations;
  7. Decides how to adapt to the available bandwidth depending on its measurements by fetching segments of different alternatives (with lower or higher bitrate) to maintain an adequate buffer.

DASH only defines the MPD and the segment formats. MPD delivery and media encoding formats containing the segments as well as client behavior for fetching, adaptation heuristics and content playing are outside of MPEG-DASH’s scope.


The reader should not think that this is an exhaustive presentation of MPEG’s Systems work. I hope the description will reveal the amount of work that MPEG has invested in Systems aspects, sometimes per se, and sometimes to provide adequate support to users of its media coding standards. This article also describes some of the most successful MPEG standards. At the top certainly towers MPEG-2 Systems of which 9 editions have been produced to keep up with continuous user demands for new functionalities.

Without mentioning the fact that MPEG-2 Systems has received an Emmy Award 😉.

Posts in this thread



MPEG: what it did, is doing, will do


If I exchange words with taxi drivers in a city somewhere in the world, one of the questions I am usually asked is: “where are you from?”. As I do not like straight answers, I usually ask back “where do you think I am from?” It usually takes time before the driver gets the information he asked for. Then the next question is: “what is your job?”. Again, instead of giving a straight answer, I ask the question: “do you know MPEG?” Well, believe it or not, 9 out of 10 times the answer is “yes”, often supplemented by an explanation decently connected with what MPEG is.

Wow! Do we need a more convincing proof that MPEG has conquered the minds of the people of the world?

The interesting side of the story, though, is that, even if the name MPEG is known by billions of people, it is not a trademark. Officially, the word MPEG does not even exist. When talking to ISO you should say “ISO/IEC JTC 1/SC 29/WG 11” (next time, ask your taxi driver if they know this letter soup). The last insult is that the domain is owned by somebody who just keeps it without using it.

Should all this be of concern? Maybe for some, but not for me. What I have just talked about is just one aspect of what MPEG has always been. Do you think that MPEG was the result of high-level committees made of luminaries advising governments to take action on the future of media? You are going to be disappointed. MPEG was born haphazardly (read here, if you want to know how). Its strength is that it has been driven by the idea that the epochal transition from analogue to digital should not become another PAL-SECAM-NTSC or VHS-Betamax trap.

In 30 years MPEG has grown 20-fold, changed the way companies do business with media, made music liquid, multiplied the size of TV screens, brought media where there were stamp-size displays, made internet the primary delivery for media, created new experiences, shown that its technologies can successfully be applied beyond media…

There is no sign that its original driving force is abating, unless… Read until the end if you want to know more.

What did MPEG do?


MPEG was the first standards group that brought digital media to the masses. In the 2nd half of the 1990’s the MPEG-1 and MPEG-2 standards were converted to products and services as the list below will show (not that the use of MPEG-1 and MPEG-2 is confined to the 1990’s).

  • Digital Audio Broadcasting: in 1995, just 3 years after MPEG-1 was approved, DAB services began to appear in Europe with DAB receivers becoming available some time later.
  • Portable music: in 1997, 5 years after MPEG-1 was approved, Saehan Information Systems launched MPMan, probably the first portable digital audio player for the mass market that used MP3. This was followed by a long list of competing players until the mobile handset largely took over that function.
  • Video CD: in the second half of the 1990’s VCD spread especially in South East Asia until the MPEG-2 based DVD, with its superior quality, slowly replaced it. It uses all 3 parts of MPEG-1 (layer 2 for audio).
  • Digital Satellite broadcasting: in June 1994 DirecTV launched its satellite TV broadcasting service for the US market, even before MPEG released the MPEG-2 standard in November of that year! It used MPEG-2 and its lead was followed by many other regions who gradually converted their analogue broadcast services to digital.
  • Digital Cable distribution: in 1992 John Malone launched the “500-channel” vision for future cable services and MPEG gave the cable industry the means to make that vision real.
  • Digital Terrestrial broadcasting:
    • In 1996 the USA Federal Communications Commission adopted the ATSC A/53 standard. It took some time, however, before wide coverage of the country, and of other countries following the ATSC standards, was achieved.
    • In 1998 the UK introduced Digital Terrestrial Television (DTT).
    • In 2003 Japan started DTT services using MPEG-2 AAC for audio in addition to MPEG-2 Video and TS.
    • DTT is not deployed in all countries yet, and there are regularly news of a country switching to digital, the MPEG way of course.
  • Digital Versatile Disc (DVD): toward the end of the 1990’s the first DVD players were put to market. They used MPEG-2 Program Stream (part 1 of MPEG-2) and MPEG-2 Video, and a host of audio formats, some from MPEG.


In the 1990s the Consumer Electronics industry provided devices to the broadcasting and telecom industries. and devices for package media. The shift to digital services called for the IT industry to join as providers of big servers for broadcasting and interactive services (even though in the 1990’s the latter did not take off). The separate case of portable audio players provided by startups did not fit the established categories.

MPEG-4 played the fundamental role of bringing the IT industry under the folds of MPEG as a primary player in the media space.

  • Internet-based audio services: The great original insight of Steve Jobs and other industry leaders transformed Advanced Audio Coding (AAC) from a promising technology to a standard that dominates mobile devices and internet services
  • Internet video: MPEG-4 Visual, with the MP4 nickname, did not repeat the success of MP3 for video. Still it was the first example of digital media on the internet as DivX (a company name). Its hopes to become the streaming video format for the internet were dashed by the licensing terms of MPEG-4 Visual, the first example of ill-influence of technology rights on an MPEG standard
  • Video for all: MPEG-4 Advanced Video Coding (AVC) became a truly universal standard adopted in all areas and countries. Broadcasting, internet distribution, package media (Blu-ray) and more.
  • Media files: the MP4 File Format is the general structure for time-based media files, that has become another ubiquitous standard at the basis of modern digital media.
  • Advanced text and graphics: the Open Font Format (OFF), based on the OpenType specification, revised and extended by MPEG, is universally used.



  • Format for encrypted, adaptable multimedia presentation: is provided by the Common Media Application Format (CMAF), a format optimised for large scale delivery of protected media with a variety of adaptive streaming, broadcast, download, and storage delivery methods including DASH and MMT.
  • Interoperable image format: the Multi-Image Application Format (MIAF) enables precise interoperability points for creating, reading, parsing, and decoding images embedded in HEIF.


  • Generic binary format for XML: is provided by Binary format for XML (BiM), a standard used by products and services designed to work according to ARIB and DVB specifications.
  • Common encryption for files and streams: is provided by Common Encryption (CENC) defined in two MPEG-B standards – Part 7 for MP4 Files and Parts 9 for MPEG-2 Transport Stream. CENC is widely used for the delivery of video to billions of devices capable to access internet-delivered stored files, MPEG-2 Transport Syteam and live adaptive streaming.


  • IP-based television: MPEG Media Transport (MMT) is the “transport layer” of IP-based television. MMT assumes that delivery is achieved by an IP network with in-network intelligent caches close to the receiving entities. Caches adaptively packetise and push the content to receiving entities. MMT has been adopted by the ATSC 3.0 standard and is currently being deployed in countries adopting ATSC standards and also used in low-delay streaming applications.
  • More video compression, siempre!: has been provided by High Efficiency Video Coding (HEVC), the AVC successor yielding an improved compression up to 60% compared to AVC. Natively, HEVC supports High Dynamic Range (HDR) and Wider Colour Gamut (WCG). However, its use is plagued by a confused licensing landscape as described, e.g. in A crisis, the causes and a solution
  • Not the ultimate audio experience, but close: MPEG-H 3D Audio is a comprehensive audio compression standard capable of providing very satisfactory immersive audio experiences in broadcast and interactive applications, It is part of the ATSC 3.0 standard.
  • Comprehensive image file format: High Efficiency Image File Format (HEIF) is a file format for individual HEVC-encoded images and sequences of images. It is a container capable of storing HEVC intra-images and constrained HEVC inter-images, together with other data such as audio in a way that is compatible with the MP4 File Format. HEIF is widely used and supported by major OSs and image editing software.


Streaming on the unreliable internet: Dynamic Adapting Streaming on HTTP (DASH) is the widely used standard that enables a media client connected to a media server via the internet to obtain instant-by-instant the version, among those available on the server, that best suites the momentary network conditions.

What is MPEG doing now?

In the preceding chapter I singled out only MPEG standards that have been (and often still continue to be) extremely successful.

I am  unable to single out those that will be successful in the future 😊, so the reasonable thing to do is to show the entire MPEG work plan

At the risk of making the wrong bet 😊. let me introduce some of the most high profile standards under development,  subdivided in the three categories Media Coding, Systems and Tools, and Beyond Media. But you have better become acquainted with all ongoing activities. In MPEG sometimes the last become the first.

Media Coding

  • Versatile Video Coding (VVC): is the flagship video compression activity that will deliver another round of improved video compression. It is expected to be the platform on which MPEG will build new technologies for immersive visual experiences (see below).
  • Enhanced Video Coding (EVC): is the shorter term project with less ambitious goals. EVC is designed to satisfy urgent needs from those who need a standard with a less complex IP landscape
  • Immersive visual technologies: investigations on technologies applicable to visual information captured by different camera arrangements are under way, as described in The MPEG drive to immersive visual experiences.
  • Point Cloud Compression (PCC): refers to two standards capable of compressing 3D point clouds captured with multiple cameras and depth sensors. The algorithms in both standards are lossy, scalable, progressive and support random access to point cloud subsets. See The MPEG drive to immersive visual experiences for more details.
  • Immersive audio: MPEG-H 3D Audio supports a 3 Degrees of Freedom or 3DoF (yaw, pitch, roll) experience at the movie “sweet spot”. More complete user experiences, however, are needed, i.e. 6 DoF (adding x, y, z). These can be achieved with additional metadata and rendering technology.

Systems and Tools

  • Omnidirectional media format: Omnidirectional Media Application Format (OMAF) v1 is a format supporting the interoperable exchange of omnidirectional (VR 360) content for a user who can only Yaw, Pitch and Roll their head. OMAF v2 will support some head translation movements. See The MPEG drive to immersive visual experiences for more details.
  • Storage of PCC data in MP4 FF: MPEG is developing systems support to enable storage and transport of compressed point clouds with DASH, MMT etc.
  • Scene Description Interface: MPEG is investigating the interface to the scene description (not the technology) to enable rich immersive experiences.
  • Service interface for immersive media: Network-based Media Processing will enable a user to obtain potentially very sophisticated processing functionality from a network service via standard API.
  • IoT when Things are Media Things: Internet of Media Things (IoMT) will enable the creation of networks of intelligent Media Things (i.e. sensors and actuators)

Beyond Media

  • Standards for biotechnology applications: MPEG is finalising all 5 parts of the MPEG-G standard and establishing new liaisons to investigate new opportunities.
  • Coping with neural networks everywhere: shortly (25 March 2019) MPEG will receive responses to its Call for Proposals for Neural Network Compression as described in Moving intelligence around.

What will MPEG do in the future?

At the risk of being considered boastful, I would think that MPEG should have deserved attention from some of the business schools that study socio-economic phenomena. Why? Because many have talked about media convergence, but they have forgotten that MPEG, with its standards, has actually triggered that convergence. MPEG people know the ecosystem at work in MPEG and I for one see how it is unique.

This has not happened. Let’s say that it is better to be neglected than to receive unwanted attention.

I would also think that a body that started from a Subcommittee on character sets and has become the reference standards group for the media industry, i.e. devices, content, services and applications, worth hundreds of billion USD with potent influences on a nearby industry such as telecommunication, should have suggested standards organisations to study the work method and possibly apply it to other domains.

This has not happened. Let’s say, again, that its is better to be neglected than to receive unwanted attention.

So can we expect MPEG to continue its mission, and apply its technologies and know how to continue delivering compression standards for immersive experiences and new compression standards for other domains?

Maybe this time MPEG will attract attention. So, don’t count on it.

Posts in this thread



The MPEG drive to immersive visual experiences


In How does MPEG actually work? I described the MPEG process: once an idea is launched, context and objectives of the idea are identified; use cases submitted and analysed; requirements derived from use cases; and technologies proposed, validated for their effectiveness for eventual incorporation into the standard.

Some people complain that MPEG standards contain too many technologies supporting “non-mainstream” use cases. Such complaints are understandable but misplaced. MPEG standards are designed to satisfy the needs of different industries and what is a must for some, may well not be needed by others.

To avoid burdening a significant group of users of the standard with technologies considered irrelevant, from the very beginning MPEG adopted the “profile approach”. This allows to retain a technology for those who need it without encumbering those who do not.

It is true that there are a few examples where some technologies in an otherwise successful standard get unused. Was adding such technologies a mistake? In hindsight yes, but at the time a standard is developed the future is anybody’s guess and MPEG does not want find out later that one of its standards misses a functionality that was deemed to be necessary in some use cases and that technology could support at the time the standard was developed.

For sure there is a cost in adding the technology to the standard – and this is borne by the companies proposing the technology – but there is no burden to those who do not need it because they can use another profile.

Examples of such “non-mainstream” technologies are provided by those supporting stereo vision. Since as early as MPEG-2 Video, multiview and/or 3D profile(s) have been present in most MPEG video coding standards. Therefore, this article will review the attempts made by MPEG at developing new and better technologies to support what are called today immersive experiences.

The early days

MPEG-1 did not have big ambitions (but the outcome was not modest at all ;-). MPEG-2 was ambitious because it included scalability – a technology that reached maturity only some 10 years later – and multiview. As depicted in Figure 1, multiview was possible because if you have two close cameras pointing to the same scene, you can exploit intraframe, interframe and interview redundancy.

Figure 1 – Redundancy in multiview

Both MPEG-2 scalability and multiview saw little take up.

Both MPEG-4 Visual and AVC had multiview profiles. AVC had 3D profiles next to multiview profiles. Multiview Video Coding (MVC) of AVC was adopted by the Blu-ray Disc Association, but the rest of the industry took another turn as depicted in Figure 2.

Figure 2 – Frame packing in AVC and HEVC

If the left and right frames of two video streams are packed in one frame, regular AVC compression can be applied to the packed frame. At the decoder, the frames are de-packed after decompression and the two video streams are obtained.

This is a practical but less that optimal solution. Unless the frame size of the codec is not doubled, you either compromise the horizontal or the vertical resolution depending on the frame-packing method used. Because of this a host of other more sophisticates, but eventually non successful, frame packing methods have been introduced into the AVC and HEVC standards. The relevant information is carried by Supplemental Enhancement Information (SEI) messages, because the specific frame packing method used is not normative.

The HEVC standard, too, supports 3D vision with tools that efficiently compress depth maps, and exploit the redundancy between video pictures and associated depth maps. Unfortunately use of HEVC for 3D video has also been limited.


The MPEG-I project – ISO/IEC 23090 Coded representation of immersive media – was launched at a time when the word “immersive” was prominent in many news headings. Figure 3 gives three examples of immersivity where technology challenges increase moving from left to right.

Figure 3 – 3DoF (left), 3DoF+ (centre) and 6DoF (left)

In 3 Degrees of Freedom (3DoF) the user is static but the head that can Yaw, Pitch and Roll. In 3DoF+ the user has the added capability of some head movements in the three directions. In 6 Degrees of Freedom the user can freely walk in a 3D space.

Currently there are several activities in MPEG that aim at developing standards that support some form of immersivity. While they had different starting points, they are likely to converge to one or, at least, a cluster of points (hopefully not to a cloud😊).


Omnidirectional Media Application Format (OMAF) is not a way to compress immersive video but a storage and delivery format. Its main features are:

  1. Support of several projection formats in addition to the equi-rectangular one
  2. Signalling of metadata for rendering of 360ᵒ monoscopic and stereoscopic audio-visual data
  3. Use of MPEG-H video (HEVC) and audio (3D Audio)
  4. Several ways to arrange video pixels to improve compression efficiency
  5. Use of the MP4 File Format to store data
  6. Delivery of OMAF content with MPEG-DASH and MMT.

MPEG has released OMAF in 2018 that is now published as an ISO standard (ISO/IEC 23090-2).


If the current version of OMAF is applied to a 3DoF+ scenario, the user may feel parallax errors that are more annoying the larger the movement of the head.

To address this problem, at the January 2019 meeting MPEG has issued a call for proposals requesting appropriate metadata (see the red blocks in Figure 4) to help the Post-processor to present the best image based on the viewer’s position if available, or to synthesise a missing one, if not available.

Figure 4 – 3DoF+ use scenario

The 3DoF+ standard will be added to OMAF which will be published as 2nd edition. Both standards are planned to be completed in October 2020.


Versatile Video Coding (VVC) is the latest in the line of MPEG video compression standards supporting 3D vision. Currently VVC does not specifically include full-immersion technologies, as it only supports omnidirectional video as in HEVC. However, VVC could not only replace HEVC in the Figure 4, but also be the target of other immersive technologies as will be explained later.

Point Cloud Compression

3D point clouds can be captured with multiple cameras and depth sensors with points that can number a few thousands up to a few billions, and with attributes such as colour, material properties etc. MPEG is developing two different standards whose choice depends on whether the points are dense (Video-based PCC) or less so (Graphic-based PCC). The algorithms in both standards are lossy, scalable, progressive and support random access to subsets of the point cloud. See here for an example of a Point Cloud test sequence being used by MPEG for developing the V-PCC standard.

MPEG plans to release Video-based Point Cloud Compression as FDIS in October 2019 and Graphic-based PCC Point Cloud Compression as FDIS in April 2020.

Next to PCC compression MPEG is working on Carriage of Point Cloud Data with the goal to specify how PCC data can be stored in ISOBMFF and transported with DASH, MMT etc.

Other immersive technologies


MPEG is carrying out explorations on technologies that enable 6 degrees of freedom (6DoF). The reference diagram for that work is what looks like a minor extension of the 3DoF+ reference model (see Figure 5), but may have huge technology implications.

Figure 5 – 6DoF use scenario

To enable a viewer to freely move in a space and enjoy a 3D virtual experience that matches the one in the real world, we still need some metadata as in 3DoF+ but also additional video compression technologies that could be plugged into the VVC standard.

Light field

The MPEG Video activity is all about standardising efficient technologies that compress digital representations of sampled electromagnetic fields in the visible range captured by digital cameras. Roughly speaking we have 4 types of camera:

  1. Conventional cameras with a 2D array of sensors receiving the projection of a 3D scene
  2. An array of cameras, possibly supplemented by depth maps
  3. Point clouds cameras
  4. Plenoptic cameras whose sensors capture the intensity of light from a number of directions that the light rays travel to reach the sensor.

Technologically speaking, #4 is an area that has not been shy in promises and is delivering on some of them. However, economic sustainability for companies engaged in developing products for the entertainment market has been a challenge.

MPEG is currently engaged in Exploration Experiments (EE) to check

  1. The coding performance of Multiview Video Data (#2) for 3DoF+ and 6DoF, and Lenslet Video Data (#4) for Light Field
  2. The relative coding performance of Multiview coding and Lenslet coding, both for Lenslet Video Data (#4).

However, MPEG is not engaged in checking the relative coding performance of #2 data and #4 data because there are no #2 and #4 test data for the same scene.


In good(?) old times MPEG could develop video coding standards – from MPEG-1 to VVC – by relying on established input video formats. This somehow continues to be true for Point Clouds as well. On the other hand, Light Field is a different matter because the capture technologies are still evolving and the actual format in which the data are provided has an impact on the actual processing that MPEG applies to reduce the bitrate.

MPEG has bravely picked up the gauntlet and its machine is grinding data to provide answers that will eventually lead to one or more visual compression standards to enable rewarding immersive user experiences.

MPEG is planning a “Workshop on standard coding technologies for immersive visual experiences” in Gothenburg (Sweden) on 10 July 2019. The workshop, open to the industry, will be an opportunity for MPEG to meet its client industries, report on its results and discuss industries’ needs for immersive visual experiences standards.

Posts in this thread