Standards and collaboration

The hurdles of standardisation today

Making standards is not like any other tasks. In most cases it is technical in nature because it is about agreeing on and documenting how certain things should be done to claim to be conforming to the standard. Standards can be developed unilaterally by someone powerful enough to tell other people how they should do things. More often, however, standards are developed collaboratively by people who share an interest in a standard, i.e. in enabling those who are willing to do certain things in the same way to have an agreed reference.

Many years ago, making a standard required that those who developed it just talk to people in their environment. Before MPEG all television distribution industries were silos sharing at most some technologies here and there. This is shown in Figure 1.

Figure 1 – The video industry – Before MPEG

By specifying a common “digital baseband” layer, MPEG standards prompted industry convergence, as shown in Figure 2.

Figure 2 – The video industry – After MPEG

Today, and especially in the domain of digital media, it is common not to have the luxury of defining a standard in isolation. Systems get more and more complex and their individual elements – which may be implementations of  standards – have to interact with other elements – which are again possibly implementations of other standards.

Some of these standards are produced by the same standards organisation while other standards are produced by different organisations. How is it possible to make sure that the “standard” elements used to make the system fit nicely, if there is no one overseeing the overall process?

The answer is that, indeed, it is not possible. If it happens it is because of luck or because there were enough people of good will who cared to attend the different groups to ensure coordination.

In some cases, all standards used to make the systems are produced by groups belonging to the same standards organisation. Some of these organisations, however, think that they can solve the problem of interoperability of standards by defining precise borders (“scopes”) within which a group of experts is allowed to develop standards.

This approach probably worked decently well in the past that is represented by Figure 1. However, this approach is destined to become less and less practical to implement and the result to become less and less satisfactory and reliable.

Many standards for use today must be conceived more on their ability to integrate or interface to technologies from different sources than on the traditional “territory” delimited by the “scope” or “terms of reference” etc. of the group that created it. This trend will only continue in the future. A new approach to standardisation must be developed and put to work.

A “systems-based” approach to standardisation

That the scope-based approach to standardisation is no longer serving its original purpose does not mean that it should be abandoned. It should just be given a different purpose. So far, the “scope” was more like the ring of walls that protected medieval towns against invasions. However, the scope should become an area of competence where “gates” can be “opened” so that “alliances” with other groups can be stipulated.

MPEG has put this attitude into practice for many years. The success of MPEG standards is largely based on this attitude.

Here follows a list of cases.

Collaboration with ISO/TC 276 for the creation of a standard for DNA read compression

In the first half of 2010’s MPEG identified “compression of DNA reads” generated by high-speed sequencing machines as an area where its coding expertise can be put to good use. MPEG investigated the field and identified a first set of requirements. As DNA can certainly not be assimilated to “moving pictures and audio” (the area MPEG is competent for) MPEG experts met with TC 276 Biotechnology to present their findings and propose a collaboration.

This move was positively received because TC 276 was indeed in need for such a standard but did not have the expertise to develop it. Therefore, MPEG and TC 276 engaged in a joint effort to refine the requirements of the project.

Then TC 276 entrusted the development of the standard (called MPEG-G) to MPEG on condition of regular reports to TC 276. Ballots on the standard at different phases of development were managed by MPEG, and the results were reported to TC 276.

Today the joint MPEG-TC 276 “venture” has produced 3 standards (File format, Compression, and API and Metadata), is finalising two standards (Reference software and Conformance) and has issued a Joint Call for Proposals for a 6th standard on “Genomic Annotation Representation”.

This is an excellent example of MPEG “entrepreneurship”. Some experts saw the opportunity to develop a DNA read compression standard using the MPEG “toolkit”. They “opened a gate” to communicate with the Biotechnology world and were lucky to find that Biotechnology was equally happy to “open a gate” on their side.

Collaboration with a non-ISO standards group in need of standards MPEG can develop

The MPEG-4 project, started in 1993 (!), has been the first consistent effort by MPEG to provide standards usable by the IT and mobile industry. The 3rd Generation Partnership Project (3GPP), so named because it started in December 1998, at the time of 3G (now we are at 5G and looking forward to 6G) is a very successful international endeavour providing standards for the entire protocol stack needed by the mobile industry (that largely includes the IT industry).

Quite a few MPEG experts attend 3GPP meetings. They are best placed to understand 3GPP’s early standardisation needs. Here I will mention two successful cases.

3GPP needed a file format for multimedia content and MPEG had developed the ISO Based Media File Format (ISOBMFF, aka MP4 File Format). MPEG liaised with 3GPP using its common members, understood the requirements and developed a specification that is essentially a restriction of ISOBMFF (ETSI TS 126 244).

More recently (end of 2010’s), 3GPP has initiated studies on adaptive streaming of time-dependent media. MPEG experts attending 3GPP saw the opportunity and convinced 3GPP that they should entrust to MPEG the development of the standard. MPEG developed requirements that were checked for consistency with 3GPP needs at 3GPP meetings by the common MPEG-3GPP experts. MPEG developed 3GPP-DASH standard and the experts attending both MPEG and 3GPP relayed the necessary information to 3GPP and checked that the choices made by MPEG were agreeable to 3GPP. The 3GPP-DASH specification is ETSI TS 126 247.

In the case of DASH, an industry forum (DASH-IF) was formed to handle the needs of industry members who cannot afford to join MPEG. Experts attending both MPEG and DASH-IF relay information in both directions. The information brought to MPEG has given and is still giving rise to amendments to the DASH standard supporting more functionalities.

DASH is again an excellent example of MPEG entrepreneurship. MPEG “opened gates” to DASH that are still very busy and connect to many more external “gates”, e.g. Digital Video Broadcasting) DVB, Hybrid Broadcast Broadband TV (HbbTV).

Collaboration with an ISO/IEC committee needing MPEG standards to enhance use of its standards

MPEG “opened gates” to JPEG to respond to its needs for “Systems” support to its standards.

The original JPEG image compression standards was widely used in the early days of digital video because it could use inexpensive VLSI chips implementing the relatively simple JPEG codec to store and transmit sequences of individual images (video frames). However, there was no specification for this “Motion JPEG”.

In the early 2000’s, JPEG 2000 appeared as the next generation image compression standard and JPEG needed a file format to store and transmit sequences of individually JPEG 2000 coded images. MPEG gladly adapted the ISOBMFF to make it able to carry sequences of JPEG 2000 and original JPEG images. The file format has allowed wider use of JPEG 2000, e.g. by the movie industry.

A related case is provided by the JPEG need to enable transport of two image compression formats – JPEG 2000 and JPEG XS – on the successful MPEG transport standard, MPEG-2 Transport Stream. For both case MPEG received requests with a first set of requirements. It analysed the requests, added other requirements and sent them back to JPEG. An occasional face-to-face meeting was needed to close the requirements and to provide suggestion for minor extensions to the JPEG standard.

MPEG developed and balloted the amendment to carry JPEG 2000 and JPEG XS on MPEG-2 Transport Stream.

Collaboration to develop a specific instance of an ISO/IEC committee’s general standard

MPEG has two instances of this form of collaboration: Internet of Media Things (IoMT) and Network Based Media Processing (NBMP). The former is about APIs for discovery of and interaction between “Media Things” (e.g. cameras, microphones, displays and loudspeakers) communicating according to the Internet of Things (IoT) paradigm. The latter is a set of APIs allowing a device (e.g. a handset) to get some processing on media done by a networked service.

In JTC 1 MPEG stands out for its standards because they offer interoperability between implementations as opposed to most other standards which are about frameworks and architectures. This does not mean that MPEG does not need architectures. It needs them but it makes no sense to develop its own architectures. Much better if its architectures are specific instances of general architectures. This is true of IoMT and NBMP.

SC 41 was in the process of developing a general architecture for Internet of Things (IoT). MPEG developed a draft architecture and had it validated by SC 41.

SC 42 has developed a general architecture for Big Media Data. MPEG is developing Network based Media Processing (BNMP), which can be seen as an instance of the general Big Media Data architecture. Work on aligning the architectures of the two development is progressing.

MPEG collaborates on a standard that is also of interest to another ISO/IEC committee

This is the case of the Mixed and Augmented Reality Reference Model that MPEG has jointly developed with SC 24. This happened because SC 24 needed a framework standard for Mixed and Augmented Reality, from the architectural viewpoint and MPEG had similar interests, but from the bit-level interoperability viewpoint. SC 24 and MPEG agreed on the requirements for the standard and established a joint group (in this case, a Joint Ad hoc Group) with terms of reference, two chairs (one for SC 24 and one for MPEG) and a timeline. Ballots were handled by the SC 24 secretariat and the Joint Ad hoc Group resolved the comments from both NBs.

Collaboration to enable MPEG to develop an extension of one of its standards that falls under another ISO/IEC committee’s scope

This case is exemplified by a scenario under which MPEG and JPEG have collaborated towards a new image coding standard that is based on an MPEG moving picture coding standard.

This happened because conventional video coding standards need to support “clean” switching between different channels in broadcast applications, and random access for other use cases. This allows a decoder to reconstruct certain pictures in a video sequence (intra pictures), independently from other pictures in that sequence.

MPEG wished to develop the High Efficiency Image Format (HEIF) by defining a special case of the ISOBMFF relative to the HEVC intra picture mode. In a face-to-face meeting this goal was agreed and HEIF is now a successful file format supporting many modalities of interest to users.


The scope of work of an ISO/IEC committee is certainly useful as a reference. However, the current trend toward more convergence and more complex systems that rely on multiple standards requires a more flexible “gate” approach exemplified above. A committee may “open gates” toward another committee and the two may committees agree on developing specific projects. This approach does not work in a “defence of territory” mode where collaborations are seen as limiting a committee’s freedom, but by seeing collaborations with other committees and groups as opportunities to develop standards with a larger field of use where the constituencies of both committees share the benefits.

The examples mentioned in this article are actual cases that show how the extent of the MPEG scope and the modalities of collaboration used have been made possible by the use of the “gate” approach to develop collaborative standards.

Posts in this thread




The talents, MPEG and the master


In the parable of the Talents the Gospel tells the story of a master who entrusts 5 talents (a large amount of money at that time) to one servant and 2 talents to another before leaving for a long travel. The first servant works hard and doubles his talents, while the second plays safe and buries the talents. When the master returns, he awards the first servant and punishes the second.

Thirty-one years ago, MPEG was given the field of standards for coding of moving pictures and audio to exploit. Now the master comes. To help him make the right judgement about the use of the the talents that he gave, I will briefly review the milestones reached in these years. Of course, I am not going to revisit all the MPEG standardisation areas developed in the last 31 years. There are several posts in this blog (see the list at the bottom of the page) and in the book A vision made real – Past, present and future of MPEG), I will just take some snapshots of the major achievements.

Making some media digital

Before MPEG-1 there had been attempts at making media digital, but MPEG-1 was the first standards that made the media really digital in consumer products: Video CD brought movies on CD, Digital Audio Broadcasting (DAB) made the first digital radio and MP3, well, that simply created the new music experience triggering a development that continues to this day. This was possible thanks to the vision that a global audio-video-systems standard would take over the world. It did.

Making television digital

MPEG-1 did not make all media digital, television was the major exception. This was an intricate world where politics, commercial interests, protection of culture and more had defied all attempts made by established standards organisations. MPEG applied its recipe and produced an effective MPEG-2 specification that added DSM-CC to support TV distribution on cable. Sharp vision, excellent technology and unstinting promotion efforts delivered the result.

Making media ICT friendly

MP3 encoding and decoding on PC was achieved in the early days of the standard, but an announcement by Intel that MPEG-2 video could be decoded in real time on their x86 chips made headlines. The real marriage between media and ICT – defined as IT + mobile – was the planned result of MPEG-4. Two video standards in sequence (Visual and AVC), the ultimate audio format (AAC in all its variations), the File Format (ISO Based Media File Format – ISOBMFF), Fonts (Open Font Format) and a lot of other standard technologies still largely in use today in spite of the fast-evolving technology scenario.

Media not just for humans

MPEG-7, conceived in the mid-1990’s, was a project ahead of its time.MPEG-7, conceived in the mid-1990’s, was a project ahead of its time. It was triggered by the vision that 500 TV channels would become available thanks to the saving of MPEG-2 on cable with the technology of the time. The idea was to enable the description of content – audio, video and multimedia – in the same bit-thrifty way as MPEG had done for MPEG-1/-2 and was doing for MPEG-4. Then descriptions would be distributed to machines to enable them to respond to human queries. Audio-Visual Description Profile (AVDP) is an example of how MPEG-7 is used in the content production world, but more is expected in the upcoming Video Coding for Machines work.

E-commerce of media

Around the turn of the millennium, there was an intense debate on how media could be handled in the new context enabled by MPEG standards. This had been triggered by the advent of Peer-to-Peer protocols that allowed new forms of distribution somehow at odds with practices and laws. With MPEG-21 MPEG developed a comprehensive framework and a suite of standards to enable e-commerce of media that respected the rights and interests of the parties involved. Some of these are the specification of: Digital Item (DI), identification of DIs and its components, protection of DIs, machine-readable languages to express rights and contracts, adaptation of DIs and more. Industry has taken pieces of MPEG-21, but not the entire framework Industry has taken pieces of MPEG-21, but not the entire framework yet.

Standards for media combinations

At the beginning of the new millennium MPEG had collected enough standards that the following question was asked: how can we combine a set of content items each represented by MPEG standards or, when MPEG standards are not available, by other standards, in a standard way? This was the start of MPEG-A, a suite of standards Multimedia Application Formats (MAF). Examples are Surveillance AF, Interactive Music AF (IMAF), Augmented Reality AF ARAF), Common Media AF (CMAF) and Multi-Image (MIAF). CMAF is actually affecting millions of streaming devices today.

Systems-Video-Audio à la carte

With the main elements of the MPEG-4 standard in place, MPEG had the need for systems, video and audio standards without being able to define a unified standard. This was the birth of 3 standard suites: MPEG-B (Systems), MPEG-C (Video) and MPEG-D (Audio). Among the most relevant standards we mention Common encryption format (CENC) for ISOBMFF and MPEG-2 TS, Reconfigurable Video Coding (RVC) and Unified speech and audio coding (USAC). The last is the only standard that is capable to encode audio and speech with a quality superior to the best audio codec and the best speech codec.

Interacting with media

Media can be defined as virtual representations of audio and video information that match, hopefully in a faithful way, something that exists in the real world, or a representation of synthetically-generated audio and video information, or a mix of the two. MPEG started to tackle this issue in the middle of the first decade at the time Second Life offered an attractive paradigm for interaction with synthetically-generated audio and video information. MPEG developed MPEG-V, a framework and a suite of standards for the information flowing from sensors and to actuators and the characteristics of virtual world objects.

Getting media in any way

Broadcasting was the first system for mass distribution of media – audio and video. Originally, it was strictly one way, cable added return information, then the telecommunication networks provided the technical means to achieve full two-way distribution. With its MPEG-2 standard, MPEG provided the full stack from transport up. This was universally adopted by broadcasting, but the Internet Protocol (IP) was the transport selected for telecom distribution. With MPEG-H, MPEG provided a unified solution where content meant for one-way distribution can seamlessly distributed in a two-way fashion. With this Systems-Video-Audio based suite of standards MPEG has achieved unification of media distribution.

Facing an unpredictable internet

Probably most readers have never heard of the Asynchronous Transfer Mode (ATM), designed to transport fixed-size packets on a fixed route between two points before transferring data. ATM’s AAL1 could have guaranteed bandwidth, but had to give way to the leaner and cheaper IP. The successful digitatisation we live in is paid by unpredictability. You start with a good bandwidth between you and the source, but a moment later the bandwidth available is cut by half. A disaster for those who want to provide reliable services. MPEG-DASH is the standard that allows a consumer device to request (video, mostly) information of the appropriate bitrate matching the bitrate made available by the network at a given instant.

The immersive media dream

MPEG has dreamt for a quarter of century to immerse in media 😊. In the second half of the 1990’s MPEG developed the MPEG-2 Multiview Profile, the first attempt at providing the two eyes of a viewer with the kind of different information the eyes receive when they are hit by the light reflected by an object. The latest attempts were the Multiview and 3D extension attempts to HEVC. Technology is maturing, but many the context is far from stable as companies providing solution come and go. MPEG is developing standards in this slippery space based on 6 keypoints:

  1. Architecture for immersive services;
  2. Omnidirectional MediA Format (OMAF) for omnidirectional media applications (e.g. 360° video) and a basis for integration of other technologies;
  3. Immersive video starting from 3DoF+;
  4. Immersive audio (6DoF);
  5. Point Clouds providing a easy way to manipulate 3D visual objects;
  6. Network based Media Processing (NBMP) to allow a user to get the network to do some processing of their media.

Media devices are Things

The Internet of Things (IoT) paradigm is well known but how can we apply the general IoT paradigm to media? MPEG-IoMT (Internet of Media Things) is an MPEG standard suite providing interfaces, protocols and associated media-related information representations that enable advanced services and applications based on human to device and device to device interaction. IoMT will be the platform on which new standards such as Video Coding for Machines will be hosted.

More supple compression

MPEG video coding standards have been hugely successful. However, in certain domain, such as internet streaming adoption encounters non-technical difficulties. Essential Video Coding (EVC) is the standard that will yield excellent performance with the prospect of an easier licensing.

Compression for all

MPEG has developed an impressive number of technologies whose focus is on compression and transport of data. Some are strictly media-related. Others, however, have a more general applicability. That this is true and can be implemented is demonstrated by MPEG-G, a standard that allows efficient transport of DNA reads obtained by high-speech sequencing machines. MPEG-G compression is lossless and will allow savings on storage and transmission costs and in access to DNA information for clinical analyses.

The master returns

The master had a really long travel – 31 years – but has finally returned. Will he say to MPEG: “Well done, good and trustworthy servant; you have been trustworthy in a few things, I will put you in charge of many things; enter into the joy of your master” or will he say: “throw this lazy servant into the outer darkness, where there will be weeping and gnashing of teeth”?

Posts in this thread



Standards and business models


Some could think that the title is an oxymoron. Indeed standards, certainly international ones, are published by not-for-profit organisations. How could they have a business model?

The answer is that around a standard there are quite a few entities, some of which are far from being not-for-profit.

Therefore, this article intends to analyse how business models can influence standards.

The actors of standardisation

Let’s first have a look at the actors of standardisation.

  1. The first actor is the organisation issuing standards. It may be an international organisation such as ISO, IEC or ETSI, or a trade association or an industry forum, but the organisation itself has not been designed to make money. A typical arrangement is a membership fee that allows an individual or a company employee to participate. Another is to make users of the standard pay to obtain the specification
  2. The second actor is the staff of the standards developing organisation. Depending of the type of organisation their role may be marginal of highly influential
  3. The third actor is the company who is a member of the organisation issuing standards.
  4. The fourth actor is the expert, typically the personnel sent by the company to contribute to the development of the standards.

From the interaction of these actors, the the standard is created, Then the standard creates an ecosystem. Companies become member of the ecosystem.

Why do companies participate in standard development?

Here is an initial list of motivations prompting companies to send their personnel to a standards committee.

  1. A company is interested in shaping the landscape of how a new technology will be used by concerned companies or industries. This is the case of Artificial Intelligence (AI), a technology that has recently matured and whose use has different sometimes unexpected implications. JTC 1/SC 42 has recently been formed to define AI architectures, frameworks, models etc. This kind of participation is not exclusive of companies. Universities find it useful to join this “exploratory” work because it may help them identify new research topics.
  2. A company is interested in developing a new product or launch a new service that requires a new standard technology.
  3. A company may be obliged by national regulations to participate in the development of a standard
  4. A company or, more and more often a university, owns technology it believes is useful or even required to draft a standard that a committee plans to develop. Again, a relevant case for this is MPEG where the number of Non-Performing Entities (NPE) is on the rise.
  5. A university or, not infrequently, a company wants to keep abreast of what is going on in a technology field or become aware as early as possible of the emergence of new standards that will affect its domain. MPEG is a typical case because it is a group open to new ideas and is attended by all relevant players.

Not all standards are born equal

The word “equal” in the title does not imply that there is a hierarchy of standards where some are more important than others. It simply means that the same name “standard” can be attached to quite different things.

The compact disc (CD) can be taken as the emblem of a traditional standard. Jointly developed by Philips and Sony, the CD quickly defeated the competing product by RCA and became the universal digital music distribution medium. The technical specification of the CD was originally defined in the Red Book and later became the IEC 60908 standard.

MPEG introduced a new process that replaced the development of a product, the marketplace success and the eventual ratification by a recognised standards organisation. This is how the process can be summarised:

  1. Identify the need of a new standard
  2. Develop requirements
  3. Issue call for proposals (CfP)
  4. Integrate technologies obtained from the CfP
  5. Draft the standard

In the early MPEG days, most participants were companies interested in developing new products or launch new services. They actively contributed to the standards because they needed it but also because they had relevant technologies developed in their laboratories.

Later the range of contributors to standard development got larger. The fact that in the mid- 1990’s a patent from Columbia University, clearly an NPE, had been declared essential in MPEG-2 video made headlines and prompted many to follow suit. The trend so initiated continues to this day.

After MPEG-2 the next step was to revive the old model represented by the CD. MPEG-4 became just one “product” but other companies developed other “products” some of which got a recognition as “standard” by a professional organisation. The creation of such standards implied the conversion of an internal company specification to the standard format of the professional organisation. The use of those “standards” was “free” in the sense that there no fees were charged for their use. However, other strings of less immediate comprehension to laymen were typically attached.

MPEG (formally WG 11) is about “coding of moving pictures and audio”, but a parallel group called JPEG (formally WG 1) is about “coding of digital representations of images”. The two groups operate based on different “business models”. Today the ubiquitous JPEG standard for image compression is royalty free because the 20-year validity of any patents has been largely overcome. However, even before the 20-year limit was crossed, the JPEG standard could be used freely with no charge. The same happened to the less famous but still quite important JPEG2000 standard used for movie distribution and to the less used JPEG XR standard.

More recently a consortium was formed to develop a royalty-free video compression specification. In rough, imperfect but sufficiently descriptive words, members of that consortium can freely use the specification in their products and services.

The business model of a standard is a serious matter

From the above we see that working for a standard has the basic motivation of creating a technology to enable a certain function in an ecosystem. The ecosystem can be anything from the ensemble of users of a product/service of a company, a country, an industry or the world at large. Beyond this common motivation, however, a company contributing to the development of a standard can have widely different motivations that I simplify as follows

  1. The common technology is encumbered because, by rewarding inventions, the ecosystem has embedded the means to constantly innovate its enabling technologies for new products and services. This is the basis of the MPEG business model that has ensured 30 years of development to the digital media industry. It has advantages and disadvantages
    1. The advantage of this model is that, once a licence for the standard has been defined, no one can hold the community hostage.
    2. The disadvantage is that getting agreement to the licence may prove difficult, thus disabling or hampering the business of the entire community.
  2. The common technology is “free” because the members of the ecosystem have assessed that they do not have interest in the technology per se but only in the technology as an enabler of other functions on which their business is built. This is the case of Linux/Android and most web technologies. Here, too, there are advantages and disadvantages
    1. The advantage of this model is that anybody can access the technology by accepting the “free” licence.
    2. The disadvantage is that a member of the ecosystem can be excluded for whatever reason and have its business ruined.

Parallel worlds

It is clear now that “standard” is a name that can be assigned to things that have the promotion of the creation of an ecosystem in common but may be very different otherwise. The way the members of the ecosystem operate is completely different depending on whether the standard is encumbered or free.

Let’s see the concrete cases of MPEG and JPEG. In the late 1980’s they started as two group with roughly the same size (30 people). Thirty years later MPEG has become a 600-member group and JPEG a 60-member group. In spite of handling similar technologies, less than 1% of MPEG members attend JPEG meetings. Why?

The answer is because MPEG decided (more correctly, was forced by the very complex IP environment of video and audio coding) to adopt the encumbered standard model while JPEG could decide to adopt the free standard model. In the last 30 years companies have heavily invested in MPEG standards because they have seen a return from that investment, and a host of new companies were created and are operating thanks to the reward coming from their inventions. JPEG developed less because fewer companies saw a return from the free standard business model.

A low number of common members exists between MPEG and JPEG because the MPEG and JPEG business models are antithetical.


I would like to apply the elements above to some current discussions where some people argue that, since JPEG and some MPEG experts have similar expertise, we should put them together to make “synergy”.

The simple answer to this argument is that it would be foolish to do that. JPEG people produce free standards because, those who have a business in mind, want to make money from something else that is enabled by the free standard. If JPEG people are mixed with MPEG people who want encumbered standards, the business of JPEG people is gone

People have better play the game they know, not improvise competences in things they don’t know. It is more or less the same story as in Einige Gespenster gehen um in der Welt – die Gespenster der Zauberlehrlinge.

The right solution is MPEGfuture.

Posts in this thread

On the convergence of Video and 3D Graphics

Table of contents


For a few years now, MPEG has explored the issue of to efficiently represent (i.e. compress) data from a range of technologies offering users dynamic immersive visual experiences. Here the word “dynamic” captures the fact that the user can have an experience where objects move in the scene as opposed to being static.

Being static and dynamic may not appear to be a conceptually important difference. In practice, however, products that handle static scenes may be orders of magnitude less complex than those handling dynamic scenes. This is true both at the capture-encoding side and at the decoding-display side. This consideration implies that industry may need standards for static objects much earlier than for dynamic objects.

Industry has guided MPEG to develop two standards that are based on two approaches that are conceptually similar but are targeted to different aoolications and involve different technologies:

  1. Point clouds generated by multiple cameras and depth sensors in a variety of setups. These may contain up to billions of points with colours, material properties and other attributes to offer reproduced scenes characterised by high realism, free interaction and navigation.
  2. Multi-view videos generated by multiple cameras that capture a 3D scene from a pre-set number of viewpoints. This arrangement can also provide limited navigation capabilities.

The compression algorithms employed for the two sources of information have similarities and differences as well, and the purpose of this article is to briefly describe the algorithms involved in a general point cloud and in the particular case that MPEG calls 3DoF+ (central case in Figure 1), investigate to what extent the algorithms are similar and different, they can share technologies today and in the future.

Figure 1 – 3DoF (left), 3DoF+ (centre) and 6DoF (left)

Computer-generated scenes and video are worlds apart

A video is composed of a sequence of matrices of coloured pixels, but a computer-generated 3D scene and its objects are not represented like a video, but by geometry and appearance attributes (colour, reflectance, material…). In other words, a computer-generated scene is based on a model.

Thirty-one years ago, MPEG started working on video coding and 7 years later did the same for computer-generated objects. The (ambitious) title of MPEG-4 “Coding of audio-visual objects” signalled MPEG’s intention to handle the two media types jointly.

Until quite recently the Video and 3D Graphics competence centres (read Developing standards while preparing for the future to know more about how work in MPEG is carried out by competence centres and units) have largely worked independently until the need to compress real world 3D objects in 3D scenes has become important to industry.

The Video and 3D Graphics competence centres attacked the problem using their own specific backgrounds: 3D Graphics used Point Cloud because it is a 3D graphics representation (it has geometry), while Video used the videos obtained from a number of cameras (because they only have colours).

Video came up with a solution that is video based (obviously, because there was no geometry to encode) and 3D Graphics came up with two solutions, one which encodes the 3D geometry directly (G-PCC) and another which projects the Point Cloud objects on fixed planes (V-PCC). In V-PCC, it is possible to apply traditional video coding because geometry is implicit.

Point cloud compression

MPEG is currently working on two PCC standards: G-PCC standard, a purely geometry-based approach without much to share with conventional video coding and on V-PCC that is heavily based on video coding. Why do we need two different algorithms? Because G-PCC does a better job in “new” domains (say, automotive) while V-PCC leverages video codecs already installed on handsets. The fact that V-PCC is due to become FDIS in January 2020, makes it extremely attractive to an industry where novelty in products is a matter of life or death.

V-PCC seeks to map a point of the 3D cloud to a pixel of a 2D grid (an image). To be efficient, this mapping should be as stationary as possible (only minor changes between two consecutive frames) and should not introduce visible geometry distortions. Then the video encoder can take advantage of the temporal and spatial correlations of the point cloud geometry and attributes by maximising temporal coherence and minimising distance/angle distortions.

A 3D to 2D mapping guarantees that all the input points are captured by the geometry and attribute images so that they can be reconstructed without loss. If the point cloud is projected to the faces of a cube or a sphere bounding the object does not guarantee lossless reconstruction because auto-occlusions (points projected in the same 2D pixel are not captured) may generates significant distortions.

To avoid these negative effects, V-PCC decomposes the input point cloud into “patches”, which can be independently mapped to a 2D grid through a simple orthogonal projection. Mapped patches do not suffer from auto-occlusions and do not require re-sampling of the point cloud geometry and can produce patches with smooth boundaries, while minimising the number of patches and mapping distortions. This is an NP-hard optimization problem that V-PCC solves by applying the heuristic segmentation approach of Figure 2.

Figure 2: from point cloud to patches

An example of how an encoder operates is provided by the following steps (note: the encoder process is not standardised):

  1. At every point the normal on the point cloud “surface” is estimated;
  2. An initial clustering of the point cloud is obtained by associating each point to one of the six planes forming the unit cube (each point is associated with the plane that has the closest normal). Projections on diagonal planes are also allowed;
  3. The initial clustering is iteratively refined by updating the cluster index associated with each point based on its normal and the cluster indexes of its nearest neighbours;
  4. Patches are extracted by applying a connected component extraction procedure;
  5. The 3D patches so obtained are projected and packed into the same 2D frame.
  6. The only attribute per point that is mandatory to encode is the color  (see right-hand side of Figure 3); other attributes, such as reflectance or material properties can be optionally encoded.
  7. The distances (depth) of the point to the corresponding projection plane are used to generate a gray-scale image which is encoded using a traditional video codec. When the object is complex and several points project to the same 2D pixel, two depth layers are used encoding near plane and far plane (see left-hand side Figure 3 with one single depth layer).

Figure 3: Patch projection

3DoF+ compression

3DoF+ is a simpler case of the general visual immersion case to be specified by part 12 Immersive Video in MPEG-I. In order to provide sufficient visual quality for 3DoF+, a large number of source views need to be used, e.g. 10 ~ 25 views for a 30cm radius viewing space. Each source view can be captured as omnidirectional or perspectively projected video with texture and depth.

If such large number of source views were independently coded with legacy 2D video coding standards, such as HEVC, an unpractically high bitrate would be generated, and a costly large number of decoders would be required to view the scene.

The Depth Image Based Rendering (DIBR) inter-view prediction tools of 3D-HEVC may help to reduce the bitrate, but the 3D-HEVC codec is not widely deployed. Additionally, the parallel camera setting assumption of 3D-HEVC may affect the coding efficiency of inter-view prediction with arbitrary camera settings.

MPEG-I Immersive Video targets the support of 3DoF+ applications, with a significantly reduced coding pixel rate and limited bitrate using a limited number of legacy 2D video codecs applied to suitably pre- and post-processed videos.

The encoder is described by Figure 4.

Figure 4: Process flow of 3DoF encoder

An example of how an encoder operates is described below (note that the encoding process is not standardised):

  1. Multiple views (possibly one) are selected from the source views;
  2. The selected source views are called basic views and the non-selected views additional views;
  3. All additional views are pruned by synthesizing basic views to the additional views to erase non-occluded area;
  4. Pixels left in the pruned additional views are grouped into patches;
  5. Patches in a certain time interval may be aggregated to increase temporal stability of the shape and location of patches;
  6. Aggregated patches are packed into one or multiple atlases (Figure 5).

Figure 5: Atlas Construction process

  1. The selected basic view(s) and all atlases with patches are fed into a legacy encoder (an example of how an input looks like is provided by Figure 6)

Figure 6: An example of texture and depth atlas with patches

The atlas parameter list of Figure 4 contains: a list of starting position in atlas, source view IDs, location in source view and size for all patches in the atlas. The camera parameter list comprises the camera parameters of all indicated source views.

At the decoder (Figure 7) the following operations are performed

  1. The atlas parameter and camera parameter lists are parsed from the metadata bitstream;
  2. The legacy decoder reconstructs the atlases from the video bitstream;
  3. An occupancy map with patch IDs are generated according to the atlas parameter list and decoded depth atlas;
  4. When users watch the 3DoF+ content, the viewports corresponding to the position and orientation of their head are rendered using patches in the decoded texture and depth atlases, and corresponding patch and camera parameters.

Figure 7: Process flow of 3DoF+ decoder

Figure 8 shows how the quality of synthesised viewports decreases with decreasing number of views. With 24 views the image looks perfect, with 8 views there are barely visible artefacts on the tube on the floor, but with only two views artefacts become noticeable. The goal of 3DoF+ is to achieve the quality of the leftmost image when using the bitrate and pixel rate for the rightmost case.

Figure 8: Quality of synthesized video as a function of the number of views

Commonalities and differences of PCC and 3DoF+

V-PCC and 3DoF+ can use the same 2D video codec, e.g. HEVC. For 3DoF+, input to the encoder and output from the decoder are sequences of texture and depth atlases containing patches, which are somewhat similar to V-PCC patches, sequences of geometry/attribute video data also containing patches.

Both 3DoF+ and V-PCC have metadata describing positions and parameters for patches in atlas or video. But 3DoF+ should describe the view ID each patch belongs to and its camera parameters to support flexible camera setting, while V-PCC just needs to indicate which of the 6 fixed cube-faces each patch bonds to. V-PCC does not need metadata of camera parameters.

3DoF+ uses a renderer to generate synthesised viewport at any desired position and towards any direction, while V-PCC re-projects pixels of decoded video into 3D space to regenerate the point cloud.

Further, the V-PCC goal is to reconstruct the 3D model, in order to obtain the 3D coordinates for each point. For 3DoF+, the goal is to obtain some additional views by interpolation but not necessarily any possible view. While both methods use patches/atlases and encode them as video + depth, the encoders and decoders are very different because the input formats (and, implicitly, the output) are completely different.

The last difference is how the two groups developed their solutions. It is already known that G-PCC has much more flexibility in representing the geometry than V-PCC. It is also expected that compression gains will be bigger for G-PCC than for V-PCC. However, the overriding advantage of V-PCC it that is can use using existing and widely deployed video codecs. Industry would not accept dumping V-PCC to rely exclusively on G-PCC.

How can we achieve further convergence?

You may ask: I understand the differences between PCC and 3DoF+, but why was convergence not identified at the start? The answer depends on the nature of MPEG.

MPEG could have done that if it were a research centre. At its own will it could put researchers together on common projects and give them the appropriate time. Eventually, this hypothetical MPEG could have merged and united the two cultures (within its organisation, not the two communities at large), identified the common parts and, step by step, defined all the lower layers of the solution.

But MPEG is not a research centre, it is a standards organisation whose members are companies’ employees “leased” to MPEG to develop the standards their companies need. Therefore, the primary MPEG task is to develop the standards its “customers” need. As explained in Developing standards while preparing for the future, MPEG has a flexible organisation that allows it to accomplish its primary duty to develop the standards that industry needs while at the same time explore the next steps.

Now that we have identified that there are commonalities, does MPEG need to change its organisation? By all means no. Look at the MPEG organisation of Figure 9

Figure 9 – The flat MPEG organisation

The PCC work is developed by a 3DG unit (soon to become two because of the widely different V-PCC and G-PCC) and the 3DoF+ standard is developed by a Video unit. These units are at the same level and can easily talk to one another now because they have concrete matters to discuss, even more than they did before. This will continue for the next challenges of 6DoF where the user can freely move in a virtual 3D space corresponding to a real 3D space.

The traditional Video and 3D Graphics tools can also continue to be in the MPEG tool repository and continue to supplemented by new technologies that make them more and more friendly to each other.

This is the power of the flat and flexible MPEG organisation as opposed to a hierarchical and rigid organisations advocated by some. A rigid hierarchical organisation where standards are developed in a top-down fashion is unable to cope with the conflicting requirements that MPEG continuously faces.


MPEG is synonymous of technology convergence and the case illustrated in this paper is just the most recent. It indicates that more such cases will appear in the future as more sophisticated point cloud compressions will be introduced and technologies supporting the full navigation of 6DoF will become available.

This can happen without the need to change the MPEG organisational structure because the MPEG organisation has been designed to allow units interact in the same easy way if they are in the same competence centre or in different ones.

Many thanks to Lu Yu (Zhejiang University) and Marius Preda (Institut Polytechnique de Paris) who are the real authors of this article.

Posts in this thread



Developing standards while preparing for the future


In Einige Gespenster gehen um in der Welt – die Gespenster der Zauberlehrlinge I described the case of an apprentice who operates in an environment that practices some mysterious arts (that in the past could have been called sorcery). Soon the apprentice wakes up at the importance of what he is learning/doing and thinks he has learned enough to go his own way.

The reality is that the apprentice has not learnt yet the art not because he has not studied and practiced it diligently and for enough time. He has not learnt it because no one can say “I know the art”. One can say to have practiced the art for a certain time, to have got that much experience or to have had this and that success (better not talk of failures or, better, to talk of successes after failures). The best one can say is that successes were the result of a time-tested teamwork.

Nothing fits more this description than the organisation of human work and this article will deal with how MPEG has developed a non-apprentice based organisation.

Work organisation across the millennia

Thousands of years ago (even rather recently) there was slave labour. By “owning the humans” people assumed they could impose any task on them (until a Spartacus came in, I mean).

More advanced than slavery is hired labour, because humans are not owned, you pay them and they do work for you. You can do what you want within a contract but only up to a point (much lower than with slave labour). If you cross a threshold you have to deal with unions or simply with workers leaving for another job and employer.

Fortunately, there have been many innovations over and beyond this archaic form of relationship. One case is when you have a work force hired to do intellectual work, such as research on new technologies. Managing the work force is more complicated, but there is an unbeatable tool: the promise to share the revenues of any invention that the intellectual worker makes.

Here, too, there are many variations to bind researcher and employer, to the advantage of both.

The MPEG context is quite different

Apart from not being a company, MPEG has a radically different organisation than any of those described above. In MPEG there are “workers”, but MPEG has no control of them because MPEG is not paying any salary. Someone else does. Still, MPEG has to improve the relationship between itself, the “virtual employer”, and its“workers”, if it is not to produce dull standards.

Here are some: projecting the development of a standard as a shared intellectual adventure, pursuing the goal with a combination of collective and personal advantages, promoting a sense of belonging to a great team, flashing the possibility to acquire personal fame because “we are making history here” and more.

For the adventure to be possible, however, MPEG has to entice two types of “worker”. One is the researcher who knows things and the other the employer who pays the salary. Both have to buy into the adventure.

This is not the end of the story because MPEG must also convince users of the standard that it will make sense for their industrial plans. By providing requirements, the users of the standard establish client-supplier relationship with MPEG.

Thirty years ago, matters were much simpler because the guy who paid the salary was regularly the same guy who used the standard. Today things are more complicated because the guy who pays the salary of the “worker” may very well not have any relationship with the guy who uses the standard, because their role may stop at providing the technologies that are used in the standards.

Organising the work in MPEG

So far so good. This is the evolution of an established business that MPEG brought about. This evolution, however, was accompanied by substantial changes in the object of work. In MPEG-1 and MPEG-2, audio, video, 3D graphics and systems were rather well delimited areas (not really, but compared with today, certainly so). Starting with MPEG-4, however, the different pieces of MPEG business got increasingly entangled.

If MPEG had been a company, it could have launched a series of restructurings, a favourites activity by many companies who think that a restructuring shows how their organisation is flexible. They can think and say that because they are not aware of the human costs of such reorganisations.

I said that MPEG is not a company and MPEG “workers” are not really workers but researchers rented out by their employers, or self-styled entrepreneurs, or students working on a great new idea etc. In any case, MPEG “workers” are intellectually highly prominent individuals.

When it started its work on video coding for interactive applications on digital media, MPEG did not have particularly innovative organisational ideas. Little by little it extended the scope of its work to other areas that were required to provided complete audio-visual solution.

MPEG ended up with building a peculiar competence centre-based organisation by reacting to the changing conditions at each step of its evolution. The organisation has gradually morphed (see here for the full history). Today the competence centres are: Requirements, Systems, Video, Video collaborations with ITU-T, Audio, 3D Graphics and Tests.

The innovative parts of the MPEG organisation are the units, formed inside each of these competence centres. They can have one of these two main goals: to address specific items that are required to develop one or more standards and to investigate related issues that may not be directly related to the standard. This is the mix of two developments: technologies for the standards and know-how for future standards, of which MPEG is proud.

Units may be temporary and sometimes may be long-lived. All units are formed with a natural leader, often as a result of an ad hoc group, whose creation may have been triggered by a proposal.

A graphical description of the MPEG organisation is provided by Figure 1

Figure 1 – The flat MPEG organisation

The units working for standards under development produce outputs which are integrated by the relevant competence centre plenaries and implemented by the editors with the assistance of the experts who have developed component technologies. The activity of the units of the joint groups with ITU-T are limited to the development of specific standards. Therefore, they do not have units working on explorations if not directly finalised to providing answers to open issues in their standards under development.

This is MPEG’s modus operandi that some (outside MPEG) think is the MPEG process described in How does MPEG actually work?. Nothing is farther from truth. MPEG’s modus operandi is the informal but effective organisation that permeates the MPEG work and allows interactions to happen when they are needed by those who need them at the time they need them. It is a system that allows MPEG to get the most out of the individual initiative, combining the need to satisfy industry needs now and and the need to create the right conditions for future standards tomorrow.

Proposing, as mentioned in Einige Gespenster gehen um in der Welt – die Gespenster der Zauberlehrlinge, to create a group that merges the Video and 3D Graphics competence centres based on a a hierarchical structure with substructures is prehistory of work organisation – fortunately not stretching back to slave labour. This is something that today would not even be considered in the organisation of a normal company, to say nothing of the organisation of a peculiar entity such as MPEG.

Units are highly mobile. They interact with other groups either because an issue is identified by the competence centre chairs, or by the competence centre plenaries or by initiative of the unit itself. Interaction can also be between groups or between units in different groups.

The number of units at any given time is rather large, exceeding 20 or even 30. Therefore the IT support system described in Digging deeper in the MPEG work and that is reproduced below helps MPEG members to keep up with the dynamics of all these interactions by providing information on what is being discussed where and when.

Figure 2 – How MPEG people know what happens where


A good example of how MPEG’s modus operandi can pursue its primary goal of producing standards, while at the same time keeping abreast of what comes next is the common layer shared by 3DoF+ and 3DG. This is something that MPEG thought conceptually existed and could have been designed in a top-down fashion. We did not do it because MPEG is not a not-for-profit organisation that pursues the goal of furthering the understanding of science. MPEG is a not-for-profit organisation developing standard while at the same time preparing the know-how for the next challenges. By not by imposing a vision of the future but doing the work today and  investigating the next steps, we get ready to respond to future requests from the industry.

What will be the nest steps of 3DoF+ and 3DG convergence is another story for another article.

Posts in this thread