MPEG standards, MPEG software and Open Source Software

Introduction

The MPEG trajectory is not the trajectory of an Information Technology (IT) group. Today software plays a key role in MPEG standard development. However, MPEG it is not an IT group. For MPEG, software is a tool to achieve the goal of producing excellent standards. But software remains a tool. Clearly, because MPEG assembles so many industries, with so many different agendas. There are MPEG members for which software is more than a tool.

In this article I will explore the relationship of MPEG with software and, in particular, Open Source Software.

Early days

In my early professional days I had the opportunity to be part of an old-type ICT standardisation, the European COST 211 project. A video codec specification (actually more than that, because it contained Systems aspects as well) was later submitted to and became a Recommendation of CCITT (Today ITU-T) with the acronym and title H.120 – Codecs For Videoconferencing Using Primary Digital Group Transmission.

The specification was developed on the basis of contributions received, discussed, possibly amended and then added to the specification. There was no immediate “verification” of the effectiveness of contributions adopted because that relied on hardware implementation of the specification and hardware is a different beast. Four countries (DE, FR, IT and UK) implemented the specification that was eventually confirmed by field trials using 2 Mbit/s satellite links where the 4 implementations were shown to interoperate.

MPEG-1 and MPEG-2

That happened in the years around 1980. Ten years later, MPEG started the development of the MPEG-1 Video and then the MPEG-2 Video standards using a different method.

MPEG assembled the first MPEG-1 Video Simulation Model (VM) at MPEG10 (1990/03). The VM was comparable with the H.120 evolving specification because it was a traditional textual description. At MPEG12 (1990/08), MPEG started complementing the text of the standard with pseudo C-code because people accustomed to write computer programs found it more natural to describe the operations performed by a codec with this language than using words.

In MPEG-1 and MPEG-2 times active participants developed and maintained their own simulation software. Some time later, however, it was decided to develop reference software, i.e. a software implementation of the MPEG-1 standard.

Seen with the eyes of a software developer, the process of standard development in MPEG-1 and MPEG-2 times was rather awkward, because the – temporally overlapping – sequence of steps was:

  1. Produce a textual description of the standard
  2. Translate the text to the individual software implementing the Simulation Model
  3. Optimise the software
  4. Translate the software back to text/pseudo C-code.

Reference Software and Conformance Testing

People of the early MPEG days used software – and quite intensely – because that was a tool that cut the time it would take to develop the specification by orders of magnitude while offering the opportunity to obtain a standard with better performance.

Another important development was the notion of conformance testing. Separation of the specification (the law) from determination of conformance (the tribunal) was a major MPEG innovation. The reference software could be used to test an encoder implementation for conformance by feeding the bitstream produced by the implemented encoder to the reference decoder. Especially produced conformance testing bitstreams could be used to test a decoder for conformance.

Conformance testing and its “tool” reference software is an essential add-on to the standard because it gives users the freedom to make their own implementation and enables the creation of ecosystems of interoperable implementations.

Open Source Software (OSS)

OSS is a very large and impactful world for which the software is not the tool to achieve a goal but the goal itself. Those adopting the Gnu’s Not Unix (GNU) General Public License (GPL) grant some basic rights to users of their software they call “Free”. The terms can be (roughly) summarised as

  1. Distribute copies of the software
  2. Receive the software or get it
  3. Change the software or use pieces of it in new programs

in exchange of a commitment of the user to

  1. Give another recipient all the rights acquired
  2. Make sure that recipients are able to receive or get the software
  3. Recipients must be made aware that a software is a modification.

Two additional issues should be borne in mind:

  1. There is no warranty for GNU license software
  2. Any patent required to operate the software must be licensed to everybody.

MPEG-4

Development of the MPEG-4 Visual standard took another important turn, one that marked the convergence of the way telecom, broadcasting and consumer electronics on one side and information technology on the other side developed.

Unaware of the formalisation of the OSS rules that were already taking place in the general IT world, MPEG made the decision to develop the MPEG-4 reference software collaboratively because

  • Better reference software would be obtained
  • The scope of MPEG-4 was so large that probably no company could afford to develop the complete software implementation of the standard
  • A software implementation made available to the industry would accelerate adoption of the standard
  • A standard with two different forms of expression would have improved quality because the removal of an ambiguity from one form of expression would help clarify possible ambiguities in the other.

MPEG-4 Visual had only one person in charge of the Test Model. All new proposals were assessed and, if agreed, converted to Core Experiments. If at least two participants from two different institutions brought similarly convincing improvement results, the proposal would be accepted and added to the TM.

Therefore the MPEG-4 software was no longer just the tool to develop the standard, it became the tool to make products based on the standard, but not necessarily the only one. Therefore a reversal of priorities was required because the standard in textual form was still needed, but many users considered the standard expressed in a programming language as the real reference. This applied not just to those making software implementations, but often to those making more traditional hardware-based products and VLSI designs as well.

Therefore it was decided that the software version of the standard should have the same normative status as the textual part. This decision has been maintained in all subsequent MPEG standards.

Licensing software

While the previous approach where every participant had their own implementation of the TM did not raise the issue of “who owns the software?”, the new approach did. MPEG resolved that with the following rules labelled as “copyright disclaimer”:

  • Whoever makes a proposal that is accepted must provide a software implementation and assign the copyright of the code to ISO
  • ISO grants a license of the copyright of the code for products conforming to the standard
  • Proponents are not required to release patents that are needed to exercise the code and users should not expect that the copyright release includes a patent licence.

More recently, MPEG has started using a modified version of the Berkeley Software Distribution (BSD), a licence originally used to distribute a Unix-like operating system. This licence, originally called “MXM licence” from the name of MPEG-M part 2 standard “MPEG Extensible Middleware (MXM)” simply says that code may be used as prescribed by the BSD licence with the usual disclaimer that patents are not released. This new licence is particularly interesting for software companies that do not want to have liabilities when using software not developed internally.

MPEG and OSS are close, but not quite so

Let me summarise the main elements of MPEG drivers to develop standards and software that implement them:

  1. MPEG develops the best standards satisfying identified requirements. Best standards must use technologies resulting from large R&D investments typically made by for-profit entities.
  2. MPEG uses a competitive and transparent process to acquire (Call for Proposals) and refine (Core Experiment) technologies. Today that process largely uses collaboratively developed software, with methodologies that resemble those of the OSS community
  3. Typically, an MPEG standard is available in two forms: The standard expressed in natural language (possibly with some pseudo C-code inside to improve clarity) and the standard expressed in computer code.
  4. Each form of the standard has the same normative value. If discrepancies are found, the group will decide which is the correct form and amend the one not considered correct.
  5. Reference Software and Conformance Testing are attached to MPEG standards. The former is used to test encoder implementations for conformance and the latter to test decoder implementations for conformance.
  6. The Reference Software of some MPEG standards is of extremely high quality and can be used in products. However, the natural language and computer code forms of the standard have the same status.
  7. MPEG standards are typically Option 2, i.e., essential patents may exist but those patents can be used at FRAND terms.

Who is better?

The question is not meaningful if we do not specify the context in which the standard or software is used.

I claim that only a standard that responds to the 7 drivers of the previous section can

  1. Embed state-of-the-art technology
  2. Be implemented by a variety of independent entities in different industries
  3. Allow for incremental evolutions in response to user requirements
  4. Stimulate the appearance of constantly improving implementations
  5. Enable precise assignment of responsibilities (e.g. for legal purposes).

Conclusions

I am sure that some will think differently on the subject of the previous section and I will certainly be willing to engage in a discussion.

I believe that Open Source Software is a great movement that has brought a lot to humankind. However, I do not think that it is adequate to create an environment that respond to the 7 drivers above.

Posts in this thread

The MPEG Metamorphoses

Introduction

In past publications, I have often talked about how many times MPEG has changed its skin during its 3-decade long life. In this article I would like to add substance to this claim by giving a rather complete, albeit succinct, account. You can find a more detailed story at Riding the Media Bits.

The early years

MPEG-1

MPEG started with the idea of creating a video coding standard for interactive video on compact disc (CD). The idea of opening another route to video coding standards had become an obsession to me because I had been working for many yeas in video coding research without seeing ant trace of consumer-level devices for what was touted as the killing application at that time: video telephony. I thought that if the manufacturing prowess of the Consumer Electronics (CE) industry could be exploited, that industry could supply telco customers with those devices so that telcos would be pushed into upgrading their networks to digital in order to withstand the expected high videophone traffic.

The net bitstream from CD – 1.4 Mbit/s – is close to the 1.544 Mbit/s of the primary digital multiplex in USA and Japan. Therefore it was natural to set a target bitrate of 1.5 Mbit/s as a token of the CE and telco convergence (at video terminal level).

At MPEG1 (1988/05) 29 experts attended. The work plan was agreed to be MPEG-1 at 1-1.5 Mbit/s, MPEG-2 and 1.5-10 Mbit/s and MPEG-3 at 10-60 Mbit/s (the numbering of standards came later).

For six months all activities happened in single sessions. However, 3 areas were singled out for specific activities: quality assessment (Test), complexity issues in implementing video codecs in silicon (VLSI) and characteristics of digital storage media (DSM) . The last activity was needed because CD was a type of medium quite dissimilar from telecom networks and broadcast channels, for which video coding experts did not have much familiarity.

In the following months I dedicated my efforts to quell another obsession of mine: humans do not generally value video without audio. The experience of the ISDN videophone where, because of organisational reasons, video was compressed by 3 orders of magnitude in 64 kbit/s and audio was kept uncompressed in another 64 kbit/s stream, pushed me into creating an MPEG subgroup dedicated to Audio coding. Audio, however, was not the speech used in videotelephony (for which there were plenty of experts in ITU-T), but the audio (music) typically recorded on CDs. Therefore an action was required lest MPEG end up like videoconference, with a state-of-the-art video compression standard but no audio (music) or with a quality non satisfactory for the target “entertainment-level” service.

The Audio subgroup was established at MPEG4 (1988/10) under the chairmanship of Hans Mussmann, just 7 months after MPEG1, while the Video subgroup was established at MPEG7 (1989/07), under the chairmanship of Didier Le Gall, about a year after MPEG1.

The other concern of mine was that integrating the audio component in a system that had not been designed for that could lead to some technical oversights that could be only belatedly corrected with some abominable hacks. Hence the idea of a “Systems” activity, initially similar to the H.221 function of the ISDN videophone (a traditional frame and multiframe-based multiplexer), but with a better performance because I expected it to be more technically forward looking.

At MPEG8 (1989/11) all informal activities were formalised into subgroups: Test (Tsuneyoshi Hidaka), DSM (Takuyo Kogure), Systems (Al Simon) and VLSI (Colin Smith).

MPEG-2

Discussions on what would eventually become the MPEG-2 standard started at MPEG11 (1990/07). The scope of the still ongoing MPEG-1 project was nothing, compared to the ambitions of the MPEG-2 project. The goal of MPEG-2 was to provide a standard that would enable the cable, terrestrial TV, satellite television, telcos and the package media industries – worth in total hundreds of billion USD, to go digital in compressed form.

Therefore, at MPEG12 (1990/09) the Requirements group was established under the chairmanship of Sakae Okubo, the rapporteur of the ITU-T Specialists Group on Coding for Visual Telephony. This signalled the fact that MPEG-2 Video (and Systems) were joint projects. The mandate of the Requirements Group was to distil the requirements coming from the different industries into one coordinated set of requirements.

The Audio and Video subgroup had their minds split in two with one half engaged in finishing their MPEG-1 standards and the other half in initiating the work on the next MPEG-2 standard. This was just the first time MPEG subgroups had to split their minds.

In those early years subgroup chairs changed rather frequently. At MPEG9 (1990/02) Colin (VLSI) was replaced by Geoff. Morrison and the name of the group was changed to Implementation study Group (ISG) to signal the fact that not only hardware implementation was considered, but software implementation as well. At MPEG12 (1990/03) Al (Systems) was replaced by Sandy MacInnis and Hans (Audio) was replaced by Peter Noll.

MPEG29 (1994/11) approved the Systems, Video and Audio parts of the MPEG-2 standard and some of the subgroup chairs saw their mission as an accomplished one. The first move was at MPEG28 (1994/07) when Sandy (Systems) was replaced by Jan van der Meer to finalise the issues left over from MPEG-2.

The MPEG subgroups did a great job in finishing several pending MPEG-2 activities such as MPEG-2 Video Multiview and 4:2:2 profiles, MPEG-2 AAC, DSM-CC and more.

A new skin of coding

In the early years 1990s, MPEG-1 was not finished and MPEG-2 had barely started but talks about a new video coding standard for very low bitrate (e.g. 10 kbit/s) were already under way. The name eventually assigned to the project was MPEG-4, because the MPEG-3 standard envisaged at MPEG1 had been merged with MPEG-2 by bringing the upper bound of the bitrate range to 10 Mbit/s .

MPEG-4, whose title eventually settled to Coding of Audio-Visual Objects, was a completely different standard from the preceding two in that it aimed at integrating the world of audio and video, so far under the purview of broadcasting, CE and telecommunication, with the world of 3D Graphics, definitely within the purview of the Information Technology (IT) industry.

At MPEG20 (1992/11) a new subgroup called Applications and Operational Environments (AOE) was established under the chairmanship of Cliff Reader. This group took charge of developing the requirements for the new MPEG-4 project and spawned three groups inside it: “MPEG-4 requirements”, “Synthetic and Natural Hybrid Coding (SNHC) and “MPEG-4 Systems”.

The transition from the “old MPEG” (MPEG-1 and MPEG-2) and the “new MPEG” (MPEG-4) was quite laborious with many organisational and personnel changes. At MPEG30 Didier (Video) was replaced by Thomas Sikora and Peter (Audio) was replaced by Peter Schreiner At MPEG32 Geoff (ISG) was replaced by Paul Fellows and Tsuneyoshi (Test) was replaced by Laura Contin.

MPEG-4 Visual was successfully concluded thanks to the great efforts of Thomas (Video) and Laura (Test) and the very wide participation by experts. The foundations of the extremely successful AAC standards were laid down by Peter (Audio) and the Audio subgroup experts.

At MPEG34 (1996/03) C. Reader left MPEG and at MPEG35 (1996/07) a major reorganisation took place:

  1. The “AOE requirements” activity was mapped to the Requirements subgroup under the chairmanship of Rob Koenen, after a hiatus of 3 meeting after Sakae (Requirements) had left.
  2. The “AOE systems” activity was mapped to the Systems subgroup under the chairmanship of Olivier Avaro.
  3. The “AOE SNHC” activity became a new SNHC subgroup under the chairmanship of Peter Doenges. Peter was replaced by Euee Jang at MPEG49 (1999/10).

At MPEG 40 (1997/07) a DSM activity became a new subgroup with the name Delivery Multimedia Integration Framework (DMIF) under the chairmanship of Vahe Balabanian. DMIF addressed the problem of virtualising the distribution medium (broadcast, network and storage) from the Systems level by defining appropriate interfaces (API). At MPEG 47 (1999/03) Guido Franceschini took over with a 2 meeting tenure after which the DMIF subgroups was closed (1999/07).

At MPEG41 Peter (Audio) was replaced by Schuyler Quackenbush who since then has been running the Audio group for 23 years and is the longest-serving MPEG chair.

At MPEG46 (1998/12) Paul (ISG) was replaced by Marco Mattavelli. Under Marco’s tenure, such standards as MPEG-4 Reference hardware description, an extension to VHDL of the notion of Reference Software, and Reconfigurable Media Coding were developed.

The MPEG-4 standard is unique in MPEG history. MPEG-1 and -2 were great standards because they brought together establish large industries with completely different agendas, but MPEG-4 is the standard that bonded together the initial MPEG industries with the IT industry. The standard had big challenges and Chairs and experts dedicated enormous resources to the project to face them: video objects, audio objects, synthetic audio and video, VRML extensions, file format and more. MPEG-4 is a lively standard even today almost 30 years after we first started working on it and has the largest number of parts.

Liaisons

At MPEG33 (1996/01) the Liaison subgroup was created under the chairmanship of Barry Haskell to handle the growing network of organisations MPEG was liaising with (~50). At MPEG56 Barry, a veteran of the video coding old guard, left MPEG and at MPEG57 (2001/07) Jan Bormans took over and continued until MPEG71 (2005/01) when Kate Grant took over. The Liaison subgroup was closed at MPEG84 (2008/04). Today liaisons are coordinated at Chairs meeting, drafted by the relevant subgroup and reviewed by the plenary.

An early skin change

In 1996 MPEG started addressing MPEG-7, a media-related standard but with a completely different nature than the preceding three: it was about media description and their efficient compression. At MPEG48 (1999/07) it became clear that we needed a new subgroup that was called Multimedia Description Schemes (MDS) to carry out part of the work.

Philippe Salembier was put in charge of the MDS subgroup who was initially in charge of all MPEG-7 matters that did not involve Systems, Video and Audio. At MPEG 56 (2001/03) John Smith took over the position which he held until MPEG70 (2004/10) when Ian Burnett took over until the MDS group was closed at MPEG87 (2009/02).

The media description skin has had several revivals since then. One is Part 13 – Compact Descriptors for Visual Search (CDVS) standard in the first half of the 2010. Another is Part 15 – Compact Descriptors for Video Analysis (CDVA) standard developed in the middle-to-second half of the 2010. Finally Part 17 – Compression of neural networks for multimedia content description and analysis is preparing a basic compression technology for neural network-based media description.

Another video coding

At MPEG46 (1998/12) Laura (Test) was replaced by Vittorio Baroncini. At MPEG54 (2000/10) Thomas (Video) left MPEG and at MPEG56 (2001/03) Jens-Rainer Ohm was appointed as Video chair.

Vittorio brought the expertise to carry out the subjective tests required by he collaboration with ITU-T SG 16 restarted to develop the Advanced Video Coding (AVC) standard. At MPEG58 (2001/12) Jens was appointed as co-chair of a joint subgroup with ITU-T called Joint Video Team (JVT). The other co-chair was Gary Sullivan, rapporteur of the ITU-T SG 16 Video Coding Experts Group (VCEG). The JVT continued its work until well after the AVC standard was released at MPEG 64 (2003/03). Since then Gary has attended the chairs meetings as a token of the collaboration between the two groups.

Still media-related, but a different “coding”

At MPEG49 (1999/10) the many inputs received from the market prompted me to propose that MPEG develop a new standard with the following vision: “Every human is potentially an element of a network involving billions of content providers, value adders, packagers, service providers, resellers, consumers …”.

The standard was eventually called MPEG-21 Multimedia Framework. MPEG-21 can be described as the “suite of standards that enable media ecommerce”.

The MDS subgroup was largely in charge of this project which continued during the first decade of the 2000s with occasional revivals afterwards. Today MPEG-21 standards are handled by the Systems subgroup.

Under the same heading of “different coding” it is important to mention Open Font Format (OFF), a standard built on the request made by Adobe, Apple and Microsoft to maintain the OpenType specification. The word maintenance” in MPEG has a different meaning because OFF has had many extensions, developed “outside” MPEG in an open ad hoc group with strong industry participation and ratified by MPEG.

A standard of standards

In the early year 2000s MPEG could look back at its first decade and a half of operation with satisfaction: its standards covered video, audio and 3D Graphics coding, systems aspects, transport (MPEG-2 TS and MPEG-4 File Format) and more. While refinements on its already impressive assets were under way, MPEG wondered whether there were other areas it could cover. The answer was: the coding of “combinations of MPEG coded media”. That was the beginning  of a long series of 20 standards originally developed by the groups in charge of the individual media, e.g. Part 2 – MPEG music player application format was developed by the Audio subgroup and Part 3 – MPEG photo player application format was developed by the Video subgroup. Today all MPEG-A standard, e.g. the very successful Part 19 – Common Media Application Format, are developed by the Systems subgroup.

The mid 2000s

Around the mid 2000s MPEG felt that there was still a need for more Systems, Video and Audio standards, but did not have the usual Systems, Video and Audio “triad” umbrella it had had until then with MPEG-1, -2, -4 and -7. So it decided to create containers for those standards and called them MPEG-B (Systems), MPEG-C (Video) and MPEG-D (Audio).

MPEG also ventured in new areas:

  1. Specification of a media device software stack (MPEG-E)
  2. Communication with and between virtual worlds (MPEG-V)
  3. Multimedia service platform technologies (MPEG-M)
  4. Rich media user interfaces (MPEG-U)

Rob (Requirements) continued until MPEG58 (2001/12). He was replaced by Fernando Pereira until MPEG64 (2003/04) when Rob returned, holding his position until MPEG71 (2005/01) when Fernando took over again until MPEG82 (2007/10) when he left MPEG.

The Requirements subgroup is the “control board” of MPEG in the sense that Requirements gives proposals of standards the shape that will be implemented by the operational group after the Call for Proposals. Therefore the duo Rob-Fernando have been in the control room of MPEG for some 40% of MPEG life.

Vittorio (Test) continued until MPEG68 (2004/03) when he was replaced by T. Oelbaum who held the positions until MPEG81 (2007/07).

Olivier (Systems) kept his position until MPEG86 (2008/07) when he left MPEG to pursue his entrepreneurial ambitions. Olivier has been in charge of the infrastructure that keeps MPEG standards together for 13 years and is the third longest-serving MPEG chair.

Euee (SNHC) kept his position until MPEG59 (2002/03). He was replaced by M. Bourges-Sévenier who continued until MPEG70 (2004/10). Mikaël was then replaced by Mahnjin Han who continued until MPEG78 (2006/10). The SNHC subgroup has been producing valuable standards. However, they have had a hard time penetrating an industry that is content with less performing but freely-available standards.

The return of the triad

The end of the years 2000s signaled a major change in MPEG. When Fernando (Requirements) left MPEG at MPEG82 (2007/10), the task of developing requirements was first assigned to the individual groups. The experiment lasted 4 meetings but it demonstrated that it was not the right solution. Therefore, Jörn Ostermann was appointed as Requirements chair at MPEG87 (2009/02). That was just in time for the handling of the requirements of the new Audio-Video-Systems triad-based MPEG-H standard.

MPEG-H included the MPEG Media Transport (MMT) part, the video coding standard that eventually became High Efficiency Video Coding (HEVC) and 3D Audio. MPEG-H was adopted by thw ATSC as a tool to implement new forms of broadcasting services where traditional broadcasting and internet not only coexist but cooperate.

The Requirements, and then the Systems subgroups were also quickly overloaded by the other project called DASH aiming at “taming” the internet from an unreliable transport to one the end user device could adapt to.

The two Systems projects – MMT and DASH – were managed by Youngkwon Lim who took over from Olivier at MPEG86 (2008/10).

At MPEG87 (2009/01) the MDS subgroup was closed. At the same meeting, Vittorio resumed his role as chair of the Test subgroup, about on time for the new round of subjective tests for the HEVC Call for Evidence and Call for Proposals.

The Joint Collaborative Team on Video Coding between ITU-T and MPEG (JCT-VC) was established at MPEG92 (200/04) co-chaired by Gary and Jens as in the AVC project. At its peak, the VC group was very large and processed in excess of 1,000 documents per meeting. When the group was still busy developing the main (2D video coding) part of HEVC, 3D video coding became important and a new subgroup called JCT-3V (joint with ITU-T) was established at MPEG100. The 3V subgroup closed its activities at MPEG115 (2016/05), while the VC subgroup is still active, mostly in maintenance mode.

The recent years

In the first half of the years 2010 MPEG developed the Augmented Reality Application Format and developed the Mixed and Augmented Reality (MAR) Reference Model in a joint ad hoc group with SC 24/WG 9.

In 2016 MPEG kicked off the work on MPEG-I – Coded representation of immersive media. Part 3 of this is Versatile Video Coding (VVC), the latest video coding standard developed by the new Joint Video Experts Team (JVET) between ITU-T and MPEG established at MPEG114 (2016/02). It is expected to become FDIS at MPEG131 (2020/06).

The JVET co-chairs are again Jens and Gary. In the, regularly materialised, anticipation that JVET would be again overloaded by contributions, Jens was replaced as Video chair by Lu Yu at MPEG 121 (2018/01).

The Video subgroup is currently engaged in two 2D video coding standards of rather different nature Essential Video Coding (EVC and Low Complexity Enhancement Video Coding (LCEVC) and is working on the MPEG Immersive Video (MIV) project due to become FDIS at MPEG134 (2021/03).

MIV is connected with another exciting area that in this article we left with the name of SNHC under the chairmanship of Mahnjin. At MPEG79 (2007/01) Marius Preda took over SNHC from Mahnjin to continue the traditional SNHC activities. At MPEG89 (2009/06) SNHC was renamed 3D Graphics (3DG).

In the mid 2010 the 3DG subgroup started several explorations, in particular Point Cloud Compression (PCC) and Internet of Media Things (IoMT). The former has split into two standards Video-based (V-PCC) and Graphics-based (G-PCC). The latter has reached FDIS recently.

nother promising activity started at MPEG109 (2014/03) and has now become the Genomic Information Representation (MPEG-G) standard. This standard signals the intention to bring the benefits of compression to industries other than media who process other data types.

Conclusions

This article was a long overview of 32 years of MPEG life. The intention was not to talk about MPEG standards, but about how the MPEG organisation morphed to suit the needs of standardisation.

Of course, structure without people is nothing. It was not obviously possible to mention the thousands of experts who made MPEG standards, but I thought that it was my duty to record the names of subgroup chairs who drove their development. You can see a complete table of all meetings and MPEG Chairs here.

In recent years the MPEG structure has remained stable, but there is always room for improvements. However, this must be driven by needs, noth by ideology.

One possible improvement is to make the Genomic data coding activity a formal subgroup as a first step in anticipation of more standards to code other non-media data. The other is to inject more market awareness into the phase that defines the existence first and then the characteristics of MPEG standards.

But this is definitely another story.

Posts in this thread

 

 

National interests, international standards and MPEG

Having spent a considerable amount of my time in standardisation, I have developed my own definition of standard: “the documented agreement reached by a group of individuals who recognise the advantage of all doing certain things in an agreed way”. Indeed, I believe that, if we exclude some areas such as safety, in matters of standards the authority principle should not hold. Forcing free people to do things against their interest, is an impossible endeavour. If doing certain things in a certain way is not convenient, people will shun a standard even if it bears the most august credentials.

Medieval Europe was a place where my definition of standard reached an atomic level. However, with the birth of national centralised states and, later, the industrial revolution, national standards came to the fore. Oddly enough, national standards institutions such as the British Standards Institute (BSI), originally called Engineering Standards Committee and probably the first of its kind, were established just before World War I, when the first instance of modern globalisation took shape.

Over the years, national standards became a powerful instrument to further a country’s industrial and commercial interests. As late as 1989 MPEG had trouble displaying 625/50 video coding simulation results at a USA venue because import of 625/50 TV sets in the country was forbidden at that time (and no one had an interest in making such sets). This “protection of national interests” is the cause of the 33 pages of the ITU-R Report 624 – Characteristics of television systems of 1990 available here containing tables and descriptions of the different analogue television systems used at the time by the 193 countries of the United Nations.

The same spirit of “protecting national interests” informed the CCITT SGXV WG4 Specialists Group on Coding for Visual Telephony (that everybody at that time called the Okubo group) when it defined  the Common Intermediate Format (CIF) in Recommendation H.261 to make it possible for a 525/60 camera to communicate to a a 625/50 monitor (and between a 625/50 camera and a 525/60 monitor).

That solution was a “compromise” video format (actually not a real video format because it was used only inside the video codec) with one quarter of the 625/50 spatial resolution and one half the 525/60 temporal resolution. This was a typical political solution of the time (and one that 525/60 people later regretted because the spatial interpolation required by CIF was more onerous than the temporal interpolation in 625/50). Everybody (but me, who opposed the solution) felt happy because everybody had to “share the burden” when communicating across regions with different video formats.

International Standardisation is split in 3 – IEC, ISO and ITU – but IEC and ISO share the principle that standards for a technical area are developed by a Technical Committee (or a Subcommittee) managed by an international Secretariat funded and manned by a national standards organisation (so called National Body). Things in ITU are slightly different because ITU itself provides the secretariat whose personnel is provided by national administrations.

In the traditional context of standards being established by a national standards committee to protect the national interest, an international standards committee was seen as the place where national interests, as represented by their national standards bodies, had to be protected. Therefore, holding the secretariat of a committee was seen as a major achievement for the country that ran the secretariat. As an emblem of the achievement, the country had the right to nominate (in practice, appoint) the chairperson of the committee (in some committees this is rigorously enforced. In some others, things are taken more lightly).

That was then, but actually it is still so even now in many standardisation contexts. The case of CIF mentioned above shows that, in the area of video coding standards, then the prerogative of the ITU-T “for Visual Telephony”, the influence of national interests was still strong. MPEG, however, changed the sides of the equation. One of the first things that it did when it developed MPEG-1 Video was to define test sequences that were both 525/60 and 625/50 and then issued a Call for Proposals where respondents could submit coded sequences in one or the other format at their choice. MPEG did not use CIF but SIF, where the format was either a quarter of the spatial resolution and one half of the temporal resolution of 525/60 (i.e. 288 lines x 352 pixels) or a quarter of the spatial resolution and one half of the temporal resolution of 625/50 (i.e. 240 lines x 352 pixels).

By systematically defusing political issues and converting them to technical issues, MPEG succeeded in the impossible task of defining compressed media formats with an international scope. However, by kicking political issues out of the meeting rooms, MPEG changed the nature and role of the parent subcommittee SC 29’s chairmen and secretariat. The first yearly SC 29 plenary meetings lasted 3 days, but later the duration was reduced to 1 day and in some cases inhalf a day alla matters were handled.

One of the most contentious areas of standardisation (remember the epic battles on the HDTV production format of 1986 and before) was completely tamed and reduced to technical battles where experts assess the quality of the solution proposed and not how it is dressed in political clothing. This does not mean that the battles are not epic, but for sure they are rational.

I do not remember having heard complaints on the part of the industry regarding the de-politicised state of affairs in media coding standardisation. Therefore it is time to ask if we should not be dispensed from the pompous ritual of countries expressing national interests through national bodies in secretariats and chairs of international standards committees when in fact there are global industrial interests poorly mapped through a network of countries actually deprived of national interests.

Posts in this thread

 

 

Media, linked media and applications

Introduction

In a technology space moving at an accelerated pace like the one MPEG has the task to develop standards for, it is difficult to have a clear plan for the future (MPEG has a 5-year plan, though).

Still, when MPEG was developing the Multimedia Linking Application Format (MLAF), it “discovered” that it had developed or was developing several standards – MPEG-7, Compact descriptors for visual search (CDVS), Compact descriptors for video analysis (CDVA) and Media Orchestration.

The collection of these standards (and of others in the early phases of conception or development, e.g. Neural Network Compression and Video Coding for Machines) that help create the Multimedia Linking Environment, i.e. an environment where it is possible to create a link between a given spatio-temporal region of a media object and spatio-temporal regions in other media objects.

This article explains the benfits brought by the MLAF “multimedia linking” standard also for very concrete applications.

Multimedia Linking Environment

Until a quarter of century ago, virtually the only device that could establish relationships between different media items was the brain. A very poor substitute was a note on a book to record a possible relationship of the place in the book where the note was written with content in the same or different books.

The possibility to link a place in a web page to another place in another web page, or to a media object, was the great innovation brought by the web. However, a quarter of century after a billion web sites and quadrillions of linked web pages, we must recognise that the notion of linking is pervasive one and not necessarily connected with the web.

MPEG has dedicated significant resources to the problem described by the sentence “I have a media object and I want to know which other related media objects exist in a multimedia data base” and represented in the MPEG-7 model depicted the figure below.

However, MPEG-7 is an instance of the more general problem of linking a given spatio-temporal region of a media object to spatio-temporal regions in other media objects.

These are some examples:

  1. A synthetic object is created out of a number of pictures of an object. There is a relationship between the pictures and the synthetic object;
  2. There is a virtual replica of a physical place. There is a relationship between the physical place and the virtual replica;
  3. A User is experiencing as virtual place in a virtual reality application. There is a relationship between the two virtual places;
  4. A user creates a media object by mashing up a set of media items coming from different sources. There is a relationship between the media items and the mashed up media object.

MPEG has produced MPEG-A part 16 (Media Linking Application Format – MLAF) specifies a data format called bridget that can be used to link any kinds of media. MPEG has also developed a number of standards that play an auxiliary role in the “ media linking” context outlined by the examples above.

  1. MPEG-7 parts 1 (Systems), 3 (Visual), 4 (Audio) and 5 (Multimedia) provide the systems elements, and the visual (image and video), audio and multimedia descriptions.
  2. MPEG-7 parts 13 (Compact descriptors for visual search) and 15 (Compact descriptors for video analysis) provide new generation image and video descriptors
  3. MPEG-B part 13 (Media Orchestration) provides the means to mash up media items and other data to create personal user experiences.

The MLAF standard

A bridget is a link between a “source” content and a “destination” content. It contains information on

  1. The source and the destination content
  2. The link between the two
  3. The information in the bridget is presented to the users who consume the source content.

The last information is the most relevant to the users because it is the one that enables them to decide whether the destination content is of interest to them.

The structure of the MLAF representation (points 1 and 2) is based on the MPEG-21 Digital Item Container implemented as a specialised MPEG-21 Annotation. The spatio-temporal scope is represented by the expressive power of two MPEG-7 tools and the general descriptive capability of the MPEG-21 Digital Item. They allow a bridget author to specify a wide range of possible associations and to be as precise and granular as needed.

The native format to present bridget information is based on MPEG-4 Scene description and application engine. Nevertheless, a bridget can be directly linked to any external presentation resource (e.g., an HTML page, an SVG graphics or others).

Bridgets for companion screen content

An interesting application of the MLAF standard is described in the figure below describing the entire bridget workflow.

    1. A TV program, scheduled to be broadcast at a future time, is uploaded to the broadcast server [1] and to the bridget Authoring Tool (BAT) [2].
    2. BAT computes and stores the program’s audio fingerprints to the Audio Fingerprint Server (AFS) [3].
    3. The bridget editor uses BAT to create bridgets [4].
    4. When the editor is done all bridgets of the program and the referenced media objects are uploaded to the Publishing Server [5].
    5. At the scheduled time, the TV program is broadcast [6].
    6. The end user’s app computes the audio fingerprint and sends it to the Audio Fingerprint Server [7].
    7. AFS sends to the user’s app ID and time of the program the user is watching [8].
    8. When the app alerts the user that a bridget is available, the viewer may decide to
      1. Turn his eyes away from the TV set to her handset
      2. Play the content in the bridget [9]
      3. Share the bridget to a social network [10].

    This is the workflow of a recorded TV program. A similar scenario can be implemented for live programs. In this case bridgets must be prepared in advance so that the publisher can select and broadcast a specific bridget when needed.

    Standards are powerful tools that facilitate the introduction of new services, such as companion screen content. In this example, the bridget standard can stimulate the creation of independent authoring tools and end-user applications.

    Creating bridgets

    The bridget creation workflow depends on the types of media object the bridget represents.

    Let’s assume that the bridget contains different media types such as an image, a textual description, an independently selectable sound track (e.g. an ad) and a video. Let’s also assume that the layout of the bridget has been produced beforehand.

    This is the sequence of steps performed by the bridget editor:

    1. Select a time segment on the TV program timeline and a suitable layout
    2. Enter the appropriate text
    3. Provide a reference image (possibly taken from the video itself)
    4. Find a suitable image by using an automatic images search tool (e.g. based on the CDVS standard)
    5. Provide a reference video clip (possibly taken from the video itself)
    6. Find a suitable video clip, possibly taken from the video itself, by using an automatic video search tool (e.g. based on the CDVA standard)
    7. Add an audio file.

    The resulting bridget will appears to the end user like this.

When all bridgets are created, the editor saves the bridgets and the media to the publishing server.

It is clear that the “success” of a bridget (in terms of number of users who open it) depends to a large extent on how the bridget is presented.

Why bridgets

Bridget was the title of a research project funded by the 7th Framework Research Program of the European Commission. The MLAF standard (ISO/IEC 23000-16) was developped at the instigation and with participation of members of the Bridget project.

At this page you will find more information on how the TVBridge application can be used to create, publish and consume bridgets for recorded and live TV programs.

Posts in this thread

 

 

Standards and quality

Introduction

Quality pervades our life: we talk of quality of life and we choose things on the basis of declared or perceived quality.

A standard is a product, and as such may also be judged, although not exclusively, in terms of its quality. MPEG standards are no exception and the quality of MPEG standards has been a feature has considered of paramount importance since its early days.

Cosmesis is related to quality, but is a different beast. You can apply cosmesis at the end of a process, but that will not give quality to a product issued from that process. Quality must be an integral part of the process or not at all.

In this article I will describe how MPEG has embedded quality in all phases of its standard development process and how it has measured quality in some illustrative cases.

Quality in the MPEG process

The business of MPEG is to produce standards that process information in such a way that users do not notice, or notice in as a reduced ways as possible, the effect of that standard processing when implemented in a product or service.

When MPEG considers the development of a new standard, it defines the objective of the standard (say, compression of video of a particular range of resolutions), range of bitrates and functionality. Typically, MPEG makes sure that it can deliver the standard with the agreed functionality by issuing a Call for Evidence (CfE). Industry members are requested to provide evidence that their technology is capable to achieve part of all the identified requirements.

Quality is now an important, if not essential, parameter for making a go-no go decision. When MPEG assesses the CfE submissions, it may happen that established quality assessment procedures are found inadequate. That was the case of the call for evidence on High-Performance Video Coding (HVC) of 2009. The high number of submissions received required the design of a new test procedure: the Expert Viewing Protocol (EVP). Later on the EVP test method became ITU recommendation ITU-R BT-2095. While the execution of any other ITU recommendation of that time would require more than three weeks, the EVP allowed the complete testing of all the submissions in three days.

If MPEG has become confident of the feasibility of the new standard from the results of the CfE, a Call for Proposals (CfP) is issued with attached requirements. These can be considered as the terms of the contract that MPEG stipulates with its client industries.

Testing of CfP submissions allows MPEG to develop a Test Model and initiate Core Experiments (CE). These aim to achieve optimisation of a part of the entire scheme.

In most cases the result of CEs involves quality evaluation. In the case of CfP responses subjective testing is necessary because there are typically large differences between the different coding technologies proposed. However, in the assessment of CE results where smaller effects are involved, , objective metrics are typically, but not exclusively, used because formal subjective testing is not feasible for logistic or cost reasons.

When the development of the standard is completed MPEG engages in the process called Verification Tests which will produce a publicly available report. This can be considered as the proof on the part of the supplier (MPEG) that the terms of the contract with its customer have been satisfied.

Samples of MPEG quality assessment

MPEG-1 Video CfP

The first MPEG CfP quality tests were carried out at the JVC Research Center in Kurihama (JP) in November 1989. 15 proposals of video coding algorithms operating at a maximum bitrate of 1.5 Mbit/s were tested and used to create the first Test Model at the following Eindhoven meeting in February 1990 (see the Press Release).

MPEG-2 Advanced Audio Coding (AAC)

In February 1998 the Verification Test allowed MPEG to conclude that “when auditioning using loudspeakers, AAC coding according to the ISO/IEC 13818-7 standard gives a level of stereo performance superior to that given by MPEG-1 Layer II and Layer III coders” (see the Verification Test Report). This showed that the goal of high audio quality at 64 kbps per channel for MPEG-2 AAC had been achieved.

Of course that was “just” MPEG-2 AAC with no substantial encoder optimisation. More that 20 years of MPEG-4 AAC progress has brought down the bitrate per channel.

MPEG-4 Advanced Video Coding (AVC) 3D Video Coding CfP

The CfP for new 3D (stereo & auto-stereo) technologies was issued in 2012 and received a total of 24 complete submissions. Each submission produced 24 files representing the different viewing angle for each test case. Two sets of two and three viewing angles were blindly selected and used to synthesise the stereo and auto-stereo test files.

The test was carried out on standard 3D displays with glasses and auto-stereoscopic displays. A total of 13 test laboratories took part in the test running a total of 224 test sessions, hiring around 5000 non-expert viewers. Each test case was run by two laboratories making it a full redundant test.

MPEG-High Efficiency Video Coding (HEVC) CfP

The HEVC CfP covered 5 different classes of content covering resolutions from WQVGA (416×240) up to 2560×1600. For the first time MPEG introduced two set of constrains (low delay and random access) for different classes of target applications.

The HEVC CfP was a milestone because it requested the biggest ever testing effort performed by any laboratory or group of laboratories until then. The CfP generated a total of 29 submissions and 4205 coded video files plus the set of anchor coded files. Three testing laboratories took part in the tests that lasted four months and involved around 1000 naïve (non-expert) subjects allocated to a total of 134 test sessions.

A common test set of about 10% of the total testing effort was included to monitor the consistency of results from the different laboratories. With this procedure it was possible to detect a set of low quality test results from one laboratory.

Point Cloud Compression (PCC) CfP

The CfP was issued to assess how a proposed PCC technology could provide some 2D representations of the content synthesised using PCC techniques, resulting in some video suitable for evaluation by means of established subjective assessment protocols.

Some video clips for each of the received submissions were produced after an accurate selection of the rendering conditions. The video clips were generated using a rendering video tools. This was used to generate, under the same conditions, two different video clips for each of the received submissions: a rotating view of a fixed synthesised image and a rotating view of moving synthesised video clips. The rotations were selected in a blind way and the resulting video clips were subjectively assessed to rank the submissions.

Conclusions

Quality is what end users of media standards value as the most important feature. To respond to this requirements, MPEG has designed a standards development process that is permeated by quality considerations.

MPEG has no resources of its own. Therefore, sometimes it has to rely on the voluntary participation of many competent laboratories to carry out subjective tests.

The domain of media is very dynamic and, very often, MPEG cannot rely on established method – both subjective and objective – to assess the quality of compressed new media types. Therefore, MPEG is constantly innovating the methodologies it used to assess media quality.

Posts in this thread