MPEG and the future of visual information coding standards

Video in MPEG has a long history

MPEG started with the idea of compressing the 216 Mbit/s of standard definition video, and the associated 1.41 Mbit/s of stereo audio, for interactive video applications on Compact Disc (CD). That innovative medium of the early 1980s was capable to provide a sustained bitrate of 1.41 Mbit/s for about 1 hour. The bitrate was expected to accommodate both the video and audio information. At about the same time, some telco research laboratories were working on an oddly named technology called Asymmetric Digital Subscriber Line (ADSL), in other words, a modem for high-speed (at that time) transmission of ~1.5 Mbit/s for the “last mile”, but only from the telephone exchange to the subscriber’s network termination. In the other direction, only a few tens of kbit/s were supported.

Therefore, if we exclude a handful of forward-looking broadcasters, the MPEG-1 project was really a Consumer Electronics – Telco project.

Setting aside the eventual success of the MPEG-1 standard – Video CD (VCD) used MPEG-1 and 1 billion player were produced in total, hence a different goal than the original interactive video – MPEG-1 was a remarkable success for being the first toidentify an enticing business case for video (and audio) compression, and systems, on top of which tens of successful MPEG standards were built over the years.

This article has many links

In Forty years of video coding and counting I have recounted the full story of video coding and in The MPEG drive to immersive visual experiences I have focused on the efforts MPEG has made, since its early years, to provide standards for an extended 3D visual experience. In Quality, more quality and more more quality I have described how MPEG uses and innovates subjective quality assessment methodologies to develop and eventually certify the level of performance of a visual coding standard. In On the convergence of Video and 3D Graphics I have described the efforts MPEG is making to develop a unified framework that encompasses a set of video sources producing pixels and a set of sensors producing points. In More video with more features I described how MPEG has been able to support more video features in addition to basic compression.

Now to the question

Seeing all this, the obvious question a reader might ask could be: if MPEG has done so much in the area of visual information coding standards, does MPEG still have much to do in that space? A reader with only a superficial understanding of the force that drives MPEG should probably know the answer, but I am not going to give it right now. I will first argument what I see as the future of MPEG in this area.

I need to make a disclaimer first. The title of this article is “The future of visual information coding standards”, but I should restrict the scope to “dynamic (i.e. time dependent) visual information coding”. Indeed the coding of still pictures is a different field of emdeavour serving the needs of a different industry with a different business model. It should not be a surprise if the two JPEG standards – the original JPEG and JPEG 2000 – have both a baseline mode (the only one that is actually used) which is Option 1 (ISO/IEC/ITU language for “royalty free”). It should also be no surprise to see that, while it is conceivable to think of a standard for holographic still image coding,  holography is not even mentioned in this article.

There was always a need for new video codecs

Forty years of video coding and counting explains the incredible decade-long ride to develop video coding standards all based on the same basic ideas enriched at each generation, that will enable the industry to achieve a bitrate reduction of 1,000 from video PCM samples to compressed bitstream with the availability of the latest VVC  video compression standard, hopefully in the second half of 2020 (the incertainty is caused by the current Covid-19 pandemic which is taking its toll on MPEG as well).

The need for new and better compression standards, when technology makes it possible and the improvement over the latest existing standard justifies it, has been justified by the push toward higher resolution, colour, bit-per-pixel, dynamic range, viewing angle etc. video and the lagging availability of a correspondingly higher bitrate to the end user.

The push toward “higher everything” will continue, but will the bitrate made available to the end user continue to lag?

The safe answer is: it will depend. It is a matter of fact that bandwidth is not an asset uniformly available in the world. In the so-called advanced economies the introduction of fibre to the home or to the curb continues apace. The global 5G services market size is estimated to reach 45.7 B$ by 2020 and register a CAGR of 32.1% from 2021 to 2025 reaching ~184 B$. Note that 2025 is the time when MPEG should think seriously about a new video coding standard. The impact of the current pandemic could further accelerate 5G deployment.

More video and which codecs

The first question is whether there will be room for a new video coding standard. My answer is yes and for at least two reasons. The first is socio-economic: the amount of world population that is served by a limited amount of bandwidth will remain large while the desire to enjoy the same level of experience of the rest of the world will remain high. The second is technical: currently, efficient 3D video compression is largely dependent on efficient 2D video compression.

The second question is more tricky. Will this new (after VVC) 2D video compression standard still be another extension of the motion compensated prediction scheme? I am sure that the answer could be yes. The prowess of the MPEG community is such that another 50% improvement could well be provided. I am not sure that will happen, though. Machine learning applied to video coding is showing that significant improvements over state-of-the-art video compression can be obtained by replacing components of existing schemes with Neural Networks (NN), or even by defining entirely new NN-based architectures.

The latter approach has several aspects that make it desirable. The first is that a NN is trained for a certain purpose but you can always trained it better, possibly at the cost of making it heavier. Neural Network Compression (NNC), another standard MPEG is working on, could further extend the range of incrementally improving the performance of a video coding standard, without changing the standard, by making components of the standard downloadable. Another desirable aspect is that media devices will become more and more addicted to using Artificial Intelligence (AI)-inspired technologies. Therefore a NN-based video codec could simply be more attractive for a device implementor because the basic processing architectures are shared amongst a larger number of data types.

New types of video codec

There is another direction that needs to considered in this context and that is the large and growing quantity of data that are being and will be produced by connected vehicles, video surveillance, smart cities etc. In most cases today and more so in the future, it is out of question to have humans at the other side of the transmission channel watching what is being transmitted. More likely there will be machines that will monitor what happens. Therefore, the traditional video coding scenario that aims to achieve the best video/image under certain bit-rate constraints having humans as consumption targets is inefficient and unrealistic in terms of latency and scale when the consumption target is a machine.

Video Coding for Machines (VCM) is the title of an MPEG investigation that seeks to determine the requirement for this novel, but not entirely new video coding standard. Indeed, the technologies standardised by MPEG-7 – efficiently compressed image and video descriptors and description schemes – belong to the same category as VCM. It must be said, however, that 20 years have not passed in vain. It is expected that all descriptors will be the output of one or more NNs.

One important requirement is the fact that while millions of streams may be monitored by machines, some streams may need to be monitored by humans as well, possibly after having been alerted by a machine. Therefore VCM is linked to the potential new video coding I have talked about above. The question is whether VCM should be called HMVC (Human-Machine Video Coding) or there should be VCM (where the human part remains below threshold in terms of priority) and YAVC (Yet Another Video Coding, where the user is meant to be a human).

Immervice video codecs

The MPEG drive to immersive visual experiences shows that MPEG has always been fascinated by immersive video. The fascination is not fading away as shown by the fact that, in four months MPEG plans to release the Video-based Point Cloud Compression standard and in a year the MPEG Immersive Video standard.

These standards, however, are not the end points, but the starting points in the drive to more rewarding user experiences. Today we cannot say how and when MPEG standards will be able to provide full navigation in a virtual space. However, that remains the goal for MPEG. Reaching that goal will also depend on the appearance of new capture and display technologies.

Conclusions

The MPEG war machine will be capable to deliver the standards that will keep the industry busy developing the products and the service that will enhance the user experience. But we should not forget an important element: the need to adapt MPEG’s business model to the new age.

MPEG needs to adapt, not change its business model. If MPEG has been able to sustain the growth of the media industry, it is because it has provided opportunities to remunerate the good Intellectual Property that is part of its standards.

There are other business models appearing. The MPEG business model has shown its worth for the last 30 years. It can do the same for another 30 years if MPEG will be able to develop a strategy to face and overcome the competition to its standards.

Posts in this thread

Can MPEG survive? And if so, how?

Introduction

Human beings and organisation that result from human endeavours are complex living organisms that live and expand because they have internal forces driving their development. Both types of organism, however operate in environments populated by other organisms which variously affect them.

Today 7.5 billion human organisms are under the threat of Covid-19 viral pandemic that is taking a terrible and growing toll of deaths and economic hardships. Organisms created by human, too, are subjects to the influence of other organisms, some operating synergistically and some antagonistically. MPEG is no exception.

This article will explore the state of health of the MPEG organism and its chances of survival.

What is driving the MPEG organism

In general organisms created by human endeavours are different from one another. Their nature depends on who created them, for what purpose, in which environment etc. MPEG, I must say, is probably more different than most.

Industry-independent standards

MPEG was created to give a chance to the use of video compression in products and services, as video communication had proved not to be an attractive business proposition. It started (MPEG-1) as a mix of Telecommunication and Consumer Electronics, two industries who did not have, 30 years ago, much to share business-wise. This original mix explains why MPEG standards have rigorously a normative conformance testing specification without any associated authorised testing laboratory.

With MPEG-2 broadcasting concurred to shaping the MPEG organism and, with MPEG-4, IT joined and shaped MPEG. The miracle recipe is that MPEG standards have been able to satisfy the needs of industries that used to be – and now, because of MPEG, less so – oceans apart. The result has been that different industries could design products that were interoperable with those of other industries. They could access the compression technology without discrimination, independently of their belonging to a particular industry or a particular country.

Business model

MPEG expanded thanks to its business model based on the best compressed media quality at a given time at the costs of introducing a potentially large amount of patents owned by different entities. The MPEG model is radically different than the JPEG model. The two JPEG successful standards – the original JPEG and JPEG2000 – are both based on a baseline Option 1 (that people outside of ISO would call royalty free) with additional encumbered profiles.

From the bottom

The two elements are not sufficient to explain the wild success of MPEG standards. The third element is the lowly MPEG status. MPEG is a working group (WG), an entity not allowed to make “decisions”, but only to propose recommendations to be approved by other entities sitting above it. This is the formality, but the practice is different. The MPEG working groups has competence on (compression of) all media types (actually, even more than just “media”), is populated by Industries and Companies with a stake in media, and by Academia and Research. It identifies standards opportunities using technology as scale, but technology as handled by a large number of extremely competent experts from the industries and companies that matter.

If MPEG had been an entity above WG, it would have been driven by political considerations which would have led to project of standards aligned to the politics of the committee. This would have produced less standards, less successes, less technology and more national/industry interests.

No central decision

The fourth element is that in MPEG there is no centralised decision point. It is an environment where interests aggregate around proposals that come from members. When those interests reach a certain threshold, they have the opportunity to become MPEG projects.

MPEG has the unique recipe to define technology-driven projects of standards first, adding corporate and national interests later, if they pass through the sieve of the technical requirements analysis.

Antagonistic organisms

MPEG is an organism living in an environment populated by other organisms. Many application-oriented standards and industry fora need compression standards for media (and other data) but understand that compression is an extremely specialised field that it would be foolish to compete with. These are the synergistic organisms, however, there are organisms who play an antagonistic role. We should not say that they are evil. Covid-19 is no more, no less evil than a computer virus displaying “I am erasing your hard disk. Nothing personal” while it destroys years of efforts embedded in your hard disk.

Grabbing pieces

MPEG produces standards for the vast field for which it has competence. However, there are minor organisms who think that, if they could grab some pieces of the MPEG field, they would be able to emulate the success of MPEG standards. Of course they do not understand that a successful standard does not just depend on what the standard is about, but on the ecosystem that produces it and the paths leading to those who will use it.

Getting control

MPEG has become large and exert a vast influent but, as I argued in Who “owns” MPEG? And Who “decides” in MPEG? no one has really a control of, a stake in or an influence on MPEG. This is unusual because bodies like MPEG are typically under the influence of some company or industry or country.

MPEG is an appetising morsel for any organisms out there.

Removing competition

A standard is seldom just a technical specification. It is typically the technical arm that supports a business model. The MPEG business model is rather neutral. It is an assembly of technology “sold” at a price without strings attached. There are plenty of other business models. Possibly at the other extreme the GNU General Public License stipulates certain obligations on the use of an assembly of technology expressed in a computer language.

It is only natural that organisms with different business models may wish to wipe out a competitor like MPEG.

Immune reaction

The DNA of the MPEG organism has driven the expansion of MPEG beyond the traditional digital media field. MPEG has already proven that it is capable of applying its methodology to other fields such as compression of genomic data. MPEG is not new to the introduction of foreign bodies into itself. As I said above, MPEG started as an excrescence of the Telecommunication industry combined with the Consumer Electronic industry, later combined with the Broadcasting and IT industry.

So far the reaction of the MPEG immune system was suppressed, but it is not clear whether that suppression will continue to be effective if major injections of foreign bodies will continue.

Autoimmune reaction

The cells that compose the MPEG organism are having different reactions to the success of the MPEG organism. The MPEG business model – one could call it the “MPEG DNA” – produces results, but some cells inside MPEG operate against the success of that business model.

MPEG has developed enzymes to cut and paste its DNA to inject new business model code, but the jury of that operation is still out.

Adapt or die

At the risk of playing down the important successes of the past, one could say that, so far, MPEG has largely worked in known land. If we discard some “minor details”, the requirements of the VVC standard, which we all hope will be released as FDIS on the 3rd of July or another date close to it, are not so different than the MPEG-1 Video requirements. Sure, we have higher definition, higher frame rates, higher chrominance, higher dynamic range, 360⁰ and more. However, MPEG has consistently handled rectangular video. Something similar can be said of audio. 3D Graphics is offering new domains such as Point Cloud Compression. The MPEG-2 Transport Stream addressed different problems that the MP4 file format, but both are transport issues.

Immersive media is a different beast. This is technically not a new land, because MPEG has tried working on immersive visual media before. Without much success, however. Waves of new technologies come and go with new promises.

Viruses and national policies

Even before the coming of the Covid-19 virus, there was a clear trend toward a comeback of country- and industry-driven standard compartments that MPEG had successfully fought by imposing its own model based on global – country and industry – standards.

The reaction of countries to the global Covid-19 threat show that countries are unable to pool common resources to face common threats. Every country is going its own way.

The MPEG organism can well die because there will be a dearth of food (i.e. proposals of new standards) because all food will go to the local organisms.

How can MPEG survive?

I see four possible futures for MPEG

  1. Retain MPEG. The MPEG model is tested and vital, if it is allowed to continue.
  2. Clone MPEG. Take over the MPEG model and reproduce it in another environment.
  3. Mimic MPEG. Develop another model working as well as MPEG.
  4. Balkanise MPEG. The MPEG work space is divided into national, industry and business model subspaces.

Conclusions

There should be no doubt that my preference goes to “Retain MPEG”. However, this may be difficult to realise because so many hostile organisms are acting against MPEG.

My second best is to “Clone MPEG” and let it operate in a different environment. I expect this to require a huge effort. However, things may be facilitated by the choice of the environment and by the collaboration of those who believe in the plan.

I can only define the option of “Mimic MPEG”, i.e. to develop and implement a model alternative to MPEG that works as a dream. Some people like it hot, some others like to think their dreams as reality.

“Balkanise MPEG ” is the natural consequence of the “Mimic MPEG” dream gone sour.

Posts in this thread

Quality, more quality and more more quality

Quality measurement is an essential ingredient of the MPEG business model that targets the development of the best performing standards that satisfy given requirements.

MPEG was not certainly the first to discover the importance of media quality assessment. Decades ago, when still called Comité Consultatif International des Radiocommunications (CCIR), ITU-R developed Recommendation 500  – “Methodologies for the subjective assessment of the quality of television images”. This recommendation guided the work of television labs for decades. It was not possible, however, to satisfy all MPEG needs with BT.500, the modern name of CCIR Recommendation 500, for three main reasons: MPEG needed methods to assess the impact of coding on video quality, MPEG dealt with a much wider range of moving pictures than television and MPEG ended up dealing with more than just 2D rectangular moving pictures.

Video quality assessment in MPEG began in November 1989 at the research laboratories of JVC in Kuriyama when all aspects of the responses to the MPEG-1 Call for Proposals (CfP), including quality, were considered. Two years later MPEG met again in Kurihama to consider the responses to the MPEG-2 CfP. At that time the assessment of video quality was done using the so-called Double-stimulus impairment scale (DSIS) using a 5-grade impairent scale. In both tests massive use of digital D1 tapes was made to deliver undistorted digital video to the test facility. The Test subgroup led by the chair Tsuneyoshi Hidaka managed all the logistics of D1 tapes coming from the 4 corners of the worls.

The MPEG Test chair could convince the JVC management to offer free use of the testing facilities for MPEG-1. However, he could not achieve the same for MPEG-2. Therefore MPEG-2 respondents were asked to pay for the tests. Since then participation in most if not all subjective tests campaigns has been subject to the payment of a fee to cover the use of facilities and/or the human subjects who were requested to view the video sequences under test. The MPEG-1 and MPEG-2 tests were carried out in the wake of Recommendation BT.500.

The MPEG-4 tests, carried out in 1995, fundamentally changed the scope because the CfP addressed Multimedia contents, i.e.  progressively scanned moving images typically at lower resolution than TV which was supposed to be transmitted over noisy channels (videophone over fixed subscriber line or the nascent mobile networks). The statistical processing of subjective data applied to the MPEG-4 CfP was innovated by the use of ANOVA (analysis of variance), because until then tests only used simple mean value and Grand Mean, i.e. the mean value computed considering the scores assigned to several video sequences.

The use of Statistically Significant Difference (SSD) allowed a precise ranking of the technologies under test. Traditional test methods (DSIS and SS) were used together with the new Single Stimulus Continuous Quality Evaluation (SSCQE) test method to evaluate “long” video sequences of 3 minutes measure how well a video compression technology could recover from transmission errors. The tests were carried out using the D1 digital professional video recorder and Professional Studio Quality “grade 1” CRT displays.

The Digital Cinema test, carried out in 2001 at the Entertainment Technology Centre (ETC) of the University of Southern California, was designed to evaluate cinematic content in a real theatrical environment, i.e. on a 20 m base perforated screen, projected by a cinema projector fed with digital content. The subjective evaluations were done with three new test methods: The Expert Viewing Test (EVT), a two steps procedure, where the results of a DSIS test were refined by means of careful observation by a selected number of “golden eye” observations, the Double Stimulus Perceived Difference Scale (DSPDS), a double stimulus impairment detection test method using a 5 grades impairment scale and the Double Stimulus Split-Screen Perceived Difference Scale (S3PDS), a test method based on a split screen approach where both halves of the screen were observed in sequence.

The test for the Call for New Tools to Further Improve Coding Efficiency were done using traditional test methods and the same methodology and devices of the MPEG 4 Call for Proposal. The test demonstrated the existence of a new technology in video compression and allowed the collaboration between ISO and ITU-T in the area of digital video coding to resume. This was the first test to use the 11-grade impairment scale, that became a reference for DSIS and the SS test experiments, and provided a major improvement in result accuracy.

A new test method – the VSMV-M Procedure – was designed in 2004 to assess the submission received for the Core Experiment for the Scalable Video Coding. The Procedure was made of two phases: a “controlled assessment” phase and a “deep analysis” phase. The first phase was made according to the DSIS and SS test methods and a second phase, designed by MPEG, where a panel of experts confirmed the ranking obtained running the evaluation done with formal subjective assessment. These test were the first to be entirely based on digital video servers and DLP projector. Therefore, 15 years after they were first used in the MPEG-1 tests, D1 tapes were finally put to rest.

The SVC Verification Tests carried out in 2007, represented another important step in the evolution of the MPEG testing methodology. Two new test methods were designed: the Single Stimulus Multi-Media (SSMM) and the Double Stimulus Unknown Reference (DSUR). The SSMM method minimised the contextual effect typical of the Single Stimulus (SS) and the DSUR was derived from the Double Stimulus Impairment Scale (DSIS) Variant II introduced some of the advantages of the Double Stimulus Continuous Quality Scale (DSCQS) method in the DSIS method avoiding the tricky and difficult data processing of DSCQS.

The Joint Call for Proposals on Video Compression Technology (HECV) covered 5 different classes of content, with resolutions ranging from WQVGA (416×240) to 2560×1600, in two configurations (low delay and random access) for different classes of target applications. It was a very large test effort because it was done on a total of of 29 submissions that lasted 4 months and involved 3 laboratories which assessed more than 5000 video files and hired more than 2000 non-expert viewers. The ranking of submissions was done considering the Mean Opinion Square (MOS) and Confidence Interval (CI) values. A procedure was introduced to check that the results provided by different test laboratories were consistent. The results of the three laboratories included a common test set that allowed to measure the impact of a laboratory on the results of a test experiment.

A total of 24 complete submissions were received in response to the Joint Call for Proposal on 3D Video Coding (stereo and auto-stereo) issued in 2012. For each test case each submission produced 24 files representing the different viewing angle. Two sets of two and three viewing angles were blindly selected to synthesise the stereo and auto-stereo test files. The test was done on standard 3D displays (with glasses) and auto stereoscopic displays. A total of 13 test laboratories took part in the test running a total of 224 test sessions, hiring around 5000 non expert viewers. The test applied a full redundancy scheme, where each test case was run by two laboratories to increase the reliability and the accuracy of the results. The ranking of the submissions was done considering the MOS and CI values. This test represented a further improvement in the control of performances of each test laboratory. The test could ensure full result recovery in the case of failure of up to 6 out of 13 testing laboratories.

The Joint CfP for Coding of Screen Content was issued to extend the HEVC standard in order to improve the coding performance of typical computer screen content. Whent it became clear that the set of test conditions defined in the CfP was not suitable to obtain valuable results, the test method was modified from the original “side by side” scheme, to a sequential presentation scheme. The complexity of the test material led to the design of an extremely accurate and long training of the non-expert viewers. Four laboratories participated in the formal subjective assessment test, assessing and ranking the seven responses to the CfP. More than 30 test sessions were run (including the “dry-run” phase) hiring around 250 non-expert viewers.

The CfP on Point Cloud Coding was issued to assess coding technologies for 3D point coulds. MPEG had no experience (but actually no one had) in assessing the visual quality of point clouds. MPEG projected the 3D point clouds to 2D spaces and evaluated the resulting 2D video according to formal subjective assessment protocols. The video clips were produced using a rendering tool that generated two different video clips for each of the received submissions, under the same creation conditions. Both were rotating views of 1) a fixed synthesised image and 2) a moving synthesised video clips. The rotations were blindly selected.

The CfP for Video Compression with Capability beyond HEVC included three test categories, for which different test methods had to be designed. The Standard Dynamic Range category was a  compression efficiency evaluation process where the classic DSIS test method was applied with good results. The High Dynamic Range category required two separate sessions, according to the peak luminance of the video content taken into account, i.e. below (or equal to) 1K nits and above 1K nits (namely 4K nits); in both cases DSIS test method was used. The quality of the 360° category was assessed in a “viewport” extracted from the whole 360° screen with an HD resolution.

When the test was completed, the design of the 36 “SDR”, 14 “HDR” and 8 “360°” test sessions was verified. For each test session the distribution of the raw quality scores assigned during each session was analysed to verify that the level of visual quality across the many test sessions was equally distributed.

This was a long but still incomplete review of 30 years of subjective visual quality in MPEG. This ride across 3 decades should demonstrate that MPEG draws from established knowledge to create new methods that are functional to obtain the resulst MPEG is seeking. It should also show the level of effort invovled in actually assigning task, coordinate the work and produce integrated results that provide the responses. Most important is the level of human participation involved: 2000 people (non experts) for the HEVC tests!

Acknowledgements

Many thanks to the MPEG Test chair Vittorio Baroncini for providing the initial text of this article. Many parts of the activities described here were conducted by him as Test chair.

Posts in this thread

Developing MPEG standards in the viral pandemic age

Introduction

For 30 years industry has been accustomed to rely on MPEG as the source of standards the industry needs. In 30 years MPEG has held a record 129 meetings, roughly spaced by 3 months.

What happens if MPEG130 is not held? Can industry afford it?

In this article I will try and answer this non so hypothetical question.

An MPEG meeting (physical)

In Looking inside an MPEG meeting I have illustrated the “MPEG cycle” workflow using the figure below

At the plenary session of the previous N-1th meeting, MPEG approves the results achieved and creates some 25 Ad hoc Groups (AhGs). Taking one example from MPEG129, each AhG has a title (Compression of Neural Networks for Multimedia Content Description and Analysis), chairs (Werner Bailer, Sungmoon Chun and Wei Wang) and a set of mandates:

  1. Collect more diverse types of models and test data for further use cases, working towards a CfP for incremental network representation
  2. Perform the CEs and analyse the results
  3. Improve the working draft and test model
  4. Continue analyzing the state of the art in NN compression and exchange formats
  5. Continue interaction with SC42, FG ML5G, NNEF, ONNX and the AI/ML community

Work is carried out during the typical ~3 months between the end of the N-1th and the next Nth meeting using e-mail reflector or conference calls or, less frequently, physical meetings. Documents are shared by AhG members using the MPEG Document Management System (MDMS).

When the date of the next meeting approaches, AhGs wrap up their conclusions and many of them hold physical meetings on the week-end prior to the “MPEG week”.

On the Monday morning of the MPEG week, AhGs report their results to the MPEG plenary. In the afternoon, subgroups (Requirements, Systems, Video, Joint groups with ITU, Audio, 3DG and Test) hold short plenaries after which Break-out Groups (BoGs), often a continuation of AhGs, carry out their work interspersed with joint meetings of subgroups and BoGs.

Two more plenaries are held: on Wednesday morning to make everybody aware of what has happened in groups a member might not have had the opportunity to attend and on Friday afternoon to ratify or, if necessary, reconsider, decisions made by the subgroups.

The Convenor and the Chairs meet at night to assess progress and coordinate work between subgroups and BoGs. A typical function is the identification of joint meetings.

ICT at the service of MPEG

Some 500 people are involved in an MPEG week, At times some 10-15 meeting sessions are held in parallel.

Most of this is possible because of the ICT facilities MPEG prides itself of. Developed by Christian Tulvan, they run on servers made available by Institut Mines Télécom.

Currently the MPEG Meeting Support System (MMSS) includes a calendar where subgroup chairs record all subgroup and BoG sessions adding a description of the topics to be discussed. The figure below gives a snapshot of the MMSS calendar. This of course has several views to serve different needs.

In Digging deeper in the MPEG work, I described MDMS and MMSS. Originally deployed in 1995, MDMS has been one of the greatest contributors to MPEG’s skyrocketing rise in performance. In addition to providing the calendar, MMSS also enables the collation of all results produced by the galaxy of MPEG organisational units depicted below.

The third ICT support is the MPEG Workplan Management System (MWMS). This provides different views of the relevant information on MPEG standards that is needed to execute the workplan.

MPEG online?

Now imagine, and probably you don’t have to stretch you imagination too much, that physical meetings of people are banned but industry requests are so pressing that a meeting must be held, no matter what, because product and service plans depend so much on MPEG standards.

MPEG is responding to this call of duty and is attempting the impossible by converting its 131st (physical) meeting of 500 experts to a full online meeting retaining as much as possible the modus operandi depicted in the figures above.

In the following I will highlight how MPEG is facing what is probably its biggest organisational challenge ever.

The first issue to be considered is that, no matter how skilfully MPEG will handle its first online meeting, productivity is going to be less than a physical meeting could yield. This is because by and large the majority of the time of a physical MPEG meeting is dedicated to intense technical discussions in smaller (and sometimes not so small) groups. At an online meeting, such discussions will at best be a pale replica of the physical meeting where experts are pressed by the number and the complexity of the issues, the argument they make, the little time available, the need to get to a conclusion and a clumsier handling of the interventions.

MPEG is facing this challenge by asking AhGs to come to the online meeting with much more solid conclusions than usual so that the results that will be brought to the online meeting will be more mature and will require less debate to be adopted. This has generated a surge in conference calls by the groups who are more motivated by the need to achieve firm results at the next meeting.

Another way to face the challenge is by being realistic in what is achievable at an online meeting, Issues that are very complex and possibly less urgent will be handled with a lower priority or not considered at all, of course if the membership agrees. Therefore the management will set the meeting goals, balancing urgency, maturity and achievability of results. Of course experts, individually or via AhGs, will have an opportunity to make themselves heard.

Yet another way to face the challenge is by preparing a very detailed assignment of time slots to issues during the entire week in advance of the MPEG week. So far this was done only partially because MPEG allowed as much time as possible to experts to prepare and upload their contributions for others to study and to be ready to discuss at the meeting. This has always forced the chairs to prepare their schedule at the last minute or even during the week as the meeting unfolds. This time MPEG asks its experts to submit their contributions one full week before with an extended abstract to facilitate the task of the chairs who have to understand tens and sometimes hundreds of contributions and properly assign them to homogeneous sessions.

The schedules will balance the need to achieve as many results as possible (i.e. parallel sessions) with giving the opportunity to as many members as possible to attend (i.e. sequential sections).

The indefatigable Christian Tulvan, the mind and the arm behind MDMS and MMSS, is currently working to extend MMSS to enable the chairs to add the list of documents to be considered and to create online session reports shared with and possibly co-edited by session participants.

So far MPEG has been lenient most of the time to late contributions (accepted if there is consensus to review the contribution). This time late contributions will simply not be considered.

No matter how good the forecast will be, It is expected that the schedule will change while the week progresses. If a change during the meeting is needed, it will be announced at least 24 hours in advance.

The next big challenge is the fact that MPEG is a truly global organisation. We do not have Hawaiian experts in attendance, but we do have experts from Australia (East Coast) to the USA (West Coast). That makes a total of 19 time zones. Therefore MPEG130 online will be conducted in 3 time slots starting at 05:00, 13:00 and 21:00 (times are GMT). The sessions inside will have durations less than 2 hours followed by a break.

Conclusions

Last but not least. MPEG is confident that the current emergency will be called off soon. The situation we are facing, however, is new and we simply don’t know when it will be over and if it will be for once or if this is just the first of future pandemics.

With MPEG130 online, MPEG not only wants to respond to the current industry needs, but also to fine tune its processes in an online context to be always ready to serve the industry and the people industry serves, no matter which are the external circumstances.

I don’t underestimate the challenge MPEG is facing with MPEG130 online, but I know I can rely on a dedicated leadership and membership.

Posts in this thread

The impact of MPEG on the media industry

MPEG was established as an experts group on 1988/01/22 in Copenhagen, a likttle more that 32 tears ago. At that time, content media were already very important: voice communication; vinyl, compact cassettes, compact discs for audio; radio, mostly on terrestrial Hertzian channels; and television on 4 physical media: terrestrial Hertzian channels, satellite, cable and package media.

The way individual media evolved was a result of the technology adopted to represent content media and the way content media were distributed. Industry shared some elements of the technologies but each industry introduced many differences. The situation was further exacerbated by different choice made by different countries and regions, sometimes justified by the fact that some countries introduced a technology earlier (like 415 lines of UK TV before WW II and 525 lines od US TV some years later). In some other cases there was no justification at all.

The figure below represents the actors of 1988:

  1. Two forms of wireless radio and television (terrestrial and satellite)
  2. Wired radio and television (cable)
  3. Physical distribution (package media)
  4. Theatrical movies distribution
  5. Content industry variously interconnected with the distribution industries.

The figure includes also two industries who, at that time, did not have an actual business in content distribution. Telecommunications was actively vying for a future role (although at that time some telcos were running cable television services as a separate business from telephony both as public services). The second industry was information technology. Few at that time expected that the internet protocol, an outcome of the information technology industry because it was designed to enable computers to communicate, would become the common means to transport media. However, eventually that is what it did.

The figure should be more articulated. Indeed it does not include manufacturers. At that time consumer electronics served users of the broadcasting service but broadcasting had their own manufacturing industry for the infrastructure. Consumer electronics was by itself the package media industry. Telcos had a manufacturing industry of their own for the infrastructure and a separate manufacturing industry for terminal devices, with some consumer electronics or office equipment companies providing facsimile terminals.

Even though it did not happen overnight, MPEG came, saw and unified. Today all the industries in the figure maintain a form of individual existence but they are much more integrated, as represented by the figure below.

Industry convergence has become a much abused word. However, it is true that standard and efficient digital media have enabled the industries to achieve enormous savings in moving to digital, and expanding from it, by allowing reuse of common components possibly form hitherto remote industries. A notable example is Media Transport (MMT) which provides the means to seamlessly move from one-way to two-way media distribution because IP is the underlying common protocol.

There is a net result from convergence that can be described as two points

  1. Industry: MPEG-enabled products (devices) & services are worth 1.5 T$ p.a., i.e. ~1.8% Gross World Product
  2. Consumers: Billions of consumers enjoy media every time and everywhere.

It would be silly to claim that this is a result for which MPEG is the only one to claim merit. There are many other standards bodies/committees who share in this result. The figure below shows some of them. It should be cleat, however, that, all started from MPEG while other bodies took over from where MPEG has left the technology.

Two words about the semantics of the figure. A black line without arrows signifies that MPEG is in liaison with the body. A black line with one arrow means that MPEG is providing or has provided standards to that body. A black line with two arrows means that the interchange is/has been two way. Finally a red line means that MPEG has actually developed standards with that body. The numbers refer to the number of jointly developed standards. The number after the + indicates the number of standards MPEG is currently developing jointly with that body.

Is there a reason why MPEG has succeeded? Probably more than one, but primarily I would like to mention one: MPEG has created standards for interoperability where industry used to develop standards for barriers. Was MPEG unique in its driving thoughts? No, it just applied the physiocratic principle “laissez faire, laissez passer” (let them do, let them pass), without any ideological connotation. Was MPEG unique in how it did it? Yes, because it first applied the principle to media standard. Was MPEG unique in its result? Yes. It created a largely homogeneous industry in what used to be scattered and compartmentalised industries.

It is easy to look at the success of the past. It is a nice exercise to do when you have reached the end of the path, but this is not the case of MPEG. Indeed MPEG has a big challenge: after it has done the impossible, people expects to do even better in the future. And MPEG has better not fail 🙁

The figure below depicts some of the challenges MPEG faces in the next few years.

A short explanation of the 8 areas of the figure:

  1. Maintenance of ~180 standards is what MPEG needs to do primarily. Industry has adopted MPEG standards by the tens, but that is not the end point, that is the start. Industry continuously expresses needs that come from the application of MPEG standards it has adopted. These requests must be attended to.
  2. Immersive media is one of the biggest challenges faced by MPEG. We all wish to have immersive experience like being physically here but feeling like we were at a different place subject to the experiences felt by those who are in that place. The challenges are immense. Addressing them requires a level on integration with the industry never seen before.
  3. Media for old and new users conveys two notions. The first that “old” media are not going to die anytime soon. We will need conventional audio, good old 2D rectangular video and, even though it is hard to call them as “old media”, point clouds. These media are for human users, but we see the appearance of a new type of user – machines – that are going to make use of audio and visual information that has been transmitted from remote. This goal includes the current Video Coding for Machines (VCM) exploration.
  4. Internet of Media Things is a standard that MPEG has already developed with the acronym IoMT. At this moment, however, this is more at the level of a basic infrastructure on which it will be possible to build support for such ambitious scenarios as Video Coding for Machines where media information is captured and processes by a network of machines assembled or built to achieve a predetermined goal.
  5. Neural Network Compression (NNR) is another component of the same scenario. The current assumption is that in the future a lot, if not all, of the “traditional” processing, e.g. for feature extraction, will accomplished using neural network and that components of “intelligence” will be distributed to devices, e.g. handheld devices but also IoMTs, to enable them to be a better or a new job. NNR is at its infancy in MPEG and much more from it can be expected.
  6. Genomic Data Compression has been shown to be viable by the MPEG-G standard. The notion of a single representation of a given type of data is a given in MPEG and has been the foundation of its success. That notion is alien to the genomic world where different data formats are applied at different portions of genomic workflows, but its application will have beneficial effects as much as it had to the media industry.
  7. Other Data Compression is a vast field that includes all cases where data, possibly already in digital form, are currently handled in an inefficient way. Data compression is not important only because it reduces storage and transmission time/bandwidth requirements, but because it provides data in a structured form that is suitable for further processing. Exploring and handking these opportunities is a long-term effort and will certainly provide rewarding opportunities.
  8. Finally, we should realise that, although MPEG holds the best compression and transport experts from the top academic and economic enterprises, we do not know the needs of all economic players. We should be constantly on alert, ready to detect the weak signal of today that will become mainstream tomorrow.

For as many years to come as it is possible to forecast today, industry and consumers will need MPEG standards.

Posts in this thread

 

MPEG standards, MPEG software and Open Source Software

Introduction

The MPEG trajectory is not the trajectory of an Information Technology (IT) group. Today software plays a key role in MPEG standard development. However, MPEG it is not an IT group. For MPEG, software is a tool to achieve the goal of producing excellent standards. But software remains a tool. Clearly, because MPEG assembles so many industries, with so many different agendas. There are MPEG members for which software is more than a tool.

In this article I will explore the relationship of MPEG with software and, in particular, Open Source Software.

Early days

In my early professional days I had the opportunity to be part of an old-type ICT standardisation, the European COST 211 project. A video codec specification (actually more than that, because it contained Systems aspects as well) was later submitted to and became a Recommendation of CCITT (Today ITU-T) with the acronym and title H.120 – Codecs For Videoconferencing Using Primary Digital Group Transmission.

The specification was developed on the basis of contributions received, discussed, possibly amended and then added to the specification. There was no immediate “verification” of the effectiveness of contributions adopted because that relied on hardware implementation of the specification and hardware is a different beast. Four countries (DE, FR, IT and UK) implemented the specification that was eventually confirmed by field trials using 2 Mbit/s satellite links where the 4 implementations were shown to interoperate.

MPEG-1 and MPEG-2

That happened in the years around 1980. Ten years later, MPEG started the development of the MPEG-1 Video and then the MPEG-2 Video standards using a different method.

MPEG assembled the first MPEG-1 Video Simulation Model (VM) at MPEG10 (1990/03). The VM was comparable with the H.120 evolving specification because it was a traditional textual description. At MPEG12 (1990/08), MPEG started complementing the text of the standard with pseudo C-code because people accustomed to write computer programs found it more natural to describe the operations performed by a codec with this language than using words.

In MPEG-1 and MPEG-2 times active participants developed and maintained their own simulation software. Some time later, however, it was decided to develop reference software, i.e. a software implementation of the MPEG-1 standard.

Seen with the eyes of a software developer, the process of standard development in MPEG-1 and MPEG-2 times was rather awkward, because the – temporally overlapping – sequence of steps was:

  1. Produce a textual description of the standard
  2. Translate the text to the individual software implementing the Simulation Model
  3. Optimise the software
  4. Translate the software back to text/pseudo C-code.

Reference Software and Conformance Testing

People of the early MPEG days used software – and quite intensely – because that was a tool that cut the time it would take to develop the specification by orders of magnitude while offering the opportunity to obtain a standard with better performance.

Another important development was the notion of conformance testing. Separation of the specification (the law) from determination of conformance (the tribunal) was a major MPEG innovation. The reference software could be used to test an encoder implementation for conformance by feeding the bitstream produced by the implemented encoder to the reference decoder. Especially produced conformance testing bitstreams could be used to test a decoder for conformance.

Conformance testing and its “tool” reference software is an essential add-on to the standard because it gives users the freedom to make their own implementation and enables the creation of ecosystems of interoperable implementations.

Open Source Software (OSS)

OSS is a very large and impactful world for which the software is not the tool to achieve a goal but the goal itself. Those adopting the Gnu’s Not Unix (GNU) General Public License (GPL) grant some basic rights to users of their software they call “Free”. The terms can be (roughly) summarised as

  1. Distribute copies of the software
  2. Receive the software or get it
  3. Change the software or use pieces of it in new programs

in exchange of a commitment of the user to

  1. Give another recipient all the rights acquired
  2. Make sure that recipients are able to receive or get the software
  3. Recipients must be made aware that a software is a modification.

Two additional issues should be borne in mind:

  1. There is no warranty for GNU license software
  2. Any patent required to operate the software must be licensed to everybody.

MPEG-4

Development of the MPEG-4 Visual standard took another important turn, one that marked the convergence of the way telecom, broadcasting and consumer electronics on one side and information technology on the other side developed.

Unaware of the formalisation of the OSS rules that were already taking place in the general IT world, MPEG made the decision to develop the MPEG-4 reference software collaboratively because

  • Better reference software would be obtained
  • The scope of MPEG-4 was so large that probably no company could afford to develop the complete software implementation of the standard
  • A software implementation made available to the industry would accelerate adoption of the standard
  • A standard with two different forms of expression would have improved quality because the removal of an ambiguity from one form of expression would help clarify possible ambiguities in the other.

MPEG-4 Visual had only one person in charge of the Test Model. All new proposals were assessed and, if agreed, converted to Core Experiments. If at least two participants from two different institutions brought similarly convincing improvement results, the proposal would be accepted and added to the TM.

Therefore the MPEG-4 software was no longer just the tool to develop the standard, it became the tool to make products based on the standard, but not necessarily the only one. Therefore a reversal of priorities was required because the standard in textual form was still needed, but many users considered the standard expressed in a programming language as the real reference. This applied not just to those making software implementations, but often to those making more traditional hardware-based products and VLSI designs as well.

Therefore it was decided that the software version of the standard should have the same normative status as the textual part. This decision has been maintained in all subsequent MPEG standards.

Licensing software

While the previous approach where every participant had their own implementation of the TM did not raise the issue of “who owns the software?”, the new approach did. MPEG resolved that with the following rules labelled as “copyright disclaimer”:

  • Whoever makes a proposal that is accepted must provide a software implementation and assign the copyright of the code to ISO
  • ISO grants a license of the copyright of the code for products conforming to the standard
  • Proponents are not required to release patents that are needed to exercise the code and users should not expect that the copyright release includes a patent licence.

More recently, MPEG has started using a modified version of the Berkeley Software Distribution (BSD), a licence originally used to distribute a Unix-like operating system. This licence, originally called “MXM licence” from the name of MPEG-M part 2 standard “MPEG Extensible Middleware (MXM)” simply says that code may be used as prescribed by the BSD licence with the usual disclaimer that patents are not released. This new licence is particularly interesting for software companies that do not want to have liabilities when using software not developed internally.

MPEG and OSS are close, but not quite so

Let me summarise the main elements of MPEG drivers to develop standards and software that implement them:

  1. MPEG develops the best standards satisfying identified requirements. Best standards must use technologies resulting from large R&D investments typically made by for-profit entities.
  2. MPEG uses a competitive and transparent process to acquire (Call for Proposals) and refine (Core Experiment) technologies. Today that process largely uses collaboratively developed software, with methodologies that resemble those of the OSS community
  3. Typically, an MPEG standard is available in two forms: The standard expressed in natural language (possibly with some pseudo C-code inside to improve clarity) and the standard expressed in computer code.
  4. Each form of the standard has the same normative value. If discrepancies are found, the group will decide which is the correct form and amend the one not considered correct.
  5. Reference Software and Conformance Testing are attached to MPEG standards. The former is used to test encoder implementations for conformance and the latter to test decoder implementations for conformance.
  6. The Reference Software of some MPEG standards is of extremely high quality and can be used in products. However, the natural language and computer code forms of the standard have the same status.
  7. MPEG standards are typically Option 2, i.e., essential patents may exist but those patents can be used at FRAND terms.

Who is better?

The question is not meaningful if we do not specify the context in which the standard or software is used.

I claim that only a standard that responds to the 7 drivers of the previous section can

  1. Embed state-of-the-art technology
  2. Be implemented by a variety of independent entities in different industries
  3. Allow for incremental evolutions in response to user requirements
  4. Stimulate the appearance of constantly improving implementations
  5. Enable precise assignment of responsibilities (e.g. for legal purposes).

Conclusions

I am sure that some will think differently on the subject of the previous section and I will certainly be willing to engage in a discussion.

I believe that Open Source Software is a great movement that has brought a lot to humankind. However, I do not think that it is adequate to create an environment that respond to the 7 drivers above.

Posts in this thread

The MPEG Metamorphoses

Introduction

In past publications, I have often talked about how many times MPEG has changed its skin during its 3-decade long life. In this article I would like to add substance to this claim by giving a rather complete, albeit succinct, account. You can find a more detailed story at Riding the Media Bits.

The early years

MPEG-1

MPEG started with the idea of creating a video coding standard for interactive video on compact disc (CD). The idea of opening another route to video coding standards had become an obsession to me because I had been working for many yeas in video coding research without seeing ant trace of consumer-level devices for what was touted as the killing application at that time: video telephony. I thought that if the manufacturing prowess of the Consumer Electronics (CE) industry could be exploited, that industry could supply telco customers with those devices so that telcos would be pushed into upgrading their networks to digital in order to withstand the expected high videophone traffic.

The net bitstream from CD – 1.4 Mbit/s – is close to the 1.544 Mbit/s of the primary digital multiplex in USA and Japan. Therefore it was natural to set a target bitrate of 1.5 Mbit/s as a token of the CE and telco convergence (at video terminal level).

At MPEG1 (1988/05) 29 experts attended. The work plan was agreed to be MPEG-1 at 1-1.5 Mbit/s, MPEG-2 and 1.5-10 Mbit/s and MPEG-3 at 10-60 Mbit/s (the numbering of standards came later).

For six months all activities happened in single sessions. However, 3 areas were singled out for specific activities: quality assessment (Test), complexity issues in implementing video codecs in silicon (VLSI) and characteristics of digital storage media (DSM) . The last activity was needed because CD was a type of medium quite dissimilar from telecom networks and broadcast channels, for which video coding experts did not have much familiarity.

In the following months I dedicated my efforts to quell another obsession of mine: humans do not generally value video without audio. The experience of the ISDN videophone where, because of organisational reasons, video was compressed by 3 orders of magnitude in 64 kbit/s and audio was kept uncompressed in another 64 kbit/s stream, pushed me into creating an MPEG subgroup dedicated to Audio coding. Audio, however, was not the speech used in videotelephony (for which there were plenty of experts in ITU-T), but the audio (music) typically recorded on CDs. Therefore an action was required lest MPEG end up like videoconference, with a state-of-the-art video compression standard but no audio (music) or with a quality non satisfactory for the target “entertainment-level” service.

The Audio subgroup was established at MPEG4 (1988/10) under the chairmanship of Hans Mussmann, just 7 months after MPEG1, while the Video subgroup was established at MPEG7 (1989/07), under the chairmanship of Didier Le Gall, about a year after MPEG1.

The other concern of mine was that integrating the audio component in a system that had not been designed for that could lead to some technical oversights that could be only belatedly corrected with some abominable hacks. Hence the idea of a “Systems” activity, initially similar to the H.221 function of the ISDN videophone (a traditional frame and multiframe-based multiplexer), but with a better performance because I expected it to be more technically forward looking.

At MPEG8 (1989/11) all informal activities were formalised into subgroups: Test (Tsuneyoshi Hidaka), DSM (Takuyo Kogure), Systems (Al Simon) and VLSI (Colin Smith).

MPEG-2

Discussions on what would eventually become the MPEG-2 standard started at MPEG11 (1990/07). The scope of the still ongoing MPEG-1 project was nothing, compared to the ambitions of the MPEG-2 project. The goal of MPEG-2 was to provide a standard that would enable the cable, terrestrial TV, satellite television, telcos and the package media industries – worth in total hundreds of billion USD, to go digital in compressed form.

Therefore, at MPEG12 (1990/09) the Requirements group was established under the chairmanship of Sakae Okubo, the rapporteur of the ITU-T Specialists Group on Coding for Visual Telephony. This signalled the fact that MPEG-2 Video (and Systems) were joint projects. The mandate of the Requirements Group was to distil the requirements coming from the different industries into one coordinated set of requirements.

The Audio and Video subgroup had their minds split in two with one half engaged in finishing their MPEG-1 standards and the other half in initiating the work on the next MPEG-2 standard. This was just the first time MPEG subgroups had to split their minds.

In those early years subgroup chairs changed rather frequently. At MPEG9 (1990/02) Colin (VLSI) was replaced by Geoff. Morrison and the name of the group was changed to Implementation study Group (ISG) to signal the fact that not only hardware implementation was considered, but software implementation as well. At MPEG12 (1990/03) Al (Systems) was replaced by Sandy MacInnis and Hans (Audio) was replaced by Peter Noll.

MPEG29 (1994/11) approved the Systems, Video and Audio parts of the MPEG-2 standard and some of the subgroup chairs saw their mission as an accomplished one. The first move was at MPEG28 (1994/07) when Sandy (Systems) was replaced by Jan van der Meer to finalise the issues left over from MPEG-2.

The MPEG subgroups did a great job in finishing several pending MPEG-2 activities such as MPEG-2 Video Multiview and 4:2:2 profiles, MPEG-2 AAC, DSM-CC and more.

A new skin of coding

In the early years 1990s, MPEG-1 was not finished and MPEG-2 had barely started but talks about a new video coding standard for very low bitrate (e.g. 10 kbit/s) were already under way. The name eventually assigned to the project was MPEG-4, because the MPEG-3 standard envisaged at MPEG1 had been merged with MPEG-2 by bringing the upper bound of the bitrate range to 10 Mbit/s .

MPEG-4, whose title eventually settled to Coding of Audio-Visual Objects, was a completely different standard from the preceding two in that it aimed at integrating the world of audio and video, so far under the purview of broadcasting, CE and telecommunication, with the world of 3D Graphics, definitely within the purview of the Information Technology (IT) industry.

At MPEG20 (1992/11) a new subgroup called Applications and Operational Environments (AOE) was established under the chairmanship of Cliff Reader. This group took charge of developing the requirements for the new MPEG-4 project and spawned three groups inside it: “MPEG-4 requirements”, “Synthetic and Natural Hybrid Coding (SNHC) and “MPEG-4 Systems”.

The transition from the “old MPEG” (MPEG-1 and MPEG-2) and the “new MPEG” (MPEG-4) was quite laborious with many organisational and personnel changes. At MPEG30 Didier (Video) was replaced by Thomas Sikora and Peter (Audio) was replaced by Peter Schreiner At MPEG32 Geoff (ISG) was replaced by Paul Fellows and Tsuneyoshi (Test) was replaced by Laura Contin.

MPEG-4 Visual was successfully concluded thanks to the great efforts of Thomas (Video) and Laura (Test) and the very wide participation by experts. The foundations of the extremely successful AAC standards were laid down by Peter (Audio) and the Audio subgroup experts.

At MPEG34 (1996/03) C. Reader left MPEG and at MPEG35 (1996/07) a major reorganisation took place:

  1. The “AOE requirements” activity was mapped to the Requirements subgroup under the chairmanship of Rob Koenen, after a hiatus of 3 meeting after Sakae (Requirements) had left.
  2. The “AOE systems” activity was mapped to the Systems subgroup under the chairmanship of Olivier Avaro.
  3. The “AOE SNHC” activity became a new SNHC subgroup under the chairmanship of Peter Doenges. Peter was replaced by Euee Jang at MPEG49 (1999/10).

At MPEG 40 (1997/07) a DSM activity became a new subgroup with the name Delivery Multimedia Integration Framework (DMIF) under the chairmanship of Vahe Balabanian. DMIF addressed the problem of virtualising the distribution medium (broadcast, network and storage) from the Systems level by defining appropriate interfaces (API). At MPEG 47 (1999/03) Guido Franceschini took over with a 2 meeting tenure after which the DMIF subgroups was closed (1999/07).

At MPEG41 Peter (Audio) was replaced by Schuyler Quackenbush who since then has been running the Audio group for 23 years and is the longest-serving MPEG chair.

At MPEG46 (1998/12) Paul (ISG) was replaced by Marco Mattavelli. Under Marco’s tenure, such standards as MPEG-4 Reference hardware description, an extension to VHDL of the notion of Reference Software, and Reconfigurable Media Coding were developed.

The MPEG-4 standard is unique in MPEG history. MPEG-1 and -2 were great standards because they brought together establish large industries with completely different agendas, but MPEG-4 is the standard that bonded together the initial MPEG industries with the IT industry. The standard had big challenges and Chairs and experts dedicated enormous resources to the project to face them: video objects, audio objects, synthetic audio and video, VRML extensions, file format and more. MPEG-4 is a lively standard even today almost 30 years after we first started working on it and has the largest number of parts.

Liaisons

At MPEG33 (1996/01) the Liaison subgroup was created under the chairmanship of Barry Haskell to handle the growing network of organisations MPEG was liaising with (~50). At MPEG56 Barry, a veteran of the video coding old guard, left MPEG and at MPEG57 (2001/07) Jan Bormans took over and continued until MPEG71 (2005/01) when Kate Grant took over. The Liaison subgroup was closed at MPEG84 (2008/04). Today liaisons are coordinated at Chairs meeting, drafted by the relevant subgroup and reviewed by the plenary.

An early skin change

In 1996 MPEG started addressing MPEG-7, a media-related standard but with a completely different nature than the preceding three: it was about media description and their efficient compression. At MPEG48 (1999/07) it became clear that we needed a new subgroup that was called Multimedia Description Schemes (MDS) to carry out part of the work.

Philippe Salembier was put in charge of the MDS subgroup who was initially in charge of all MPEG-7 matters that did not involve Systems, Video and Audio. At MPEG 56 (2001/03) John Smith took over the position which he held until MPEG70 (2004/10) when Ian Burnett took over until the MDS group was closed at MPEG87 (2009/02).

The media description skin has had several revivals since then. One is Part 13 – Compact Descriptors for Visual Search (CDVS) standard in the first half of the 2010. Another is Part 15 – Compact Descriptors for Video Analysis (CDVA) standard developed in the middle-to-second half of the 2010. Finally Part 17 – Compression of neural networks for multimedia content description and analysis is preparing a basic compression technology for neural network-based media description.

Another video coding

At MPEG46 (1998/12) Laura (Test) was replaced by Vittorio Baroncini. At MPEG54 (2000/10) Thomas (Video) left MPEG and at MPEG56 (2001/03) Jens-Rainer Ohm was appointed as Video chair.

Vittorio brought the expertise to carry out the subjective tests required by he collaboration with ITU-T SG 16 restarted to develop the Advanced Video Coding (AVC) standard. At MPEG58 (2001/12) Jens was appointed as co-chair of a joint subgroup with ITU-T called Joint Video Team (JVT). The other co-chair was Gary Sullivan, rapporteur of the ITU-T SG 16 Video Coding Experts Group (VCEG). The JVT continued its work until well after the AVC standard was released at MPEG 64 (2003/03). Since then Gary has attended the chairs meetings as a token of the collaboration between the two groups.

Still media-related, but a different “coding”

At MPEG49 (1999/10) the many inputs received from the market prompted me to propose that MPEG develop a new standard with the following vision: “Every human is potentially an element of a network involving billions of content providers, value adders, packagers, service providers, resellers, consumers …”.

The standard was eventually called MPEG-21 Multimedia Framework. MPEG-21 can be described as the “suite of standards that enable media ecommerce”.

The MDS subgroup was largely in charge of this project which continued during the first decade of the 2000s with occasional revivals afterwards. Today MPEG-21 standards are handled by the Systems subgroup.

Under the same heading of “different coding” it is important to mention Open Font Format (OFF), a standard built on the request made by Adobe, Apple and Microsoft to maintain the OpenType specification. The word maintenance” in MPEG has a different meaning because OFF has had many extensions, developed “outside” MPEG in an open ad hoc group with strong industry participation and ratified by MPEG.

A standard of standards

In the early year 2000s MPEG could look back at its first decade and a half of operation with satisfaction: its standards covered video, audio and 3D Graphics coding, systems aspects, transport (MPEG-2 TS and MPEG-4 File Format) and more. While refinements on its already impressive assets were under way, MPEG wondered whether there were other areas it could cover. The answer was: the coding of “combinations of MPEG coded media”. That was the beginning  of a long series of 20 standards originally developed by the groups in charge of the individual media, e.g. Part 2 – MPEG music player application format was developed by the Audio subgroup and Part 3 – MPEG photo player application format was developed by the Video subgroup. Today all MPEG-A standard, e.g. the very successful Part 19 – Common Media Application Format, are developed by the Systems subgroup.

The mid 2000s

Around the mid 2000s MPEG felt that there was still a need for more Systems, Video and Audio standards, but did not have the usual Systems, Video and Audio “triad” umbrella it had had until then with MPEG-1, -2, -4 and -7. So it decided to create containers for those standards and called them MPEG-B (Systems), MPEG-C (Video) and MPEG-D (Audio).

MPEG also ventured in new areas:

  1. Specification of a media device software stack (MPEG-E)
  2. Communication with and between virtual worlds (MPEG-V)
  3. Multimedia service platform technologies (MPEG-M)
  4. Rich media user interfaces (MPEG-U)

Rob (Requirements) continued until MPEG58 (2001/12). He was replaced by Fernando Pereira until MPEG64 (2003/04) when Rob returned, holding his position until MPEG71 (2005/01) when Fernando took over again until MPEG82 (2007/10) when he left MPEG.

The Requirements subgroup is the “control board” of MPEG in the sense that Requirements gives proposals of standards the shape that will be implemented by the operational group after the Call for Proposals. Therefore the duo Rob-Fernando have been in the control room of MPEG for some 40% of MPEG life.

Vittorio (Test) continued until MPEG68 (2004/03) when he was replaced by T. Oelbaum who held the positions until MPEG81 (2007/07).

Olivier (Systems) kept his position until MPEG86 (2008/07) when he left MPEG to pursue his entrepreneurial ambitions. Olivier has been in charge of the infrastructure that keeps MPEG standards together for 13 years and is the third longest-serving MPEG chair.

Euee (SNHC) kept his position until MPEG59 (2002/03). He was replaced by M. Bourges-Sévenier who continued until MPEG70 (2004/10). Mikaël was then replaced by Mahnjin Han who continued until MPEG78 (2006/10). The SNHC subgroup has been producing valuable standards. However, they have had a hard time penetrating an industry that is content with less performing but freely-available standards.

The return of the triad

The end of the years 2000s signaled a major change in MPEG. When Fernando (Requirements) left MPEG at MPEG82 (2007/10), the task of developing requirements was first assigned to the individual groups. The experiment lasted 4 meetings but it demonstrated that it was not the right solution. Therefore, Jörn Ostermann was appointed as Requirements chair at MPEG87 (2009/02). That was just in time for the handling of the requirements of the new Audio-Video-Systems triad-based MPEG-H standard.

MPEG-H included the MPEG Media Transport (MMT) part, the video coding standard that eventually became High Efficiency Video Coding (HEVC) and 3D Audio. MPEG-H was adopted by thw ATSC as a tool to implement new forms of broadcasting services where traditional broadcasting and internet not only coexist but cooperate.

The Requirements, and then the Systems subgroups were also quickly overloaded by the other project called DASH aiming at “taming” the internet from an unreliable transport to one the end user device could adapt to.

The two Systems projects – MMT and DASH – were managed by Youngkwon Lim who took over from Olivier at MPEG86 (2008/10).

At MPEG87 (2009/01) the MDS subgroup was closed. At the same meeting, Vittorio resumed his role as chair of the Test subgroup, about on time for the new round of subjective tests for the HEVC Call for Evidence and Call for Proposals.

The Joint Collaborative Team on Video Coding between ITU-T and MPEG (JCT-VC) was established at MPEG92 (200/04) co-chaired by Gary and Jens as in the AVC project. At its peak, the VC group was very large and processed in excess of 1,000 documents per meeting. When the group was still busy developing the main (2D video coding) part of HEVC, 3D video coding became important and a new subgroup called JCT-3V (joint with ITU-T) was established at MPEG100. The 3V subgroup closed its activities at MPEG115 (2016/05), while the VC subgroup is still active, mostly in maintenance mode.

The recent years

In the first half of the years 2010 MPEG developed the Augmented Reality Application Format and developed the Mixed and Augmented Reality (MAR) Reference Model in a joint ad hoc group with SC 24/WG 9.

In 2016 MPEG kicked off the work on MPEG-I – Coded representation of immersive media. Part 3 of this is Versatile Video Coding (VVC), the latest video coding standard developed by the new Joint Video Experts Team (JVET) between ITU-T and MPEG established at MPEG114 (2016/02). It is expected to become FDIS at MPEG131 (2020/06).

The JVET co-chairs are again Jens and Gary. In the, regularly materialised, anticipation that JVET would be again overloaded by contributions, Jens was replaced as Video chair by Lu Yu at MPEG 121 (2018/01).

The Video subgroup is currently engaged in two 2D video coding standards of rather different nature Essential Video Coding (EVC and Low Complexity Enhancement Video Coding (LCEVC) and is working on the MPEG Immersive Video (MIV) project due to become FDIS at MPEG134 (2021/03).

MIV is connected with another exciting area that in this article we left with the name of SNHC under the chairmanship of Mahnjin. At MPEG79 (2007/01) Marius Preda took over SNHC from Mahnjin to continue the traditional SNHC activities. At MPEG89 (2009/06) SNHC was renamed 3D Graphics (3DG).

In the mid 2010 the 3DG subgroup started several explorations, in particular Point Cloud Compression (PCC) and Internet of Media Things (IoMT). The former has split into two standards Video-based (V-PCC) and Graphics-based (G-PCC). The latter has reached FDIS recently.

nother promising activity started at MPEG109 (2014/03) and has now become the Genomic Information Representation (MPEG-G) standard. This standard signals the intention to bring the benefits of compression to industries other than media who process other data types.

Conclusions

This article was a long overview of 32 years of MPEG life. The intention was not to talk about MPEG standards, but about how the MPEG organisation morphed to suit the needs of standardisation.

Of course, structure without people is nothing. It was not obviously possible to mention the thousands of experts who made MPEG standards, but I thought that it was my duty to record the names of subgroup chairs who drove their development. You can see a complete table of all meetings and MPEG Chairs here.

In recent years the MPEG structure has remained stable, but there is always room for improvements. However, this must be driven by needs, noth by ideology.

One possible improvement is to make the Genomic data coding activity a formal subgroup as a first step in anticipation of more standards to code other non-media data. The other is to inject more market awareness into the phase that defines the existence first and then the characteristics of MPEG standards.

But this is definitely another story.

Posts in this thread

 

 

National interests, international standards and MPEG

Having spent a considerable amount of my time in standardisation, I have developed my own definition of standard: “the documented agreement reached by a group of individuals who recognise the advantage of all doing certain things in an agreed way”. Indeed, I believe that, if we exclude some areas such as safety, in matters of standards the authority principle should not hold. Forcing free people to do things against their interest, is an impossible endeavour. If doing certain things in a certain way is not convenient, people will shun a standard even if it bears the most august credentials.

Medieval Europe was a place where my definition of standard reached an atomic level. However, with the birth of national centralised states and, later, the industrial revolution, national standards came to the fore. Oddly enough, national standards institutions such as the British Standards Institute (BSI), originally called Engineering Standards Committee and probably the first of its kind, were established just before World War I, when the first instance of modern globalisation took shape.

Over the years, national standards became a powerful instrument to further a country’s industrial and commercial interests. As late as 1989 MPEG had trouble displaying 625/50 video coding simulation results at a USA venue because import of 625/50 TV sets in the country was forbidden at that time (and no one had an interest in making such sets). This “protection of national interests” is the cause of the 33 pages of the ITU-R Report 624 – Characteristics of television systems of 1990 available here containing tables and descriptions of the different analogue television systems used at the time by the 193 countries of the United Nations.

The same spirit of “protecting national interests” informed the CCITT SGXV WG4 Specialists Group on Coding for Visual Telephony (that everybody at that time called the Okubo group) when it defined  the Common Intermediate Format (CIF) in Recommendation H.261 to make it possible for a 525/60 camera to communicate to a a 625/50 monitor (and between a 625/50 camera and a 525/60 monitor).

That solution was a “compromise” video format (actually not a real video format because it was used only inside the video codec) with one quarter of the 625/50 spatial resolution and one half the 525/60 temporal resolution. This was a typical political solution of the time (and one that 525/60 people later regretted because the spatial interpolation required by CIF was more onerous than the temporal interpolation in 625/50). Everybody (but me, who opposed the solution) felt happy because everybody had to “share the burden” when communicating across regions with different video formats.

International Standardisation is split in 3 – IEC, ISO and ITU – but IEC and ISO share the principle that standards for a technical area are developed by a Technical Committee (or a Subcommittee) managed by an international Secretariat funded and manned by a national standards organisation (so called National Body). Things in ITU are slightly different because ITU itself provides the secretariat whose personnel is provided by national administrations.

In the traditional context of standards being established by a national standards committee to protect the national interest, an international standards committee was seen as the place where national interests, as represented by their national standards bodies, had to be protected. Therefore, holding the secretariat of a committee was seen as a major achievement for the country that ran the secretariat. As an emblem of the achievement, the country had the right to nominate (in practice, appoint) the chairperson of the committee (in some committees this is rigorously enforced. In some others, things are taken more lightly).

That was then, but actually it is still so even now in many standardisation contexts. The case of CIF mentioned above shows that, in the area of video coding standards, then the prerogative of the ITU-T “for Visual Telephony”, the influence of national interests was still strong. MPEG, however, changed the sides of the equation. One of the first things that it did when it developed MPEG-1 Video was to define test sequences that were both 525/60 and 625/50 and then issued a Call for Proposals where respondents could submit coded sequences in one or the other format at their choice. MPEG did not use CIF but SIF, where the format was either a quarter of the spatial resolution and one half of the temporal resolution of 525/60 (i.e. 288 lines x 352 pixels) or a quarter of the spatial resolution and one half of the temporal resolution of 625/50 (i.e. 240 lines x 352 pixels).

By systematically defusing political issues and converting them to technical issues, MPEG succeeded in the impossible task of defining compressed media formats with an international scope. However, by kicking political issues out of the meeting rooms, MPEG changed the nature and role of the parent subcommittee SC 29’s chairmen and secretariat. The first yearly SC 29 plenary meetings lasted 3 days, but later the duration was reduced to 1 day and in some cases inhalf a day alla matters were handled.

One of the most contentious areas of standardisation (remember the epic battles on the HDTV production format of 1986 and before) was completely tamed and reduced to technical battles where experts assess the quality of the solution proposed and not how it is dressed in political clothing. This does not mean that the battles are not epic, but for sure they are rational.

I do not remember having heard complaints on the part of the industry regarding the de-politicised state of affairs in media coding standardisation. Therefore it is time to ask if we should not be dispensed from the pompous ritual of countries expressing national interests through national bodies in secretariats and chairs of international standards committees when in fact there are global industrial interests poorly mapped through a network of countries actually deprived of national interests.

Posts in this thread

 

 

Media, linked media and applications

Introduction

In a technology space moving at an accelerated pace like the one MPEG has the task to develop standards for, it is difficult to have a clear plan for the future (MPEG has a 5-year plan, though).

Still, when MPEG was developing the Multimedia Linking Application Format (MLAF), it “discovered” that it had developed or was developing several standards – MPEG-7, Compact descriptors for visual search (CDVS), Compact descriptors for video analysis (CDVA) and Media Orchestration.

The collection of these standards (and of others in the early phases of conception or development, e.g. Neural Network Compression and Video Coding for Machines) that help create the Multimedia Linking Environment, i.e. an environment where it is possible to create a link between a given spatio-temporal region of a media object and spatio-temporal regions in other media objects.

This article explains the benfits brought by the MLAF “multimedia linking” standard also for very concrete applications.

Multimedia Linking Environment

Until a quarter of century ago, virtually the only device that could establish relationships between different media items was the brain. A very poor substitute was a note on a book to record a possible relationship of the place in the book where the note was written with content in the same or different books.

The possibility to link a place in a web page to another place in another web page, or to a media object, was the great innovation brought by the web. However, a quarter of century after a billion web sites and quadrillions of linked web pages, we must recognise that the notion of linking is pervasive one and not necessarily connected with the web.

MPEG has dedicated significant resources to the problem described by the sentence “I have a media object and I want to know which other related media objects exist in a multimedia data base” and represented in the MPEG-7 model depicted the figure below.

However, MPEG-7 is an instance of the more general problem of linking a given spatio-temporal region of a media object to spatio-temporal regions in other media objects.

These are some examples:

  1. A synthetic object is created out of a number of pictures of an object. There is a relationship between the pictures and the synthetic object;
  2. There is a virtual replica of a physical place. There is a relationship between the physical place and the virtual replica;
  3. A User is experiencing as virtual place in a virtual reality application. There is a relationship between the two virtual places;
  4. A user creates a media object by mashing up a set of media items coming from different sources. There is a relationship between the media items and the mashed up media object.

MPEG has produced MPEG-A part 16 (Media Linking Application Format – MLAF) specifies a data format called bridget that can be used to link any kinds of media. MPEG has also developed a number of standards that play an auxiliary role in the “ media linking” context outlined by the examples above.

  1. MPEG-7 parts 1 (Systems), 3 (Visual), 4 (Audio) and 5 (Multimedia) provide the systems elements, and the visual (image and video), audio and multimedia descriptions.
  2. MPEG-7 parts 13 (Compact descriptors for visual search) and 15 (Compact descriptors for video analysis) provide new generation image and video descriptors
  3. MPEG-B part 13 (Media Orchestration) provides the means to mash up media items and other data to create personal user experiences.

The MLAF standard

A bridget is a link between a “source” content and a “destination” content. It contains information on

  1. The source and the destination content
  2. The link between the two
  3. The information in the bridget is presented to the users who consume the source content.

The last information is the most relevant to the users because it is the one that enables them to decide whether the destination content is of interest to them.

The structure of the MLAF representation (points 1 and 2) is based on the MPEG-21 Digital Item Container implemented as a specialised MPEG-21 Annotation. The spatio-temporal scope is represented by the expressive power of two MPEG-7 tools and the general descriptive capability of the MPEG-21 Digital Item. They allow a bridget author to specify a wide range of possible associations and to be as precise and granular as needed.

The native format to present bridget information is based on MPEG-4 Scene description and application engine. Nevertheless, a bridget can be directly linked to any external presentation resource (e.g., an HTML page, an SVG graphics or others).

Bridgets for companion screen content

An interesting application of the MLAF standard is described in the figure below describing the entire bridget workflow.

    1. A TV program, scheduled to be broadcast at a future time, is uploaded to the broadcast server [1] and to the bridget Authoring Tool (BAT) [2].
    2. BAT computes and stores the program’s audio fingerprints to the Audio Fingerprint Server (AFS) [3].
    3. The bridget editor uses BAT to create bridgets [4].
    4. When the editor is done all bridgets of the program and the referenced media objects are uploaded to the Publishing Server [5].
    5. At the scheduled time, the TV program is broadcast [6].
    6. The end user’s app computes the audio fingerprint and sends it to the Audio Fingerprint Server [7].
    7. AFS sends to the user’s app ID and time of the program the user is watching [8].
    8. When the app alerts the user that a bridget is available, the viewer may decide to
      1. Turn his eyes away from the TV set to her handset
      2. Play the content in the bridget [9]
      3. Share the bridget to a social network [10].

    This is the workflow of a recorded TV program. A similar scenario can be implemented for live programs. In this case bridgets must be prepared in advance so that the publisher can select and broadcast a specific bridget when needed.

    Standards are powerful tools that facilitate the introduction of new services, such as companion screen content. In this example, the bridget standard can stimulate the creation of independent authoring tools and end-user applications.

    Creating bridgets

    The bridget creation workflow depends on the types of media object the bridget represents.

    Let’s assume that the bridget contains different media types such as an image, a textual description, an independently selectable sound track (e.g. an ad) and a video. Let’s also assume that the layout of the bridget has been produced beforehand.

    This is the sequence of steps performed by the bridget editor:

    1. Select a time segment on the TV program timeline and a suitable layout
    2. Enter the appropriate text
    3. Provide a reference image (possibly taken from the video itself)
    4. Find a suitable image by using an automatic images search tool (e.g. based on the CDVS standard)
    5. Provide a reference video clip (possibly taken from the video itself)
    6. Find a suitable video clip, possibly taken from the video itself, by using an automatic video search tool (e.g. based on the CDVA standard)
    7. Add an audio file.

    The resulting bridget will appears to the end user like this.

When all bridgets are created, the editor saves the bridgets and the media to the publishing server.

It is clear that the “success” of a bridget (in terms of number of users who open it) depends to a large extent on how the bridget is presented.

Why bridgets

Bridget was the title of a research project funded by the 7th Framework Research Program of the European Commission. The MLAF standard (ISO/IEC 23000-16) was developped at the instigation and with participation of members of the Bridget project.

At this page you will find more information on how the TVBridge application can be used to create, publish and consume bridgets for recorded and live TV programs.

Posts in this thread

 

 

Standards and quality

Introduction

Quality pervades our life: we talk of quality of life and we choose things on the basis of declared or perceived quality.

A standard is a product, and as such may also be judged, although not exclusively, in terms of its quality. MPEG standards are no exception and the quality of MPEG standards has been a feature has considered of paramount importance since its early days.

Cosmesis is related to quality, but is a different beast. You can apply cosmesis at the end of a process, but that will not give quality to a product issued from that process. Quality must be an integral part of the process or not at all.

In this article I will describe how MPEG has embedded quality in all phases of its standard development process and how it has measured quality in some illustrative cases.

Quality in the MPEG process

The business of MPEG is to produce standards that process information in such a way that users do not notice, or notice in as a reduced ways as possible, the effect of that standard processing when implemented in a product or service.

When MPEG considers the development of a new standard, it defines the objective of the standard (say, compression of video of a particular range of resolutions), range of bitrates and functionality. Typically, MPEG makes sure that it can deliver the standard with the agreed functionality by issuing a Call for Evidence (CfE). Industry members are requested to provide evidence that their technology is capable to achieve part of all the identified requirements.

Quality is now an important, if not essential, parameter for making a go-no go decision. When MPEG assesses the CfE submissions, it may happen that established quality assessment procedures are found inadequate. That was the case of the call for evidence on High-Performance Video Coding (HVC) of 2009. The high number of submissions received required the design of a new test procedure: the Expert Viewing Protocol (EVP). Later on the EVP test method became ITU recommendation ITU-R BT-2095. While the execution of any other ITU recommendation of that time would require more than three weeks, the EVP allowed the complete testing of all the submissions in three days.

If MPEG has become confident of the feasibility of the new standard from the results of the CfE, a Call for Proposals (CfP) is issued with attached requirements. These can be considered as the terms of the contract that MPEG stipulates with its client industries.

Testing of CfP submissions allows MPEG to develop a Test Model and initiate Core Experiments (CE). These aim to achieve optimisation of a part of the entire scheme.

In most cases the result of CEs involves quality evaluation. In the case of CfP responses subjective testing is necessary because there are typically large differences between the different coding technologies proposed. However, in the assessment of CE results where smaller effects are involved, , objective metrics are typically, but not exclusively, used because formal subjective testing is not feasible for logistic or cost reasons.

When the development of the standard is completed MPEG engages in the process called Verification Tests which will produce a publicly available report. This can be considered as the proof on the part of the supplier (MPEG) that the terms of the contract with its customer have been satisfied.

Samples of MPEG quality assessment

MPEG-1 Video CfP

The first MPEG CfP quality tests were carried out at the JVC Research Center in Kurihama (JP) in November 1989. 15 proposals of video coding algorithms operating at a maximum bitrate of 1.5 Mbit/s were tested and used to create the first Test Model at the following Eindhoven meeting in February 1990 (see the Press Release).

MPEG-2 Advanced Audio Coding (AAC)

In February 1998 the Verification Test allowed MPEG to conclude that “when auditioning using loudspeakers, AAC coding according to the ISO/IEC 13818-7 standard gives a level of stereo performance superior to that given by MPEG-1 Layer II and Layer III coders” (see the Verification Test Report). This showed that the goal of high audio quality at 64 kbps per channel for MPEG-2 AAC had been achieved.

Of course that was “just” MPEG-2 AAC with no substantial encoder optimisation. More that 20 years of MPEG-4 AAC progress has brought down the bitrate per channel.

MPEG-4 Advanced Video Coding (AVC) 3D Video Coding CfP

The CfP for new 3D (stereo & auto-stereo) technologies was issued in 2012 and received a total of 24 complete submissions. Each submission produced 24 files representing the different viewing angle for each test case. Two sets of two and three viewing angles were blindly selected and used to synthesise the stereo and auto-stereo test files.

The test was carried out on standard 3D displays with glasses and auto-stereoscopic displays. A total of 13 test laboratories took part in the test running a total of 224 test sessions, hiring around 5000 non-expert viewers. Each test case was run by two laboratories making it a full redundant test.

MPEG-High Efficiency Video Coding (HEVC) CfP

The HEVC CfP covered 5 different classes of content covering resolutions from WQVGA (416×240) up to 2560×1600. For the first time MPEG introduced two set of constrains (low delay and random access) for different classes of target applications.

The HEVC CfP was a milestone because it requested the biggest ever testing effort performed by any laboratory or group of laboratories until then. The CfP generated a total of 29 submissions and 4205 coded video files plus the set of anchor coded files. Three testing laboratories took part in the tests that lasted four months and involved around 1000 naïve (non-expert) subjects allocated to a total of 134 test sessions.

A common test set of about 10% of the total testing effort was included to monitor the consistency of results from the different laboratories. With this procedure it was possible to detect a set of low quality test results from one laboratory.

Point Cloud Compression (PCC) CfP

The CfP was issued to assess how a proposed PCC technology could provide some 2D representations of the content synthesised using PCC techniques, resulting in some video suitable for evaluation by means of established subjective assessment protocols.

Some video clips for each of the received submissions were produced after an accurate selection of the rendering conditions. The video clips were generated using a rendering video tools. This was used to generate, under the same conditions, two different video clips for each of the received submissions: a rotating view of a fixed synthesised image and a rotating view of moving synthesised video clips. The rotations were selected in a blind way and the resulting video clips were subjectively assessed to rank the submissions.

Conclusions

Quality is what end users of media standards value as the most important feature. To respond to this requirements, MPEG has designed a standards development process that is permeated by quality considerations.

MPEG has no resources of its own. Therefore, sometimes it has to rely on the voluntary participation of many competent laboratories to carry out subjective tests.

The domain of media is very dynamic and, very often, MPEG cannot rely on established method – both subjective and objective – to assess the quality of compressed new media types. Therefore, MPEG is constantly innovating the methodologies it used to assess media quality.

Posts in this thread