Quality, more quality and more more quality

Quality measurement is an essential ingredient of the MPEG business model that targets the development of the best performing standards that satisfy given requirements.

MPEG was not certainly the first to discover the importance of media quality assessment. Decades ago, when still called Comité Consultatif International des Radiocommunications (CCIR), ITU-R developed Recommendation 500  – “Methodologies for the subjective assessment of the quality of television images”. This recommendation guided the work of television labs for decades. It was not possible, however, to satisfy all MPEG needs with BT.500, the modern name of CCIR Recommendation 500, for three main reasons: MPEG needed methods to assess the impact of coding on video quality, MPEG dealt with a much wider range of moving pictures than television and MPEG ended up dealing with more than just 2D rectangular moving pictures.

Video quality assessment in MPEG began in November 1989 at the research laboratories of JVC in Kuriyama when all aspects of the responses to the MPEG-1 Call for Proposals (CfP), including quality, were considered. Two years later MPEG met again in Kurihama to consider the responses to the MPEG-2 CfP. At that time the assessment of video quality was done using the so-called Double-stimulus impairment scale (DSIS) using a 5-grade impairent scale. In both tests massive use of digital D1 tapes was made to deliver undistorted digital video to the test facility. The Test subgroup led by the chair Tsuneyoshi Hidaka managed all the logistics of D1 tapes coming from the 4 corners of the worls.

The MPEG Test chair could convince the JVC management to offer free use of the testing facilities for MPEG-1. However, he could not achieve the same for MPEG-2. Therefore MPEG-2 respondents were asked to pay for the tests. Since then participation in most if not all subjective tests campaigns has been subject to the payment of a fee to cover the use of facilities and/or the human subjects who were requested to view the video sequences under test. The MPEG-1 and MPEG-2 tests were carried out in the wake of Recommendation BT.500.

The MPEG-4 tests, carried out in 1995, fundamentally changed the scope because the CfP addressed Multimedia contents, i.e.  progressively scanned moving images typically at lower resolution than TV which was supposed to be transmitted over noisy channels (videophone over fixed subscriber line or the nascent mobile networks). The statistical processing of subjective data applied to the MPEG-4 CfP was innovated by the use of ANOVA (analysis of variance), because until then tests only used simple mean value and Grand Mean, i.e. the mean value computed considering the scores assigned to several video sequences.

The use of Statistically Significant Difference (SSD) allowed a precise ranking of the technologies under test. Traditional test methods (DSIS and SS) were used together with the new Single Stimulus Continuous Quality Evaluation (SSCQE) test method to evaluate “long” video sequences of 3 minutes measure how well a video compression technology could recover from transmission errors. The tests were carried out using the D1 digital professional video recorder and Professional Studio Quality “grade 1” CRT displays.

The Digital Cinema test, carried out in 2001 at the Entertainment Technology Centre (ETC) of the University of Southern California, was designed to evaluate cinematic content in a real theatrical environment, i.e. on a 20 m base perforated screen, projected by a cinema projector fed with digital content. The subjective evaluations were done with three new test methods: The Expert Viewing Test (EVT), a two steps procedure, where the results of a DSIS test were refined by means of careful observation by a selected number of “golden eye” observations, the Double Stimulus Perceived Difference Scale (DSPDS), a double stimulus impairment detection test method using a 5 grades impairment scale and the Double Stimulus Split-Screen Perceived Difference Scale (S3PDS), a test method based on a split screen approach where both halves of the screen were observed in sequence.

The test for the Call for New Tools to Further Improve Coding Efficiency were done using traditional test methods and the same methodology and devices of the MPEG 4 Call for Proposal. The test demonstrated the existence of a new technology in video compression and allowed the collaboration between ISO and ITU-T in the area of digital video coding to resume. This was the first test to use the 11-grade impairment scale, that became a reference for DSIS and the SS test experiments, and provided a major improvement in result accuracy.

A new test method – the VSMV-M Procedure – was designed in 2004 to assess the submission received for the Core Experiment for the Scalable Video Coding. The Procedure was made of two phases: a “controlled assessment” phase and a “deep analysis” phase. The first phase was made according to the DSIS and SS test methods and a second phase, designed by MPEG, where a panel of experts confirmed the ranking obtained running the evaluation done with formal subjective assessment. These test were the first to be entirely based on digital video servers and DLP projector. Therefore, 15 years after they were first used in the MPEG-1 tests, D1 tapes were finally put to rest.

The SVC Verification Tests carried out in 2007, represented another important step in the evolution of the MPEG testing methodology. Two new test methods were designed: the Single Stimulus Multi-Media (SSMM) and the Double Stimulus Unknown Reference (DSUR). The SSMM method minimised the contextual effect typical of the Single Stimulus (SS) and the DSUR was derived from the Double Stimulus Impairment Scale (DSIS) Variant II introduced some of the advantages of the Double Stimulus Continuous Quality Scale (DSCQS) method in the DSIS method avoiding the tricky and difficult data processing of DSCQS.

The Joint Call for Proposals on Video Compression Technology (HECV) covered 5 different classes of content, with resolutions ranging from WQVGA (416×240) to 2560×1600, in two configurations (low delay and random access) for different classes of target applications. It was a very large test effort because it was done on a total of of 29 submissions that lasted 4 months and involved 3 laboratories which assessed more than 5000 video files and hired more than 2000 non-expert viewers. The ranking of submissions was done considering the Mean Opinion Square (MOS) and Confidence Interval (CI) values. A procedure was introduced to check that the results provided by different test laboratories were consistent. The results of the three laboratories included a common test set that allowed to measure the impact of a laboratory on the results of a test experiment.

A total of 24 complete submissions were received in response to the Joint Call for Proposal on 3D Video Coding (stereo and auto-stereo) issued in 2012. For each test case each submission produced 24 files representing the different viewing angle. Two sets of two and three viewing angles were blindly selected to synthesise the stereo and auto-stereo test files. The test was done on standard 3D displays (with glasses) and auto stereoscopic displays. A total of 13 test laboratories took part in the test running a total of 224 test sessions, hiring around 5000 non expert viewers. The test applied a full redundancy scheme, where each test case was run by two laboratories to increase the reliability and the accuracy of the results. The ranking of the submissions was done considering the MOS and CI values. This test represented a further improvement in the control of performances of each test laboratory. The test could ensure full result recovery in the case of failure of up to 6 out of 13 testing laboratories.

The Joint CfP for Coding of Screen Content was issued to extend the HEVC standard in order to improve the coding performance of typical computer screen content. Whent it became clear that the set of test conditions defined in the CfP was not suitable to obtain valuable results, the test method was modified from the original “side by side” scheme, to a sequential presentation scheme. The complexity of the test material led to the design of an extremely accurate and long training of the non-expert viewers. Four laboratories participated in the formal subjective assessment test, assessing and ranking the seven responses to the CfP. More than 30 test sessions were run (including the “dry-run” phase) hiring around 250 non-expert viewers.

The CfP on Point Cloud Coding was issued to assess coding technologies for 3D point coulds. MPEG had no experience (but actually no one had) in assessing the visual quality of point clouds. MPEG projected the 3D point clouds to 2D spaces and evaluated the resulting 2D video according to formal subjective assessment protocols. The video clips were produced using a rendering tool that generated two different video clips for each of the received submissions, under the same creation conditions. Both were rotating views of 1) a fixed synthesised image and 2) a moving synthesised video clips. The rotations were blindly selected.

The CfP for Video Compression with Capability beyond HEVC included three test categories, for which different test methods had to be designed. The Standard Dynamic Range category was a  compression efficiency evaluation process where the classic DSIS test method was applied with good results. The High Dynamic Range category required two separate sessions, according to the peak luminance of the video content taken into account, i.e. below (or equal to) 1K nits and above 1K nits (namely 4K nits); in both cases DSIS test method was used. The quality of the 360° category was assessed in a “viewport” extracted from the whole 360° screen with an HD resolution.

When the test was completed, the design of the 36 “SDR”, 14 “HDR” and 8 “360°” test sessions was verified. For each test session the distribution of the raw quality scores assigned during each session was analysed to verify that the level of visual quality across the many test sessions was equally distributed.

This was a long but still incomplete review of 30 years of subjective visual quality in MPEG. This ride across 3 decades should demonstrate that MPEG draws from established knowledge to create new methods that are functional to obtain the resulst MPEG is seeking. It should also show the level of effort invovled in actually assigning task, coordinate the work and produce integrated results that provide the responses. Most important is the level of human participation involved: 2000 people (non experts) for the HEVC tests!


Many thanks to the MPEG Test chair Vittorio Baroncini for providing the initial text of this article. Many parts of the activities described here were conducted by him as Test chair.

Posts in this thread

Developing MPEG standards in the viral pandemic age


For 30 years industry has been accustomed to rely on MPEG as the source of standards the industry needs. In 30 years MPEG has held a record 129 meetings, roughly spaced by 3 months.

What happens if MPEG130 is not held? Can industry afford it?

In this article I will try and answer this non so hypothetical question.

An MPEG meeting (physical)

In Looking inside an MPEG meeting I have illustrated the “MPEG cycle” workflow using the figure below

At the plenary session of the previous N-1th meeting, MPEG approves the results achieved and creates some 25 Ad hoc Groups (AhGs). Taking one example from MPEG129, each AhG has a title (Compression of Neural Networks for Multimedia Content Description and Analysis), chairs (Werner Bailer, Sungmoon Chun and Wei Wang) and a set of mandates:

  1. Collect more diverse types of models and test data for further use cases, working towards a CfP for incremental network representation
  2. Perform the CEs and analyse the results
  3. Improve the working draft and test model
  4. Continue analyzing the state of the art in NN compression and exchange formats
  5. Continue interaction with SC42, FG ML5G, NNEF, ONNX and the AI/ML community

Work is carried out during the typical ~3 months between the end of the N-1th and the next Nth meeting using e-mail reflector or conference calls or, less frequently, physical meetings. Documents are shared by AhG members using the MPEG Document Management System (MDMS).

When the date of the next meeting approaches, AhGs wrap up their conclusions and many of them hold physical meetings on the week-end prior to the “MPEG week”.

On the Monday morning of the MPEG week, AhGs report their results to the MPEG plenary. In the afternoon, subgroups (Requirements, Systems, Video, Joint groups with ITU, Audio, 3DG and Test) hold short plenaries after which Break-out Groups (BoGs), often a continuation of AhGs, carry out their work interspersed with joint meetings of subgroups and BoGs.

Two more plenaries are held: on Wednesday morning to make everybody aware of what has happened in groups a member might not have had the opportunity to attend and on Friday afternoon to ratify or, if necessary, reconsider, decisions made by the subgroups.

The Convenor and the Chairs meet at night to assess progress and coordinate work between subgroups and BoGs. A typical function is the identification of joint meetings.

ICT at the service of MPEG

Some 500 people are involved in an MPEG week, At times some 10-15 meeting sessions are held in parallel.

Most of this is possible because of the ICT facilities MPEG prides itself of. Developed by Christian Tulvan, they run on servers made available by Institut Mines Télécom.

Currently the MPEG Meeting Support System (MMSS) includes a calendar where subgroup chairs record all subgroup and BoG sessions adding a description of the topics to be discussed. The figure below gives a snapshot of the MMSS calendar. This of course has several views to serve different needs.

In Digging deeper in the MPEG work, I described MDMS and MMSS. Originally deployed in 1995, MDMS has been one of the greatest contributors to MPEG’s skyrocketing rise in performance. In addition to providing the calendar, MMSS also enables the collation of all results produced by the galaxy of MPEG organisational units depicted below.

The third ICT support is the MPEG Workplan Management System (MWMS). This provides different views of the relevant information on MPEG standards that is needed to execute the workplan.

MPEG online?

Now imagine, and probably you don’t have to stretch you imagination too much, that physical meetings of people are banned but industry requests are so pressing that a meeting must be held, no matter what, because product and service plans depend so much on MPEG standards.

MPEG is responding to this call of duty and is attempting the impossible by converting its 131st (physical) meeting of 500 experts to a full online meeting retaining as much as possible the modus operandi depicted in the figures above.

In the following I will highlight how MPEG is facing what is probably its biggest organisational challenge ever.

The first issue to be considered is that, no matter how skilfully MPEG will handle its first online meeting, productivity is going to be less than a physical meeting could yield. This is because by and large the majority of the time of a physical MPEG meeting is dedicated to intense technical discussions in smaller (and sometimes not so small) groups. At an online meeting, such discussions will at best be a pale replica of the physical meeting where experts are pressed by the number and the complexity of the issues, the argument they make, the little time available, the need to get to a conclusion and a clumsier handling of the interventions.

MPEG is facing this challenge by asking AhGs to come to the online meeting with much more solid conclusions than usual so that the results that will be brought to the online meeting will be more mature and will require less debate to be adopted. This has generated a surge in conference calls by the groups who are more motivated by the need to achieve firm results at the next meeting.

Another way to face the challenge is by being realistic in what is achievable at an online meeting, Issues that are very complex and possibly less urgent will be handled with a lower priority or not considered at all, of course if the membership agrees. Therefore the management will set the meeting goals, balancing urgency, maturity and achievability of results. Of course experts, individually or via AhGs, will have an opportunity to make themselves heard.

Yet another way to face the challenge is by preparing a very detailed assignment of time slots to issues during the entire week in advance of the MPEG week. So far this was done only partially because MPEG allowed as much time as possible to experts to prepare and upload their contributions for others to study and to be ready to discuss at the meeting. This has always forced the chairs to prepare their schedule at the last minute or even during the week as the meeting unfolds. This time MPEG asks its experts to submit their contributions one full week before with an extended abstract to facilitate the task of the chairs who have to understand tens and sometimes hundreds of contributions and properly assign them to homogeneous sessions.

The schedules will balance the need to achieve as many results as possible (i.e. parallel sessions) with giving the opportunity to as many members as possible to attend (i.e. sequential sections).

The indefatigable Christian Tulvan, the mind and the arm behind MDMS and MMSS, is currently working to extend MMSS to enable the chairs to add the list of documents to be considered and to create online session reports shared with and possibly co-edited by session participants.

So far MPEG has been lenient most of the time to late contributions (accepted if there is consensus to review the contribution). This time late contributions will simply not be considered.

No matter how good the forecast will be, It is expected that the schedule will change while the week progresses. If a change during the meeting is needed, it will be announced at least 24 hours in advance.

The next big challenge is the fact that MPEG is a truly global organisation. We do not have Hawaiian experts in attendance, but we do have experts from Australia (East Coast) to the USA (West Coast). That makes a total of 19 time zones. Therefore MPEG130 online will be conducted in 3 time slots starting at 05:00, 13:00 and 21:00 (times are GMT). The sessions inside will have durations less than 2 hours followed by a break.


Last but not least. MPEG is confident that the current emergency will be called off soon. The situation we are facing, however, is new and we simply don’t know when it will be over and if it will be for once or if this is just the first of future pandemics.

With MPEG130 online, MPEG not only wants to respond to the current industry needs, but also to fine tune its processes in an online context to be always ready to serve the industry and the people industry serves, no matter which are the external circumstances.

I don’t underestimate the challenge MPEG is facing with MPEG130 online, but I know I can rely on a dedicated leadership and membership.

Posts in this thread

The impact of MPEG on the media industry

MPEG was established as an experts group on 1988/01/22 in Copenhagen, a likttle more that 32 tears ago. At that time, content media were already very important: voice communication; vinyl, compact cassettes, compact discs for audio; radio, mostly on terrestrial Hertzian channels; and television on 4 physical media: terrestrial Hertzian channels, satellite, cable and package media.

The way individual media evolved was a result of the technology adopted to represent content media and the way content media were distributed. Industry shared some elements of the technologies but each industry introduced many differences. The situation was further exacerbated by different choice made by different countries and regions, sometimes justified by the fact that some countries introduced a technology earlier (like 415 lines of UK TV before WW II and 525 lines od US TV some years later). In some other cases there was no justification at all.

The figure below represents the actors of 1988:

  1. Two forms of wireless radio and television (terrestrial and satellite)
  2. Wired radio and television (cable)
  3. Physical distribution (package media)
  4. Theatrical movies distribution
  5. Content industry variously interconnected with the distribution industries.

The figure includes also two industries who, at that time, did not have an actual business in content distribution. Telecommunications was actively vying for a future role (although at that time some telcos were running cable television services as a separate business from telephony both as public services). The second industry was information technology. Few at that time expected that the internet protocol, an outcome of the information technology industry because it was designed to enable computers to communicate, would become the common means to transport media. However, eventually that is what it did.

The figure should be more articulated. Indeed it does not include manufacturers. At that time consumer electronics served users of the broadcasting service but broadcasting had their own manufacturing industry for the infrastructure. Consumer electronics was by itself the package media industry. Telcos had a manufacturing industry of their own for the infrastructure and a separate manufacturing industry for terminal devices, with some consumer electronics or office equipment companies providing facsimile terminals.

Even though it did not happen overnight, MPEG came, saw and unified. Today all the industries in the figure maintain a form of individual existence but they are much more integrated, as represented by the figure below.

Industry convergence has become a much abused word. However, it is true that standard and efficient digital media have enabled the industries to achieve enormous savings in moving to digital, and expanding from it, by allowing reuse of common components possibly form hitherto remote industries. A notable example is Media Transport (MMT) which provides the means to seamlessly move from one-way to two-way media distribution because IP is the underlying common protocol.

There is a net result from convergence that can be described as two points

  1. Industry: MPEG-enabled products (devices) & services are worth 1.5 T$ p.a., i.e. ~1.8% Gross World Product
  2. Consumers: Billions of consumers enjoy media every time and everywhere.

It would be silly to claim that this is a result for which MPEG is the only one to claim merit. There are many other standards bodies/committees who share in this result. The figure below shows some of them. It should be cleat, however, that, all started from MPEG while other bodies took over from where MPEG has left the technology.

Two words about the semantics of the figure. A black line without arrows signifies that MPEG is in liaison with the body. A black line with one arrow means that MPEG is providing or has provided standards to that body. A black line with two arrows means that the interchange is/has been two way. Finally a red line means that MPEG has actually developed standards with that body. The numbers refer to the number of jointly developed standards. The number after the + indicates the number of standards MPEG is currently developing jointly with that body.

Is there a reason why MPEG has succeeded? Probably more than one, but primarily I would like to mention one: MPEG has created standards for interoperability where industry used to develop standards for barriers. Was MPEG unique in its driving thoughts? No, it just applied the physiocratic principle “laissez faire, laissez passer” (let them do, let them pass), without any ideological connotation. Was MPEG unique in how it did it? Yes, because it first applied the principle to media standard. Was MPEG unique in its result? Yes. It created a largely homogeneous industry in what used to be scattered and compartmentalised industries.

It is easy to look at the success of the past. It is a nice exercise to do when you have reached the end of the path, but this is not the case of MPEG. Indeed MPEG has a big challenge: after it has done the impossible, people expects to do even better in the future. And MPEG has better not fail 🙁

The figure below depicts some of the challenges MPEG faces in the next few years.

A short explanation of the 8 areas of the figure:

  1. Maintenance of ~180 standards is what MPEG needs to do primarily. Industry has adopted MPEG standards by the tens, but that is not the end point, that is the start. Industry continuously expresses needs that come from the application of MPEG standards it has adopted. These requests must be attended to.
  2. Immersive media is one of the biggest challenges faced by MPEG. We all wish to have immersive experience like being physically here but feeling like we were at a different place subject to the experiences felt by those who are in that place. The challenges are immense. Addressing them requires a level on integration with the industry never seen before.
  3. Media for old and new users conveys two notions. The first that “old” media are not going to die anytime soon. We will need conventional audio, good old 2D rectangular video and, even though it is hard to call them as “old media”, point clouds. These media are for human users, but we see the appearance of a new type of user – machines – that are going to make use of audio and visual information that has been transmitted from remote. This goal includes the current Video Coding for Machines (VCM) exploration.
  4. Internet of Media Things is a standard that MPEG has already developed with the acronym IoMT. At this moment, however, this is more at the level of a basic infrastructure on which it will be possible to build support for such ambitious scenarios as Video Coding for Machines where media information is captured and processes by a network of machines assembled or built to achieve a predetermined goal.
  5. Neural Network Compression (NNR) is another component of the same scenario. The current assumption is that in the future a lot, if not all, of the “traditional” processing, e.g. for feature extraction, will accomplished using neural network and that components of “intelligence” will be distributed to devices, e.g. handheld devices but also IoMTs, to enable them to be a better or a new job. NNR is at its infancy in MPEG and much more from it can be expected.
  6. Genomic Data Compression has been shown to be viable by the MPEG-G standard. The notion of a single representation of a given type of data is a given in MPEG and has been the foundation of its success. That notion is alien to the genomic world where different data formats are applied at different portions of genomic workflows, but its application will have beneficial effects as much as it had to the media industry.
  7. Other Data Compression is a vast field that includes all cases where data, possibly already in digital form, are currently handled in an inefficient way. Data compression is not important only because it reduces storage and transmission time/bandwidth requirements, but because it provides data in a structured form that is suitable for further processing. Exploring and handking these opportunities is a long-term effort and will certainly provide rewarding opportunities.
  8. Finally, we should realise that, although MPEG holds the best compression and transport experts from the top academic and economic enterprises, we do not know the needs of all economic players. We should be constantly on alert, ready to detect the weak signal of today that will become mainstream tomorrow.

For as many years to come as it is possible to forecast today, industry and consumers will need MPEG standards.

Posts in this thread