Introduction
When I described the MPEG workflow in How does MPEG actually work? I highlighted the role of quality assessment across the entire MPEG standard life cycle: at the time of issuing a Call for Evidence (CfE) or a Call for Proposals (CfP), carrying out Core Experiments (CE) or executing Verification Tests.
We should consider, however, that in 30 years the coverage of the word “media” has changed substantially.
Originally (1989-90) the media types tested were Standard Definition (SD) 2D rectangular video and stereo audio. Today the video data types include also High Definition (HD), Ultra High Definition (UHD), Stereoscopic Video, High Dynamic Range (HDR) and Wide Colour Gamut (WCG), and Omnidirectional (Video 360).
The video information can be 2D, but also multiview and 3D: stereoscopic, 3 degrees of freedom + (3DoF+), 6 degrees of freedom (6DoF) and various forms of light field. Audio has evolved to different forms of Multichannel Audio and 6DoF. Recently Point Clouds were added to the media types for which MPEG has applied subjective quality measurements to develop compression standards.
In this article I would like to go inside the work that comes together with subjectively assessing the quality of media compression.
Preparing for the tests
Even before MPEG decides to issue a CfE or CfP for compression of some type of media content, viewing or listening to content may take place to appreciate the value of a proposal. When a Call is issued MPEG has already reached a pretty clear understanding of the use cases and requirements (at the time of a CfE) or the final version of them (at the time of a CfP).
The first step is the availability of appropriate test sequences. Sequences may already be in the MPEG Content Repository or are spontaneously offered by members or are obtained from industry representatives by issuing a Call for Content.
Selection of test sequences is a critical step because we need sequences that are suitable for the media type and representative of the use cases, and in a number that allows us to carry out meaningful and realistic tests.
By the CfE or CfP time, MPEG has also decided what is the standard against which responses to the CfE or CfP should be tested. For example, in the case of HEVC, the comparison was with AVC and, in the case of VVC, the comparison was with HEVC. In the case of Internet Video Coding (IVC) the comparison was with AVC. When such a reference standard does not exist (this was the case for, e.g., all layers of MPEG-1 Audio and Point Cloud Compression) the codec built during the exploratory phase that groups together state of the art tools is used.
Once the test sequences have been selected, the experts in charge of the reference software are asked to run the reference software encoder and produce the “anchors”, i.e. the video sequences encoded according to the “old” standard. The anchors are made available in an FTP site so that anybody intending to respond to the CfE or CfP can download them.
The set up used to generate the anchors are documented in “configuration files” for each class of submission. In the case of video these are (SD/HD/UHD, HDR, 360º) and design conditions (low delay or random access). Obviously, in order to have comparable data, all proponents must use the same configuration files when they encode the test sequences using their technology.
As logistic considerations play a key role in the preparation of quality tests, would-be proponents must submit formal statements of intention to respond to the CfE or CfP to the Requirements group chair (currently Jörn Ostermann) and Test group chair (currently Vittorio Baroncini), and the chair of the relevant technology group, 2 months before the submission deadline.
At the meeting before the responses to the CfP/CfE are due, an ad hoc group (AHG) is established with the task of promoting awareness of the Call among the industry, to carry out the tests, draft a report and submit conclusions on the quality tests to the following MPEG meeting.
Carrying out the tests
The actual tests are entirely carried out by the AHG under the leadership of AHG chairs (typically the Test chair and a representative of the relevant technology group).
Proponents send on hard disk drives or on an FTP site their files containing encoded data to the Test Chair by the deadline specified in the CfE/CfP.
When all drives are received the Test chair performs the following tasks
- Acquire special hardware and displays for the tests (if needed)
- Verify that the submitted files are all on disk and readable
- Assign submitted files to independent test labs (sometimes even 10 test labs are concurrently involved is a test run)
- Make copies and distribute the relevant files to the test labs
- Specify the tests or provide the scripts for the test labs to carry out.
The test labs carry out a first run of the tests and provide their results for the Test chair to verify. If necessary, the Test chair requests another test run or even visits the test labs to make sure that the tests will run properly.
When this “tuning” phase has been successfully executed, all test labs run the entire set of tests assigned to them using test subjects. Tens of “non-expert” subjects may be involved for several days.
Here is a sample of what it means to attend subjective quality tests
Test report
Tests results undergo a critical revision according to the following steps
- The Test chair collects all results from all test labs performs a statistical analysis of the data, prepares and submits a final report to the AHG
- The report id discussed in the AHG and may be revised depending on the discussions
- The AHG draws and submits its conclusions to MPEG along with the report
- Report and conclusions are added to all the material submitted by proponents
- The Requirements and the technology group in charge of the media type evaluate the material and rank the proposals. Because of the sensitivity of some of the data all the material is not made public.
This signals the end of the competition phase and the beginning of the collaboration phase.
Other media types
In general the process above has been described having specifically rectangular 2D or 360º video in mind. Most of the process applies to other media types with some specific actions to be made for each of them, e.g.
- 3DTV: for the purpose of 3D HEVC tests, polarised glasses as in 3d movies and autostereoscopic displays were used;
- 3DoF+: a common synthesiser will be used in the upcoming 3D0F+ tests to synthesise views that are not available at the decoder;
- Audio: in general subjects need to be carefully trained for the specific tests;
- Point Clouds: videos generated by a common presentation engine consisting of point clouds animated by a script (by rotating the object and seeing it from different viewpoints) were tested for quality as if they were natural videos (NB: there were no established method to assess the quality of point clouds before. It was demonstrated that the subjective tests converged to the same results as the objective measurements).
Verification tests
Verification tests are executed with a similar process. Test sequences are selected and compressed by experts running reference software for the “old” and the “new” standard. Subjective tests are carried out as done in CfE/CfP subjective tests. Test results are made public to provide the industry with guidance on the performance of the new standard. See as examples the Verification Test Report for HDR/WCG Video Coding Using HEVC Main 10 Profile and the MPEG-H 3D Audio Verification Test Report.
Conclusions
Quality tests play an enabling role in all phases of development of a media compression standard. For 30 years MPEG has succeeded in mobilising – on a voluntary basis – the necessary organisational and human resources to perform this critical task.
I hope that, with this post, I have opened a window to an aspect of MPEG life that is instrumental to offer industry the best technology so that users can have the best media experience.
Posts in this thread (in bold this post)
- The MPEG ecosystem
- Why is MPEG successful?
- MPEG can also be green
- The life of an MPEG standard
- Genome is digital, and can be compressed
- Compression standards and quality go hand in hand
- Digging deeper in the MPEG work
- MPEG communicates
- How does MPEG actually work?
- Life inside MPEG
- Data Compression Technologies – A FAQ
- It worked twice and will work again
- Compression standards for the data industries
- 30 years of MPEG, and counting?
- The MPEG machine is ready to start (again)
- IP counting or revenue counting?
- Business model based ISO/IEC standards
- Can MPEG overcome its Video “crisis”?
- A crisis, the causes and a solution
- Compression – the technology for the digital age
- On my Charles F. Jenkins Lifetime Achievement Award
- Standards for the present and the future