Forty years of video coding and counting

Introduction

For about 150 years, the telephone service has provided a socially important communication means to billions of people. For at least a century the telecom industry wanted to offer a more complete user experience (as we would call it today) by adding the visual to the speech component.

Probably the first large scale attempt at offering such an audio-visual service was AT&T’s PicturePhone in the mid 1960’s. The service was eventually discontinued but the idea of expanding the telephone service with a video service caught the attention of telephone companies. Many expected that digital video-phone or video-conference services on the emerging digital networks would guarantee the success that the PicturePhone service did not have and research in video coding was funded in many research labs of the telephone companies.

This article will tell the story of how this original investment, seconded by other industries, gave rise to the ever improving digital video experience that our generation is experiencing in ever greater number.

First Video Coding Standard

The first international standard that used video coding techniques – ITU-T Recommendation H.120 – originated from the European research project called COST 211. H.120 was intended for video-conference services, especially on satellite channels, was approved in 1984 and implemented in a limited number of specimens.

Second Video Coding Standard

The second international standard that used video coding techniques – ITU-T Recommendation H.261 – was intended for audio-visual services and was approved in 1988. This signaled the maturity of video coding standardisation that left the old and inefficient algorithms to enter the DCT/motion compensation age.

For several reasons H.261 was implemented by a limited number of manufacturing companies for a limited number of customers.

Third Video Coding Standard

Television broadcasting has always been – and, with challenges, continues to be also today – a socially important communication tool. Unlike audio-visual services that were mostly a strategic target on the part of the telecom industry, television broadcasting in the 1980’s was a thriving industry served by the Consumer Electronic (CE) industry providing devices to hundreds of millions of consumers.

The idea the originated ISO MPEG-1, the third international standard that used video coding techniques and intended for interactive video applications on CD-ROM, was approved by MPEG in November 1992. Besides the declared goal, the intention was to popularise video coding technologies by relying on the manufacturing prowess of the CE industry. MPEG-1 was the first example of a video coding standard developed by two industries that had had until that time very little in common: telecom and CE (terminals for the telecom market were developed by a special industry with little contact with the CE industry).

Fourth Video Coding Standard

Even though in the late 1990’s MPEG-1 Video eventually reached the 1 billion units sold with the nickname “Video CD”, especially in the Far East, the big game started with the fourth international standard that used video coding techniques – ISO MPEG-2 – whose original target was “digital television”. The number of industries interested in it made MPEG crowded: telecom had always sought to have a role in television, CE was obviously interested in having existing analogue TV sets replaced by shining digital TV sets or at least supplemented by a set top box, satellite broadcasters and cable were very keen on the idea of hundreds of TV programs in their bouquets, terrestrial broadcasters had different strategies in different regions but eventually joined, as well as the package media sector of the CE industry, with their tight contacts with the movie industry. This explains why the official title of MPEG-2 is “Generic coding of moving pictures and associated audio information” to signal the fact that MPEG-2 could be used by all the industries that, at that time, had an interest in digital video, a unique feat in the industry.

Fifth and Sixth Video Coding Standards

Remarkably, MPEG-2 Video (and Systems) was a standard jointly developed by MPEG and ITU-T. The world, however, follows the dictum of the Romance of Three Kingdoms (三國演義): 話說天下大勢.分久必合,合久必分. Adapted to the context this can be translated as in the world things divided for a long time shall unite, things united for a long time shall divide. So, the MPEG and ITU paths divided in the following phase. ITU-T developed its own H.263 Recommendation “Video coding for low bit rate communication” and MPEG developed its own MPEG-4 Visual standard, part 2 “Coding of audio-visual objects”. The conjunction of the two standards is a very tiny code that simply tells the decoder that a bitstream is H.263 or MPEG-4 Visual. A lot of coding tool commonality exists, but not at the bitstream level.

H.263 focused on low bitrate video communication, while MPEG-4 Visual kept on making real the vision of extending video coding to more industries: this time Information Technology and Mobile. MPEG-4 Visual was released in 2 versions in 1999 and 2000, while H.263 went through a series of updates documented in a series of Annexes to the H.263 Recommendation. H.263 enjoyed some success thanks to the common belief that it was “royalty free”, while MPEG-4 Visual suffered a devastating blow by a patent pool that decided to impose “content fees” on their licensing term.

Seventh Video Coding Standard

The year 2001 marked the return to the second half of Romance of Three Kingdoms’ dictum: 分久必合 (things separated for a long time shall divide), even though it was not too 久 (long time) since they had divided, certainly not on the scale intended by the Romance of Three Kingdoms. MPEG and ITU-T (through its Video Coding Experts Group – VCEG) joined forces again in 2001 and produced the seventh international standard in 2003. The standard is called Advanced Video Coding by both MPEG and ITU, but is labelled as AVC by MPEG and as H.264 by ITU-T. Reasonable licensing terms (of course always considered unreasonable by licensees) ensured AVC’s long-lasting success in the market place that continues to this day (for another 4 years and 3 months, I mean).

Eighth Video Coding Standard

The eight international video coding standard that used video coding techniques stands by itself because it is not a standard with “new” video coding technologies, but a standard that enables a video decoder to build a decoder matching the bitstream using standardised tools represented in a standard form available at the decoder. The technique, called Reconfigurable Video Coding (RVC) or, more generally, Reconfigurable Media Coding (RMC), because MPEG has applied the same technology to 3D Graphics Coding, is enabled by two standards: ISO/IEC 23002-4 Codec configuration representation and ISO/IEC 23003-4 Video tool library (VTL). The former defines the methods and general principles to describe codec configurations. The latter describes the MPEG VTL and specifies the Functional Units that are required to build a complete decoder for the following standards: MPEG-4 Simple Profile, AVC Constrained Baseline Profile and Progressive High Profile, MPEG-4 SC3DMC, and HEVC Main Profile.

Ninth Video Coding Standard

In 2010 MPEG and VCEG extended their collaboration to a new project: High Efficiency Video Coding (HEVC). A few months after the HEVC FDIS had been released, the HEVC Verification Tests showed that the standard had achieved 60% improvement on AVC, 10% more than originally planned. After that HEVC has been enriched with a number of features that at the time of development were not supported by previous standards such as High Dynamic Range (HDR) and Wide Colour Gamut (WCG), and support to Screen Content and omnidirectional video (video 360). Unfortunately, technical success did not translate into full market success because adoption of HEVC is still hampered – 6 years after its approval by MPEG – by an unclear licensing situation. In IP counting or revenue counting?; Business model based ISO/IEC standards, Can MPEG overcome its Video “crisis”? and A crisis, the causes and a solution an analysis is made of the reasons of the currently stalled situation and possible remedies are proposed.

Tenth Video Coding Standard

ISO, IEC and ITU share a common policy vis-à-vis patents in their standards. Using few imprecise but clear words (where a patent attorney would use many precise but unclear words), the policy is: it is good if a standard has no patents or if the patent holders are allowing use of their patents for free (Optioon 1); it is tolerable if a standard has patents but the patents holders allow use of their patent on fair and reasonable terms and non discriminatory conditions (Option 2); it is not permitted to have a standard with patents whose holders do not allow use of their patents (Option 3).

The target of MPEG standards until AVC had always been “best performance no matter what is the IPR involved” (of course if the IPR holders allow), but as the use of AVC extended to many domains, it was becoming clear that there was so much “old” IP (i.e. more than 20 years) that it was technically possible to make a standard whose IP components were Option 1.

In 2013 MPEG released the FDIS of WebVC, strictly speaking not a new standard because MPEG had simply extracted what was the Constrained Baseline Profile of AVC and made it a separate standard with the intention of making it Option 1. The attempt failed because some companies confirmed their Option 2 patent declarations already made against the AVC standard.

Eleventh Video Coding Standard

WebVC has not been the only effort made by MPEG to develop an Option 1 video coding standard (i.e. a standard for which only Option patent declarations have been made). A second effort, called Internet Video Coding (IVC), was concluded in 2017 with the release of the IVC FDIS. Verification Tests performed showed that the performance of IVC exceeded that of the best profile of AVC, by then a 14 years old standard. Three companies made Option 2 patent declarations that did not contain any detail so that MPEG could not remove the technologies in IVC that the companies claimed infringed their patents.

Twelfth Video Coding Standard

MPEG achieved a different result with its third attempt at developing an Option 1 video coding standard. The proposal made by a company in response to an MPEG Call for Proposals was reviewed by MPEG and achieved FDIS with the name of Video Coding for Browsers (VCB). However, a company made an Option 3 patent declaration that, like those made against IVC, did not contain any detail that would enable MPEG to remove the allegedly infringing technologies. Eventually ISO did not publish VCB.

Today ISO and IEC have disabled the possibility for companies to make Option 3 patent declarations without details (a policy that ITU had not allowed). As the VCB approval process has been completed, it is not possible to resume the study of VCB if MPEG does not restart the process. Therefore, VCB is likely to remain unpublished and therefore not an ISO standard.

Thirteenth Video Coding Standard

For the third time MPEG and ITU are collaborating in the development of a new video coding standard with the target of a 50% reduction of bitrate compared to HEVC. The development of Versatile Video Coding (VVC), as the new standard is called, is still under way and involves close to 300 experts attending VVC sessions. MPEG expects to reach the FDIS of Versatile Video Coding (VVC) in October 2020.

Fourteenth Video Coding Standard

Thirteen is a large number for video coding standards but this number should be measured against the number of years covered – close to 40. In this long period of time we have gone from 3 initial standards that were mostly application/industry-specific (H.120, MPEG-1 and H.261) to a series of generic (i.e. industry-neutral) standards (MPEG-2, MPEG-4 Visual, MPEG-4 AVC and HEVC) and then to a group of standards that sought to achieve Option 1 status (WebVC, IVC and VCB). Other proprietary video coding formats that have found significant use in the market point to the fact that MPEG cannot stay forever in its ivory tower of “best video coding standards no matter what”. MPEG has to face the reality of a market that becomes more and more diversified and where – unlike the golden age of a single coding standard – there is no longer one size that fits all.

At its 125th meeting MPEG has reviewed the responses to its Call for Proposals on a new video coding standard that sought proposals with a simplified coding structure and an accelerated development time of 12 months from working draft to FDIS. The new standard will be called MPEG-5 Essential Video Coding (EVC) and is expected to reach FDIS in January 2020.

The new video coding project will have a base layer/profile which is expected to be Option 1 and a second layer/profile that has already a performance ~25% better than HEVC. Licensing terms are expected to be published by patent holders within 2 years.

VCEG has decided not to work with MPEG on this coding standard. Are we back to the 合久必分 (things combined for a long time must split) situation? This is half true because the MPEG-VCEG collaboration in VVC is continuing. In any case VVC will provide 50% more than the HEVC compression performance.

Fifteenth Video Coding Standard

If there was a need to prove that there is no longer “one size fits all” in video coding, just look at the Call for Proposals for a “Low Complexity Video Coding Enhancements” standard issued by MPEG. This Call is not for a “new video codec”, but a technology capable to extend the capabilities of an existing video codec. A typical usage scenario is the addition of, say, the high definition capability to a set top boxes (typically deployed by the millions) that cannot be recalled. Proposals are due at the March 2019 meeting and FDIS is expected in April 2020.

Sixteenth Video Coding Standard

Point Clouds are not really the traditional “video” content as we know it, namely sequences of “frames” at a frequency sufficiently high frequency to fool the eye into believing that the motion is natural. In point clouds motion is given by dynamic point clouds that represent the surface of objects moving in the scene. For the eye, however, the end-result is the same: moving pictures displayed on a 2D surface, whose objects can be manipulated by the viewer (this, however, requires a system layer that MPEG is already developing).

MPEG is working on two different technologies: the first one uses HEVC to compress projections of portions of a point cloud (and is therefore well-suited for entertainment applications because it can rely on an existing HEVC decoder) and the second one uses computer graphics technologies (and is currently more suited to automotive applications). The former will achieve FDIS in January 2020 and the latter in April 2020.

Seventeenth and Eighteenth Video Coding Standards

Unfortunately, the crystal ball gets blurred as we move into the future. Therefore MPEG is investigating several technologies capable to providesolutions for alternative immersive experiences. After providing HEVC and OMAF for 3DoF experiences (where the user can only have roll, pitch, and yaw movement of the head), MPEG is working on OMAF v2 for 3DoF+ experiences (where the user can have a limited translation of the head). A Call for Proposal has been issued and responses are due in March 2019 and the FDIS is expected in July 2020. Investigations are being carried out on 6DoF (where the user can have full translation of the head) and on light field.

Conclusions

The last 40 years have seen digital video converted from a dream into a reality that involves billions of users every day. This long ride is represented in the figure that ventures into the next steps of the ride.

MPEG keeps working to make sure that manufacturers and content/services providers have access to more and better standard visual technologies for an increasingly diversified market of increasingly demanding users.

Posts in this thread (in bold this post)