Can MPEG overcome its Video “crisis”?

In my earlier post I described the “crisis” and how it was created, and hinted at possible ways to solve this crisis and avoid possible other future crises in other areas. As I remain skeptical that the crisis will be overcome, I want to remove any doubt about who should be blamed for the failure, certainly not ISO and not MPEG.

About ISO (and IEC)

The International Organisation for Standardisation (ISO), is an international non-intergovernmental organisation, made up of members from the national standards bodies of 162 countries (as of today). This is a summary of the ISO organisational structure

  1. The General Assembly is the ultimate authority of ISO and meets once a year.
  2. The Council is the core ISO governance body made up of 20 member bodies and other officers. It reports to the General Assembly and meets three times a year.
  3. The Technical Management Board (TMB) manages the ISO technical work and is responsible for the Technical Committees (TC). The TMB reports to Council.
  4. Technical Committees are in charge of developing standards. Nominally there are 314 TCs, but some are inactive or disbanded. Of particular relevance is the Joint ISO/IEC Technical Committee 1 (JTC 1) on Information Technologies established in 1987 by combining relevant activities in ISO TC 97 Information processing systems and in the International Electrotechnical Commission (IEC). JTC 1 is the largest TC in ISO and by itself manages ~1/3 of all ISO standardisation activities. JTC 1 is organised in Subcommittees, the latest of which is SC 42 Artificial Intelligence. Some SCs have been disbanded.

The size and importance of ISO require rules that are contained in the ISO/IEC Directives that all entities in ISO are bound to follow. These are periodically reviewed.

About MPEG

The Moving Picture Experts Group was created in 1988 as an Experts Group of Working Group 8 of JTC 1/SC 2 then called Character Sets and Information Coding. MPEG operatedin parallel to the Joint ISO/ITU-T Picture Experts Group (JPEG). In 1991 JTC 1/SC 29 Coding of audio, picture, multimedia and hypermedia information was created and MPEG became its WG 11 “Coding of Moving Pictures and Audio”.

MPEG develops highly sophisticated nature of MPEG standards and the typical attendance at its quarterly meetings is 400-500 experts. Therefore MPEG is organised in Subgroups: Requirements (what MPEG standards should do), Systems (media system level standards), Video (video coding standards), JCT-VC (HEVC standard), JVET (future video coding standard), 3DG (3D graphics coding standards), Tests (testing the quality of MPEG standards) and Communication (promotion of MPEG standards). JCT-VC and JVET are joint with ITU-T SG 16 Q6.

This is a unique organisation in ISO, but it exists because standards for media systems require a strong interaction of its components. Because of this MPEG holds several joint meetings where relevant subgroups discuss and agree on matters of common interests. The ability to develop complex digital media standards is one of the reasons of the success of MPEG standards in the market. Breaking up MPEG would deal a fatal blow to the validity of MPEG standard.

There is a fundamental operational difference between ISO Working Groups on one side, and Subcommittees and Technical Committees on the other. SC and TC decisions are based on national votes (where e.g. Luxembourg and United States have one vote each), WG technical decisions are made by consensus. Far from being a constraint, consensus-based working ensures that MPEG standards are technically sound.

About MPEG standards

As stated in my earlier post MPEG has been developing standards having the best performance as a goal, irrespective of the IPR involved. This approach has produced the best technical – and usable – video coding standards until AVC. No longer so with HEVC. It is not that there are many more patent holders in HEVC than in AVC, but the patent pool creation mechanism seems no longer able to deliver results.

In my earlier post I have provided some ideas on how the MPEG standard development process can be adapted to deliver standards in a form that facilitates the development of licences. None of these ideas can even remotely disadvantage proponents of good video coding technology. The point is that decisions in the MPEG working group are to be made by consensus. So it is entirely in the hands of MPEG members (and also ITU-T, in this case) to agree on an effective way of streamlining the MPEG video coding standard development process.

Unfortunately this is only the tip of the iceberg. The fact that almost the same companies have been unable to agree on an HEVC licence when 12 years before they had been able to agree on an AVC licence shows that the patent holder environment is degrading. The result is that the nice “ISO consensus” practice may no longer ensure that Option 2 standards can be developed and the MPEG experience proves that Option 1 standards cannot be developed.

Of course the work of developing technical standards must be done using the “ISO consensus” practice but MPEG must be able to access the high layers of the ISO hierarchy without shields. Even if there were no other good reasons, this should be achieved because MPEG is the largest working group in ISO, larger than most if not all JTC 1 Subcommittees and of most ISO Technical Committees, producing more standards than most ISO entities.

Disclaimer

The reader should not think that I am rasing the “MPEG TC” issue because of an ill-conceived desire for “promotion”. Over the last 30 years I have been approached by several National Body representatives who asked me to raise the MPEG status in ISO. I always declined because at the time I thought that the “MPEG WG” status was just OK.

Not acting on those proposals was my mistake.

A crisis, the causes and a solution

Why this post?

Because there are rumours spreading about a presumed “MPEG-Video collapse” and Brownian motion-like initiatives trying to remedy – in some cases by the very people who have contributed to creating the “crisis”.

Who is the author of this post?

Leonardo Chiariglione, the founder and chairman of MPEG, but I am writing in a personal capacity.

Why is MPEG important?

In its 30 years of operation MPEG has created digital media standards that have enabled the birth and continue promoting the growth of digital media products, services and applications. Here are a few, out of close to 180 standards: MP3 for digital music (1992), MPEG-2 for digital television (1994), MPEG-4 Visual for video on internet (1998), MP4 file format for mobile handsets (2001), AVC for reduced bitrate video (2003), DASH for internet streaming (2013), MMT for IP broadcasting (2013) and more. In other words, MPEG standards have had and keep on having an impact on the lives of billions of people.

How could MPEG achieve this?

Thanks to its “business model” that can be simply described as: produce standards having the best performance as a goal, irrespective of the IPR involved. Because MPEG standards are the best in the market and have an international standard status, manufacturers/service providers get a global market of digital media products, services and applications, and end users can seamless communicate with billions of people and access millions of services. Patent holders who allow use of their patents get hefty royalties with which they can develop new technologies for the next generation of MPEG standards. A virtuous cycle everybody benefits from.

Why is there a “crisis”?

Good stories have an end, so the MPEG business model could not last forever. Over the years proprietary and “royalty free” products have emerged but have not been able to dent the success of MPEG standards. More importantly IP holders – often companies not interested in exploiting MPEG standards, so called Non Practicing Entities (NPE) – have become more and more aggressive in extracting value from their IP.

I saw the danger coming and designed a strategy for it. This would create two tracks in MPEG: one track producing royalty free standards (Option 1, in ISO language) and the other the traditional Fair Reasonable and Non Discriminatory (FRAND) standards (Option 2, in ISO language). Option 1 standards, obviously less performing than Option 2 ones, would counter the “proprietary” threat and provide an incentive to produce even more effective Option 2 standards while keeping at bay excessive claims by patent holders because of the incoming Option 1 standards “competition”.

The Internet Video Coding (IVC) standard was a successful implementation of the idea – kind of. Indeed, a few years after approving AVC, IVC was found to perform better than AVC. Unfortunately 3 companies made blank Option 2 statements (of the kind “I may have patents and I am willing to license them at FRAND terms”), a possibility that ISO allows. MPEG had no means to remove the claimed infringing technologies, if any, and IVC is practically dead.

In 2013 MPEG approved the HEVC standard which provides the same quality as AVC at half the bitrate. The licensing situation is depicted in the picture below (courtesy of Jonathan Samuelsson of Divideon): of ~45 patent holders ~1/3 published their licences  and 2/3 have joined one of the 3 patent pools, one of which has not published their licence.

 I saw the threat coming and one year ago I tried to bring the matter to the attention of the higher layers in ISO. My attempts were thwarted by a handful of NPEs.

Alliance for Open Media (AOM) has occupied the void created by MPEG’s outdated (but still largely used) video compression standard (AVC), absence of competitive Options 1 standards (IVC) and unusable modern standard (HEVC). AOM’s AV1 codec, due to be released soon, is claimed to perform better than HEVC and is said to be offered royalty free.

At long last everybody realises that the old MPEG business model is now broke, all the investments (collectively hundreds of millions USD) made by the industry for the new video codec will go up in smoke and AOM’s royalty free model will spread to other business segments as well.

Can something be done?

The situation can be described as tragic. This does not mean that there is nothing left to do. I personally doubt that something will be done, though, seeing how blindfolded the industry is. As I like to say, God blinds those He wants to lose.

The first action is to introduce what I call “fractional options”. As I said, ISO envisages two forms of licensing: Option 1, i.e. royalty free and Option 2, i.e. FRAND, which is taken to mean “with undetermined licence”. We could introduce fractional options in the sense that proposers could indicate that their technologies be assigned to specifically identified profiles with an “industry licence” (defined outside MPEG) that does not contain monetary values. For instance, one such licence could be “no charge” (i.e. Option 1), another could be targeted to the OTT market etc.

The second action, not meant to be alternative to the first, is to streamline the MPEG standard development process. Within this a first goal is to develop coding tools with “clear ownership”, unlike today’s tools which are often the result of contributions with possibly very different weights. A second goal is not to define profiles in MPEG. A third goal could be to embed in the standard the capability to switch coding tools on and off.

The work of patent pools would be greatly simplified because they could define profiles with technologies that are “available” because they would know who owns which tools. Users could switch on tools once they become usable, e.g. because the relevant owner has joined a patent pool.

These are just examples of how the MPEG standard development process can be adapted to better match the needs of entities developing licences and without becoming part – God (ISO) forbid – of a licence definition process.

Is this enough?

Even if industry will decide to get their acts together and put a patch to MPEG’s business model, it is easy to anticipate that the next threat is just around the corner. But MPEG cannot have a future if it is going to pass from crisis to crisis, each of which has an inevitable “cost”.

MPEG’s problem – so far a blessing – is that it is a working group, the lowest organisational structure in ISO. MPEG’s governance is weak: if there is a need, like it happened recently, to bring problems to the attention of the decision making layers in ISO, it is necessary to cross several layers of other committees with completely different priorities and concerns, each time asking for their approval and each time getting a diluted message. Only by becoming a Technical Committee can MPEG, the forerunner of problems that other committees will experience in the years to come, stay competitive in the market.

End of the world as we know it?

The reader should not think that I am personal concerned by all this, if not intellectually. I have been running MPEG for the last 30 years serving the industry – and billions of users – and I have been blessed by professional satisfactions that few have had enjoying the collaboration of thousands of experts each driven by their own motivations, but united in their desire to make the best standards. If MPEG ends now it will be a pity, but if this is the decision of the stakeholders – the industry in MPEG – so be it.

My concerns are at a different level and have to do with the way industry at large will be able to access innovation. AOM will certainly give much needed stability to the video codec market but this will come at the cost of reduced if not entirely halted technical progress. There will simply be no incentive for companies to develop new video compression technologies, at very significant cost because of the sophistication of the field, knowing that their assets will be thankfully – and nothing more – accepted and used by AOM in their video codecs.

Companies will slash their video compression technology investments, thousands of jobs will go and millions of USD of funding to universities will be cut. A successful “access technology at no cost” model will spread to other fields.

So don’t expect that in the future you will see the progress in video compression technology that we have seen in the past 30 years.

Standard MPEG-1 MPEG-2 MPEG-4 Visual MPEG-4 AVC MPEG-H HEVC
Bitrate vs previous 25% less 25% less 30% less 60% less
Year of approval 1992 1994 1998 2003 2013

The future of Europe: taming national states

I want my standing to be clear: at the age of 16 I was distributing European propaganda leaflets to my classmates. The decades since then did not pass in vain, but my position has not substantially changed.

I would also like to tell you how I tackled mykids’ education: I spoke English to them, but in the family I spoke the local Piedmontese dialect. Why? Because I thought – and think – that we belong to a greater whole whose de facto koiné is English, but we are collections of smaller groups with their own identities, the most characterising element of which is language.

After the ravages of two world wars the Founding Fathers of the European Unions conceived the project of creating a greater whole and their successors continued more or less in that spirit, but with decreasing momentum. In hindsight that was the best project that they could conceive, but today the European project looks weary.

One problem is that in most European countries the covenant that binds the people has been under attack for many years.

  • In Southern Tyrol (Alto Adige in Italian), a part of the German speaking population wanted to return to Austria
  • In Spain part of the the Basques fought for independence from the Spanish state
  • In Northern Ireland the Catholics fought for independence
  • In Belgium Walloons and Flemish could no longer live together and created a federation
  • In former Czechoslovachia the Czechs and Slovaks could no longer live together and created two separate states
  • In Italy the Northern League advocated independence to avoid paying taxes to a state that was accused to be unable to redistribute them properly
  • In the former Yugoslavia Orthodoxes, Catholics and Muslims – Slavic and Albanians – not only could not live together, but each wanted a piece of the others’ territories
  • In Scotland a sizeable number of Scots wish to separate from the United Kingdom
  • In Catalunya a majority of Catalans confirm that they want a separate state from Spain.

Less prominent, but present in other states as well, regionalism if not separatism is mounting.

I do not believe there are cases where separatists feel they are being “occupied” or “oppressed” by an alien people – at least, no longer – but we live with a deadly recipe with such ingredients as “the capital city exploits us”, “we want to stay by ourselves because we are different”, “we are in the same country just because our ancestors lost the battle of…”, “they always exploited us” and so on.

The situation is worrying because the mixed economic situation exacerbates the spirits of the claimants and, to make an already unstable situation more difficult, millions of migrants wait at the gates of Europe. Still the states and Europe take the attitude that “legality is to be preserved”, i.e. “forget about changing anything”.

I do think that this attitude is not sustainable and that there is a way to handle these problems that is inspired by my handling of the education of the kids. We have a “container” – Europe – that includes both states and regions. Does it really make a difference if Catalunya is part of Spain or is directly part of Europe, if South Tyrol is part of Italy or directly part of Europe, if there is no Belgium and there are in its stead Wallonia, Flanders and the Brussels Federal District?

Of course there is, because fiscality, not the wrongs of history, is what drives the claims of separatists, hard or mild. This matter is to be resolved by politicians who should stop repeating the mantra of “legality is to be preserved” and take the bull by the horns.

Compression – the technology for the digital age

Table of contents

  1. Introduction
  2. A little bit of history
  3. Video and audio generate a lot of bits
  4. More challenges for the future
  5. Even ancient digital media need compression
  6. There are no limits to the use of compression
  7. A bright future, or maybe not
  8. Acknowledgements

Introduction

People say that ours is the digital age. Indeed, most of the information around us is in digital form and we can expect that what is not digital now will soon be converted to that form.

But is being digital what matters? In this paper I will show that being digital does not make information more accessible or easier to handle. Actually being digital may very well mean being less able to do things. Media information becomes accessible (even liquid) and processable only if it is compressed.

Compression is the enabler of the evolving media rich communication society that we value.

A little bit of history

In the early 1960s the telco industry felt ready to enter the digital world. They decided to digitise the speech signal by sampling it at 8 kHz with a nonlinear 8 bits quantisation. Digital telephony exists in the network and it is no surprise that not many know about it because hardly ever was this digital speech compressed.

In the early 1980s Philips and Sony developed the compact disc (CD). Stereo audio was digitised by sampling the audio waveforms at 44.1 kHz with 16 bits linear and stored on a laser disc. This was a revolution in the audio industry because consumers could have an audio quality that did nor deteriorate with time (until, I mean, the CD stopped playing altogether). Did the user experience change? Definitely. For the better? Some people, even today, disagree.

In 1980 ITU-T defined the Group 3 facsimile standard. In the following decades hundreds of millions Group 3 digital facsimile devices were installed worldwide. Why? Because it cut transmission time of an A4 sheet from 6 min (Group 1), or 3 min (Group 2) to about 1 min. If digital fax had not used compression, transmission with a 9.6 kbit/s modem (a typical value of that time) would have taken more than required by Group 1 analogue facsimile.

A digital uncompressed photo of, say, 1 Mpixel would take half an hour on a 9.6 kbit/s modem (and it was probably never used in this form), but with a compression of 60x it would take half a minute. An uncompressed 3 min CD track on the same modem would take in excess of 7 h, but compressed at 96 kbit/s would take about 30 min. That was the revolution brought about by the MP3 audio compression that changed music for ever.

Digital television was specified by ITU-R in 1980 with luminance and the two colour difference signals sampled at 13.5 and 6.75 MHz, respectively, at 8 bits per sample. The result was an exceedingly high bitrate of 216 Mbit/s. As such it never left the studio if not on bulky magnetic tapes. However, when MPEG developed the MPEG-2 standard capable of yielding studio-quality video compressed at a bitrate of 6 Mbit/s and it became possible to pack 4 TV programs (or more) where there was just one analogue program, TV was no longer the same.

In 1987 several European countries adopted the GSM specification. This time speech was digitised and compressed at 13 kbit/s and GSM spread to all the world.

The bottom line is that being digital is pleasantly good, but it is so much practically better being compressed digital.

Video and audio generate a lot of bits

Digital video, in the uncompressed formats used so far, generates a lot of bits and new formats will continue doing so. This is described in Table 1 that gives the parameters of some of the most important video formats, assuming 8 bits/sample. High Definition with an uncompressed bitrate of ~829 Mbit/s is now well established and Ultra High Definition with an uncompressed bitrate of 6.6 Gbit/s is fast advancing. So-called 8k with a bitrate of 26.5 Gbit/s seems to be the preferred choice for Virtual Reality and higher resolution values are still distant, but may be with us before we are even aware of them.

Table 1: Parameters of video formats

  # Lines # Pixels/

line

Frame

frequency

Bitrate

(Gbit/s)

“VHS” 288 360 25 0.031
Standard definition 576 720 25 0.166
High definition 1080 1920 25 0.829
Ultra high definition 2160 3840 50 6.636
8k (e.g. for VR) 4320 7680 50 26.542
16k (e.g. for VR) 8640 15360 100 212.336

Compression has been and will continue being the enabler of the practical use of all these video formats. The first format in Table 1 was the target of MPEG-1 compression which reduced the uncompressed 31 Mbit/s to 1.2 Mbit/s at a quality comparable to video distribution of that time (VHS). Today it may be possible to send 31 Mbit/s through the network, but no one would do it for a VHS type of video. They would possibly do it for a compressed UHD Video.

In its 30 years of existence MPEG has constantly pushed forward the limits of video compression. This is shown in the third column of Table 2 where it is possible to see the progress of compression over the years: going down in the column, every cell gives the additional compression provided by a new “generation” of video compression compared to the previous one.

Table 2: Improvement in video compression (MPEG standards)

Std Name Base Scalable Stereo Depth Selectable viewpoint Date
1 Video VHS 92/11
2 Video SD -10% -15% 94/11
4 Visual -25% -10% -15% 98/10
4 AVC -30% -25% -25% -20% 5/10% 03/03
H HEVC -60% -25% -25% -20% 5/10% 13/01
I Immersive Video ? ? ? ? ? 20/10

The fourth column in Table 2 gives the additional saving (compared to the third column) offered by scalable bitstreams compared to simulcasting two bitstreams of the same original video at different bitrates (scalable bitstreams contain, in the same bitstream, two or more bitstreams of the same scene at different rates). The fifth column gives the additional saving offered by stereo coding tools compared to independent coding of two cameras pointing to the same scene. The sixth column gives the additional saving (compared to the fifth column) obtained by using depth information and the seventh column gives the additional cost (compared to the sixth column) of giving the viewer the possibility to select the viewpoint. The last column gives the date MPEG approved the standard and the last row refers to the next video coding standard under development of which 2020/10 is the expected time of approval.

Table 3 gives the reduction in number of bits required to represent pixels in a compressed bitstream through 3 video compression standards applied to different resolutions (TV-HDTV-UHDTV) at typical bitrates (4-8-16 Mbi/s).

Table 3: Reduction in bit/pixel as new standards appear

816

Std

Name Application Mbit/s

Bit/pixel

2 Video

TV

4

0.40

4 AVC  HDTV

8

0.15

H HEVC  UHDTV

16

0.04

MPEG has pushed the limits of audio compression as well. The first columns of Table 4 gives the different types of sound formats (full channels.effects channels) and the sampling frequency. The next columns give the MPEG standard, the target bitrate of the standard and the performance (as subjective audio quality) achieved by the standard at the target bitrate(s)

Table 4: Audio systems

 

Sampling

freq. (kHz)

MPEG

Std

kbit/s Performance
Speech 8 64 Toll Quality
CD 44.1 1,411 CD Quality
Stereo 48 1-2-4 128 –> 32 Excellent to Good
5.1 surround 48 2-4 384 –> 128 Excellent to Good
11.1 immersive 48 H 384 Excellent
22.2 immersive 48 H 1,500 Broadcast Quality

More challenges for the future

Many think that the future of entertainment is more digital data generated by sampling the electromagnetic and acoustic fields that we call light and sound, respectively. Therefore MPEG is investigating the digital representation of the former by acting in 3 directions.

The first direction of investigation uses “sparse sensing elements” capable of capturing both the colour of a point and the distance (depth) of the point from the sensor. Figure 1 shows 5 pairs of physical cameras shooting a scene and a red line of points at which sophisticated algorithms can synthesise the output of virtual cameras.

Figure 1:  Real and virtual cameras

Figure 2 shows the output of a virtual camera moving along the red line of Figure 1.

Figure 2:  Free navigation example – fencing
(courtesy of Poznan University of Technology, Chair of Multimedia Telecommunications and Microelectronics)

The second direction of investigation uses “dense sensing elements” and comes in two variants. In the first variant each sensing element captures light coming from a direction perpendicular to the sensor  plane and, in the second illustrated by Figure 3, each “sensor” is a collection of sensors capturing light from different directions (plenoptic camera).

Figure 3: Plenoptic camera

The second variant of this investigation tries to reduce to manageable levels the amount of information generated by sensors that capture the light field.

Figure 4 shows the expected evolution of

  1. The pixel refresh rate of light field displays (right axis [1]) and the bitrate (left axis) required for transmission when HEVC compression is applied (blue line)
  2. The available broadband and high-speed local networks bitrates (red/orange lines and left axis [2], [3]).

Figure 4: Sensor capabilities, broadband internet and compression
(courtesy of Gauthier Lafruit and Marek Domanski)

The curves remain substantially parallel and separated by a factor of 10. i.e. one unit in log10 (yellow arrow). More compression is needed and MPEG is indeed working to provide it with its MPEG-I project. Actually, today’s bandwidth of 300 Mbps (green dot A) is barely enough to transmit video at 8 Gpix/s (109.9 on the right axis) for high-quality stereoscopic 8k VR displays at 120 frames/s compressed with HEVC.

In 2020 we expect light field displays that project hundreds of high-resolution images in various directions for glasses-free VR to reach point B on the blue line. Again HEVC compression is not sufficient to transmit all data over consumer-level broadband networks. However, this will be possible over local networks, since the blue line stays below the top orange line in in 2020.

The third direction of investigation uses “point clouds”. These are unordered sets of points in a 3D space, used in several industries, and typically captured using various setups of multiple cameras, depth sensors, LiDAR scanners, etc., or synthetically generated. Point clouds have recently emerged as representations of the real world enabling immersive forms of interaction, navigation, and communication.

Point clouds are typically represented by extremely large amounts of data, which is a significant barrier for mass market applications. Point cloud data not only includes spatial location (X,Y,Z) but also colour (R,G,B or YCbCr), reflectance, normals, and other data depending on the needs of the volumetric application domain. MPEG is developing a standard capable of compressing a point cloud to levels that are compatible with today’s networks reaching consumers. This emerging standard compresses 3D data by leveraging decades of 2D video coding technology development and combining 2D and 3D compression technologies. This approach allows industry to take advantage of existing hardware and software infrastructures for rapid deployment of new devices for immersive experiences.

Targeted applications for point cloud compression include immersive real-time communication, six Degrees of Freedom (6 DoF) where the user can walk in a virtual space, augmented/mixed reality, dynamic mapping for autonomous driving, and cultural heritage applications.

The first video (Figure 5) shows a dynamic point cloud representing a lady that a viewer can see from any viewpoint.

 

Figure 5: Video representing a dynamic point cloud (courtesy of i8)

The second video by Replay Technology shows how a basketball player, represented by a point cloud as in Figure 5, could be introduced in a 360° video and how viewers could create their own viewpoint or view path of the player.

The third video (Figure 6) represents a non-entertainment application where cars equipped with Lidars could capture the environment they navigate as point clouds for autonomous driving or recording purposes.

 

Figure 6: Point cloud for automotive (courtesy of Mitsubishi)

Dense sensing elements are also applicable to audio capture and coding. Wave Field Synthesis is a technique in which a grid or array of sensors (microphones) are spaced at least as closely as one-half the highest wavelength in the sound signal (1.7 cm for 20 kHz). Such an array can be used to capture the performance of a symphony orchestra, where the array is placed between the orchestra and the audience in the concert hall. When that captured signal is played out to an identically placed set of loudspeakers, the acoustic wave field at every seat in the concert hall can be correctly reproduced (subject to the effects of finite array size). Hence, with this technique, every seat in the “reproduction” hall has the exact experience as with the live performance. Figure 7 represents a Wave Field Synthesis laboratory installation.

Figure 7: Wave Field Synthesis equipment (courtesy of Fraunhofer IDMT)

Even ancient digital media need compression

So far we have been talking of media which are intrinsically analogue but can be converted to a digital representation. The roots of nature, however, are digital and so are some of its products, such as the genome, which can be considered as a special type of “program” designed to “run” on a special type of “computer” – the cell. The program is “written” with an alphabet of 4 symbols (A, T, G and C) and physically carried as four types of nucleobases called adenine, thymine, guanine and cytosine on a double helix created by the bonding of adenine and thymine, and cytosine and guanine (see Figure 8).

Figure 8: The double helix of a genome

Each of the cells (~ 37 trillion for a human weighing 70 kg) carries a hopefully intact copy of the “program”, pieces of which it runs to synthesise proteins for its specific needs (troubles arise when the original “program” lacks the instructions to create some vital proteine or when copies of the “program” have changed some vital instructions).

The genome is the oldest – and naturally made – instance of digital media. The length of the human “program” is significant: ~ 3.2 billion base pairs, equivalent to ~800 MBytes.

While a seemingly simple entity like a cell can transcript and execute the program contained in the genome or parts of it, humans have a hard time even reading it. Devices called “sequencing machines” do the job, but rather lousily, because they are capable of reading only random fragments of the DNA and of outputting the values of these fragments in a random order.

Figure 9 depicts a simplified description of the process, which also shows how these “reads” must be aligned, using a computer program, against a “reference” DNA sequence to eventually produce a re-assembled DNA sequence.

Figure 9: Reading, aligning and reconstructing a genome

To add complexity to this already complex task, sequencing machines also generate a number (the quality score) that corresponds to the quality of each base call. Therefore, sequencing experiments are typically configured to try to provide many reads (e.g. ~200) for each base pair. This means that reading a human genome might well generate ~1.5 TBytes.

Transporting, processing and managing this amount of data is very costly. Some ASCII formats exist for these genomic data: FASTQ, a container of non-aligned reads and Sequence Alignment Mapping (SAM), a container of aligned/mapped reads. Zip is applied to reduce the size of both FASTQ and SAM, generating zipped FASTQ blocks and BAM (Binary version of SAM) with a compression of ~5. However compression performance is poor, data access awkward and maintenance of the formats a problem.

MPEG was made aware of this situation and, in collaboration with ISO TC 276 Biotechnology, is developing the MPEG-G standard for genomic information representation (including compression) that it plans to approve in January 2019 as FDIS. The target of the standard is not compression of the genome per se, but compression of the many reads related to a DNA sample as depicted in Figure 9. Of course the standard provides lossless compression.

Unlike the BAM format where data with different statistical properties are put together and then zip-compressed, MPEG uses a different and more effective approach:

  1. Genomic data are represented with statistically homogeneous descriptors
  2. Metadata associated to classified reads are represented with specific descriptors
  3. Individual descriptors sub-sets are compressed achieving a compression in the range of 100x
  4. Descriptors sub-sets are stored in Access Units for selective access via standard API
  5. Compressed data are packetised thus facilitating the development of streaming applications
  6. Enhanced data protection mechanisms are supported.

Therefore MPEG-G not only saves storage space and transmission time (the 1.5 TBytes mentioned above could be compressed down to 15 GBytes) but also makes genomic information processing more efficient with an estimated improvement of data access times of around 100. Additionally MPEG-G promises, as for other MPEG standards, to provide more efficient technologies in the future using a controlled process.

It may be time to update the current MPEG logo from what it has been so far

to a new logo

There are no limits to the use of compression

There is another type of “program” called “neural network” that is acquiring more and more importance. As a DNA sample contains simple instructions to create a complex organism, so a neural network is composed of very simple elements assembled to solve complex problems. The technology is more than 50 years old but it has recently received a new impetus and even appears – as Artificial Intelligence – in ads directed to the mass market.

As a human neuron, the neural network element of Figure 10 collect inputs from different elements, processes them and, if the result is above a threshold, generates an activation signal.

Figure 10: An element of a neural network

As in the human brain it is also possible to feed back the output signal to the source (see Figure 11).

Figure 11: Recurrency in neural networks

Today, neural networks for image understanding consist of complex configurations and several million weights (see Figure 10) and it is already possible to execute specific neural-network based tasks on recent mobile devices. Why should we then not be able to download to a mobile device a neural network that best solves a particular problem that can range from image understanding to automatic translation to gaming?

The answer is yes but, if the number of neural network-based applications increases, if more users want to download them to their devices and if applications grow in size to solve ever more complex problems, compression will be needed and will play the role of enabler of a new age of mobile computing.

MPEG is currently investigating if and how it can use its compression expertise to develop technologies that efficiently represent neural networks of several million neurons used in some of its standards under development. Unlike MPEG-G, compression of neural networks can be lossy, if the user is willing to trade algorithm efficiency with compression rate.

A bright future, or maybe not

Implicitly or explicitly, all big social phenomena like MPEG are based on a “social contract”. Which are the parties to and what is the “MPEG social contract” about?

When MPEG started, close to 30 years ago, it was clear that there was no hope of developing decently performing audio and video compression standards after  so many companies and universities had invested for decades in compression research. So, instead of engaging in the common exercise of dodging patents, MPEG decided to develop its standards having as goal the best performing standards, irrespective of the IPR involved. The deal offered to all parties was a global market of digital media products, services and applications offering interoperability to billions of people generating hefty royalties to patents holders.

MPEG did keep its part of the deal. Today MPEG-based digital media enable major global industries, allow billions of people to create and enjoy digital media interoperably and provide royalties worth billions of USD (NB: ISO/IEC and MPEG have no role in royalties or in the agreements leading to them).

Unfortunately some parties have decided to break the MPEG social contract. The HEVC standard, the latest MPEG standard targeting unrestricted video compression, was approved in January 2013. Close to 5 years later, there is no easy way to get a licence to practice that standard.

What I am going to say is not meant to have and should not be interpreted in a legal sense. Nevertheless I am terribly serious about it. Whatever the rights granted to patent holders by the laws, isn’t depriving billions of people, thousands of companies and hundreds of patent holders of the benefits of a standard like HEVC and, presumably, other future MPEG standards, a crime against humankind?

Acknowledgements

I would like to thank the MPEG subgroup chairs, activity chairs and the entire membership (thousands of people over the years) for making MPEG what it is recognised for – the source of digital media standards that have changed the world for the better.

Hopefully they will have a chance to continue doing so in the future as well.

References

[1] “Future trends of Holographic 3D display,” http://www-oid.ip.titech.ac.jp/holo/holography_when.html
[2] “Nielsen’s law of internet bandwidth,” https://www.nngroup.com/articles/law-of-bandwidth/
[3] “Bandwidth growth: nearly what one would expect from Moore’s law,” https://ipcarrier.blogspot.be/2014/02/bandwidth-growth-nearly-what-one-would.html

 

On my Charles F. Jenkins Lifetime Achievement Award

In a press release today the Academy of Television Arts and Sciences announces that I have been selected to receive the Charles F. Jenkins Lifetime Achievement Award, “a special engineering honor to an individual whose contributions over time have significantly affected the state of television technology and engineering”.

I should be happy to see the recognition of 30 years of work dedicated to making real the vision of humans finally free to communicate without barriers and sharing more and more rewarding digital media experiences. Still, I need to make a few remarks.

The first remark is that my endeavours were driven by the hand of God and that tens, hundreds and thousands of people have made MPEG what it is recognised for: the originator of standards that have changed the lives of billions of people for the better.

The second remark concerns the word “lifetime” in the name of the award. This sort of implies that my professional lifetime has been observed in its entirety. I hereby communicate that I do not intend to retire anytime soon.

The last and most important remark concerns two necessary conditions for the success of MPEG standards. MPEG demonstrably achieves the first – technical excellence – but those in charge of the second – commercial exploitability – perform less and less. Indeed MPEG approved the HEVC standard in January 2013 (56 months ago!), but prospective users must negotiate with 3 different patent pools and a host of individual patent holders to get a licence. There are standards that will never see the light or, if they will do, will not be used because the standards organisations have been unable to update their processes from the time they dealt with standards for nuts and bolts.

I did my best to reverse this trend by raising awareness on these problems. Vested interests have stopped me, depriving billions of people and various industries of the benefits of new MPEG standards.

I am happy to receive this Charles F. Jenkins Lifetime Achievement Award – for what it means for the past – but with a sour taste for the future, the only thing that matters.

Standards for the present and the future

It is hard to talk sensibly to the general public about standards. It is a pity because standards are important as they ensure e.g. that nuts match with bolts, paper sheets feed into printers, music files play on handsets and a lot more.

One reason is that standards are one of the most ethereal things on Earth as they concern interfaces between systems.  Another reason is that the many industries created by human endeavour have developed their own customs: what is a must in one industry can be anathema in another. Yet another is the fact that standards are often the offspring of innovation, often generating flows of money that can be anything from a trickle to a swollen river.

Standards used to have a direct impact on industry, but only rarely on end users, at most only on a small portion of them. Recently, however, Information and Communication Technologies have become so pervasive, affecting companies by the thousands and people by the billions, that standards underpinning the industry have assumed unseen impact and visibility.

One of the most egregious cases is the ISO/IEC standard called High Efficiency Video Coding (HEVC). Work on this standard started in January 2010 and ended with the first release exactly 3 years later. As of July 2017 there are 3 patent pools (one representing 35 patent holders) and a number of companies (not represented by any patent pool) all claiming to have Intellectual Property (IP) on the standard.

It is no surprise that most people do not even know about HEVC because it is seldom – if ever – used in audio-visual services, and this 4 and a half years after industries could implement the standard – 18 months longer than it took to develop the standard itself. And some people say that standardisation takes too long!

This situation creates three clear losers:

  1. Companies that have contributed their technologies to the standard do not get the benefits of their investment;
  2. Companies that would be ready to use – in products, services and applications – HEVC because it performs better (by 60%) than Advanced Video Coding (AVC) currently in use are practically prevented from using it;
  3. End users are deprived of their right to get better or new services, or simply services where it was not possible to have them before.

If there is market failure when the good/service allocation is not efficient because one can imagine a different situation where many individuals are better-off without making others worse-off, then we are in front of a market failure.

Or maybe not. According to recent news Apple has announced that they will support HEVC in High Sierra (macOS) and iOS 11. One expects that a company as important as Apple does not make such an announcement if they do not have their back well covered.

But is this a big news? It depends on how you look at it.

  • Actually not so big, because major handset manufacturers are reportedly already installing HEVC chips in their handsets. So the Apple news is the software equivalent of a déjà vu and we are in in front of a market failure.
  • If the news is as big as some people claim it is, then we are forced to conclude that only a company worth 800 B$ can get the licence required to exercise the HEVC standard. So we are in front of market success.

Maybe not, or maybe yes. If we are in front of market success, we have sacrificed a major principle of international standardisation enshrined in the ISO/IEC Directives: standards must be accessible to everybody on a nondiscriminatory basis on reasonable terms and conditions. Everybody of the size of Apple Inc., I mean.

The problem of these well-intentioned rules is that they were developed at a time when patents relevant to a standard were typically held by one company. Even with tens of MPEG-2 and AVC patent holders, things were still under control because there was one patent pool and a limited number of patent holders outside. However, in HEVC we are dealing with close to 100 patent holders grouped in 3 patent pools and a significant number of patents holders outside. HEVC is not the exception, but the rule, in this and future standards.

The sentence underlined does not imply that it is always necessary to pay in order to access a standard. If an amount has to be paid, it should be the same for all. If access is free it should be free for all.

The devil, they say, is in the details. Per ISO/IEC Directives a patent holder is not obliged to disclose which patents are relevant to a technology proposed for a standard. This is not ideal but acceptable if the patent holder intends to license the technologies contributed for a fee, because precise identification of relevant patents will be part of the development of licensing terms with a now well-honed process.

If, however, access to the standard is intended to be free of charge, such “blanket” declarations should not be acceptable because the committee developing a standard has no means to remove the technology. Declarations may come from companies that have more patents than employees and there is no process to develop licensing terms.

It should also not be acceptable that patent holders make patent declaration where they declare they own relevant patents that they do not intend to licence them. Again the committee developing the standard has no means to remove the infringing technology.

These problems have been identified and brought to an appropriate level in ISO/IEC. Is anything going to happen? Don’t count on it. At the meeting where the problems were presented, delegates from a handful of countries disputed the process that brought the matter to the attention of the committee, but no discussion could take place on the substance of the matter.

Something is rotten in the state of Denmark, and some are determined to keep it rotten.

Personal devices and persons

USA President Obana is reported saying (NYT 2016/03/12): “If, technologically, it is possible to make an impenetrable device or system, where the encryption is so strong that there is no key, there is no door at all, then how do we apprehend the child pornographer? How do we disrupt a terrorist plot?”

Anwer with another question: “How do we make a child pornographer or a terrorist talk if he does not want to?”

Egg and chicken – tax and expenses

After decades of funding the most fancy and unproductives aspects of the welfare state by raising taxes, politicians have discovered that too much is too much. So cutting taxes has become the mantra of right- and left-wing politicians alike.

There is one problem, though. Citizens have become unresponsive (i.e. they don’t believe anymore to “tax cuts” promises).

One suggestion to politicians in need of recovering citizens’ confidence: instead of saying “I will cut this tax”, say “I will cut this expense”.

Getting out of the mess

The Christian religion explains the mess of the world we live in with the Original Sin that has destroyed the good nature that would otherwise be in us.

The Original Sin cannot be undone but Baptism and adherence to the Religion’s precepts promise to make us reborn people.

How can this help sorting out the European mess experienced of these days? Here, too, we have an original sin: greedy Greek politicians bent on lighting new debts from greedier bankers bent on dispensing the banks’ assets as if they were candies for children.

Alas, that original sin cannot be undone and there is no baptism redeeming people. There are two precepts, though, that can help people’s rebirth.

  1. If a bank manager recklessly lends money to a country and the loan goes sour, the manager pays. There is no room for excuses like “the country showed bogus accounts” because 1/10 of the diligence bank managers put when a small enterprise requests a loan should be enough to detect holes in the accounts of a country.
  2. If a bank risks failing because its managers have recklessly rented money to greedy politicians, the bank fails. No socialising of losses. Notionally the state can decide to rescue the bank, but bank shareholders get the reward the deserve for their lack of vigilance (aka connivance): nothing.