The MPAI machine has started


When the MPAI web site opened it was a challenging time for a new initiative: 19th of July was in the middle of summer holidays for some and the beginning for others. That did not matter so much, however, because the idea of combining a focus on AI for data coding and the proposal to rejuvenate the decades-old FRAND declaration process proved to be too attractive an amalgam.

A group of dedicated people working in unceasing rhythm produced the statutes of Moving Picture, Audio and Data Coding by Artificial Intelligence (MPAI) and started developing use cases that could potentially become MPAI standards.

MPAI is not (yet) formally established (it will be incorporated on one of the last 3 days of September), but MPAI is already following the workflow that is part of the statutes. The workflow envisages that there is a phase where use cases are proposed according to a detailed template, and merged. Merging of use cases is necessary because the MPAI process is supposed to be fully bottom up. Different members propose different use cases and, where it makes sense, use cases are merged to represent a more comprehensive set of needs.

The “use case” phase is then followed by the “functional requirements” phase where requirements are developed from use cases. When functional requirements are fully defined, the MPAI General Assembly decides with a 2/3 majority whether the requirements justify the creation of a standard.

The approval of the functional requirements does not mean that the project can start, because users of the potential standard should have some understanding of the “commercial requirements”. This is what the framework licence, i.e. the business model that IPR holders will apply to monetise their IP – without numerical values – aims to be.

On the 9th of September MPAI agreed that the integrated collection of use cases justifies the development of functional requirements.

This article intends to provide a summary of the use case called “Context-based Audio Enhancement” (MPAI-CAE).

What is MPAI-CAE

The overall user experience quality is highly dependent on the context in which audio is used, e.g.

  1. Entertainment audio can be consumed in the home, in the car, on public transport, on-the-go (e.g. while doing sports, running, biking) etc.
  2. Voice communications: can take place office, car, home, on-the-go etc.
  3. Audio and video conferencing can be done in the office, in the car, at home, on-the-go etc.
  4. (Serious) gaming can be done in the office, at home, on-the-go etc.
  5. Audio (post-)production is typically done in the studio
  6. Audio restoration is typically done in the studio

By using context information using AI to act on the content, it is possible substantially to improve the user experience.

There are already solutions that adapt the conditions in which the user experiences content or service for some of the contexts mentioned above. However, they tend to be vertical in nature, making it dif­ficult to re-use possibly valuable AI-based components of the solutions for differ­ent applications. This hinders the broad adoption of AI technologies.

MPAI-CAE aims to create a horizontal market of re-usable and possibly context-depending components that expose standard interfaces. With MPAI-CAE, the market would become more receptive to innov­ation, hence more compet­itive benefiting industry and consumers alike.

Some examples of audio enhancement

  1. Enhanced audio experience in a conference call

Often, the user experience of a video/audio conference can be marginal. Too much background noise or undesired sounds can lead to participants not understanding what participants are saying. By using AI-based adaptive noise-cancellation and sound enhancement, MPAI-CAE can virtually eliminate those kinds of noise without using complex microphone systems to capture environment characteristics.

  1. Pleasant and safe music listening while biking

While biking in the middle of city traffic, AI can process the signals from the environment captured by the microphones available in many earphones and earbuds (for active noise cancellation), adapt the sound rendition to the acoustic environment, provide an enhanced audio experience (e.g. performing dynamic signal equalisation), improve battery life and selectively recognise and allow relevant environment sounds (i.e. the horn of a car). The user enjoys a satisfactory listening experience without losing contact with the acoustic surroundings.

  1. Emotion enhanced synthesised voice

Speech synthesis is constantly improving and finding several applications that are part of our daily life (e.g. intelligent assistants). In addition to improving the ‘natural sounding’ of the voice, MPAI-CAE can implement expressive models of primary emotions such as fear, happiness, sad­ness, and anger.

  1. Speech/audio restoration

Audio restoration is often a time-consuming process that requires skilled audio engineers with specific experience in music and recording techniques to go over manually old audio tapes. MPAI-CAE can automatically remove anomalies from recordings through broadband denoising, declicking and decrackling, as well as removing buzzes and hums and performing spectrographic ‘retouching’ for removal of discrete unwanted sounds.

What is there to standardise?

Three areas of standardisation have been identified:

  1. Context type interfaces: a first set of input and output signals, with corresponding syntax and semantics, for audio usage contexts considered of sufficient interest (e.g. audiocon­ferencing and audio consumption on-the-go). They have the following features
    1. Input and output signals are context specific, but with a significant degree of commonality across contexts
    2. The operation of the framework is implementation-dependent offering implementors the way to produce the set of output signals that best fit the usage context
  2. Processing component interfaces: with the following features
    1. Interfaces of a set of updatable and extensible processing modules (both traditional and AI-based)
    2. Possibility to create processing pipelines and the associated control (including the needed side information) required to manage them
    3. The processing pipeline may be a combination of local and in-cloud processing
  3. Delivery protocol interfaces
    1. Interfaces of the processed audio signal to a variety of delivery protocols

Who will benefit from MPAI-CAE

Benefits: MPAI-CAE will bring benefits positively affecting

  1. Technology providers need not develop full applications to put to good use their technol­ogies. They can concentrate on improving the AI technologies that enhance the user exper­ience. Further, their technologies can find a much broader use in application domains beyond those they are accustomed to deal with.
  2. Equipment manufacturers and application vendors can tap from the set of technologies made available according to the MPAI-CAE standard from different competing sources, integrate them and satisfy their specific needs
  3. Service providers can deliver complex optimisations and thus superior user experience with minimal time to market as the MPAI-CAE framework enables easy combination of 3rd party components from both a technical and licensing perspective. Their services can deliver a high quality, consistent user audio experience with minimal dependency on the source by selecting the optimal delivery method
  4. End users enjoy a competitive market that provides constantly improved user exper­iences and controlled cost of AI-based audio endpoints.

Impact of MPAI-CAE

MPAI-CAE willfree users from the dependency on the context in which they operate; make the content experience more personal; make the collective service experience less dependent on events affecting the individual participant and raise the level of past content to today’s expectations.

MPAI-CAE should create a competitive market of AI-based components expos­ing standard interfaces, processing units available to manufacturers, a variety of end user devices and trigger the implicit need felt by a user to have the best experience whatever the context.

Posts in this thread

Posts in the previous thread

List of all related articles

At a technology and business watershed

For 30+ years, digital media have been the powerful driver that has fostered research, industry and commer­ce. This happened because the engine that has sustained the development – MPEG – has been capable to expand its coverage and provide new standards to a growing group of client industries. Academia and research, all facets of industry, and billions of users have benefited from this bonanza.

A good game lasts little. Well, not last so little as we are talking of a quarter of century. The reality is that today the engine has run out of steam – technology-wise and business-wise. The capital sentence of cutting MPEG in pieces and leaving the pieces without a head sanctions this as a fact.

Thirty years of practical data compression show the importance of the business that is built of data compression standards. Old technology has had its day. To renew it, we need fresh new technologies, but also a fresh new approach to the matter.

A new engine is coming to rescue. There is a vast group of technologies – going under the general name of Artificial Intelligence – that provide alternative and more promising approaches than using statistical correlation. They go deeper to understand what are the physical phenomena that we are trying to represent.

Moving Picture, Audio and Data Coding by Artificial Intelligence (MPAI) is the vehicle ready to implement the plan. It is a win-win proposal because Digital Media gets more performing technologies and Artificial Intelligence extends the range where its technologies are applied – not just digital media, but also other data types whose use can be more effective if converted to a more efficient representation.

The MPAI Statutes define data coding as the transformation of data from one representation into another representation that is more convenient for a particular purpose. Reducing the amount of data, a.k.a. compression, is one purpose that has proved to be very important to billions of people, but there are many other purposes. Having AI as the underlying technology layer will ensure that AI technologies for data coding will have wider applications, practical deployment will be accelerated and interoperability improved.

This is the grand plan, but we should not forget that the devil is in the details. MPEG has shown that technically excellent standards are no guarantee that their access will be easy and their use possible. Therefore, MPAI abandons the old FRAND approach because it does not guarantee that a licence for a supposed FRAND standard will be available. It embraces instead the Framework Licence approach where IPR holders agree to a business model, and possibly a cap to the total cost of a licence, _before_ the work on the standard starts.

MPAI attacks the main issue of the digital world – data representation, i.e. coding – and leverages AI to get the best results achievable in the current time frame. However, it has learnt the lesson: industry is no longer available to wait for the terms after the standard is done. They want to know more before starting the work.

Posts in this thread

Posts in the previous thread

List of all related articles

The two main MPAI purposes

So far I have discussed the shortcomings of the organisation called ISO: feudalism, chaos, hypocrisy, obtuseness and incompetence. Fortunately, we are not stuck with ISO. The good side of our age is that, if you are unhappy of a situation (here I am talking of standards), you have the chance to do better.

On the 19th of July 2020, I did exactly that when my post New standards making for a new age launched the idea of MPAI – Moving Picture, Audio and Data Coding by Artificial Intelligence.

In what sense and to what extent does MPAI take advantage of this freedom and for what purposes? This article will try and answer this question.

One reason for creating MPAI is to respond to the needs of the MPEG constituency, ill-served by ISO’s self-imposed “FRAND” constraints and by its inability to react to the changes of the industry induced by MPEG standards and the effects wrought by industry changes on MPEG itself. MPAI intends to reverse the trend that has made progressively harder, especially for some industries, to use MPEG standards. MPAI does not believe that the alternative of offering “royalty free” standards to industries is sustainable even in the short term, as recent news may be read to confirm.

MPAI takes an antipodal attitude to MPEG with respect to the nature of the requirements that drive the work on a standard. In MPEG functional requirements, made widely known to industry, used to drive the development of a standard. Users were left to “discover” the commercial terms when the standard was done, possibly 4 years after the start of the work, but actually much later than that because of the time it usually took to develop licence(s) and in some cases, never.

MPAI would love to make both functional and commercial requirements available to users. However, providing a full set of commercial requirements may not by supported by antitrust regulations. Therefore, MPAI comes as close as possible to that by making known to users the business model, that MPAI calls Framework Licence (FWL), that IPR holders will eventually apply in their licence(s). The FWL does not contain the monetary values and other data that would be frown upon by antitrust authorities.

These are the main features of the MPAI FWL

  1. As a minimum, the FWL will state that the total cost of the license(s) will be in line with the total cost of the licenses for similar data coding/decoding technologies, considering the market value of the specific technology. While this is the minimum, the FWL may go as far as to provide a cap on the total licence cost.
  2. The FWL will also state that access to the standard shall be granted in a non-discriminatory fashion.
  3. The FWL may envisage that IPR holders make available their patents if all IPR holders agree to do so without requiring a licence. Of course, if certain events specified in the FWL happen, e.g. IPR holders may decide to withdraw their offer. Therefore, the FWL specifies the terms of the licence, but not the values, that IPR holders will make available in case such events happen.
  4. Documents submitted by MPAI members that relate to a standard shall contain a declaration that the proponent will make available the terms of the Licence related to their patents according to the FWL, alone or jointly with other IPR holders after the standard is approved and not after commercial implementations of the standard become available on the market.
    1. Each member will declare it will take a Licence for the patents held by other members, if used, within one year from the publication by patent holders of their licence terms. Non-members remain obligated to acquire licences to use MPAI standards as mandated by the legislation of the territories in which they use MPAI standards.
  5. Each MPAI member shall inform the Secretariat of the result of its best effort and transparent identification of IP that it believes is infringed by a standard that is being or has already been developed by MPAI.
  6. Finally, when the MPAI standard is approved, IPR holders express their preference on the entity that should administer the patent pool of the standard.

So far, we have talked about how MPAI intends to work, but that is not the only driver. MPAI intends to work differently also on the content of the standards.

After decades of hardly visible work by researchers, Artificial Intelligence (AI) is arousing the attention of the public at large. Various AI technologies have been and are being investigated with the goal to provide more efficient and more intelligent compression. MPAI retains the proposal made by the Italian Standards Organisation UNI in 2018 to consider coding as a single field of which instances are: images, moving pictures, audio, 3D Graphics and other data such as those generated in manufacturing, automotive, health and other fields, and generic data.

Even though MPAI has not been formally incorporated, experts are busy collecting use cases where AI-enabled coding can provide new solutions that enhance industry performance while benefitting end users.

Posts in this thread

Posts in the previous thread

List of all related articles

Leaving FRAND for good


Fair, reasonable and non-discriminatory (FRAND) is the combination of adjectives that have been commonly used to indicate the way patent holders intended to licence their technologies in standards produced by many standardisation bodies and industry fora.

Decades ago, these adjectives were easily applicable to a standard. When JVC submitted their VHS cassette recording system to IEC for standardisation and IEC produced IEC 60774 – Helical-scan video tape cassette system using 12,65 mm (0,5 in) magnetic tape on type VHS, JVC submitted an FRAND declaration. The companies manufacturing VHS recorders knew exactly whom to call to get a licence. It was a small world with a limited number of players who knew each other since decades.

The same FRAND declaration was used in the MPEG-2 case. But the world at that time (1995) had already changed. As MPEG adopted the policy “any technology that it proves its worth and is accompanied by an FRAND declaration”, not only the number of licensors skyrocketed to about 30, but licensors included Consumer Electronic companies, telcos, telco manufacturers, IT companies and a couple of Non-Performing Entities (NPE) and a university.

MPEGLA created an MPEG-2 patent pool that gathered most patent holders and contributed to make MPEG-2 the first big success in a converged world.

Today the situation is again different than 25 years ago. As I have already said a few times the HEVC standard has almost two times as many the number of patent holders of MPEG-2, three patent pools and several patent holders who did not join any patent pool.

Moving Picture, Audio and Data Coding by Artificial Intelligence (MPAI), is a non-profit organisation to be established soon in Geneva. MPAI has identified the standardisation process based on accepting FRAND declarations as unable to cope with the tectonic changes that have affected the industry during the 30 years since work on the successful MPEG-2 standard started.

MPAI acknowledges the value of unconstrained collaboration adopted by MPEG. This should be retained, however, leaving to the market the task of sorting out all commercial aspects related to a standard may lead to situations of confusion like in HEVC and MPEG-H 3D Audio.

Let’s consider the main steps of the MPEG process

  1. Develop the functional requirements of a standard
  2. Accept the organisation’s commercial requirements in the form of FRAND declaration
  3. Publish a Call for Proposals asking for technologies that satisfy the functional and commercial requirements
  4. Develop a standard that satisfies the functional and commercial requirements, i.e. using technologies for which proponents have declared they will licence their technologies at FRAND terms.

The MPEG process assumed that the market would find a way to remunerate patent holders. This did happen, until patent pools stopped working as the HEVC case and MPEG 3D Audio case demonstrate. The MPEG-style commercial requirements are no match to the needs of the industry today.

MPAI believes that it should be possible to move the development of some commercial aspects that do not affect the constraints of the competition law to a phase that precedes the technical work. This does not mean that commercial aspects will be mixed with the development of standards.

In other words, MPAI adopts the following modifications of the steps of the MPEG standardisation process (in red the MPAI modifications)

  1. Develop the functional requirements of a standard
  2. Develop commercial requirements, in the form of  a “Framework Licence” (FWL)
  3. Publish a Call for Proposals asking for technologies that satisfy the functional and commercial requirements
  4. Develop a standard that satisfies the functional and commercial requirements, i.e. using technologies for which proponents have declared they will license their technologies in line with the FWL.

The FWL is the business model to remunerate IPRs in the standard without values: no $, no %, no dates etc. It is obviously standard specific.

Thus, the MPAI policy becomes “any technology that proves its worth and is accompanied by a Framework Licence declaration”.

The MPAI process is a major improvement to the MPEG process because the big problem of simultaneously defining the business model and the price of the patented technologies, is split in two: the business model before work on the standard starts (doable because the functional requirements are known), and the monetary values after the work on the standard is finished (doable because every patent holder can make their assessment of the worth of their technology).

There are advantages for users of the standard as well because they know the business model of the standard before work on the standard even starts. Actually, the FWL can even set a cap to the total cost of the standard.

In a multi-polar world where there are multiple sources of coding standards, users can only be attracted if they know what they are committing to before work on the standard starts. It is no longer possible to promise technical wonders and, years later, make customers discover a business disaster.

MPAI offers a process that accelerates the practical use of standardised technologies benefitting industry and end users alike.

Posts in this thread

Posts in the previous thread

List of all related articles

Better information from data

What is data

Data can be defined as the digital representation of an entity. The entity can have different attributes: physical, virtual, logical or other.

A river may be represented by its length, its average width, its max, min, average flow, the coordinates of its bed from the source to its mouth and so on. Typically, different data of an entity are captured depending on the intended use of the data. If the use of a river data is for agricultural purposes, the depth of the river, the amount of flow during the seasons, the width , the nature of the soil etc. are likely to be important.

Video and audio intended for consumption by humans are data types characterised by a large amount of data, typically samples of the visual and audio information: tens/hundreds/thousands/millions of Mbit/s for video, and tens/hundreds/thousands of kbit/s for audio. If we exclude niche cases, this amount of data is unsuited to storage and transmission.

High-speed sequencing machines produce snapshots of randomly taken segments with unknown coordinates of a DNA sample. As the “reading” process is noisy, the value of each nucleotide is assigned a “quality value” typically expressed by an integer. As a nucleotide must be read several tens of times to achieve a sufficient degree of confidence, the size of whole genome sequencing files may reach Terabytes. This digital representation of a DNA sample made of unordered and unaligned reads is costly to storage and transmission but is also ill-suited to extracting vital information for a medical doctor to make a diagnosis.

Data and information

Data become information when their representation makes them suitable to a particular use. Tens of thousands of researcher-years were invested in studying the problem and finding practical ways conveniently to represent audio and visual data.

For several decades, facsimile compression became the engine that drove efficiency in the office by offering high quality prints at 1/6 of the time taken by early analogue facsimile machines.

Reducing the audio data rate by a factor of 10-20 preserving the original quality, as offered by MP3 and AAC, changed the world of music forever. Reducing the video data rate by a factor of 1,000, as achieved by the latest MPEG-I VVC standard, multiplies the way humans can have visual experiences. Surveillance applications developed alternative ways to represent audio and video that allowed, for instance, event detection, object counting etc.

The MPEG-G standard, developed to convert DNA reads into information that needs less bytes to be represented, also gives easier access to information that is of immediate interest to a medical doctor.

These examples of transformation of data from “dull” into “smart” collections of numbers has largely been achieved by using the statistical properties of the data or their transformations.

Although quite dated, the method used to compress facsimile information is emblematic. The Group 3 facsimile algorithm defines two statistical variables: the length of “white” or “black” runs, i.e. the number of white/black points following and including the first white/black point after a black point) until a black/white point is encountered. A machine that had been trained to read billions of pages could develop an understanding of how business documents are typically structured and probably be able to use less bits (and probably more meaningful to an application) to represent a page of a business document.

CABAC is another more sophisticated example of data compression using statistical method. CA in CABAC stands for “Context-Adaptive”, i.e. the code set is adapted to the local statistics and B stands for Binary, because all variable are converted to and handled in binary form. CABAC can be applied to variables which do not have uniform statistical characteristics. A machine that had been trained by observing how the variable changes depending on other parameters should probably be able to use less and more meaningful bits to represent the data.

The end of data coding as we know it?

Many approaches at data coding were developed having in mind to exploit some statistical properties in some data transformations. The two examples given above show that Machine Learning can probably be used to provide more efficient solutions to data coding than traditional methods could provide.

It should be clear that there is no reason to stay confined to the variables exploited in the “statistical” phase of data coding. There are probably better ways to use machines’ learning capabilities on data processed by different methods.

The MPAI workplan is not set yet, but one proposal is to investigate, on the one hand, how far artificial intelligence can be applied to the “old” variables and how far artificial intelligence applied to fresh approaches can transform data into better usable information. MPAI standards will follow based on the results.

Smart application of Artificial Intelligence promises to do a better job in converting data into information than statistical approaches have done so far.

Posts in this thread

Posts in the previous thread

List of all related articles

An analysis of the MPAI framework licence


The main features of MPAI – Moving Picture, Audio and Data Coding by Artificial Intelligence – are the focus on efficient representation of moving pictures, audio and data in general using Artificial Intelligence technologies and the will to find a point where the interest of holders of IPR in technologies essential to a standard and the interest of users of the standard are equally upheld.

In this article I will analyse the principal tool devised by MPAI to achieve the latter goal, the framework licence.

The MPEG process

The process used by MPEG to develop its standards used to be simple and effective. As there were just too many IPRs in video coding back in the late 1980s, MPEG did not even consider the possibility to develop a royalty free standard. Instead it assessed any technology proposed and added it to the standard if the technology provided measurable benefits. MPEG did so because it expected that the interest of industry users and IPR holders would converge to a point satisfactory to both. This did indeed happen by reviving the institute of patent pools.

Therefore, the MPEG process can be summarised by this sequence of steps

Define requirements – Call for technologies – Receive patent declarations – Develop the standard – (Develop licence)

The brackets surrounding “Develop” indicate that MPEG has no business in that step. On the other hand, the entire process relied on the expectation that patent holders could be remunerated.

Lately, many including myself, have pointed out that last step of the process, has stalled. The fact that individually all patent holders declare to be willing to licence their patents at FRAND terms does not automatically translate into the only thing users need – a licence to use the standard.

A tour of existing licences

Conceptually a licence can be separated in two components. The first describes the business model that the patent holders apply to obtain their remuneration. The second determines the levels of remuneration.

Let’s take two relevant examples: the HEVC summary licences published by the two MPEG LA and HEVC Advance patent pools on their web sites where I have hidden the values in dollars, percentages and dates and replaced with variable. In the following you will find my summary. My wording is deliberately incomplete because my intention is to convey the essence of the licences and probably imperfect as I am not a lawyer. If you think I made a serious mistake or an important omission please send an email to Leonardo.


  • Licence has worldwide coverage and includes right to make, use and sell
  • Royalty paid for products includes right to use encoders/dec­oders for content
  • Products is sold to end users by a licensee
  • Vendors of products that contain an encoder/decoder may pay royalties on behalf of their customers
  • Royalties of R$/unit start from a date and apply if licensee sells more than N units/year or the royalties paid are below a cap of C$
  • The royalty program is divided in terms, the first of which end on a date
  • The percent increase from one term to another is less than x%

HEVC Advance

  • Licence applies to
    • Encoder/decoders in consumer products with royalty rates that depend on the type of device
    • Commercial content distribution (optical discs, video packs etc.)
  • Commercial products used to create or distribute content and streaming are not licensed
  • Licence covers all of licensor(s)’ essential claims of the standard practiced by a licensee
  • Royalty rates
    • Rates depend on territory in which consumer product/content is first sold (y% less in less developed countries)
    • Rates include separate and non-additive discounts of z% for being in-compliance and standard rates if licence is not in-compliance
    • Base rates for baseline profiles and extended rates for advanced profiles
    • Optional features (e.g. SEI messages) have a separate royalty structure
    • Rates and caps will not increase more than z% for any renewal term
    • Multiple cap categories (different devices and content) and single enterprise cap
    • All caps apply to total royalties due on worldwide sales for a single enterprise
    • Standard rates are not capped
    • Annual Credit of E$ applies to all enterprises that owe royalties and are in-compliance provided in four equal quarterly installments of 25E$ each
  • Licences
    • Licenses are for n year non-terminable increments, under the same n year term structure
    • The initial n year term ends yyyy/01/01 and the first n-year renewal term ends yyyy+n/01/01

What is a framework licence?

A framework licence is the business model part of a licence. The MPEGLA and HEVC Advance licences in the form summarised above can be taken as examples of framework licences.

Therefore, a framework licence does not, actually shall not (for antitrust reasons) contain any value of dollars, percentages and dates.

How does MPAI intend to use framework licences?

MPAI brings the definition of the business model part of a licence (that used to be done after an MPEG standard was developed) at the point in the process between definition of requirements and call for technologies. In other words, the MPAI process becomes

Define requirements – Define framework licence – Call for technologies – Receive patent declarations – Develop the standard – (Develop licence)

As was true in MPEG, MPAI does not have any business in the last step in brackets.

Let’s have a closer look at how a framework licence is developed and used. First of all, active MPAI members, i.e. those who will participate in the technical development, are identified. Active members develop the licence and adopt it by a qualified majority

Members who make a technical contribution to the standard must make a two-fold declaration that they will

  1. make available the terms of the licence related to their essential patents according to the framework licence, alone or jointly with other IPR holders (i.e. in a patent pool), after the approval of the standard by MPAI and in no event after the commercial implementation of the standard.
  2. take a licence for the essential patents held by other MPAI members, if used, within the term specified in the framework licence from the publication by IPR holders of their licence terms. Evaluation of essentiality shall be made by an independent chartered patent attorney who never worked for the owner of such essential patent or by a chartered patent attorney selected by a patent pool.

What problem a framework licence solves

The framework licence approach is not a complete solution of the problem of providing a timely licence to data representation standards, it is a tool that facilitates reaching that goal.

When MPAI decides to develop a standard, it must know what purpose the standard serves, in other words it must have precise requirements. These are used to call for technologies but can also be used by IPR holders to define in a timely fashion how they intend to monetise their IP, in other words to define their business model.

Of course, the values of royalties, caps, dates etc. are important and IPR holders in a patent pool will need significant amounts of discussions to achieve a common view. However, unlike the HEVC case above, the potentially very significant business model differences no longer influence the discussions.

Users of the standard can know in advance how the standard can be used. The two HEVC cases presented above show that the licences can have very different business models and that some users may be discouraged from using – and therefore not wait for – the standard, if they know the business model. Indeed, a user is not only interested in the functional requirements but also in the commercial requirements. The framework licence tells the usage conditions, not the cost.

However, some legal experts think that the framework licence could include a minimum and maximum value of the licence without violating regulatory constraints. Again, this would not tell a user the actual cost, but a bracket.

Further readings

More information on the framework licence can be found on the MPAI web site where the complete MPAI workflow is described, or from the MPAI Statutes.

Posts in this thread

Posts in the previous thread

List of all related articles


MPAI – do we need it?


Sunday last week I launched the idea of MPAI – Moving Picture, Audio and Data Coding by Artificial Intelligence – an organisation with the twofold goal of 1) developing Technical Specifications of coded representation of moving pictures, audio and data, especially using artificial intelligence and 2) bridging the gap between technical specifications and their practical use, especially using “framework licences”.

The response has been overwhelming, but some have asked me: “Why do we need MPAI?”. This is indeed a basic question and, in this article, I intend to provide my answer.

The first reason

As much as VC1, VP8/VP9 and AV1 were developed because MPEG and/or its ecosystem were not providing the solutions that the market demanded, MPAI responds to the need of industry to have usable standards that allow industry and consumers to benefit from technological progress.

The second reason

The body producing standards of such an industrial and social importance should be credible. MPEG is no more, and its unknown SC 29 replacement operates in ISO, a discredited environment because of its lack of governance. The very fact that a determined and connected group of ISO entities could hijack a successful group as MPEG, with its track record serving the industry, is a proof that, at the macro level, major decisions in ISO are made because some powers that be decide that certain things should go in a direction convenient to them. Then, at the micro level, common-sense decisions like preserving MPEG plenaries where the conclusions of different groups are integrated in a single whole are blocked because “they are not in the directives” (as if hijacking MPEG was in the directives).

The third reason

The standards produced by the body should be usable. I have already written that, about 15 years ago, at what was probably the pinnacle of MPEG success, I was already anticipating the evolution of the industry that we are witnessing today. However, my efforts to innovate the way MPEG developed standards were thwarted. I tried to bring the situation to the public attention (see for instance …). All in vain. The result has been that the two main components of MPEG-H, the latest integrated MPEG project – part 2 video (HEVC) and part 3 audio (3D Audio) – have miserably failed. The hope to see a decent licence for Part 3 video (VVC) of the next integrated MPEG project – MPEG-I – is in the mists of an unknown future and may well tread the same path.

It could well happen that, in a burst of pride, VVC patent holders will want to show that they can get their acts together and deliver a VVC licence, but who guarantees that, at the next standard, the same HEVC/3D Audio pantomime will not be on stage? Can the industry – and billions of consumers – continue to be the hostage of a handful of string pullers acting in the dark?

The fourth reason

We need a North Star guiding the industry in the years to come. Thirty-two years ago, the start of MPEG was a watershed. Digital technologies promised to provide more attractive moving pictures and audio, more conveniently and with more features to the many different and independent industries who were used to handle a host of incompatible audio-visual services. Having been present then and being present now, I can testify that MPEG has delivered much more than promised. By following the MPEG North Star, industry has got a unified technology platform on which different industries can build and extend their business.

MPAI is the new watershed. I don’t know if bigger or smaller than 32 years ago, probably bigger. Artificial Intelligence technologies demonstrate that it is possible to do better and more than traditional digital technologies. But there is a difference. In the last 32 years digital audio and video have offered wonders, still they kept the two information streams isolated from the rest of the information reaching the user. With artificial intelligence, audio and video have the potential to seamlessly integrate with the many other information types handled by a device on a unified technology platform. How? Leave it to digital media and artificial intelligence experts, which have started to become an integrated community, to open their respective domains to other technologies.

Forget the past

It would be nice – and many, I for one, would thank for it – if someone undertook to solve the open problems in the use of digital media standards past. I am afraid this is an intricate problem without a unified point from which one can attempt to find a solution.

But is that a worthwhile effort? One way or another, industry has interoperable audio-visual technologies for its current needs, some even say more than it needs.

What remains of the group that did the miracle in the past 32 years is paralysed and the organisation in which it used to operate is problem-ridden and discredited. I pity the hundreds of valuable experts who are forced to face unneeded troubles.

Look to the future

Let’s look to the future, because we can still give it the shape we want. This is what the MPAI statutes suggest when they define the MPAI purpose as developing technical specifications of coded representation of moving pictures, audio and data, especially using artificial intelligence.

The task for MPAI is to call the large community of researchers from industry and academia to reach the goal to develop standards that provide a quantum leap in user experience by doing better and offering more than done so far, and by achieving a deeper integration of information sources reaching the user.

I know that the technologies in our hands have the potential to reach the goal, but only a new organisation that has the spirit, the enthusiasm and the effectiveness of the old one to deliver on the new promises can actually reach the goal.

That is the ideal reason to create MPAI. A more prosaic but vital reason to do it is that standards should also be usable.

Posts in this thread

Posts in the previous thread

List of all related articles


New standards making for a new age

Problem statement: Making standards, especially communication standard, is one of the noblest activities that humans can perform for other humans. The MPEG group used to do that for media and other data. However, ISO, the body that hosted MPEG, suffers from several deficiencies, two of which are: fuzzy governance and ineffective handling of Intellectual Property Rights (IPR), the engine that ensures tech­nical innovation-based progress. The prospects of reforming ISO are low: installing good governance requires capable leadership and solving the IPR problem is an unre­warding endeavour. Actually, the only beneficiary of such endeavours is by and large only MPEG, whose standards collect ~57.5% of all patent declarations rece­ived by ISO.

Moving Picture, Audio and Data Coding by Artificial Intelligence – MPAI is a not-for-profit organisation that addresses the two deficiencies – governance and IPR handling – by building on and innovating MPEG’s experience and achiev­em­ents and by targeting the involvement of a large community of industry, research and acad­emic experts. MPAI’s gover­nance is clear and robust, and its specifications are developed using a process that is technically sound and designed to facil­it­ate practical use of IPR in MPAI specifications.

Mission: to promote the efficient use of Data by

  1. Developing Technical Specifications (TS) of
    1. Data coding, especially using new technologies such as Artificial Intelligence, and
    2. Technologies that facilitate integration of Data Compression components in Information and Communication Technology systems, and
  2. Bridging the gap between TSs and their practical use through the develop­ment of IPR Guidelines, such as Framework Licences and other instruments.

Data include, but are not restricted to, media, health, manufacturing, autom­ot­ive and generic data.

 Governance: The General Assembly (GA) elects the Board of Directors, establishes Develop­ment Committees (DC) tasked to develop specifications and approves their TSs. Each Member appoints an adequate number of representatives in DCs. Principal Members appoint one representative in the IPR Sup­port Advisory Committee (IPR SAC), tasked to develop Framework Licenses (FWL).

Process: before a new project starts (i.e. before a Call for Technologies is issued)

  1. The IPR SAC develops a FW) that lists the elements of the future licen­ce of the TS without any indication of cost. Examples of such possible elements could be: “royalty free profile” with a given performance level, possible “initial grace period” depending on market develop­ment, possible “content fees”, possible one or more annual “caps”, a possible given ratio of user devices generating human perceivable signals vs other user devices etc.
  2. The FWL does not contain actual values such royalty levels, dates, percentage values etc.
  3. The FWL is approved by a qualified majority of Principal Members participating in the project.
  4. Each Member participating in the project declares to be willing to make available a licence for its IP according to the FWL by a certain date, and to take a licence by a certain date if it will use the part of the TS that is covered by IP
  5. Each Member shall inform the Secretariat of the result of its best effort to identify IP that it believes is infringed by a TS that is being or has already been developed by a DC.

 Method of work: the GA develops, maintains and constantly updates a work plan on the basis of Members’ inputs and responses to Calls for Interest. The GA assigns the development of a TS to a DC. The DC typically issues Calls for Evidence and/or Calls for Technol­ogy. Anybody may answer Calls for Interest, Evidence or Technology. A non-Member whose contribution submitted in response to a Call for Technology is accepted, is requested to join MPAI. DCs develop TSs by consensus. If consensus is not reached on an issue, the chair may decide to bring the matter to the attention of the GA who decides by qualified majority vote. The DC shall docum­ent which (parts of) a contribution is adopted in the TS. See here for a detailed work flow.

Membership: companies and organisations, including universities, may become Principal or As­sociated Members at their choice. Applicants can become and then remain Members by paying yearly membership fees. Only Principal Members are allowed to vote. Associated Mem­bers may join DCs and contribute to the development of TSs.

Key documents: The text above is a summary description of MPAI. The Statutes, that include detailed Procedures of work should be con­sulted for precise information. See a summary here. The Statutes are being reviewed and will be made public shortly.

A novel approach: MPAI offers a novel approach to standardisation with the following features:

  1. MPAI intends to be a broad multidisciplinary and multi-stakeholder community.
  2. Low access threshold to participate in the development of Technical Specifications: most meetings are held by teleconference sup­ported by advanced ICT-based collaboration facilities.
  3. Facilitated participation of experts who have stayed away from formal standardisation because of cost and other concerns.
  4. Framework Licences, developed through a rigorous process, expedite the use of Technical Specifications covered by IP.
  5. Timely delivery of application-driven and technology-intensive specifications.
  6. Bottom-up governance in specification development.
  7. No external constraints on members when they decide about activities.

The MPAI web site is at As the TLD suggests, MPAI is a community. Therefore, comments from the community, in particular on Statutes and Operation, are welcome. Please send your comments to Leonardo, while the MPAI Secretariat is being established.

Posts in this thread

Posts in the previous thread

List of all related articles


The MPEG to Industry Hall of fame

At the suggestion of Steve Morgan, THE [RE]DESIGN GROUP, I initiate a new “MPEG to Industry” Hall of fame complementing the MPEG Hall of fame where I highlighted those who helped make MPEG what it eventually became, besides standards development.

Suggestions are open. If you want to make a nomination please send an email to Leonardo adding the name of the nominee and a brief text explaining the contribution of the nominee to convert one or more MPEG standards into products.

[2020/07/15, Steve Morgan]

The honorable Mr. Jerry Pierce was responsible for helping Matsushita (aka Panasonic) establish the Digital Video Compression Center (DVCC), which was Hollywood’s first and foremost DVD Mastering facility.  During it’s 7 years if operations, DVCC released over 87% of Hollywood’s “A” titles on DVD.

Posts in this thread

Posts in the previous thread

List of all related articles


This is ISO – An incompetent organisation

ISO is too important to leave it in the hand of people who are catapulted to Geneva from who knows where for who knows what alchemy to serve who knows what purposes.

Of course, the way an organisation elects to hire people is their business. However, the mission of that organisation must be fulfilled. The mission “to develop high quality voluntary International Standards which facilitate international exchange of goods and services, support sustainable and equitable economic growth, promote innovation and protect health, safety and the environment” cannot stop at putting in place a process prescribing how to move a document due to become a standard from one stage to another. I mean, that could have been the end point 73 years ago when ISO was established.

I do not know what is required for economic growth or protection of health, safety and the environment, but is innovation promoted just by managing the process of standards approval? In my opinion standard is synonymous of innovation, or we are talking of rubber stamping.

Of course, innovation has probably as many faces than there are industries, probably more. Therefore the point here is not about ISO looking for and hiring superhumans competent on everything and able to discover the enabling factors of innovation, but it is about hiring people who listens to the weak or strong signals coming from the field of standardisation.

In this article I will talk about how MPEG handled reference software for its standards, an issue that goes to the core of what is a media compression standard.

In 1990 Arian Koster proposed to develop a common reference software for MPEG-1. Internet may have developed on the principle of “Rough Consensus and Running Code”, but the world of video compression was born on what I would call “Rough Consensus and Running Hardware” where each active participant developed their own implementation of a commonly (roughly) agreed specification. Comparing results was not easy. In the COST 211 project satellite 2 Mbit/s satellite links were used to interconnect different hardware implementations. In MPEG-1 (but that was true of MPEG-2 as well) every active participant developed their own code and brought results of their simulations of Core Experiments. By proposing to create common code bases, Arian opened the doors of a new world to MPEG.

Arian’s proposal notwithstanding, there was not a lot of common code for MPEG-1 and MPEG-2, but in the mid 1990s his ideas were fully implemented for the MPEG-4 reference software – audio, video and systems. That was more than 10 years after Richard Stallman had launched the GNU Project. In a completely different setting, but with comparable motivation, MPEG made the decision to develop the reference software collaboratively because better software would be obtained, the scope of MPEG-4 was so large that no company could probably develop it all and a software implementation made available to the industry would accelerate adoption of the standard.

In those years, Mike Smith, the head of ISO’s Information Technology Task Force (ITTF), was of great help. He stated that ISO was only interested in “selling text”, not software, and allowed MPEG to develop what was called the MPEG-4 “copyright disclaimer” that contained the following elements

  1. The role of developer and contributors
  2. The status of the software as an implementation of the MPEG standard
  3. Free licence to use/modify the module in conforming products
  4. Warning to users that use of the software may infringe patents
  5. No liability for developers, contributors, companies and ISO/IEC for use/modify the software
  6. Original developer’s right to use, assign or donate the code to a third party.

For sure the MPEG-4 copyright disclaimer was rather disconnected from the GNU Public Licence (GPL), but it did serve MPEG purposes well. All MPEG reference software was made available on the ISO web site for free download. It is a known fact that many use MPEG reference software as the uncontroversial and unambiguous way to express the intention of the (textual) MPEG standard.

The copyright disclaimer was used for about 15 years, until large software companies in MPEG expressed their discomfort for it. At that time many companies were already using modified reference software in their products. This was allowed by the copyright disclaimer but handling software with different licences in their products was not desirable. MPEG then opted for a modified BSD licence. “Modification” meant adding a preamble to the BSD licence: “The copyright in this software is being made available under the BSD License, included below. This software may be subject to other third party and contributor rights, including patent rights, and no such rights are granted under this license”. MPEG called this the MXM licence because it was developed for and first applied to the MPEG Extensible Middleware standard.

One day, probably some 25 years after Arian’s proposal, the ISO attorney Holger Gehring realised that there was one, actually more issues with the reference software. As MPEG used to meet in Geneva every 9 months because of the video collaboration with ITU-T, the MPEG chairs had several sessions with him and his collaborators until late at night for a couple of MPEG meetings. We discussed that, for MPEG people, the textual version of standard was important, but the software version, at least for some types of standard, was more important and that the two had equal status, in the sense that textual version expressed normative clauses in a human language, while the software version expressed the same normative clauses in a machine language.

After reaching an agreement on this principle, the discussion moved to the licence of the software. I recalled the agreement with Mike Smith and that all MPEG reference software was posted on the ISO web site for free download and advocated the use of the MXM licence. This was agreed and I undertook to convince all MPEG subgroups to adopt it (some were still using the copyright disclaimer). With some difficulty I got the support of all subgroups to the agreement.

I communicated the agreement to the ISO attorney but got no acknowledgement. Later I learned that he had “left” ISO.

We kept our part of the agreement and released all reference software with the MXM licence. A little before I left ISO I learned that several standards with reference software in it had been withheld because the reference software issue had surfaced again, that they were discussing it and that they would inform us of the result…

ISO has to decide if it wants to “promote innovation” or if it wants to be feudal. People in the field have acquired a lot of competence, validated by some of the biggest software companies, that would allow ISO competently to address the issue of software copyright in standards.

Posts in this thread

Posts in the previous thread

List of all related articles