You are currently viewing The MPAI machine has started

The MPAI machine has started

Introduction

When the MPAI web site opened it was a challenging time for a new initiative: 19th of July was in the middle of summer holidays for some and the beginning for others. That did not matter so much, however, because the idea of combining a focus on AI for data coding and the proposal to rejuvenate the decades-old FRAND declaration process proved to be too attractive an amalgam.

A group of dedicated people working in unceasing rhythm produced the statutes of Moving Picture, Audio and Data Coding by Artificial Intelligence (MPAI) and started developing use cases that could potentially become MPAI standards.

MPAI is not (yet) formally established (it will be incorporated on one of the last 3 days of September), but MPAI is already following the workflow that is part of the statutes. The workflow envisages that there is a phase where use cases are proposed according to a detailed template, and merged. Merging of use cases is necessary because the MPAI process is supposed to be fully bottom up. Different members propose different use cases and, where it makes sense, use cases are merged to represent a more comprehensive set of needs.

The “use case” phase is then followed by the “functional requirements” phase where requirements are developed from use cases. When functional requirements are fully defined, the MPAI General Assembly decides with a 2/3 majority whether the requirements justify the creation of a standard.

The approval of the functional requirements does not mean that the project can start, because users of the potential standard should have some understanding of the “commercial requirements”. This is what the framework licence, i.e. the business model that IPR holders will apply to monetise their IP – without numerical values – aims to be.

On the 9th of September MPAI agreed that the integrated collection of use cases justifies the development of functional requirements.

This article intends to provide a summary of the use case called “Context-based Audio Enhancement” (MPAI-CAE).

What is MPAI-CAE

The overall user experience quality is highly dependent on the context in which audio is used, e.g.

  1. Entertainment audio can be consumed in the home, in the car, on public transport, on-the-go (e.g. while doing sports, running, biking) etc.
  2. Voice communications: can take place office, car, home, on-the-go etc.
  3. Audio and video conferencing can be done in the office, in the car, at home, on-the-go etc.
  4. (Serious) gaming can be done in the office, at home, on-the-go etc.
  5. Audio (post-)production is typically done in the studio
  6. Audio restoration is typically done in the studio

By using context information using AI to act on the content, it is possible substantially to improve the user experience.

There are already solutions that adapt the conditions in which the user experiences content or service for some of the contexts mentioned above. However, they tend to be vertical in nature, making it dif­ficult to re-use possibly valuable AI-based components of the solutions for differ­ent applications. This hinders the broad adoption of AI technologies.

MPAI-CAE aims to create a horizontal market of re-usable and possibly context-depending components that expose standard interfaces. With MPAI-CAE, the market would become more receptive to innov­ation, hence more compet­itive benefiting industry and consumers alike.

Some examples of audio enhancement

  1. Enhanced audio experience in a conference call

Often, the user experience of a video/audio conference can be marginal. Too much background noise or undesired sounds can lead to participants not understanding what participants are saying. By using AI-based adaptive noise-cancellation and sound enhancement, MPAI-CAE can virtually eliminate those kinds of noise without using complex microphone systems to capture environment characteristics.

  1. Pleasant and safe music listening while biking

While biking in the middle of city traffic, AI can process the signals from the environment captured by the microphones available in many earphones and earbuds (for active noise cancellation), adapt the sound rendition to the acoustic environment, provide an enhanced audio experience (e.g. performing dynamic signal equalisation), improve battery life and selectively recognise and allow relevant environment sounds (i.e. the horn of a car). The user enjoys a satisfactory listening experience without losing contact with the acoustic surroundings.

  1. Emotion enhanced synthesised voice

Speech synthesis is constantly improving and finding several applications that are part of our daily life (e.g. intelligent assistants). In addition to improving the ‘natural sounding’ of the voice, MPAI-CAE can implement expressive models of primary emotions such as fear, happiness, sad­ness, and anger.

  1. Speech/audio restoration

Audio restoration is often a time-consuming process that requires skilled audio engineers with specific experience in music and recording techniques to go over manually old audio tapes. MPAI-CAE can automatically remove anomalies from recordings through broadband denoising, declicking and decrackling, as well as removing buzzes and hums and performing spectrographic ‘retouching’ for removal of discrete unwanted sounds.

What is there to standardise?

Three areas of standardisation have been identified:

  1. Context type interfaces: a first set of input and output signals, with corresponding syntax and semantics, for audio usage contexts considered of sufficient interest (e.g. audiocon­ferencing and audio consumption on-the-go). They have the following features
    1. Input and output signals are context specific, but with a significant degree of commonality across contexts
    2. The operation of the framework is implementation-dependent offering implementors the way to produce the set of output signals that best fit the usage context
  2. Processing component interfaces: with the following features
    1. Interfaces of a set of updatable and extensible processing modules (both traditional and AI-based)
    2. Possibility to create processing pipelines and the associated control (including the needed side information) required to manage them
    3. The processing pipeline may be a combination of local and in-cloud processing
  3. Delivery protocol interfaces
    1. Interfaces of the processed audio signal to a variety of delivery protocols

Who will benefit from MPAI-CAE

Benefits: MPAI-CAE will bring benefits positively affecting

  1. Technology providers need not develop full applications to put to good use their technol­ogies. They can concentrate on improving the AI technologies that enhance the user exper­ience. Further, their technologies can find a much broader use in application domains beyond those they are accustomed to deal with.
  2. Equipment manufacturers and application vendors can tap from the set of technologies made available according to the MPAI-CAE standard from different competing sources, integrate them and satisfy their specific needs
  3. Service providers can deliver complex optimisations and thus superior user experience with minimal time to market as the MPAI-CAE framework enables easy combination of 3rd party components from both a technical and licensing perspective. Their services can deliver a high quality, consistent user audio experience with minimal dependency on the source by selecting the optimal delivery method
  4. End users enjoy a competitive market that provides constantly improved user exper­iences and controlled cost of AI-based audio endpoints.

Impact of MPAI-CAE

MPAI-CAE willfree users from the dependency on the context in which they operate; make the content experience more personal; make the collective service experience less dependent on events affecting the individual participant and raise the level of past content to today’s expectations.

MPAI-CAE should create a competitive market of AI-based components expos­ing standard interfaces, processing units available to manufacturers, a variety of end user devices and trigger the implicit need felt by a user to have the best experience whatever the context.

Posts in this thread

Posts in the previous thread

List of all related articles