Introduction
When the MPAI web site opened it was a challenging time for a new initiative: 19th of July was in the middle of summer holidays for some and the beginning for others. That did not matter so much, however, because the idea of combining a focus on AI for data coding and the proposal to rejuvenate the decades-old FRAND declaration process proved to be too attractive an amalgam.
A group of dedicated people working in unceasing rhythm produced the statutes of Moving Picture, Audio and Data Coding by Artificial Intelligence (MPAI) and started developing use cases that could potentially become MPAI standards.
MPAI is not (yet) formally established (it will be incorporated on one of the last 3 days of September), but MPAI is already following the workflow that is part of the statutes. The workflow envisages that there is a phase where use cases are proposed according to a detailed template, and merged. Merging of use cases is necessary because the MPAI process is supposed to be fully bottom up. Different members propose different use cases and, where it makes sense, use cases are merged to represent a more comprehensive set of needs.
The “use case” phase is then followed by the “functional requirements” phase where requirements are developed from use cases. When functional requirements are fully defined, the MPAI General Assembly decides with a 2/3 majority whether the requirements justify the creation of a standard.
The approval of the functional requirements does not mean that the project can start, because users of the potential standard should have some understanding of the “commercial requirements”. This is what the framework licence, i.e. the business model that IPR holders will apply to monetise their IP – without numerical values – aims to be.
On the 9th of September MPAI agreed that the integrated collection of use cases justifies the development of functional requirements.
This article intends to provide a summary of the use case called “Context-based Audio Enhancement” (MPAI-CAE).
What is MPAI-CAE
The overall user experience quality is highly dependent on the context in which audio is used, e.g.
- Entertainment audio can be consumed in the home, in the car, on public transport, on-the-go (e.g. while doing sports, running, biking) etc.
- Voice communications: can take place office, car, home, on-the-go etc.
- Audio and video conferencing can be done in the office, in the car, at home, on-the-go etc.
- (Serious) gaming can be done in the office, at home, on-the-go etc.
- Audio (post-)production is typically done in the studio
- Audio restoration is typically done in the studio
By using context information using AI to act on the content, it is possible substantially to improve the user experience.
There are already solutions that adapt the conditions in which the user experiences content or service for some of the contexts mentioned above. However, they tend to be vertical in nature, making it difficult to re-use possibly valuable AI-based components of the solutions for different applications. This hinders the broad adoption of AI technologies.
MPAI-CAE aims to create a horizontal market of re-usable and possibly context-depending components that expose standard interfaces. With MPAI-CAE, the market would become more receptive to innovation, hence more competitive benefiting industry and consumers alike.
Some examples of audio enhancement
- Enhanced audio experience in a conference call
Often, the user experience of a video/audio conference can be marginal. Too much background noise or undesired sounds can lead to participants not understanding what participants are saying. By using AI-based adaptive noise-cancellation and sound enhancement, MPAI-CAE can virtually eliminate those kinds of noise without using complex microphone systems to capture environment characteristics.
- Pleasant and safe music listening while biking
While biking in the middle of city traffic, AI can process the signals from the environment captured by the microphones available in many earphones and earbuds (for active noise cancellation), adapt the sound rendition to the acoustic environment, provide an enhanced audio experience (e.g. performing dynamic signal equalisation), improve battery life and selectively recognise and allow relevant environment sounds (i.e. the horn of a car). The user enjoys a satisfactory listening experience without losing contact with the acoustic surroundings.
- Emotion enhanced synthesised voice
Speech synthesis is constantly improving and finding several applications that are part of our daily life (e.g. intelligent assistants). In addition to improving the ‘natural sounding’ of the voice, MPAI-CAE can implement expressive models of primary emotions such as fear, happiness, sadness, and anger.
- Speech/audio restoration
Audio restoration is often a time-consuming process that requires skilled audio engineers with specific experience in music and recording techniques to go over manually old audio tapes. MPAI-CAE can automatically remove anomalies from recordings through broadband denoising, declicking and decrackling, as well as removing buzzes and hums and performing spectrographic ‘retouching’ for removal of discrete unwanted sounds.
What is there to standardise?
Three areas of standardisation have been identified:
- Context type interfaces: a first set of input and output signals, with corresponding syntax and semantics, for audio usage contexts considered of sufficient interest (e.g. audioconferencing and audio consumption on-the-go). They have the following features
- Input and output signals are context specific, but with a significant degree of commonality across contexts
- The operation of the framework is implementation-dependent offering implementors the way to produce the set of output signals that best fit the usage context
- Processing component interfaces: with the following features
- Interfaces of a set of updatable and extensible processing modules (both traditional and AI-based)
- Possibility to create processing pipelines and the associated control (including the needed side information) required to manage them
- The processing pipeline may be a combination of local and in-cloud processing
- Delivery protocol interfaces
- Interfaces of the processed audio signal to a variety of delivery protocols
Who will benefit from MPAI-CAE
Benefits: MPAI-CAE will bring benefits positively affecting
- Technology providers need not develop full applications to put to good use their technologies. They can concentrate on improving the AI technologies that enhance the user experience. Further, their technologies can find a much broader use in application domains beyond those they are accustomed to deal with.
- Equipment manufacturers and application vendors can tap from the set of technologies made available according to the MPAI-CAE standard from different competing sources, integrate them and satisfy their specific needs
- Service providers can deliver complex optimisations and thus superior user experience with minimal time to market as the MPAI-CAE framework enables easy combination of 3rd party components from both a technical and licensing perspective. Their services can deliver a high quality, consistent user audio experience with minimal dependency on the source by selecting the optimal delivery method
- End users enjoy a competitive market that provides constantly improved user experiences and controlled cost of AI-based audio endpoints.
Impact of MPAI-CAE
MPAI-CAE willfree users from the dependency on the context in which they operate; make the content experience more personal; make the collective service experience less dependent on events affecting the individual participant and raise the level of past content to today’s expectations.
MPAI-CAE should create a competitive market of AI-based components exposing standard interfaces, processing units available to manufacturers, a variety of end user devices and trigger the implicit need felt by a user to have the best experience whatever the context.
Posts in this thread
- The MPAI machine has started
- A technology and business watershed
- The two main MPAI purposes
- Leaving FRAND for good
- Better information from data
- An analysis of the MPAI framework licence
- MPAI – do we need it?
- New standards making for a new age
- The MPEG to Industry Hall of fame
- This is ISO – An incompetent organisation
- This is ISO – An obtuse organisation
- What to do with a jammed machine?
- Stop here if you want to know about MPEG (†)
- This is ISO – A hypocritical organisation
- The MPEG Hall of fame
- Top-down or bottom-up?
- This is ISO – A chaotic organisation
- A future without MPEG
- This is ISO – A feudal organisation