A unified approach to AI-based data coding standardisation

A unified approach to AI-based data coding standardisation

1        Introduction

Moving Picture, Audio and Data Coding by Artificial Intelligence (MPAI) is an international, non-affiliated, not-for-profit registered in Geneva. MPAI’s mission is to promote the efficient use of Data by

  1. developing standards for coding any type of data, especially using new technologies to enable the use of Artificial Intelligence
  2. integrating data coding components in Information and Communication Technology systems
  3. bridging the gap between standards and their practical use by means of Framework Licenses.

While Artificial Intelligence (AI) technologies are used in more and more applications yielding one of the fastest-growing markets in the data analysis and service sector, stakeholders must over­c­ome hurdles fully to exploit this historical opportunity: current framework-based develop­ment model making application redeployment difficult, and monolithic and opaque AI applic­ations that generate mistrust in users.

The MPAI Manifesto summarises the key points:

  1. MPAI considers AI module (AIM) and its interfaces as the AI building block. The syntax and semantics of interfaces determine what AIMs should per­form, not how. AIMs can be implemented in hardware or software, with AI or Machine Learning legacy Data Processing.
  2. MPAI’s AI framework enables creation, execution, com­pos­ition and update of AIM-based work­flows (MPAI-AIF). It is the cornerstone of MPAI standardisation because it enables buil­ding high-com­plexity AI solutions by interconnecting multi-vendor AIMs trained to specific tasks that operate in the standard AI framework and exchange data in standard formats.
Figure 1 – An example of AI moduleFigure 2 – An example of AI Framework
  1. MPAI relies on AI technologies for its standards. However, it is well aware that in our transitional age there are many technologies that have been used to provide products and services that AI promises to replace. MPAI’s AI Framework is designed to enable coexistence, inter-operation and mutual replacement ofAI Modules of different nature: AI, Machine Learning (ML) and legacy Data Processing (DP). In the following sections, technology-neutral solutions will be described.
  2. The MPAI process.
    Anybody can par­ticipate in
    –        0. collection of interest
    –        1. use case definition and
    –        2. functional requir­em­ent identification.
    All members partic­ipate in
    –        3. definition of commercial requir­ements,
    –        4. drafting of call for tech­nologies, and
    –        5. devel­oping the standard.
    Principal members
    –        6. approve the standard.

2        AI Framework

AI Framework (MPAI-AIF)  has a reference model of Figure 3 that extends the conceptual model of Figure 2. MPAI issued a Call for Technologies [2] for MPAI-AIF on 16 December 2020. Responses were received on 14 February 2021. MPAI is busy reviewing the responses received that it will use to develop the standard. planned for July 2021.

Figure 3 – The MPAI-AIF Architecture

The scope of the MPAI-AIF standard is describes by its requirements [1]:

  1. Support single AIM life cycle
    1. initialize | instantiate-configure-remove | start-sus­pend-stop AIMs
    2. dump/retrieve internal state | enforce res­ource limits of AIMs
  2. Support multiple AIMs life cycles
    1. initialise | instantiate-remove-configure AIMs
    2. manual-automatic-dynamic-adaptive interface configuration of AIMs
  3. Support AIM machine learning
    1. train-retrain-update AIMs | configure/reconfigure ML computational models
    2. dynamic update of ML model
    3. supervised/unsupervised/reinfor­cement-based learning paradigms
  4. Workflows
    1. hierarchical execution of workflows | computational graphs, DAG as a minimum
    2. AIM topologies synchronised accor­ding to time base and full ML life cycles
    3. 1 & 2 way signal initialisation & control, communic­ation & security policies between AIMs

3        Context-based Audio Enhancement

Context-based Audio Enhancement (MPAI-CAE) includes 4 use cases. In Emotion-enhanced speech (Figure 4) an emotion-less synthesised or natural speech is enhanced with a specified emotion with specified intensity and in Audio recor­ding preservation (Figure 5) sound from an old audio tape is enhanced and a preservation master file produced using a video camera pointing to the magnetic head;

Figure 4 – Emotion-enhanced speechFigure 5 – Audio Recording Preservation

In Enhanced audioconference experience (Figure 6) speech captured in an unsuitable (e.g. at home) enviroment is cleaned of unwanted sounds and in Audio on the go (Figure 7) the audio experi­enced by a user in an environment preserves the external sounds that are considered relevant.

Figure 6 – Enhanced Audioconference ExperienceFigure 7 – Audio-on-the-go

MPAI has issued a Call for Technologies for MPAI-CAE [4] on 20 January 2021. Responses are due on 12 April 2021. Some of the requested technologies documented in [3] are

  • Emotion-enhanced speech use case
    • Emotion
    • Speech features
    • Emotion descriptors
  • Audio Recording Preservation use case
    • Tape image features
  • Enhanced audioconference experience use case
    • Microphone geometry information
    • Output device acoustic model KB query format
  • Audio-on-the-go use case
    • Sound features
    • Sound Categorisation
    • User hearing profile KB query format

4        Multimodal conversation

Currently, Multimodal convers­ation (MPAI-MMC) includes 3 use cases where a human en­ter­tains an audio-vis­ual con­versation with a machine emulating hum­an-to-human conversation in complet­eness and intensity. In Con­ver­sation with em­otion (Figure 8). the human holds a dial­ogue with speech, video and possibly text. The machine responds with a syn­thes­ised voice and an animated face.
Figure 8 – Conversation with emotion

In Multimedia question answering (Figure 9),a human requests information about an object while displaying it. The machine responds with synthesised speech. In Personalized Automatic Speech Translation (Figure 10), a sentence uttered by a human is translated by a machine using a synthesised voice that preserves the speech features of the human.

Figure 9 Multimodal Question AnsweringFigure 10 Personalized Automatic Speech Translation

MPAI has issued a Call for Technologies for MPAI-MMC [6] on 20 January 2021. Responses are due on 12 April 2021.

Some of the technologies requested in [5] are

  • Conversation with emotion use case
    • Emotion
    • Text features
    • Speech features
    • Video features
    • Meaning
    • Sentence features
    • Text with emotion
    • Concept with emotion
  • Multimodal question answering use case
    • Image features
    • Emotion
    • Intention KB query format
    • Online dictionary query format
  • Personalised automatic speech translation use case
    • Speech features

5        Other standards in the pipeline

Compression and Understanding of Industrial Data (MPAI-CUI) is depicted in Figure 11. The standard enables AI-based filtering and extraction of key information from the flow of data produced by and outside of companies (e.g., data on vertical risks such as seismic, cyber etc.) to assess the risks faced by a company by using information from the flow of data produced. MPAI has completed the Functional Requirements of (MPAI-CUI) [7] and MPAI Active Members are now developing the Framework Licence to facilitate the actual licences – to be developed outside of MPAI.

Server-based Predictive Multiplayer Gaming (MPAI-SPG) is depicted in Figure 12. The standard aims to minimise the audio-visual and gameplay discontinuities caused by high latency or packet losses during an online real-time game. In case information from a client is missing, the data collected from the clients involved in a particular game are fed to an AI-based system that predicts the moves of the client whose data are missing. MPAI is finalising the MPAI-SPG Functional Requirements [9].

Figure 11 – AI-based Performance PredictionFigure 12 – Server-based predictive Multiplayer Gaming


AI tools promise to further reduce the bits/pixel beyond what has been achieved so far. AI-Enhanced Video Coding (MPAI-EVC) uses the Essential Video Coding (EVC) stan­dard. EVC Baseline Profile, made of >20 years old tech­nol­ogies, has a performance comparable to High Efficiency Video Coding (HEVC). EVC  Main Profile typically reduces the bitrate by ca. 36% over HEVC for 4k resolution High Dynamic Range (HDR) sequences. MPAI uses EVC as the starting point and is rep­lacing the blue blocks, representing 10 EVC existing tools, with AI-based tools (Figure 13).

Figure 13 MPAI-EVC Evidence Project

MPAI intends to obtain confirmation of the performance of AI tools used to replace DP-tools and will then issue a Call for Technologies. MPAI has developed stable MPAI-EVC Functional Requirements [10].

Integrative Genomic/Sensor Analysis (MPAI-GSA) is depicted in Figure 14. The standard uses AI to understand and compress the res­ult of high-throughput experiments combining genomic/proteomic and other data, e.g., from video, motion, location, weather, medical sensors. So far, MPAI-GSA has been found applicable to 4 Use Areas, i.e., collections of homogeneous Use Cases.

In Integrative analysis of ‘omics datasets (Figure 14) molecular techniques (e.g. RNA-sequencing, ChIP-sequencing, HiC) and metadata are correlated/combined. Examples are: correlating gene regulation with an individual’s variants to determine actionable variants and a personalised therapy.

In Genomics and phenotypic/spatial data (Figure 15), animal behaviour (e.g., lab mice), neural patterns (i.e., from functional MRI-based experiments), genetic profile and reaction to drug administration are correlated and/or combined.

Figure 14 – Integrative analysis of ‘omics datasetsFigure 15 – Genomics and Phenotypic/spatial data

In Genomics and behaviour (Figure 16), molecular techniques (e.g., barcoded RNA-sequencing, meta­bolomics) and phenotype as determined by 2D/3D images or metadata (e.g., tissue composition, cell lineages) are correlated/combined.

In Smart Farming (Figure 17), molecular techniques (e.g., RNA-sequencing, variants) and monitoring im­ages (e.g., growth rate, satellite imagery) and position tracking (e.g. GPS) are correlated and o/r combined

Figure 16 – Genomics and BehaviourFigure 17 – Smart Farming

MPAI is finalising the MPAI-GSA Functional Requirements [8].

6        Conclusions

  1. MPAI standards address the problems identified in the introd­uction bringing benefits to various actors:
  • Technology providers will be able to offer their conforming AIMs to an open market
  • Application developers will find on the open market the AIMs their applications need
  • Consumers will be offered a wider choice of better AI applications by a competitive market
  • Innovation will be fuelled by the demand for novel and more performing AIMs
  • Society will be able to lift the veil of opacity from large, monolithic AI-based applications.
  1. AI-based data coding is future-proof as it allows MPAI to take advantage of emerging and future research in representation learning, transfer learning, edge AI, and reproducibility of performance.
  2. MPAI Framework Licences, where Intellectual Property Right (IPR) holders set out in advance IPR guidelines, alleviate the IPR-related problems which have accompanied high-tech standardisation based on vague and contention-prone Fair, Reasonable and Non-Discriminatory (FRAND) declarations.
  3. MPAI is aware of AI’s revolutionary impact on the future of human society and pledges to address ethical questions potentially raised by its technical work. The initial significant progress offered by AIMs and AIF is to allow understanding of the inner working of complex AI systems.

7        References

  1. MPAI-AIF Use Cases and Functional Requirements, N74; https://mpai.community/standards/mpai-aif/#Requirements
  2. MPAI-AIF Call for Technologies, N100; https://mpai.community/standards/mpai-aif/#Technologies
  3. MPAI-CAE Use Cases and Functional Requirements, N151; https://mpai.community/standards/mpai-cae/#UCFR
  4. MPAI-CAE Call for Technologies, N152; https://mpai.community/standards/mpai-cae/#Technologies
  5. MPAI-MMC Use Cases and Functional Requirements, N153; https://mpai.community/standards/mpai-mmc/#Requirements
  6. MPAI-MMC Call for Technologies, N154; https://mpai.community/standards/mpai-mmc/#Technologies
  7. MPAI-CUI Use Cases and Functional Requirement, N155; https://mpai.community/standards/mpai-cui/#Requirements
  8. Draft MPAI-GSA Use Cases and Functional Requirements, N156; https://mpai.community/standards/mpai-gsa/#Requirements
  9. Draft MPAI-SPG Use Cases and Functional Requirements, N157; https://mpai.community/standards/mpai-spg/#UCFR
  10. MPAI-EVC Use Cases and Functional Requirements, N92; https://mpai.community/standards/mpai-evc/#Requirements