At its 40th General Assembly (MPAI-40), MPAI approved one draft, one new, and three extension standards. For an organisation that has already nine standards in its game bag, this may not look like big news. There are two reasons, though, to consider this a remarkable moment in the MPAI short but intense life.
The first reason is that the draft standard posted for Community Comments – Human and Machine Communication (MPAI-HMC) – does not specify new technologies but leverages technologies from existing MPAI standards: Context-based Audio Enhancement (MPAI-CAE), Multimodal Conversation (MPAI-MMC), the newly approved Object and Scene Description (MPAI-OSD), and Portable Avatar Format (MPAI-PAF).
If not new technologies, what does MPAI-HMC specify then? To answer this question let’s consider Figure 1.
Figure 1 – The MPAI-HMC communications model
The human labelled as #1 is part of a scene with audio and visual attributes and communicates with the Machine by transmitting speech information and the entire audio-visual scene including him or herself. The Machine receives that information, processes it, and emits internally generated audio-visual scenes that include itself uttering vocal and displaying visual manifestations of its own internal state generated to interact more naturally with the human. The human may also communicate with the Machine when other humans are in the scene with him or her and the Machine can discern the individual human and identify (i.e., give a name to) audio and visual objects. However, only one human at a time can communicate with the Machine.
The Machine need not capture the human in a real space. His or her digital representation can be rendered in a Virtual Space as a Digitised Human. The human may not be alone but together with other Digitised Humans or with Virtual Humans, i.e., audio-visual representations of processes, such as Machines. For this reason, we will use the word Entity to indicate both a human or their avatar and a Machine rendered as an avatar.
The Machine can also act as an interpreter between the Entities and Contexts labelled as #1 or #2 and #3 or #4. By Context we mean information surrounding an Entity that provides additional insight into the information communicated by the Entity. An example of Context is language and, more generally, culture.
Communication between #1 and #3 represents the case of a human in a Context communicating with a Machine, e.g., an information service, in another Context. In this case the Machine communicates with the human by sensing and actuating audio-visual information, but the communication between the Machine and #3 may use a different communication protocol. The payload used to communicate is the “Portable Avatar” defined as a Data Type specified by the MPAI-PAF standard representing an Avatar and its Context.
Communication between the human in #1 and the Machine is based on raw audio-visual communication while communication between Machine and Entity #3 is carried out using a Portable Avatar .
Read a collection of usage scenarios.
The name of the standard is Human and Machine Communication (MPAI-HMC). It is published as a draft with a request for Community Comments, the last step before publication. Comments are due by 2024/02/19T23:59 UTC to firstname.lastname@example.org.
To explain the second reason why the 40th General Assembly is a remarkable moment we have to recall that most MPAI application standards are based on the notion of AI Workflow (AIW) composed of interconnected AI Modules (AIM) executed in the AI Framework (AIF) specified by the MPAI-AIF standard. Four out of five documents are now published in a new format where the Use Cases-AI Modules- Data Types chapters make reference to a common body of AIMs and Data Types.
Component-based software engineering aims to build software out of modular components. MPAI is implementing this notion in the world of standards.
See the links below and enjoy: