
Figure 2 – Emotion extraction using legacy technologies
In the system of Figure 2, each of the 3 initial blocks extracts features from the input data and uses a vector of features to query an appropriate Knowledge Base that responds with one or more candidate emotions.
Actually, Video analysis and Language understanding do more than just providing emotion information. This is seen in the following Figure 3 where the two blocks additionally provide Meaning, i.e., information extracted from the text and video such as question, statement, exclamation, expression of doubt, request, invitation ecc.
Figure 3 – Emotion and meaning enter Dialogue processing
Meaning and emotion are fed into the Dialogue processing component. Note that in a legacy implementation Dialogue processing, too, needs access to a Dialogue Knowledge Base. From now on, however, we will assume to deal with a full AI-based implementation.
Dialogue processing produces two streams of data, as depicted in Figure 4 by:
Figure 4 – End-to-end multimodal conversation with emotion
The last missing element, to move from theory to practice, is an environment where you can place the blocks (that MPAI calls AI Modules – AIM) establish all connections, activate all timings, and execute the chain. One could even want to train or retrain the individual neural networks.
The technology that makes this possible by the MPAI AI Framework (MPAI-AIF) for which a Call for Technologies has been published on 2020/12/16 and whose responses are due on 2021/02/15. The full scheme of Multimodal conversation with emotion in the MPAI AI Framework is represented by Figure 5
Figure 5 – Multimodal conversation with emotion in the MPAI AI Framework
The six components making up MPAI-AIF are:
What does MPAI intend to standardise in Conversation with emotion? In a full AI-based implementation
In case of a legacy implementation, in addition to the above we need 4 query formats:
As you see MPAI standardisation is minimal, in tune with the basic rule of good standardisation: specify the minimum that is necessary for interoperability. In the MPAI case the minimum is what is required to assemble a working system using AIMs from independent sources.
What is MPAI going to do with Conversation with Emotion and the other Use Cases in the MPAI-MMC standards. The next MPAI General Assembly (MPAI-5), to be held on 2021/02/17 will likely approve the MPAI-MMC Call for Technologies.
Stay tuned to the MPAI-MMC web page but also to the companion MPAI-CAE (Context-based Audio Enhancement) web page because MPAI-5 is likely to approve the approve the MPAI-CAE Call for Technologies as well.