Object and Scene Description (MPAI-OSD) is a project for a standard specifying technologies for object description and their localisation in space. Such technologies are used across several use cases of several MPAI standards.
Figure 1 gives two examples that assume the types of output to Audio and Visual Scene Descriptors.
Figure 1 – Audio and Visual Scene Description
The nextFigure 2 provides one solution to the problem of assigning identifiers to the Objects – extracted from an audio-visual scene, especially for the purpose of identifying those that are audio-visual such as a human and their speech.
Figure 2 – Audio-Visual Alignment
Another example is provided by Figure 3
Figure 3 – Visual Spatial Object Identification
Figure 4 is an example of the Conversation with Personal Status use case that makes use of all the (Composite) AI Modules described above.
Figure 4 – Reference Model of Conversation with Personal Status (MPAI-CPS)
MPAI is seeking proposals of data formats and reference models for the identified application areas. The deadline for submitting a response is September 20 at 23:59 UTC. Those intending to submit a response should become fully familiar with the following documents:
Call for Technologies | html, pdf |
Use Cases and Functional Requirements | html, pdf |
Framework Licence | html, pdf |
Template for responses | html, docx |
See also the video recordings (YouTube, WimTV) and the slides of the presentation made on 07 September.