You are currently viewing A look inside the MPAI-XRV project

A look inside the MPAI-XRV project

  • Post author:
  • Post category:MPAI

XR Venues is an MPAI project (MPAI-XRV) addressing use cases enabled by Extended Reality (XR) technologies – the combination of Augmented Reality (AR), Virtual Reality (VR) and Mixed Reality (MR) – and enhanced by Artificial Intelligence (AI) technologies. The word “venue” is used as a synonym for “real and virtual environments”.

The XRV group has identified some 10 use cases and made a detailed analysis of three of them: eSports Tournament, Live theatrical stage performance, and Experiential retail/shopping.

How did XRV become an MPAI project? MPAI responds to industry needs with a rigorous process that includes 8 phases starting from Interest Collection up to Technical Specification. The initial phase of the process:

  1. Starts with the submission of a proposal triggering the Interest Collection stage where the interest of people other than the proposers is sought.
  2. Continues with the Use Cases stage where applications of the proposal are studied.
  3. Concludes with the Functional Requirements stage where the AI Workflows implementing the developed use cases and their composing AI Modules are identified with their functions and data formats.

Let’s see how things are developing in the XR Venues project (MPAI-XRV) now at the Functional Requirements stage. We will describe the use case of  the eSports Tournament game. This consists of two teams of 3 to 6 players arranged on either side of a real world (RW) stage, each using a computer to compete within a real-time Massively Multiplayer Online game space.

Figure 1 – An eSports Tournament

The game space occurs in a virtual world (VW) populated by:

  1. Players represented by avatars each driven by role (e.g., magicians, warriors, soldier, etc.), properties (e.g., costumes, physical form, physical features), and actions (e.g., casting spells, shooting, flying, jumping).
  2. Avatars representing other players, autonomous characters (e.g., dragon, monsters, various creatures), and environmental structures (e.g., terrain, mountains, bodies of water).

The game action is captured by multiple VW cameras and projected onto a RW immersive screen surrounding spectators and live streamed to remote spectators as a 2D video with all related sounds of the VW game space.

A shoutcaster calls the action as the game proceeds. The RW venue (XR Theatre) includes one or more immersive screens where the image of RW players, player stats or other information or imagery may also be displayed. The same may also be live streamed. The RW venue is augmented with lighting and special effects, music, and costumed performers.

Live stream viewers interact with one another and with commentators through live chats, Q&A sessions, etc. while RW spectators interact through shouting, waving and interactive devices (e.g., LED wands, smartphones). RW spectators’ features are extracted from data captured by camera and microphone or wireless data interface and interpreted.

Actions are generated from RW or remote audience behaviour and VW action data (e.g., spell casting, characters dying, bombs exploding).

At the end of the tournament, an award ceremony featuring the winning players on the RW stage is held with great fanfare.

eSports Tournament is a representative example of the XRV project where human participants are exposed to real and virtual environments that interact with one another. Figure 1 depicts the general model representing how data from a real or virtual environment are captured, processed, and interpreted to generate actions transformed into experiences that are delivered to another real or virtual environment.

Figure 2 – Environment A to Environment B Interactions

Irrespective of whether Environment A is real or virtual, Environment Capture captures signals and/or data from the environment, Feature Extraction extracts descriptors from data, and Feature Interpretation yields interpretations by analysing those descriptors. Action Generation generates actions by analysing interpretations, Experience Generation      translates action into an experience, and Environment Rendering delivers the signals and/or data corresponding to the experience into Environment B whether real or virtual. Of course, the same sequence of steps can occur in the right-to-left direction starting from Environment B.

A thorough analysis of the eSports Tournament use case has led the XRV group to develop the reference model depicted in Figure 3.

Figure 3 – Reference Model of eSports Tournament

The AI Modules on the left-hand side and in the middle of the reference model perform the Description Extraction and Descriptor Interpretation functions identified in Figure 2. The data generated by them are:

  1. Player Status is the ensemble of information internal to the player, expressed by Emotion, Cognitive State, and Attitude estimated from Audio-Video-Controller-App of the individual players.
  2. Participants Status is the ensemble of information, expressed by Emotion, Cognitive State and Attitude of participants, estimated from the collective behaviour of Real World and on-line spectators in response to actions of a team, a player, or the game through audio, video, interactive controllers, and smartphone apps. Both data types are similar to the Personal Status developed in the context of Multimodal Conversation Version 2.
  3. Game State is estimated from Player location and Player Action (both in the VW), Game Score and Clock.
  4. Game Action Status is estimated from Game State, Player History, Team History, and Tournament Level.

The four data streams are variously combined by the three AI Modules on the right-hand side to generate effects in the RW and VW, and to orientate the cameras in the VW. These correspond to the Action Generation, Experience Generation and Experience Rendering of Figure 2.

The definition of interfaces between the AI Modules of 3 will enable the independent development of those AI Modules with standard interfaces. An XR Theatre will be able to host a pre-existing game and produce an eSports Tournament supporting RW and VW audience interactivity. To the extent that the game possesses the required interfaces, the XR Theatre also can drive actions within the VW.

eSports has grown substantially in the last decade. Arena-sized eSport Tournaments with increasing complexity are now routine. An XRV Venue dedicated to eSports enabled by AI can greatly enhanced the participants’ experience with powerful multi-sensory, interactive and highly immersive media, lowering the complexity of the system and the required human resources. Standardised AI Modules for an eSports XRV Venue enhance interoperability across different games and simplify experience design.