Microsoft's Magma: A Unified AI Model for Digital and Physical Interaction

Edited by: Veronika Radoslavskaya

Microsoft Research has unveiled Magma, an integrated AI foundation model that combines visual and language processing to control software interfaces and robotic systems. Unlike previous AI systems that require separate models for perception and control, Magma integrates these capabilities into a single base model. Microsoft positions Magma as a step towards agentic AI, enabling it to autonomously plan and execute complex tasks. Magma builds on transformer-based LLM technology, incorporating spatial intelligence through training with images, videos, robotics data, and UI interactions. This allows Magma to act as a truly multimodal agent, capable of navigating user interfaces and manipulating physical objects based on user-defined goals.

Did you find an error or inaccuracy?

We will consider your comments as soon as possible.