The Grand Challenge on Multimodal Superintelligence
Text, Audio, Vision, and 3D
multimodal-ai.com
Call for Participation
Lambda Research invites researchers, engineers, and practitioners to participate in the Grand Challenge on Multimodal Superintelligence, an open initiative to design the blueprints for next-generation open-source multimodal AI systems. Participating teams may receive up to $20,000 in Lambda.ai compute credits per team to accelerate the development of their models. This challenge provides both technical resources and a collaborative platform for advancing the science and engineering of multimodal intelligence. Visit multimodal-ai.com for more information.
Scope and Objectives
The Grand Challenge spans text, audio, vision, and 3D data, with a central focus on developing any-to-any multimodal models. Participants are expected to build systems capable of accepting arbitrary subsets of modalities as input and producing arbitrary subsets as output.
Key goals include:
- Exploring architectures that enable seamless integration across diverse modalities.
- Demonstrating proof-of-concept innovations in flexible “any-to-any” generation.
- Advancing open-source frameworks that reduce data preprocessing burdens through provided custom data-loader utilities, allowing participants to concentrate on modeling innovations.
Participation Tracks
Participants may join under one of three categories:
- Sponsored Participants – Teams awarded compute credits (up to $20,000 per team) based on the strength of their proposal.
- Alpha Participants – Sponsored teams who additionally contribute to the alpha version of our streaming server by porting datasets into our universal data format. These participants receive extra credits for their contributions.
- Independent Participants – Teams opting to participate without compute sponsorship.
Specialization
While the vision is “any-to-any” multimodal capability, teams may specialize in one or two modalities. Such specialization must be explicitly justified in order to qualify for compute sponsorship.
Timeline
- Challenge Begins:
- September 2, 2025
- Private Test Example Set with Labels Released:
- October 5, 2025
- Private Test Service Open:
- October 15, 2025
- Last Call for Private Test Submissions:
- December 10, 2025
- Winners announcement:
- December 10, 2025
(All deadlines are 11:59 PM, anywhere on Earth.)
Evaluation and Criteria
The primary evaluation criterion is the originality and potential of the idea. Participants must provide proof-of-concept results by December 10, 2025. Fully developed foundation models are not required at this stage; rather, emphasis will be placed on creativity, feasibility, and prospects for scaling.
Outstanding teams from the first stage may receive extended support from Lambda to scale their systems into open-source foundation models.
Vision
This Grand Challenge is not merely a competition but a collaborative movement: to build AI that sees, hears, reads, speaks, and reasons. Together, we aim to set the foundation for the next generation of open-source multimodal superintelligence.
How to Participate
Proposals and applications for sponsorship should be submitted via the challenge platform (multimodal-ai.com). Registered teams will receive full participation guidelines, dataset access, and instructions for submitting their work.