$1 Million To Understand What Happens Inside LLMs

At Martian, we view interpretability (gaining a deep understanding of how AI models work) as the world's most important scientific problem.

To advance the field, we're announcing a $1M prize for work in interpretability, with a focus on code generation: currently the most prevalent use case for LLMs, and one we think is particularly well-suited to interpretability research.
‍

SCroll Down to read MORE

Submissions have closed. Martian may start another prize round later this year. Consider joining our newsletter to stay informed.

Join the Interp Insider List

TERMS & CONDITIONS

Why Interpretability Matters

Using AI models today is like alchemy: we can do seemingly magical things, but don't understand how or why they work.

You don't need chemistry to do incredible things. But chemistry gives you control. It's the difference between accidentally discovering phosphorus by boiling urine and systematically saving a billion lives by improving agriculture. As the march to AGI takes us down unknown paths of AI development and deployment, we need fundamental and generalizable ways of controlling models.

Consider what it would mean to have chemistry, not just alchemy, for code generation. Today's models are:

Unreliable, requiring constant developer intervention for long-horizon tasks
Unsafe, making undesirable changes when stuck or facing adversarial situations
Reward gaming, writing code that superficially passes tests rather than solving the underlying problem
Slow, especially when they go down wrong paths before finding the right one
Inefficient, requiring huge amounts of tokens, cost, and time
Opaque, where small changes to the agentic harness cause massive, unpredictable performance swings

All of these problems can be addressed by understanding why they happen and implementing principled fixes. That's not what companies do today. Instead, they boil data into post-training or prompts and hope the model behaves better.

Interpretability is already making a dent. We've done work in collaboration with researchers from Stanford, Oxford, and Anthropic that identifies and fixes Chain-of-Thought Hijacking using interpretability-based interventions. We've also generated substantial revenue deploying interpretability-powered solutions to enterprises.

Code generation is also a field uniquely suited to yielding interpretability insights. Code can be modeled formally, making it easier to map the concepts which models are using onto those used by humans and to check properties like safety and correctness. The problems in interpretability are also analogous to those in code generation; we are trying to extract human-understandable algorithms from the models that we train.

Chemistry emerged from practitioners tackling practical problems while applying enough rigor to learn generalizable lessons—and each success drove both deeper insight and greater investment. We think interpretability can follow the same path through code generation.

The Prize

We will award $1M to researchers whose work best advances interpretability. The prize will be distributed as:

Grants to fund promising research directions
Awards for completed work we judge to have made significant progress

While we're particularly excited about applications to code generation, we welcome any work that makes fundamental progress on interpretability. Strong foundational work will find downstream applications.

In Part 2, we go into technical details and lay out the four core problems we think are most important for the field—and the approaches we find most promising.

Researchers

Read Part 2 for our detailed research agenda.

PART 2

Others interested in following progress

Join the Interp Insider List

‍Thanks to Ana Marasović, Hadas Orgad, Jeff Phillips, John Hewitt, Leo Gao, Mor Geva, Neel Nanda, Stephen Casper, and Yonatan Belinkov for reviewing this RFP and providing feedback.

TERMS & CONDITIONS

Authors: Shriyash Upadhyay, Fazl Barez

Submissions have closed. Martian may start another prize round later this year. Consider joining our newsletter to stay informed.