On March 23, Jensen Huang told Lex Fridman: “I think it’s now. I think we’ve achieved AGI.”
Two days later, on March 25, François Chollet launched ARC-AGI-3 — a benchmark specifically designed to test whether AI systems can genuinely learn and reason, rather than retrieve and pattern-match. The best AI system in the world scored 0.37% of human performance. Humans score 100%.
Both things are true simultaneously. That tells you something important — not about AI, but about the word “AGI.”
- Huang's AGI declaration
- March 23, 2026 — Lex Fridman Podcast
- ARC-AGI-3 launched
- March 25, 2026 — tests genuine learning, no instructions given
- Best model score (Gemini 3.1 Pro Preview)
- 0.37% of human performance
- GPT-5, Claude, all other frontier models
- Below 1%
- Human score
- 100%
- Prize for human-level AI performance
- $2 million — unclaimed
- Metaculus median forecast for AGI
- March 2028, range 2027–2045
What Huang actually said
The quote in full context matters. Fridman posed a specific hypothetical: could an AI start and grow a tech company to $1 billion in value? Huang’s response was that no 5-to-20 year wait was needed — it’s already here. His illustrative scenario: “It is not out of the question that a Claude was able to create a web service, some interesting little app that all of a sudden, a few billion people used for 50 cents, and then it went out of business again shortly after.”
That is a definition of AGI. It is one person’s definition, and it conveniently describes capabilities that Huang’s chips already enable. It is not the definition in any academic paper, any regulatory framework, or any research benchmark. It is also not the definition in the contract between Microsoft and OpenAI — which matters more than anyone is currently noting.
The contract that defines AGI with a number
Microsoft and OpenAI’s foundational agreement — originally 2019, updated through 2025 — contains an explicit financial definition of AGI: a system that can generate at least $100 billion in profits. This clause was designed by OpenAI to protect itself: once OpenAI declares AGI under this definition, Microsoft loses its exclusive cloud access rights to OpenAI’s technology.
Microsoft’s 2025 update added an independent expert verification panel requirement — meaning OpenAI cannot self-declare. Both parties’ IP rights are now extended through 2032, including whatever gets built post-AGI.
Under this definition, Huang’s “50-cent app that went out of business” is not AGI. Under this definition, nothing that currently exists is AGI. The contract has a financial test that no system has come close to meeting.
The benchmark that makes the claim falsifiable
ARC-AGI-3, launched two days after Huang’s statement, is the most direct empirical response to AGI claims available. Created by François Chollet, it tests something specific: can an AI system encounter a task it has never seen, with no instructions, learn the rules from scratch, and transfer that learning to new situations?
This is what human intelligence does routinely. It is what every frontier AI system demonstrably cannot do.
The test design: subjects see a grid-based puzzle, no instructions provided, and must infer rules through interactive exploration. Each level introduces novel patterns — deliberate transfer learning is required. Humans navigate this at 100%. Every frontier model — GPT-5, Gemini, Claude — scores below 1%. Chollet is offering $2 million to anyone whose AI reaches human-level performance. The prize is unclaimed.
ARC-AGI-3 is not the only gap. Current LLMs demonstrably fail at intuitive physics (determining whether a short video is physically plausible — models perform near chance), persistent memory and learning from experience, and what Yann LeCun calls world modeling — predicting the consequences of actions in a dynamic environment. LeCun’s argument: “Agentic systems cannot exist without predicting consequences of actions, and LLMs cannot do this.” He has publicly called Huang’s claims overblown at multiple forums including Davos this year.
The goalposts are a moving object
Elon Musk predicted AGI by 2025. That passed without AGI. He then predicted AGI by 2026. He is now claiming Tesla will “make AGI.” Gary Marcus has documented this pattern across multiple actors over eight years: bold prediction, missed deadline, quiet goalpost movement. The predictions are not falsifiable in practice because the term “AGI” changes definition to fit whatever the next claim requires.
This is not incidental — it is the mechanism. An unfalsifiable prediction generates media coverage at announcement, generates no accountability when missed, and can be refreshed indefinitely. The cycle continues as long as the word remains undefined.
The AAAI surveyed its members on this directly: 76% of AI researchers believe scaling up current approaches to achieve AGI is “unlikely” or “very unlikely.” The median of 1,700 independent forecasters on Metaculus puts AGI at March 2028, with an 80% confidence interval stretching to 2045.
Why this matters beyond the semantics
Inflated AGI claims have three real-world effects:
First, they drive capital allocation decisions by investors, governments, and companies that are based on a capability that doesn’t exist and may not arrive on any specific timeline.
Second, they create regulatory confusion. Policy frameworks being built around “AGI risk” use the term without defining it, which means any sufficiently capable system can be retroactively declared AGI or exempted from AGI — depending on who has the incentive to claim which.
Third, they crowd out coverage of the actual state of AI: systems that are genuinely transformative in narrow domains, genuinely limited in general reasoning, and genuinely uncertain in their trajectory. That’s a more complicated story. It’s also the true one.
Jensen Huang declared AGI achieved. François Chollet launched a benchmark two days later where the best AI in the world scored 0.37% of human performance on a task requiring genuine learning. Both things are true because "AGI" has no agreed definition — which is precisely why the claim is made.
The meaningful questions are not whether AGI has arrived but whether current systems can do specific things: learn from novel environments, build world models, generalize across domains without prior exposure. The benchmarks say no. The $2 million prize for human-level ARC performance remains unclaimed. The median expert forecast puts AGI seven years out, with wide uncertainty. That is the actual state of play.