Jensen Huang Says We've Achieved AGI. Five Researchers Have a Benchmark That Scores Every Frontier Model Below 1%.

On March 23, Jensen Huang told Lex Fridman: “I think it’s now. I think we’ve achieved AGI.”

Two days later, on March 25, François Chollet launched ARC-AGI-3 — a benchmark specifically designed to test whether AI systems can genuinely learn and reason, rather than retrieve and pattern-match. The best AI system in the world scored 0.37% of human performance. Humans score 100%.

Both things are true simultaneously. That tells you something important — not about AI, but about the word “AGI.”

The Scoreboard Nobody Is Discussing

Huang's AGI declaration: March 23, 2026 — Lex Fridman Podcast
ARC-AGI-3 launched: March 25, 2026 — tests genuine learning, no instructions given
Best model score (Gemini 3.1 Pro Preview): 0.37% of human performance
GPT-5, Claude, all other frontier models: Below 1%
Human score: 100%
Prize for human-level AI performance: $2 million — unclaimed
Metaculus median forecast for AGI: March 2028, range 2027–2045

What Huang actually said

The quote in full context matters. Fridman posed a specific hypothetical: could an AI start and grow a tech company to $1 billion in value? Huang’s response was that no 5-to-20 year wait was needed — it’s already here. His illustrative scenario: “It is not out of the question that a Claude was able to create a web service, some interesting little app that all of a sudden, a few billion people used for 50 cents, and then it went out of business again shortly after.”

That is a definition of AGI. It is one person’s definition, and it conveniently describes capabilities that Huang’s chips already enable. It is not the definition in any academic paper, any regulatory framework, or any research benchmark. It is also not the definition in the contract between Microsoft and OpenAI — which matters more than anyone is currently noting.

The contract that defines AGI with a number

Microsoft and OpenAI’s foundational agreement — originally 2019, updated through 2025 — contains an explicit financial definition of AGI: a system that can generate at least $100 billion in profits. This clause was designed by OpenAI to protect itself: once OpenAI declares AGI under this definition, Microsoft loses its exclusive cloud access rights to OpenAI’s technology.

Microsoft’s 2025 update added an independent expert verification panel requirement — meaning OpenAI cannot self-declare. Both parties’ IP rights are now extended through 2032, including whatever gets built post-AGI.

Under this definition, Huang’s “50-cent app that went out of business” is not AGI. Under this definition, nothing that currently exists is AGI. The contract has a financial test that no system has come close to meeting.

Conflict of interest note: Jensen Huang sells the GPU infrastructure that AI companies buy. His definition of AGI happened to exactly match what his current products enable. This is not evidence that he is wrong, but it is a reason to weigh the claim accordingly.

The benchmark that makes the claim falsifiable

ARC-AGI-3, launched two days after Huang’s statement, is the most direct empirical response to AGI claims available. Created by François Chollet, it tests something specific: can an AI system encounter a task it has never seen, with no instructions, learn the rules from scratch, and transfer that learning to new situations?

This is what human intelligence does routinely. It is what every frontier AI system demonstrably cannot do.

The test design: subjects see a grid-based puzzle, no instructions provided, and must infer rules through interactive exploration. Each level introduces novel patterns — deliberate transfer learning is required. Humans navigate this at 100%. Every frontier model — GPT-5, Gemini, Claude — scores below 1%. Chollet is offering $2 million to anyone whose AI reaches human-level performance. The prize is unclaimed.

"AI can do many things, but it cannot have general intelligence as long as this fundamental divide exists." — François Chollet, ARC Prize, March 25, 2026

ARC-AGI-3 is not the only gap. Current LLMs demonstrably fail at intuitive physics (determining whether a short video is physically plausible — models perform near chance), persistent memory and learning from experience, and what Yann LeCun calls world modeling — predicting the consequences of actions in a dynamic environment. LeCun’s argument: “Agentic systems cannot exist without predicting consequences of actions, and LLMs cannot do this.” He has publicly called Huang’s claims overblown at multiple forums including Davos this year.

The goalposts are a moving object

Elon Musk predicted AGI by 2025. That passed without AGI. He then predicted AGI by 2026. He is now claiming Tesla will “make AGI.” Gary Marcus has documented this pattern across multiple actors over eight years: bold prediction, missed deadline, quiet goalpost movement. The predictions are not falsifiable in practice because the term “AGI” changes definition to fit whatever the next claim requires.

This is not incidental — it is the mechanism. An unfalsifiable prediction generates media coverage at announcement, generates no accountability when missed, and can be refreshed indefinitely. The cycle continues as long as the word remains undefined.

The AAAI surveyed its members on this directly: 76% of AI researchers believe scaling up current approaches to achieve AGI is “unlikely” or “very unlikely.” The median of 1,700 independent forecasters on Metaculus puts AGI at March 2028, with an 80% confidence interval stretching to 2045.

Why this matters beyond the semantics

Inflated AGI claims have three real-world effects:

First, they drive capital allocation decisions by investors, governments, and companies that are based on a capability that doesn’t exist and may not arrive on any specific timeline.

Second, they create regulatory confusion. Policy frameworks being built around “AGI risk” use the term without defining it, which means any sufficiently capable system can be retroactively declared AGI or exempted from AGI — depending on who has the incentive to claim which.

Third, they crowd out coverage of the actual state of AI: systems that are genuinely transformative in narrow domains, genuinely limited in general reasoning, and genuinely uncertain in their trajectory. That’s a more complicated story. It’s also the true one.

Bottom Line

Jensen Huang declared AGI achieved. François Chollet launched a benchmark two days later where the best AI in the world scored 0.37% of human performance on a task requiring genuine learning. Both things are true because "AGI" has no agreed definition — which is precisely why the claim is made.

The meaningful questions are not whether AGI has arrived but whether current systems can do specific things: learn from novel environments, build world models, generalize across domains without prior exposure. The benchmarks say no. The $2 million prize for human-level ARC performance remains unclaimed. The median expert forecast puts AGI seven years out, with wide uncertainty. That is the actual state of play.

sources

The AGI Claims

Jensen Huang — Lex Fridman Podcast, March 23, 2026. "I think it's now. I think we've achieved AGI." https://www.benzinga.com/markets/tech/26/03/51425080/nvidia-ceo-jensen-huang-says-think-weve-achieved-agi
Fortune — "AGI definition: Jensen Huang's claim exposes how murky the concept really is." March 30, 2026. https://fortune.com/2026/03/30/agi-definition-jensen-huang-lex-fridman-deepmind-turing-text-cognitive-taxonomy/
Gizmodo — "Elon Musk Predicts AGI by 2026. He Predicted AGI by 2025 Last Year." https://gizmodo.com/elon-musk-predicts-agi-by-2026-he-predicted-agi-by-2025-last-year-2000701007

The Microsoft-OpenAI Contract

TechCrunch — "Microsoft and OpenAI have a financial definition of AGI." December 2024. https://techcrunch.com/2024/12/26/microsoft-and-openai-have-a-financial-definition-of-agi-report/
TechRadar — "Microsoft says once AGI is declared by OpenAI, it will be verified by independent experts." 2025. https://www.techradar.com/ai-platforms-assistants/chatgpt/microsoft-says-once-agi-is-declared-by-openai-it-will-be-verified-by-independent-experts-heres-why-thats-a-big-deal

ARC-AGI-3 and Benchmark Evidence

ARC Prize — "Launching ARC-AGI-3." March 25, 2026. https://arcprize.org/blog/arc-agi-3-launch Best frontier model (Gemini 3.1 Pro Preview) scores 0.37%. Humans score 100%. $2M prize for human-level performance.
DEV Community — "GPT-5, Claude, Gemini All Score Below 1%: ARC-AGI-3 Just Broke Every Frontier Model." 2026. https://dev.to/codepawl/gpt-5-claude-gemini-all-score-below-1-arc-agi-3-just-broke-every-frontier-model-5dbj

Researcher Pushback

Marcus, Gary — "Rumors of AGI's arrival have been greatly exaggerated." Substack, March 2026. https://garymarcus.substack.com/p/rumors-of-agis-arrival-have-been
LeCun, Yann — "LLMs are a dead end." Various public statements, Davos 2026. https://medium.com/data-science-collective/llms-are-a-dead-end-what-yann-lecun-is-really-arguing-for-instead-fbe46ecae436
Metaculus — "When will the first general AI system be devised, tested, and publicly announced?" Community forecast, updated March 2026. https://www.metaculus.com/questions/5121/when-will-the-first-general-ai-system-be-devised-tested-and-publicly-announced/ 1,700+ forecasters. Median: March 2028. 80% confidence interval: 2027–2045.