Why Large Language Models Still Struggle With Research-Level Mathematics

Can Artificial Intelligence Really Do Mathematics?
As artificial intelligence continues to advance at breathtaking speed, a growing number of students, researchers, and professionals are asking an uncomfortable question: If machines can solve problems better than humans, does human creativity still matter?
For mathematics, a field often viewed as the purest form of human reasoning, that question cuts especially deep.
A recent initiative led by some of the world’s most respected mathematicians suggests that, at least for now, the “magic” of mathematics remains firmly human.
When AI Meets Real Mathematics
Large language models have become remarkably good at solving textbook-style problems, summarizing known proofs, and generating convincing explanations. But according to mathematicians like Martin Hairer, winner of the Fields Medal, this apparent competence masks a deeper weakness.
“These systems are good at recombining known arguments,” Hairer explains. “But I haven’t seen any plausible example of an AI coming up with a genuinely new mathematical idea.”
To test this limitation rigorously, Hairer and several colleagues launched a project called First Proof — an experiment designed to measure how well AI systems perform on genuine, unpublished research problems.
What Is the “First Proof” Experiment?
Unlike traditional AI benchmarks, which often rely on curated or artificial problems, First Proof uses real questions drawn directly from ongoing mathematical research.
Each participating mathematician contributed a problem they were actively working on — problems that:
- Had no existing solutions online
- Required original reasoning
- Reflected authentic research challenges
The goal was not to trick AI systems, but to evaluate their ability to reason beyond their training data.
How Did AI Perform?
The results were sobering.
When tested on state-of-the-art models, including OpenAI and Google systems, the researchers observed consistent patterns:
- Answers that looked confident but skipped critical reasoning steps
- Excessive detail on trivial parts, and vague explanations at key moments
- Logical loops where the model repeatedly revised its own “final” answer
- Responses that solved a related problem — but not the one asked
One researcher compared reading the output to reviewing work from a struggling undergraduate: confident tone, incomplete logic, and a hopeful “therefore” at the end.
Why This Matters More Than It Seems
The concern is not that AI will replace mathematicians tomorrow — but that overhyping AI’s abilities could cause real harm today.
The researchers warn that exaggerated claims about AI “solving mathematics” may:
- Discourage young students from entering the field
- Reduce funding for fundamental research
- Create misplaced trust in systems that still require expert oversight
While AI is already a powerful tool for mathematicians, it is far from being an independent researcher.
The Human Core of Mathematical Discovery
Mathematics research is not just about solving equations. It involves:
- Identifying meaningful questions
- Designing frameworks to approach them
- Solving and validating results rigorously
The First Proof project focuses only on the third step — the most measurable part — and AI still struggles there.
The first two steps, where creativity, intuition, and taste matter most, remain deeply human.
AI as a Tool, Not a Colleague
Despite frequent comparisons between AI and human collaborators, many mathematicians remain unconvinced.
Unlike human colleagues, AI systems lack independent perspectives, intellectual disagreement, and genuine curiosity. They reflect the viewpoints encoded into them — and those viewpoints can persist indefinitely.
Some researchers worry this could slow scientific progress by reinforcing existing ideas rather than challenging them.
The Verdict: Mathematics Is Still Safe — For Now
The First Proof experiment doesn’t argue against AI in mathematics. Instead, it adds nuance to the conversation.
AI will continue to assist, accelerate, and support research. But when it comes to creating new ideas, asking the right questions, and advancing the field in unexpected directions, humans are still irreplaceable.
For students wondering whether mathematics is losing its magic, the answer seems clear:
The magic is still there — and it’s still human.