An analysis of thousands of academic papers from a prestigious artificial intelligence conference has uncovered dozens of fake citations generated by AI models. The findings raise new questions about academic integrity and the reliability of large language models (LLMs) even when used by top experts in the field.
A scan conducted by AI detection startup GPTZero identified at least 100 fabricated citations across 51 different papers. These papers were all accepted by the Conference on Neural Information Processing Systems (NeurIPS), one of the most respected gatherings for AI researchers globally.
Key Takeaways
- AI detection firm GPTZero scanned 4,841 papers from the recent NeurIPS conference.
- The scan found 100 confirmed "hallucinated" or fake citations.
- These fake citations appeared in 51 different papers authored by leading AI experts.
- The findings highlight the challenges of ensuring accuracy when using AI for academic writing.
A Hidden Flaw in Premier Research
The discovery was made after GPTZero systematically analyzed every paper accepted by the NeurIPS conference, which was held last month in San Diego. The company's tools were designed to spot content likely generated by AI, including a phenomenon known as "hallucination," where an AI model confidently presents false information as fact.
In this case, the hallucinations manifested as citations for non-existent studies, articles, or books. While researchers often use LLMs to help with tedious tasks like formatting bibliographies, the presence of entirely fabricated sources points to a lack of verification in the final stages of writing.
Being accepted to NeurIPS is a significant accomplishment in the AI community, often seen as a mark of high-quality, rigorously vetted work. The conference prides itself on a robust peer-review process where multiple experts scrutinize each submission.
The Challenge of Peer Review
The fact that these fabricated citations slipped past reviewers highlights a growing strain on the academic publishing system. With an increasing number of submissions, a problem described as a "submission tsunami," reviewers face immense pressure.
A May 2025 paper titled “The AI Conference Peer Review Crisis” previously discussed the mounting difficulties that premier conferences like NeurIPS face in maintaining review quality amid a flood of submissions.
Catching a cleverly fabricated citation within a list of dozens of legitimate ones is a difficult task, especially under tight deadlines. GPTZero noted that its findings are not an indictment of the peer reviewers but rather an illustration of how AI-generated errors can subtly infiltrate even the most secure academic pipelines.
Putting the Numbers in Perspective
While the discovery of 100 fake citations is notable, it is important to consider the scale. The analysis covered 4,841 papers, each containing dozens of references. This means the total number of citations reviewed was likely in the tens of thousands, making the 100 fakes a statistically tiny fraction.
By the Numbers
- Papers Scanned: 4,841
- Papers with Fake Citations: 51 (approx. 1.1%)
- Confirmed Fake Citations: 100
Officials from NeurIPS stated that an incorrect reference does not automatically invalidate the core research presented in a paper. In a statement to Fortune, which first reported the story, the organization emphasized that the scientific contributions of the papers are not necessarily compromised by these citation errors.
However, the issue extends beyond simple typos or formatting mistakes. A fabricated citation represents a fundamental breakdown in the scholarly process, where claims are expected to be backed by verifiable sources.
Why Citations Are the Currency of Academia
In the academic world, citations are more than just a list of sources at the end of a paper. They form the bedrock of scientific discourse, allowing researchers to build upon, verify, and challenge the work of others. They are also a critical metric for a researcher's career.
"Citations are the currency of academia. They measure influence, track the evolution of ideas, and establish the credibility of new research. When they are fabricated, it erodes the foundation of trust that science is built on."
The number of times a researcher's work is cited by others is often used to gauge their influence and impact on their field. This metric can affect everything from job promotions and grant funding to professional reputation. When AI models invent citations, they not only introduce falsehoods but also dilute the value of this crucial system.
This creates a difficult situation where the very tools being studied at a conference like NeurIPS are also introducing a new layer of potential error into the research process itself.
An Ironic Lesson for the AI Community
The most striking aspect of this discovery is who was involved. These were not students or novice researchers, but the world's leading experts in artificial intelligence. If these individuals, with their deep understanding of LLMs and their professional reputations on the line, can inadvertently publish AI-generated falsehoods, it raises a critical question.
What does this mean for the average person, business, or organization relying on these same AI tools for important tasks?
The incident serves as a powerful reminder that AI is a tool that requires constant human oversight and verification. Researchers, while likely knowing the papers they intended to cite, may have overlooked the LLM's errors during the final review. This points to a potential over-reliance on the technology's output.
Ultimately, the discovery of these phantom references at NeurIPS is less about a major academic scandal and more about a critical learning moment. It underscores the urgent need for robust verification protocols and a healthy skepticism toward AI-generated content, especially in fields where accuracy and truth are paramount.





