Cento. Join the waitlist
← Articles
The data & the fix

AI citation fabrication: the data, and what to do about it

AI writing tools invent references that look perfect and don't exist. In biomedical papers, fabricated citations rose about twelve-fold between 2023 and early 2026 — reaching roughly one in every 277 papers. Here's what the evidence shows, why medicine is the worst place for it to happen, and the one design choice that makes fabrication impossible rather than merely rarer.

Last updated June 30, 2026·Updated as new studies land
1 in 277

biomedical papers in early 2026 cited a reference that doesn't exist — up from 1 in 2,828 in 2023.

≈12×

the rise in fabricated references since 2023, with the sharpest jump in mid-2024 as AI writing tools spread.

18–91%

of citations from AI chatbots were fabricated across studies — ~18% for GPT-4, up to 91% for Bard in a systematic-review test.

Sources: The Lancet (Topaz et al., 2026) · Nature · Nature Scientific Reports (2023) · JMIR (2024)

How common are fabricated citations in research papers?

Common enough that the published record is now measurably affected. In May 2026, a team led by Maxim Topaz at Columbia University reported in The Lancet an audit of nearly 2.5 million biomedical papers indexed in PubMed between January 2023 and February 2026. Across 97.1 million verified references, they identified 4,046 fabricated citations across 2,810 papers.

The trend is what matters. The rate of papers containing a non-existent reference climbed from 1 in 2,828 in 2023 to 1 in 458 in 2025 to roughly 1 in 277 in the first seven weeks of 2026 — about a twelve-fold increase, as Nature and Retraction Watch reported. The authors place the sharpest rise in mid-2024, coinciding with the spread of generative AI writing tools. Review articles, which lean hardest on long reference lists, carry elevated risk.

Why does AI invent citations?

A language model generates a reference the same way it generates a sentence: by predicting the next most likely token. A plausible author string is followed by a title-shaped phrase, a journal that frequently co-occurs with the topic, and a year-shaped number. Every piece is statistically plausible; whether the assembled paper exists is simply not a question the model is equipped to ask. That is why fabricated citations are so convincing — the format is perfect precisely because format is all the model is modelling. (We unpack the mechanism in how AI fabricates a citation.)

The rates bear this out. A widely cited 2023 analysis in Nature Scientific Reports found 55% of GPT-3.5's citations and 18% of GPT-4's were fabricated, with further errors in many of the rest. In a systematic-review test, hallucination rates reached 28.6% for GPT-4 and 91.4% for Bard. Newer models fabricate less than older ones — but "less" is not "never," and in a clinical manuscript the difference matters.

Why fabrication is worse in medicine

In most writing, a wrong reference is a nuisance. In a clinical manuscript it is a liability. A fabricated citation can mean a correction, a retraction, or — worse — a clinical takeaway built on evidence that was never there. Medical claims carry directly into patient care, so the standard for a reference isn't "sounds right" but "opens, resolves, and supports the exact sentence it's attached to." The cruel irony is that the tools that draft fastest are the ones most likely to invent: speed and fabrication come from the same mechanism.

Why "grounded" tools still fabricate: verify-before vs verify-after

Many tools now say they're "grounded in real papers." The decisive question is when the grounding happens. If a model writes freely and the references are checked afterward, there is still a window in which it can invent — and after-the-fact checking catches some, not all. Verification-after reduces the rate; it does not close the gap. The only way to make fabrication impossible is to remove the opportunity: never let the model choose a reference at all.

Verification after the fact makes fabrication rarer. Grounding before the fact makes it impossible.

What to do about it

If you write with AI today, protect yourself two ways.

Verify what you publish. Resolve every DOI; confirm each reference in PubMed or Crossref; check that the authors, journal, and year match a paper that actually exists; and read enough of the source to confirm it supports your claim. A reference you can't open is a reference you can't defend.

Or use a tool where fabrication can't occur. This is the approach Cento is built around — grounding before generation, in four steps:

  1. Retrieve first. Build a candidate set of real, retrieved papers from PubMed, OpenAlex, Semantic Scholar, and Europe PMC before any prose is written. That set is the only material the model may cite.
  2. Constrain the output. The model returns prose plus citation slots that reference candidates by ID — it has no way to name a paper that isn't in the retrieved set.
  3. Validate server-side. Every citation is checked against the candidate set before it reaches you. No supporting source means an honest [UNCITED] flag — never an invention.
  4. Watch for drift. As you revise, each claim is re-scored against its source, so a sentence that wanders from its evidence is flagged before a reviewer finds it.

See how Cento makes a fabricated citation impossible in detail, or read the rest of the citation-integrity series. Common questions are answered on the FAQ.

Frequently asked

Are AI-generated citations reliable?

Often not. A May 2026 Lancet audit of 2.5 million biomedical papers found fabricated references had risen about 12-fold since 2023, reaching roughly 1 in 277 papers in early 2026. Measured fabrication rates for AI chatbots range from ~18% (GPT-4) to over 90% (Bard, in one systematic-review test). Tools that generate first and verify later inherit this risk.

How does Cento make a fabricated citation impossible?

Cento grounds before the model writes a word: it retrieves real papers, constrains the model to cite only from that retrieved set, validates every citation against it, and re-scores claims as you revise. Fabrication isn't caught after the fact — it can't occur.

What is citation drift?

Citation drift is when a claim that was accurate in an early draft no longer matches its source after you've revised the sentence. Cento re-scores each claim against its bound source as you edit and flags drift while it's still yours to fix. More in what citation drift is.

Write papers you can defend

Cento is the AI co-writer that can't fabricate a citation. Built for medical researchers, starting in ophthalmology.

Join the waitlist