I’m a Professional Fact-Checker. AI Is Wrong More Often Than You Think

Recorded: May 26, 2026, 1:11 p.m.

Original

Summarized

I’m a Professional Fact-Checker. AI Is Wrong More Often Than You Think | WIREDSkip to main contentMenuSECURITYPOLITICSTHE BIG STORYBUSINESSSCIENCECULTUREREVIEWSMenuAccountAccountNewslettersSecurityPoliticsThe Big StoryBusinessScienceCultureReviewsChevronMoreExpandThe Big InterviewMagazineEventsWIRED InsiderWIRED ConsultingNewslettersPodcastsVideoLivestreamsMerchSearchSearchMeghan HerbstThe Big StoryMay 26, 2026 6:00 AMAI Just Isn’t RightCan AI do fact-checking? A WIRED fact-checker fact-checks.Photo-illustrations: Jobanny Cabrera; Getty ImagesCommentLoaderSave StorySave this storyCommentLoaderSave StorySave this storyNearly half of Americans say they use AI to find information and generate ideas. It’s not hard to see why. As social media devolves into slop—and Google into a glorified landing page for Reddit threads and content farms—most of us are starved for something reliable. Plus, chatbots are so helpful, aren’t they? The first time I interacted with one, I asked if it knew it was a huge drain on resources. Half an hour later, I had a new recipe for vegan cream cheese.I never tried the recipe. Instead, I found a human-created one that the LLM might have scraped. That’s the way these models work, of course. They repackage collective knowledge into something that feels tailored to you. This may be OK for dairy alternatives (unless you’re a vegan blogger). But on the order of the world, and truth—the focus of my role as a fact-checker at WIRED—the stakes are exponentially higher.Over the past year or so, more and more people have looked at me with great pity. Surely a fact-checker at a magazine isn’t long for this AI-upgraded world. Call me foolish, but I’m not that worried. Very little of humanity’s collective knowledge, I’ve concluded, lives on the internet. And according to my research, AI is even more wrong than people might think.Tom Wolfe evidently thought of fact-checkers, according to the writer Colin Dickey, as a “cabal of women and middling editors all collaborating to henpeck and emasculate the prose of the Great Writer.” As definitions go, it’s not bad (though my boss and many colleagues are men). What can I say? It’s our job, unlike AI’s, to be annoying.WIRED’s fact-checking department is old-school: meticulous line-by-line annotations, primary sources whenever possible, and a broader-scale ethical and legal review. We question basic assumptions, look for new or conflicting information, call and talk to people—make sure. It’s a quick-hit peer review, functioning as best it can at the same pace as the news itself.AI PanicTake this mandatory AI workplace training right now—or else.As far as I can tell, AI hasn’t come for this process quite yet. What it has come for is “post hoc” fact-checking, the Snopes-style analysis of something’s factuality after the fact. In the UK, an initiative called Full Fact has built out its own AI tools to help thwart the spread of misinformation. These tools, used in more than 40 countries, process huge volumes of data, from social media posts to podcast transcripts, then pinpoint specific claims that humans can investigate further. “You definitely need a human being,” says Mark Frankel, Full Fact’s head of public affairs.The reason for that is simple: AI still gets things wrong. As a fact-checker, I’d love to be able to tell you exactly how often. But it’s not so easy. Since 2018, nearly 17,000 papers have been posted to arXiv on LLMs, many focused specifically on the question of their reliability. Still, it’s worth trying to pin down a working figure.In any article that comes across WIRED’s fact-checking desk, there’s usually a decent amount of “b-matter”: statistics, news events, quotes, anything that helps contextualize the topic. Fact-checkers tend to Google this basic information, and that process, in the form of the search engine’s dreaded AI Overviews, constitutes my main interaction with AI. In my professional opinion, it’s unusable—wrong—about a third of the time.This might be a generous assessment, though. A March 2025 study from the Tow Center for Digital Journalism found that more than 60 percent of responses from AI-powered search engines were inaccurate. A BBC study puts the wrongness of chatbots closer to 45 percent, the number I see cited more often. Because percentages are distancing, let me put this more plainly: AI could be wrong about half the time.Does it matter which model? Elon Musk has said Grok is the smartest, but I haven’t seen much research that agrees. Claude led the pack in RealFactBench, a fact-checking-focused benchmark test developed by computer scientists in China and the UK last year. It scored 73 percent accuracy across all metrics. (To be fair, Grok was not assessed.) Another benchmark, SimpleQA, developed by OpenAI in October 2024, posed more than 4,000 single-answer questions to models from OpenAI and Anthropic. None of the models exceeded 50 percent accuracy. Google updated the benchmark earlier this year, winnowing the question set to 1,000. Gemini 2.5 Pro came out on top, with 55.6 percent accuracy.Then there’s the models’ own assessments. When I asked ChatGPT how accurate the major LLMs are, it told me that most models had 90 to 96 percent accuracy on some professional-style tests. It then offered a link, confusingly, to a paper on a sleep medicine certification exam. On “general real-world questions,” it simply offered me the rate at which models like it have been shown to hallucinate: 1 to 2 percent, apparently, though when I tried to click through to that referenced source, it didn’t exist.Some say the models are getting smarter, but this doesn’t necessarily mean fewer hallucinations. In fact, it could mean more, a kind of overcompensation rooted ineradicably in their programmed need to please users. In a 2025 report on the future of AI by the Association for the Advancement of Artificial Intelligence, 60 percent of surveyed researchers doubted that the “factuality” problem would be solved anytime soon.When would-be fact-checkers apply for a position, most are given a test. In my case, the test involved a story about an alleged robocalling kingpin, and I was tasked with writing a memo detailing how I’d go about checking the piece for accuracy. At the end, three quick-fire bonus questions aimed to suss out how I’d handle individual facts.Recently, I dug out that old test and gave it to (the free versions of) ChatGPT, Claude, Gemini, and Grok.Grok came out of the ether like I was interrupting its supper: “Yes, I know exactly what fact checking is.” OK. It talked a lot about bias and put “credible” and “truth” in very loud quotation marks. It was also obsessed with data, along with gathering and analyzing more data than would ever be practicable or possible for a working fact-checker. It did, somewhat to my surprise, point out that fact-checking was historically women’s work.Claude and Gemini did pretty well. They understood the task, laid out a reasonable approach, even flagged potential legal issues. Gemini did give me this very cringe phrase: I would look for “Paper Trails” to back up the “People Trails.”ChatGPT seemed overeager and insecure. It spoke in buzzwords and generalizations. The approach it laid out seemed very time-consuming (including building a fact-checking grid where each sentence was broken apart and diagrammed). It offered to show me how it would “mark it up,” exactly “like a professional fact checker.” It then generated a paragraph that didn’t exist in the story. We tried that for a while, and then it offered to check a real paragraph for me. I gave it a fairly googleable selection, but it didn’t actually check any facts. None of the models did. They all gave me a plan of attack, told me exactly what they would do, and then stopped short of actually doing it.“I don’t think it’s an option to sit AI out as some kind of fad or something that won’t dramatically impact how people find information,” says Angie Holan, head of the International Fact-Checking Network, a Poynter initiative that connects more than 170 fact-checking organizations across the world. Holan says she finds herself more comfortable with AI than some of her colleagues are. If a model leads you to authoritative sources that you are able to verify yourself, there you go, she says. Fact-checkers, journalists, librarians, archivists—all should be engaging with these models, learning how they’re put together: “That way you can understand the strengths and weaknesses of these tools,” she says.I don’t disagree. In fact, the more time I spend with AI, the more capable I feel as a human fact-checker.Once we get past the googleable b-matter, my job really gets fun. It’s why I still get a thrill when I find some bit of information that doesn’t exist on the internet—a particular sign at a border crossing, the rates of kelp growth in two different climates, whether or not there was a Burger King at a particular LA intersection in 1979. AI systems can’t stay on the phone with a widow for over an hour because asking difficult questions turned on a fountain of grief that needed care and human receptivity. It can’t suss out that there’s beef between two sources which may be blurring the edges of what counts as “factual.” It can’t tell that an email with the phrase “Thanks for your email!” may, perhaps, be passively hostile.Most physical media in the world remains offline. In Lost in Time: Our Forgotten and Vanishing Knowledge, Jack Bialik points out that the technologies and knowledge bases we assumed were recent are actually in many cases millennia old (assembly lines, cataract surgery, even batteries). “Perhaps even more sobering is the realization that our storage technologies are far more likely to succumb to deterioration and useful obsolescence than hieroglyphics or ancient Sanskrit carved in a pyramid or on a temple wall,” he writes.Years ago, during a fact-checking assignment, I talked to the sci-fi writer and history professor Ada Palmer, who told me what she often tells her students: We know less than 1 percent of what happened 500 years ago, and two-thirds of what we know is wrong. Knowledge exists on a timeline too, and the work of generations is carrying on that knowledge without little bits slipping through and getting lost. Are we really OK entrusting our legacy to a bunch of distributed servers, operated by microchips with lifespans of 5 to 10 years?One final thing that I’ve been ignoring, which is so very human of me, is that humans make mistakes too. As Holan reminded me, abstaining from chatbots isn’t some foolproof saving grace. At least, I’m 33 to 90 percent sure that’s what she said. At the end of our interview, when I looked down at my recorder, I found I’d forgotten to turn it on.What Say You?Let us know what you think about this article in the comments below. Alternatively, you can submit a letter to the editor at [email protected].CommentsBack to topTriangleContinue Your AI EducationTake This Mandatory AI Workplace Training—or ElseMeet the Sad Wives of AICan Normies Really Vibe Code?Everyone Who Used to Make TV Is Now Secretly Training AIA WIRED Fact-Checker Fact-Checks AIHow AI Agents Plunged the Tech World Into ChaosIn your inbox: Will Knight’s AI Lab explores advances in AIMeghan Herbst is a senior research editor and contributing writer at WIRED. She earned a master’s degree in journalism from UC Berkeley and reported local news for The Press Democrat and Sonoma Magazine. Before becoming a journalist, Meghan served as a weather forecaster in the US Air Force. She grew ... Read MoreSenior Research EditorXTopicsartificial intelligencechatbotsinformationGoogleBooksethicsAI PanicJobsRead MoreI Work in Hollywood. Everyone Who Used to Make TV Is Now Secretly Training AIFor screenwriters like me—and job seekers all over—AI gig work is the new waiting tables. In eight months, I’ve done 20 of these soul-crushing contracts for five different platforms. It’s bad.Ruth FowlerI’m a Normie. Can Normies Really Vibe Code?Apparently anyone can vibe code anything these days. So Claude and I tried to make a database for tracking the petty grievances of the masses.Chris ColinMeet the Sad Wives of AIAre you married to a man who’s obsessed with AI? I’m so, so sorry.Alessandra RamEven If You Hate AI, You Will Use Google AI SearchThe search giant’s AI-crafted answers are so convenient, you’ll be sucked in—to the detriment of the web and the artists and thinkers behind it.Steven LevyThe Bloomberg Terminal Is Getting an AI Makeover, Like It or NotWIRED spoke with Bloomberg’s chief technology officer about the big, chatbot-style changes coming to the iconic platform for traders.Joel KhaliliMira Murati Wants Her AI to ‘Keep Humans in the Loop’The Thinking Machines Lab founder and former CTO of OpenAI tells WIRED she isn’t interested in automating people out of jobs. Instead, she’s building AI that can collaborate.Will KnightChatGPT Has ‘Goblin’ Mania in the US. In China It Will ‘Catch You Steadily’OpenAI’s chatbot has some weird linguistic tics in Chinese that are driving users crazy.Zeyi YangHe Couldn’t Land a Job Interview. Was AI to Blame?Armed with some Python and a white-hot sense of injustice, one medical student spent six months trying to figure out whether an algorithm trashed his job application.Todd FeathersNick Bostrom Has a Plan for Humanity’s ‘Big Retirement’The philosopher thinks humans should pursue advanced AI and the promise of a “solved world.”Steven LevyUsing AI for Just 10 Minutes Might Make You Lazy and Dumb, Study ShowsNew research suggests that reliance on AI assistants can have a negative impact on people’s ability to think and problem solve.Will KnightWhat It Will Take to Make AI SustainableResearcher Sasha Luccioni argues we need better emissions data and a better sense of how people are using AI in the first place.Molly TaftDemis Hassabis Thinks AI Job Cuts Are DumbThe CEO of Google DeepMind tells WIRED that companies should use the productivity gains of AI to do more, not lay people off.Will KnightWIRED is obsessed with what comes next. Through rigorous investigations and game-changing reporting, we tell stories that don’t just reflect the moment—they help create it. When you look back in 10, 20, even 50 years, WIRED will be the publication that led the story of the present, mapped the people, products, and ideas defining it, and explained how those forces forged the future. WIRED: For Future Reference.More From WIREDSubscribeNewslettersLivestreamsTravelFAQWIRED StaffWIRED EducationEditorial StandardsArchiveRSSSite MapAccessibility HelpReviews and GuidesReviewsBuying GuidesStreaming GuidesWearablesCouponsGift GuidesAdvertiseContact UsManage AccountJobsPress CenterCondé Nast StoreUser AgreementPrivacy PolicyYour California Privacy Rights© 2026 Condé Nast. All rights reserved. WIRED may earn a portion of sales from products that are purchased through our site as part of our Affiliate Partnerships with retailers. The material on this site may not be reproduced, distributed, transmitted, cached or otherwise used, except with the prior written permission of Condé Nast. Ad ChoicesSelect international siteUnited StatesLargeChevronItaliaJapónCzech Republic & SlovakiaFacebookXPinterestYouTubeInstagramTiktok

The presence of artificial intelligence in the information landscape raises significant concerns regarding veracity, prompting a discussion about the role and limitations of fact-checking. The author, a fact-checker at WIRED, notes that while nearly half of Americans use AI to find information, the context of reliable knowledge is increasingly compromised, leading to a heightened need for human scrutiny. The experience of interacting with chatbots reveals that these models repackage collective knowledge, which can be useful for certain topics like dairy alternatives but poses exponentially higher stakes regarding global truth.

The author argues that human fact-checkers possess a distinct advantage over AI because they engage with sources, question basic assumptions, and conduct broader ethical and legal reviews. This traditional method involves meticulous line-by-line annotations and seeking primary sources, functioning as a rapid peer review process. In contrast, AI is primarily useful for post hoc fact-checking—analyzing the factuality of information after it has been published. Initiatives like Full Fact in the UK have developed AI tools to process vast amounts of data from social media and transcripts to flag claims, emphasizing that human beings remain essential for this process because AI is inherently capable of error.

The reliability of AI systems is highly variable. The author posits that AI can be wrong about as much as half the time when interacting with search engine features, noting a March 2025 study that showed over sixty percent of responses from AI-powered search engines were inaccurate. While some models claim high accuracy, such as Claude and Gemini, fact-checking benchmarks like RealFactBench and SimpleQA have shown that no model consistently exceeds fifty percent accuracy across comprehensive testing. Although some models claim high accuracy in professional settings, the author observes that the programmed need to please users might lead to more hallucinations rather than fewer. A 2025 report indicated that most researchers doubted the "factuality" problem would be solved soon.

The process of testing AI models for fact-checking reveals differences in their approach. Grok demonstrated an intense focus on data and bias, emphasizing that fact-checking is historically a task for women. Claude and Gemini provided reasonable methodological approaches, while ChatGPT tended toward buzzwords and offered generated content that did not correspond to existing information, highlighting a failure to actually verify facts. This demonstrates that without human intervention and critical oversight, AI systems provide plans or suggestions rather than actual verified facts.

The author ultimately values the unique human capacity for nuanced, contextual understanding, especially concerning information that exists outside the easily searchable internet. AI cannot replicate the human ability to assess the passive hostility in an email or understand the emotional context behind a statement, skills essential for true fact-checking. Furthermore, the reliance on distributed digital storage for human legacy knowledge introduces risk, as knowledge has historically slipped through generations, and physical media remains offline. The author concludes that engaging with AI, rather than avoiding it, allows the fact-checker to become more capable, while stressing that abstaining from chatbots is not a foolproof defense against misinformation.