Hugging Face Packages Weaponized With a Single File Tweak
Recorded: May 13, 2026, 9:09 p.m.
| Original | Summarized |
Hugging Face Packages Weaponized With a Single File Tweak TechTarget and Informa Tech’s Digital Business Combine.TechTarget and InformaTechTarget and Informa Tech’s Digital Business Combine.Together, we power an unparalleled network of 220+ online properties covering 10,000+ granular topics, serving an audience of 50+ million professionals with original, objective content from trusted sources. We help you gain critical insights and make more informed decisions across your business priorities.Dark Reading Resource LibraryBlack Hat NewsOmdia CybersecurityAdvertiseNewsletter Sign-UpNewsletter Sign-UpCybersecurity TopicsRelated TopicsApplication SecurityCybersecurity CareersCloud SecurityCyber RiskCyberattacks & Data BreachesCybersecurity AnalyticsCybersecurity OperationsData PrivacyEndpoint SecurityICS/OT SecurityIdentity & Access Mgmt SecurityInsider ThreatsIoTMobile SecurityPerimeterPhysical SecurityRemote WorkforceThreat IntelligenceVulnerabilities & ThreatsRecent in Cybersecurity TopicsСloud SecurityLatAm Vibe Hackers Generate Custom Hacking Tools on the FlyLatAm Vibe Hackers Generate Custom Hacking Tools on the FlybyAlexander CulafiMay 13, 20265 Min ReadApplication SecurityIt's Patch Tuesday for Microsoft & Not a Zero-Day In SightIt's Patch Tuesday for Microsoft & Not a Zero-Day In SightbyJai VijayanMay 12, 20265 Min ReadWorld Related TopicsDR GlobalMiddle East & AfricaAsia PacificLatin AmericaSee AllThe EdgeDR TechnologyEventsRelated TopicsUpcoming EventsPodcastsWebinarsSEE ALLResourcesRelated TopicsResource LibraryNewslettersPodcastsReportsVideosWebinarsWhite Papers Partner PerspectivesDark Reading Resource LibraryСloud SecurityApplication SecurityEndpoint SecurityData PrivacyNewsHugging Face Packages Weaponized With a Single File TweakA tokenizer library file present in Hugging Face AI models can be manipulated to hijack the model's outputs and exfiltrate data.Alexander Culafi,Senior News Writer,Dark ReadingMay 12, 20264 Min ReadSource: Sidney Van den Boogaard via Alamy Stock PhotoHugging Face, an open source store for AI models and components, is open to an attack via the "tokenizer" layer that AI models use to make their outputs human readable. A cyberattacker could use the threat vector to implement a man-in-the-middle (MitM) approach where a .json file is used to intercept tool call arguments to redirect URL tokens through attacker infrastructure; this gives the threat actor "visibility into every URL the model accesses, API parameters, and any credentials embedded in those requests," HiddenLayer security researcher Divyanshu Divyanshu explained in a blog post released today.Hidden Layer tested its attack on Hugging Face models run locally using the SafeTensors, ONNX, and GGUF formats. SafeTensors is a model created by Hugging Face and is considered the de facto model standard for the platform; all three are supported by Hugging Face, and all three are popular for a variety of use cases. That said, this is a problem that could impact any platform used for running open source models like LlamaCPP and Ollama.Related:LatAm Vibe Hackers Generate Custom Hacking Tools on the FlyIt also only affects models run locally, as the attack relies on modifying local files. As such, models run through Hugging Face's Inference API, for example, are not impacted.Hugging Face did not respond to a request for comment.AI Tokenizer Flaw Lets Attackers Hijack Model OutputsA tokenizer is a kind of translator between human language and computer language for AI models. A model's output starts as a sequence of integer IDs that is then decoded through the tokenizer before the output reaches the user. Hugging Face specifically uses a tokenizer library file named "tokenizer.json" as the mapping for this decoding process in many of its models. Each entry in this file includes a string paired with an ID that can represent a word, subword fragment, or control token, and these libraries can include tens of thousands of entries. As HiddenLayer discovered, the long and short of it is that if an attacker gets ahold of this "tokenizer.json" file and makes even a single edit, they can use it to take direct control over anything the model outputs and possibly gain a foothold into the user's device. A primary way an attacker might use this attack in the wild is by taking an open source model, editing the tokenizer file, and then uploading the poisoned model to a public repository, thus distributing it to every downstream user that pulls it. "A tampered tokenizer.json is structurally identical to a legitimate one, so it passes through the normal model distribution pipeline without any special delivery mechanism," Divyanshu wrote.Related:Hackers Use AI for Exploit Development, Attack AutomationA particularly troubling aspect of the threat vector is that a model poisoned through its .json file would still most likely run correctly. As such, the blog highlights that if you deploy a model from a public repository, you are also deploying the tokenizer attached to it."Tokenizer.json ships as a plain text file alongside every model, but it determines what your deployed system actually does," Divyanshu wrote. "Treating it as configuration rather than as part of the trusted codebase is the gap this attack lives in."Tokenizer Hijacking: Negating a Supply Chain ThreatWhile other platforms may be impacted, Hugging Face will face much of the blast radius if attackers manage to take advantage of the supply chain risks here, as a top AI open source repository. For those that want to protect themselves, Kasimir Schulz, director of security research at HiddenLayer, tells Dark Reading check sums and signatures work if a model has been proven as safe, such as one released and signed by a corporation like Microsoft. "Right now there are no public, freely available automated scanners [for this specific issue]," he says.The researcher recommends that organizations make sure to scan third-party models and to use signed models in production when possible. Model signing is a cryptographic process which applies a digital signature to AI and machine-learning models to ensure they haven't been tampered with.Related:After Replacing TeamPCP Malware, 'PCPJack' Steals Cloud SecretsHugging Face, like all open source software platforms, has dealt with a range of malicious activity. Back in 2024, JFrog found more than 100 malicious models in the repository capable of executing code; a reality that defenders continue to reckon with in myriad open source AI model platforms. It has also had to contend with critical vulnerabilities of its own. Don't miss the latest Dark Reading Confidential podcast, How the Story of a USB Penetration Test Went Viral. Two decades ago Dark Reading posted its first blockbuster piece — a column by a pen tester who sprinkled rigged thumb drives around a credit union parking lot and let curious employees do the rest. This episode looks back at the history-making piece with its author, Steve Stasiukonis. Listen now!About the AuthorAlexander CulafiSenior News Writer, Dark ReadingAlex is an award-winning writer, journalist, and podcast host based in Boston. After cutting his teeth writing for independent gaming publications as a teenager, he graduated from Emerson College in 2016 with a Bachelor of Science in journalism. He has previously been published on VentureFizz, Search Security, Nintendo World Report, and elsewhere. In his spare time, Alex hosts the weekly Nintendo podcast Talk Nintendo Podcast and works on personal writing projects, including two previously self-published science fiction novels.See more from Alexander CulafiWant more Dark Reading stories in your Google search results?Add Us NowMore InsightsIndustry ReportsHow Enterprises Are Developing Secure ApplicationsInside RSAC 2026: security leaders reveal the risks redefining your defense strategyHow Enterprises Are Harnessing Emerging Technologies in CybersecurityDitch the Data Center: Understanding Flexible Cloud Infrastructure Security Management2025 State of MalwareAccess More ResearchWebinarsYour Guide to Securing AI Adoption in Your OrganizationWhat is the Right Role for Identity Threat Detection and Response (ITDR) in Your Organization?The New Attack Surface: How Attackers Are Exploiting OAuth to Own Your Cloud WorkspacePrompt Injection Is Just the Start: Securing LLMs in AI SystemsAnatomy of a Data Breach: What to Do if it Happens to YouMore WebinarsEditor's ChoiceThreat IntelligenceFrom Stuxnet to ChatGPT: 20 News Events That Shaped CyberFrom Stuxnet to ChatGPT: 20 News Events That Shaped CyberbyDark Reading Editorial TeamMay 6, 202631 Min ReadCyber RiskPhysical Cargo Theft Gets a Boost From CybercriminalsPhysical Cargo Theft Gets a Boost From CybercriminalsbyRobert LemosMay 4, 20265 Min ReadWant more Dark Reading stories in your Google search results?Keep up with the latest cybersecurity threats, newly discovered vulnerabilities, data breach information, and emerging trends. Delivered daily or weekly right to your email inbox.SubscribeRSAC 2026: key news & insightsAt RSAC 2026, Dark Reading captured critical intelligence on AI, new attack methods, geopolitics, and much moreGet Your RecapWebinarsYour Guide to Securing AI Adoption in Your OrganizationTues, June 9, 2026 at 1pm ESTWhat is the Right Role for Identity Threat Detection and Response (ITDR) in Your Organization?Wed, June 3, 2026 at 1pm ESTThe New Attack Surface: How Attackers Are Exploiting OAuth to Own Your Cloud WorkspaceWed, June 24,2026 at 1pm ESTPrompt Injection Is Just the Start: Securing LLMs in AI SystemsTues, May 26, 2026, at 1pm ESTAnatomy of a Data Breach: What to Do if it Happens to YouJune 18th, 2026 | 11:00am -5:00pm ET | Doors Open at 10:30am ETMore WebinarsBlack Hat USA | Mandalay Bay, Las VegasThe premier cybersecurity event of the year returns to Mandalay Bay with a re‑engineered, six‑day program built to ignite innovation, push boundaries, and bring the global security community together like never before. Use code: DARKREADING to save $200 on a Briefings pass or $100 on a Business pass.GET YOUR PASSDiscover MoreBlack HatOmdiaWorking With UsAbout UsAdvertiseReprintsJoin UsNewsletter Sign-UpFollow UsCopyright © 2026 TechTarget, Inc. d/b/a Informa TechTarget. This website is owned and operated by Informa TechTarget, part of a global network that informs, influences and connects the world’s technology buyers and sellers. All copyright resides with them. Informa PLC’s registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. TechTarget, Inc.’s registered office is 275 Grove St. Newton, MA 02466.Home|Cookie Policy|Privacy|Terms of UseYour Privacy Choices |
A vulnerability has been identified within Hugging Face packages where the tokenizer library file used by AI models can be manipulated to hijack the model's outputs and exfiltrate data. This security concern stems from the role of the tokenizer, which functions as a translator between human language and computer language, with Hugging Face models relying on a specific file, typically named tokenizer.json, for this decoding process. This file contains extensive mappings of words, subword fragments, and control tokens, which can number in the tens of thousands of entries. HiddenLayer, a security researcher, demonstrated that if an attacker gains access to and modifies this tokenizer.json file, they can exert direct control over whatever the model outputs, potentially leading to a foothold on the user's device. The threat vector involves a sophisticated supply chain attack mechanism. An attacker can take an open-source model, edit its tokenizer file, and upload the poisoned model to a public repository. Because the tampered tokenizer.json file remains structurally identical to a legitimate one, it successfully passes through the normal model distribution pipeline without requiring any special delivery mechanisms. This highlights a critical gap where the tokenizer, which determines the system's actual operational behavior, is treated merely as configuration rather than a trusted piece of the codebase. Although a model poisoned via this method often still runs correctly, the risk lies in the fact that the deployed system is running with maliciously altered instructions. This tokenizer hijacking represents a significant supply chain risk, particularly given Hugging Face's status as a top repository for open-source AI models. While the attack primarily targets models run locally by modifying local files, it is important to note that models processed through Hugging Face's Inference API are not currently impacted by this specific vulnerability. To mitigate these risks, the researcher recommends that organizations implement model signing, which involves applying a digital signature to AI and machine learning models to verify that they have not been tampered with. Furthermore, checking checksums and signatures provides a means to validate the safety of a model, especially when models have been released and signed by established corporations like Microsoft. Consequently, organizations should prioritize scanning third-party models and favor the use of signed models in production environments where possible to protect against these types of software integrity compromises. |