Language Models Need Sleep
Recorded: May 26, 2026, 5 p.m.
| Original | Summarized |
[2605.26099] Language Models Need Sleep
Skip to main content Learn about arXiv becoming an independent nonprofit. We gratefully acknowledge support from the Simons Foundation, member institutions, and all contributors. > cs > arXiv:2605.26099 Help | Advanced Search All fields Search GO quick links Login Computer Science > Computation and Language arXiv:2605.26099 (cs) [Submitted on 25 May 2026] Abstract:Transformer-based large language models are increasingly used for long-horizon tasks; however, their attention mechanism scales poorly with context length. To handle this, we study a sleep-like consolidation mechanism in which a model periodically converts recent context into persistent fast weights before clearing its key-value cache. During sleep, the model performs $N$ offline recurrent passes over the accumulated context and updates the fast weights in its state-space model (SSM) blocks through a learned local rule. During inference, this shifts extra computation to sleep while preserving the latency of wake-time prediction. We test our method on controlled synthetic tasks, including cellular automata and multi-hop graph retrieval, as well as a realistic math reasoning task, on which a regular transformer as well as SSM-attention hybrid models fail. We then show that increasing sleep duration $N$ for our models improves performance, with the largest gains on examples that require deeper reasoning. Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI) Cite as: Focus to learn more arXiv-issued DOI via DataCite (pending registration) Submission history From: Sangyun Lee [view email] [v1]
Full-text links: View a PDF of the paper titled Language Models Need Sleep, by Sangyun Lee and 3 other authorsView PDFHTML (experimental)TeX Source view license < prev | new Change to browse by: References & Citations NASA ADSGoogle Scholar export BibTeX citation BibTeX formatted citation loading... Data provided by: Bookmark
Bibliographic Tools Bibliographic and Citation Tools Bibliographic Explorer Toggle Bibliographic Explorer (What is the Explorer?) Connected Papers Toggle Connected Papers (What is Connected Papers?) Litmaps Toggle Litmaps (What is Litmaps?) scite.ai Toggle scite Smart Citations (What are Smart Citations?) Code, Data, Media Code, Data and Media Associated with this Article alphaXiv Toggle alphaXiv (What is alphaXiv?) Links to Code Toggle CatalyzeX Code Finder for Papers (What is CatalyzeX?) DagsHub Toggle DagsHub (What is DagsHub?) GotitPub Toggle Gotit.pub (What is GotitPub?) Huggingface Toggle Hugging Face (What is Huggingface?) ScienceCast Toggle ScienceCast (What is ScienceCast?) Demos Demos Replicate Toggle Replicate (What is Replicate?) Spaces Toggle Hugging Face Spaces (What is Spaces?) Spaces Toggle TXYZ.AI (What is TXYZ.AI?) Related Papers Recommenders and Search Tools Link to Influence Flower Influence Flower (What are Influence Flowers?) Core recommender toggle CORE Recommender (What is CORE?) Author About arXivLabs arXivLabs: experimental projects with community collaborators Which authors of this paper are endorsers? | About contact arXivClick here to contact arXiv subscribe to arXiv mailingsClick here to subscribe Copyright Web Accessibility Assistance arXiv Operational Status |
Transformer-based large language models face challenges related to the scaling of their attention mechanism when handling long context lengths. To address this limitation, the authors propose a sleep-like consolidation mechanism designed to manage context processing efficiently. This mechanism operates by periodically converting recent context into persistent fast weights before clearing the key-value cache. During this conceptual sleep state, the model executes $N$ offline recurrent passes over the accumulated context, updating the fast weights within its state-space model state-space model blocks through a learned local rule. This approach is designed to shift computational overhead to the sleep phase, thereby preserving the latency associated with wake-time prediction during inference. The method was evaluated on a variety of controlled synthetic tasks, including cellular automata and multi-hop graph retrieval, as well as a complex mathematical reasoning task. The results demonstrated that the proposed technique yielded performance improvements across these benchmarks, showing that increasing the sleep duration $N$ led to the largest performance gains on examples that necessitated deeper reasoning capabilities. Furthermore, the study indicated that both regular transformer architectures and hybrid SSM-attention models failed to achieve comparable performance on these tasks, highlighting the necessity of this new consolidation mechanism. The research by Sangyun Lee, Sean McLeish, Tom Goldstein, and Giulia Fanti suggests that incorporating a structured sleep phase into large language model operation is beneficial for enhancing performance on demanding tasks by enabling more effective context consolidation and reasoning. |