ARC-AGI-3

Recorded: March 26, 2026, 4:02 a.m.

Original

Summarized

ARC-AGI-3View brand kitCopy logo imageCopy logo SVGExplain with ChatGPTFoundationDonateAboutHistoryJobsLeaderboardsVerifiedCommunityARC-AGI-3 CompetitionARC-AGI-2 CompetitionBenchmarkARC-AGI SeriesARC-AGI-1ARC-AGI-2ARC-AGI-3All TasksPrizeARC Prize 2026ARC Prize 2025ARC Prize 2024All CompetitionsResearchStart HerePartnersPlatformContentBlogEventsCommunityResourcesFoundationLeaderboardsBenchmarkPrizeResearchContentDonateAboutHistoryJobsVerifiedCommunityARC-AGI-3 CompetitionARC-AGI-2 CompetitionARC-AGI SeriesARC-AGI-1ARC-AGI-2ARC-AGI-3All TasksARC Prize 2026ARC Prize 2025ARC Prize 2024All CompetitionsStart HerePartnersPlatformBlogEventsCommunityResourcesSeriesSeries1ARC-AGI-12ARC-AGI-23ARC-AGI-3ResearchARC-AGI-3The first interactive reasoning benchmark designed to measure human-like intelligence in AI agents.Play [Humans]Build [AI]LinksPublic Game SetDocs + SDKARC Prize 2026 TrackTechnical PaperWhat is ARC-AGI-3?ARC-AGI-3 is an interactive reasoning benchmark which challenges AI agents to explore novel environments, acquire goals on the fly, build adaptable world models, and learn continuously.A 100% score means AI agents can beat every game as efficiently as humans.Instead of solving static puzzles, agents must learn from experience inside each environment—perceiving what matters, selecting actions, and adapting their strategy without relying on natural-language instructions.How it measures intelligence100% human-solvable environmentsSkill-acquisition efficiency over timeLong-horizon planning with sparse feedbackExperience-driven adaptation across multiple stepsAs long as there is a gap between AI and human learning, we do not have AGI.ARC-AGI-3 makes that gap measurable by testing intelligence across time, not just final answers—capturing planning horizons, memory compression, and the ability to update beliefs as new evidence appears.Design principlesEasy for humans to pick up quicklyNo pre-loaded knowledge or hidden promptsClear goals + meaningful feedbackNovelty that prevents brute-force memorizationFeaturesARC-AGI-3 includes replayable runs, a developer toolkit for agent integration, and a UI designed for transparent evaluation.Replays + EvaluationInspect agent behavior through preview replays—track decisions, actions, and reasoning in a structured timeline.Browse a sample replayTools + UIIntegrate your agent using the ARC-AGI-3 toolkit, then use the interactive UI to test and iterate.Play and testDocsEverything you need to build agents: environments, API usage, and integration guidance.Read the docsPUTPUT YOUR YOUR AGENT AGENT TO THE TO THE TEST! TEST!© 2026 ARC Prize, Inc.PrivacyTermsTesting Policy Newsletter Discord Twitter YouTubeGitHub© 2026 ARC Prize, Inc.PrivacyTermsTesting PolicyARC Prize 2026Get started and receive official contest updates and news.Sign UpNo spam. You can unsubscribe at anytime.ARC Prize : NewsletterSubscribe to get started and receive official contest updates and news.SubscribeNo spam. You can unsubscribe at anytime.

ARC Prize’s ARC-AGI-3 represents a significant step towards evaluating and understanding Artificial General Intelligence (AGI) through a novel interactive reasoning benchmark. The core of the initiative, spearheaded by ARC Prize, Inc., lies in the creation of a dynamic testing environment designed to mimic the cognitive processes believed to be inherent in human-level intelligence. Specifically, ARC-AGI-3 presents AI agents with a series of complex, interactive environments demanding a nuanced approach far beyond traditional problem-solving paradigms. Rather than being provided with explicit instructions or pre-programmed solutions, agents are tasked with exploring these environments, establishing their own goals, and developing adaptable world models through continuous learning and experience.

The benchmark’s central premise, articulated through the ambitious target of a 100% human-solvable score, underscores the ambition to quantify an agent’s intelligence by measuring its ability to perform tasks as efficiently and effectively as a human would. This moves away from evaluating solely on the final, correct answer to encompass the entire process of planning, adapting, and learning throughout the interaction. The system's design explicitly focuses on several key aspects of intelligence, namely skill-acquisition efficiency over time, long-horizon planning with sparse feedback, and experience-driven adaptation across numerous iterative steps. These components are critical to gauging whether an AI system truly embodies the characteristics typically associated with AGI.

The design principles underpinning ARC-AGI-3 are deliberately structured to facilitate both human understanding and effective agent development. The benchmark prioritizes ease of use for human players, avoids the incorporation of pre-loaded knowledge or hidden prompts, and establishes clear, attainable goals accompanied by meaningful feedback. Furthermore, the environments are engineered to incorporate elements of novelty, discouraging brute-force memorization and encouraging genuine learning and adaptation.

Key features of the ARC-AGI-3 system include replayable runs, providing developers with the ability to meticulously analyze agent behavior through structured timelines of decisions, actions, and reasoning processes. A comprehensive developer toolkit grants access to the ARC-AGI-3 environment, facilitating seamless integration of agents for testing and refinement. The interactive User Interface (UI) offers a transparent platform for evaluating agent performance, allowing for iterative development and optimization. This multi-faceted approach ensures a robust and quantifiable assessment of an agent’s reasoning capabilities.

The benchmark's value extends beyond simple game testing; it establishes a measurable “gap” between AI and human learning. By evaluating performance across these crucial dimensions—planning horizons, memory compression, and belief updating—ARC-AGI-3 aims to identify the specific shortcomings in current AI systems that distinguish them from human-like intelligence. The system's focus on time-based evaluation, rather than just final answers, recognizes that true AGI requires not only the ability to solve a problem but also the capacity to continuously learn and adjust in response to novel information.

The available documentation and resources – encompassing environments, API usage, and integration guidance – are intended to support a broad range of developers and researchers. The toolkit and UI provide a structured environment for experimentation, while the replay functionality offers critical insights into agent decision-making processes. The ARC Prize’s website consolidates these resources, facilitating accessibility and promoting community engagement within the initiative. The ongoing development and maintenance of ARC-AGI-3, coupled with the provided tools and documentation, represents a valuable undertaking in the pursuit of a demonstrable understanding of AGI.