MCP Is Dead
Recorded: May 30, 2026, 12:03 a.m.
| Original | Summarized |
MCP is dead | Quandri Engineering ArticlesMCP is deadChloe KimBackend Engineer @ QuandriTL;DR: MCP eats context, has low reliability, and overlaps with existing CLI/API.💡Reference: MCP is dead. Long live the CLIAfter reading the above article, we ran the experiments on our actual stack. This document covers the original argument, additional research, and our measurements.‍📌Update: Since these measurements were taken, Claude Code has rolled out Tool Search with Deferred Loading, which loads MCP tool schemas on-demand and reduces context usage by 85%+. The context bloat described in Problem 1 is largely addressed for users on current Claude Code versions. The performance, debugging, and architectural arguments below still apply.What's Wrong with MCPMCP (Model Context Protocol) connects LLMs to external tools (GitHub, Linear, Notion, Slack, etc.).Since its launch in late 2024, it's been called "the USB-C of the AI ecosystem." But developers actually using it day-to-day are starting to think differently.‍TL;DR: MCP eats context, has low reliability, and overlaps with existing CLI/API.Problem 1: It Devours the Context WindowThe context window is the LLM's desk. When you connect MCP servers, tool definitions alone take up a significant chunk of that desk.Restaurant analogy:You sit down and 10 menus (MCP tool definitions) are spread across the tableThere's no room left for actual food (your work)Every time you order, the menus have to be pulled out againWe extracted and measured the actual tool definitions from the MCP servers connected in our environment. With all 4 servers connected, 10.5% of the context window is consumed by tool definitions alone.Measurement: Tool Definition Sizes (Quandri Stack) MCP Server Linear Notion Slack Postgres Total Context Window Usage (all servers combined)‍ Model Claude (200K) GPT-4o (128K) Linear alone accounts for over 12,800 tokens. That's 42 tool definitions always loaded, even if you only ever use get_issue and save_issue.‍Biggest Tools by Size Tool linear/save_issue slack/search_public linear/list_issues notion/fetch slack/send_message Problem 2: Low Operational Reliability Issue Init failure, repeated re-auth Slower AI responses Mid-session tool death Opaque permissions ‍Performance is a known issue. The author of the original article benchmarked Jira MCP against its REST API directly and found MCP was 3x slower per call, and 9.4x slower on first call including initialization. This isn't Jira-specific, it's architectural: every MCP server adds a process layer between the LLM and the underlying API. The same overhead applies to the Linear, Notion, and Slack servers in our stack.Problem 3: Overlaps with Existing CLI/API Aspect Human-machine parity Composability Debugging Training data Install cost Token Comparison: MCP vs CLI for Linear Issue LookupHow many tokens does it cost to look up the same Linear issue?MCP consumes ~65x more tokens than the CLI approach.[ CLI approach: ~200 tokens ] -> Prompt (curl command): ~50 tokens [ MCP approach: ~12,957 tokens ] Aspect Loading time Context consumption Scalability The key is embedding CLI usage instructions inside Skills. Combined with Alternative 1's CLI-first strategy, this is most efficient. For example, a Linear skill:‍# Linear Issue Lookup Skill Scenario Local dev / personal DB Production DB / shared team But for most developer workflows, MCP is over-engineering.These days, every SaaS landing page has "MCP supported" in the feature list. Whether the MCP server is stable or how much context it eats doesn't matter - the goal is checking the "we do MCP too" box. Same pattern as "AI-powered" and "blockchain-based" marketing from years past. When users actually connect, they get dozens of tool definitions loaded, initialization failures, and mid-session crashes.How We Use Skills at QuandriAt Quandri we use all three approaches side by side, picking what fits each service:Bash + CLI for tools we already use day-to-day (gh, psql, aws). Zero context cost, full flexibility, debugs straight in the terminal.Skills for repeatable multi-step workflows like commit drafting and PR reviews. Loaded only when invoked.MCP for services without a strong CLI (Slack, Linear, Notion), and where team-wide auth or permission scoping matters (e.g., production database access).We don't force one path. If a CLI already exists and authenticates locally, that's usually the lightest option. If a service has no CLI or we need uniform auth across the team, MCP earns its keep.Conclusion‍Teaching well matters more than connecting everything.For us, replacing MCP servers with Skills that wrap existing CLIs freed up ~21K tokens of context, removed init failures from our daily workflow, and kept debugging in the terminal where it belongs.Load only the tools you need, only when you need them, with CLI instructions baked in. MCP might evolve to solve these problems, but right now, Skills win.‍‍‍Measurement methodology: Tool definition sizes were measured by extracting the JSON schema of each tool (name + description + parameters) from actually loaded MCP servers in our Claude Code environment. Token estimates use the ~4 chars/token heuristic. Full server estimates are extrapolated from sampled tool averages.‍Chloe KimChloe is a Backend Engineer at Quandri. She's interested in AI workflows and agent-native engineering, and how AI is changing the way software gets built.The future of insurance, in your inboxGet the latest on AI, automation, and industry insights. Sign up now. PlatformRenewal Intelligence platformAnalyzeQuoteConnectWhy Quandri?Security & TrustCustomer StoriesWhy Choose QuandriCompanyOur StoryNewsroomCareersEngineering BlogResourcesBlogTools & GuidesWebinarsKnowledge Hub© 2026 Quandri Technologies, Inc.Privacy PolicyTerms of UseTerms of ServiceCookies PolicyCanadaUnited StatesCanadaUnited States |
The Model Context Protocol (MCP) is presented as a mechanism connecting Large Language Models (LLMs) to external tools such as GitHub, Linear, Notion, and Slack, often promoted as the "USB-C of the AI ecosystem." However, the authors of the research argue that MCP introduces significant inefficiencies and architectural overhead that are often unnecessary in developer workflows. One major criticism leveled against MCP is that it consumes excessive context window space. The tool definitions required by MCP servers take up a large portion of the LLM's context, leading to context bloat. Measurements conducted on the Quandri stack indicated that tool definitions alone consume a substantial percentage of the context available to the model, demonstrating that the context window is unnecessarily filled with tool specifications rather than actual work. Furthermore, the operational reliability of MCP is questionable. Issues such as initialization failures requiring repeated re-authentication, slower AI responses due to external server round-trips on every tool call, mid-session tool crashes, and opaque permission handling introduce significant instability into the workflow. Performance benchmarking further supports this critique, showing that the MCP architecture introduces substantial latency compared to direct API calls, with measured slowdowns ranging from three to nine point four times slower per tool interaction. MCP also overlaps considerably with existing Command Line Interface (CLI) and Application Programming Interface (API) methods. MCP requires the LLM to re-learn information already available in standard developer knowledge bases like man pages or StackOverflow, and it lacks the composability and immediate debugging capabilities inherent in terminal environments. For instance, comparing the cost of retrieving data from a service like Linear highlights this disparity; the MCP approach consumes vastly more tokens for a simple lookup than the direct CLI approach, suggesting that exposing verbose tool definitions is an inefficient use of tokens when direct API calls are available. To address these limitations, the authors propose alternative strategies. The CLI-First Strategy advocates for prioritizing existing CLI and API workflows, allowing LLMs to leverage knowledge already learned from documentation. The Skills Pattern offers a superior approach by advocating for loading tool definitions only when they are invoked, rather than loading them all upfront. This pattern suggests that tools should be treated as skills that are requested when needed, rather than a constant context burden. When combined with the CLI-First strategy, the Skills Pattern allows for embedding CLI usage instructions directly into the skill definitions, enabling the LLM to execute commands efficiently while minimizing context load. Regarding database interactions, the authors distinguish between local development and production environments. For local development, the combination of Skills and CLI is deemed lightweight and fast. However, for production environments where security and access control are paramount, MCP's ability to enforce query safety and credential protection at the server level provides necessary guardrails that CLI or Skills alone cannot offer. Ultimately, the authors conclude that the emphasis should be placed on teaching efficient workflows rather than connecting every potential service via a uniform protocol. For their environment at Quandri, they utilize a hybrid approach, employing Bash and CLI for standard tools, Skills for repeatable multi-step tasks, and MCP only when necessary, such as for services lacking a CLI or where team-wide authorization scoping is critical. The research suggests that replacing MCP servers with Skills that wrap existing CLIs significantly reduces context consumption, eliminates initialization failures, and centralizes debugging in the terminal, underscoring the principle that loading only what is needed, when needed, is the most efficient path for AI agent workflows. |