knowsuchagency/mcp2cli: Turn any MCP server or OpenAPI specification into a CLI, at runtime, without codegen · GitHub

https://plumprush.com/dCmnF.z_dFGFNnv-Z/GjUe/ee-m/9qutZjU/lykAPDT/Yn3PNiTlUk0tNEzegptKNNjdcD1fNITaQ/3/OnQu

Turn any MCP server or OpenAPI specification into a CLI, at runtime, without codegen.
Save 96% to 99% of wasted tokens on tool schemes every turn.

pip install mcp2cli

# Or run directly without installing
uvx mcp2cli --help

mcp2cli ships with an installable ability which teaches AI coding agents (Claude Code, Cursor, Codex) how to use it. Once installed, your agent can discover and call any MCP server or OpenAPI endpoint, and even generate new skills from the APIs.

npx skills add knowsuchagency/mcp2cli --skill mcp2cli

After installation, try messages like:

mcp2cli --mcp https://mcp.example.com/sse – interact with an MCP server
mcp2cli create a skill for https://api.example.com/openapi.json — generate a skill from an API

# Connect to an MCP server over HTTP
mcp2cli --mcp https://mcp.example.com/sse --list

# Call a tool
mcp2cli --mcp https://mcp.example.com/sse search --query "test"

# With auth header
mcp2cli --mcp https://mcp.example.com/sse --auth-header "x-api-key:sk-..." \
  query --sql "SELECT 1"

# List tools from an MCP server
mcp2cli --mcp-stdio "npx @modelcontextprotocol/server-filesystem /tmp" --list

# Call a tool
mcp2cli --mcp-stdio "npx @modelcontextprotocol/server-filesystem /tmp" \
  read-file --path /tmp/hello.txt

# Pass environment variables to the server process
mcp2cli --mcp-stdio "node server.js" --env API_KEY=sk-... --env DEBUG=1 \
  search --query "test"

# List all commands from a remote spec
mcp2cli --spec https://petstore3.swagger.io/api/v3/openapi.json --list

# Call an endpoint
mcp2cli --spec ./openapi.json --base-url https://api.example.com list-pets --status available

# With auth
mcp2cli --spec ./spec.json --auth-header "Authorization:Bearer tok_..." create-item --name "Test"

# POST with JSON body from stdin
echo '{"name": "Fido", "tag": "dog"}' | mcp2cli --spec ./spec.json create-pet --stdin

# Local YAML spec
mcp2cli --spec ./api.yaml --base-url http://localhost:8000 --list

# Pretty-print JSON (also auto-enabled for TTY)
mcp2cli --spec ./spec.json --pretty list-pets

# Raw response body (no JSON parsing)
mcp2cli --spec ./spec.json --raw get-data

# Pipe-friendly (compact JSON when not a TTY)
mcp2cli --spec ./spec.json list-pets | jq '.[] | .name'

# TOON output — token-efficient encoding for LLM consumption
# Best for large uniform arrays (40-60% fewer tokens than JSON)
mcp2cli --mcp https://mcp.example.com/sse --toon list-tags

MCP tool specifications and lists are cached ~/.cache/mcp2cli/ with a TTL of 1 hour by default.

# Force refresh
mcp2cli --spec https://api.example.com/spec.json --refresh --list

# Custom TTL (seconds)
mcp2cli --spec https://api.example.com/spec.json --cache-ttl 86400 --list

# Custom cache key
mcp2cli --spec https://api.example.com/spec.json --cache-key my-api --list

# Override cache directory
MCP2CLI_CACHE_DIR=/tmp/my-cache mcp2cli --spec ./spec.json --list

Local file specifications are never cached.

mcp2cli [global options]  [command options]

Source (mutually exclusive, one required):
  --spec URL|FILE       OpenAPI spec (JSON or YAML, local or remote)
  --mcp URL             MCP server URL (HTTP/SSE)
  --mcp-stdio CMD       MCP server command (stdio transport)

Options:
  --auth-header K:V     HTTP header (repeatable)
  --base-url URL        Override base URL from spec
  --env KEY=VALUE       Env var for MCP stdio server (repeatable)
  --cache-key KEY       Custom cache key
  --cache-ttl SECONDS   Cache TTL (default: 3600)
  --refresh             Bypass cache
  --list                List available subcommands
  --pretty              Pretty-print JSON output
  --raw                 Print raw response body
  --toon                Encode output as TOON (token-efficient for LLMs)
  --version             Show version

Subcommands and their flags are dynamically generated from the MCP server tools definitions or specifications. Run --help for more details.

The Problem: Tool Proliferation Is Eating Your Tokens

If you’ve connected an LLM to more than a handful of tools, you’ve felt the pain. Every MCP server, every OpenAPI endpoint – their entire schemas are injected into the system prompt on every turn. Their 50 endpoint API costs 3579 context tokens even before the conversation startsand that bill is paid again on every message, whether the model touches those tools or not.

This is not a theoretical concern. Kagan Yilmaz documented it well in their analysis of CLI versus MCP costs, they show that 6 MCP servers with 84 tools consume ~15,540 tokens at session start. your project CLIHub showed that converting MCP servers to CLI and allowing the LLM to discover tools on demand reduces the cost by 92% to 98%.

The problem is so well known that Anthropic built Tool Search directly in your API – a lazy loading pattern where tools are checked defer_loading: true and Claude discovers them through a search index (~500 tokens) instead of loading all the schematics in advance. Typically reduces token usage by 85%. But how Kagan notedwhen Tool Search retrieves a tool, it still puts the entire JSON schema into context.

mcp2cli takes the CLI approach further.

CLIHub showed the way: give the LLM a CLI instead of raw tooling schematics and let it --list and --help your way to what you need. Anthropic’s tool search showed that even first-party vendors see the value in lazy loading. mcp2cli is based on both ideas with a few key differences:

No codegen, no recompilation. Point mcp2cli to a specific URL or MCP server and the CLI will exist immediately. When the server adds new endpoints, they appear in the following invocation: no rebuild step, no generated code to commit.
Provider independent. Tool Search is a feature of the Anthropic API. mcp2cli works with any LLM (Claude, GPT, Gemini, local models) because it is just a CLI tool that the model can access.
Compact discovery. Tool fetch defers loading, but still injects full JSON schemas when a tool is retrieved (~121 tokens/tool). mcp2cli --help returns human readable text which is typically cheaper than raw schema, and --list digests cost ~16 tokens/tool vs ~121 for native schemes.
OpenAPI support. MCP is not the only schema-rich protocol. mcp2cli handles OpenAPI specifications (JSON or YAML, local or remote) with the same CLI interface, the same caching, and the same on-demand discovery. A tool for both worlds.
Specification caching with TTL control. Fetched specifications and MCP tool lists are cached locally with configurable TTL, so repeated invocations do not reach the network. --refresh skips the cache when you need it.

The numbers: how much context do you really keep?

We measure this. These are not estimates: actual token counts using the cl100k_base tokenizer against actual schemes, verified by a suite of automated tests.

What mcp2cli really costs

Let’s be honest about what mcp2cli adds to the context. It’s not zero, it’s just a lot less than injecting entire schemas.

Component	Cost	When
System notice	67 tokens	Every turn (fixed)
`--list` production	~16 chips/tool	Once per conversation
`--help` production	~80-200 chips/tool	Once per single tool used
Tool call output	same as native	By call

He --list The cost increases linearly with the number of tools: 30 tools cost ~464 tokens, 120 tools cost ~1850 tokens. This is still 7-8 times cheaper than full schemes and you only pay once.

Compare that to native MCP injection: ~121 tokens per tool, each turnwhether the model uses those tools or not. For OpenAPI endpoints, it’s ~72 tokens per endpoint per turn.

Here is the total cost of the token in a realistic multi-turn conversation. The mcp2cli column includes all overheads: the system indicates in each shift, one --list discovery, --help for each unique tool that the LLM actually uses and the results of the tool calls.

MCP Servers:

Script	turns	Unique tools used	total native	mcp2cli total	Saved
Task Manager (30 Tools)	15	5	54,525	2,309	96%
Multiserver (80 tools)	20	8	193,360	3,897	98%
Complete platform (120 tools)	25	10	362,350	5,181	99%

OpenAPI specifications:

Script	turns	Unique endpoints used	total native	mcp2cli total	Saved
Pet store (5 final points)	10	3	3,730	1,199	68%
Medium API (20 endpoints)	15	5	21,720	1905	91%
Large API (50 endpoints)	20	8	71,940	2,810	96%
Enterprise API (200 endpoints)	25	10	358,425	3,925	99%

An MCP platform of 120 tools in 25 shifts: 357,169 tokens saved.

Step by step: watching the gap widen

Here’s a 30-tool MCP server in 10 turns. The mcp2cli column includes the actual costs: --list discovery in turn 1, --help + tool exit when each new tool is used for the first time.

Turn   Native       mcp2cli      Savings
──────────────────────────────────────────────────────────
1      3,619        531          3,088       ← --list (464 tokens)
2      7,238        598          6,640
3      10,887       815          10,072      ← --help (120) + tool call
4      14,506       882          13,624
5      18,155       1,099        17,056      ← --help (120) + tool call
6      21,774       1,166        20,608
7      25,423       1,383        24,040      ← --help (120) + tool call
8      29,042       1,450        27,592
9      32,691       1,667        31,024      ← --help (120) + tool call
10     36,310       1,734        34,576

Total: 34,576 tokens saved (95.2%)

Native MCP approach — pay the full scheme tax each shift:

System prompt: "You have these 30 tools: [3,619 tokens of JSON schemas]"
  → 3,619 tokens consumed per turn, whether used or not
  → 10 turns = 36,310 tokens

mcp2cli approach — pay only for what you use:

System prompt: "Use mcp2cli --mcp   [--flags]"   (67 tokens/turn)
  → mcp2cli --mcp  --list                                (464 tokens, once)
  → mcp2cli --mcp  create-task --help                    (120 tokens, once per tool)
  → mcp2cli --mcp  create-task --title "Fix bug"         (0 extra tokens)
  → 10 turns, 4 unique tools = 1,734 tokens

The LLM discovers what you need, when you need it. Everything else is taken out of context.

This is where it really hurts. Connect 3 MCP servers (a task manager, a file system server, and a database server – 60 tools in total) and you’ll pay 7,238 tokens per turn. In a 20-turn conversation, that’s 145,060 chips only for tool schemes. mcp2cli reduces that to 3,288 chips – to 97.7% reduction – even after accounting --list discovery (928 tokens) and --help for 6 unique tools (720 tokens).

Aren’t there already solutions for this?

Yes, partially. The MCP specification defines dynamic tool discovery through notifications/tools/list_changedbut it’s about reacting to server-side changes: the initial tools/list The response still returns all schemas at once and most clients inject them each turn.

anthropic Tool Search goes further: marked tools defer_loading: true stay out of context until Claude looks for them, which will reduce ~85% of the initial token cost. But it’s just Claude-API, and when a tool is retrieved, the entire JSON schema still enters the context (~121 tokens/tool).

mcp2cli takes the CLI approach: --list returns compact summaries (~16 tokens/tool), --help returns human-readable text (typically cheaper than raw JSON schema) and works with any LLM provider. The trade-off is one additional shell invocation per discovery step.

Burden – Get the OpenAPI specification or connect to the MCP server. Solve $refs. Cache for reuse.
Extract — Loop through spec paths/tools and produce a consistent list of command definitions with written parameters.
Build — Generate an argparse parser with subcommands, flags, types, options, and help text.
Execute — Sends the parsed arguments as an HTTP request (OpenAPI) or a tool call (MCP).

Both adapters produce the same internal CommandDef structure, so the CLI generator and result handling are shared.

# Install with test + MCP deps
uv sync --extra test

# Run tests (96 tests covering OpenAPI, MCP stdio, MCP HTTP, caching, and token savings)
uv run pytest tests/ -v

# Run just the token savings tests
uv run pytest tests/test_token_savings.py -v -s

This project was inspired by Kagan Yilmaz Analysis of CLI vs MCP Token Costs and his work in CLIHub. His observation that access to CLI-based tools is much more token-efficient than native MCP injection was the spark for mcp2cli. While CLIHub generates static CLIs from MCP servers, mcp2cli takes a different approach: it reads schemas at runtime, so there is no code generation or rebuild step when the server adds or changes tools. It also extends the pattern to OpenAPI specifications: any REST API with a specifications file is treated the same.

anthropic Advanced use of tools The guide describes Tool Search, a proprietary lazy loading mechanism built into the Claude API. It solves the same core problem: not paying for tools you’re not using, but at the API level instead of the CLI level. mcp2cli complements this by working with any LLM provider, generating more compact discovery results and covering OpenAPI specifications alongside MCP servers.

M.I.T.