Costs & Pricing

Two pricing models -- fixed subscription for developers, pay-per-token for automation.

Pricing

Two Pricing Models

Fixed monthly subscriptions for interactive use, per-token billing for automation.

Standard Developer

$100/month

Claude Max 5x

  • Requirements analysis
  • Planning and research
  • Code reviews
  • Bug triage
  • Documentation

Power Developer

$200/month

Claude Max 20x

  • Everything in Standard
  • Full implementation pipelines
  • Multiple stories per day
  • Figma integration
  • Heavy code generation

Automation

Per-run (API tokens)

Pay as you go

  • Headless agents on ADO/Jira
  • PR review and answer
  • DoR / DoD checks
  • Bug fix automation
  • QA and doc generation
API Pricing

API Model Pricing

Token costs vary by model. Automation uses API billing.

ModelInputOutputCache ReadUsed For
Opus 4.6$5/MTok$25/MTok$0.50/MTokPlanning, code review
Sonnet 4.6$3/MTok$15/MTok$0.30/MTokExecution, PR review
Haiku 4.5$1/MTok$5/MTok$0.10/MTokFile lookup, doc search
Important

API vs Subscription: Know the Cost Difference

Direct API usage can cost up to 10x more than a monthly subscription for the same work.

API Usage Is Significantly More Expensive

Running AI coding tools through direct API billing (pay-per-token) is roughly 8-10x more expensive than using the same models through a monthly subscription plan (e.g., Claude Max, GitHub Copilot, Cursor Pro). This applies to Claude Code and likely other AI coding providers as well.

Example — Claude Code (Opus 4.6): A typical heavy development day might consume 5-10M input tokens and 200-500K output tokens. On the API, that’s $30-75/day. A $200/month Max 20x subscription covers the same usage for roughly $7-10/day equivalent.

Recommendation:

  • Interactive development — always use a subscription plan (Standard $100/mo or Power $200/mo)
  • Automation agents — API billing is unavoidable (headless, no subscription option), so use model tiering and cost controls aggressively
  • Don’t default to API keys for developer workflows when a subscription is available
Why

Why It Costs What It Does

Context Is Expensive

High-quality code generation requires many cycles. A typical story involves ≈35 subagent invocations, and each invocation loads the full context: MCP tool schemas, project rules, codebase analysis, and prior spec files. Input tokens account for 99%+ of total usage — the AI reads far more than it writes. This is by design: thorough understanding of the codebase produces better, more consistent code.

Controls

Cost Controls

Multiple layers of protection prevent runaway spending.

Monthly Token Budget

Hard cap on monthly token spend with automatic shutdown when the limit is reached. No surprise bills.

Per-Agent Daily Limits

Each agent type has its own daily rate limit tuned to expected workload. Prevents any single agent from consuming the full budget.

Three-State Degradation

As spending approaches the cap: Normal (full operation) then Suggest-only (recommendations, no execution) then Halted (all agents stopped).

Optimization In Progress

Active work on prompt caching, context pruning, and smarter tool loading to reduce per-invocation costs without sacrificing quality.

KAI by Dragan Filipovic