Costs & Pricing

Two pricing models -- fixed subscription for developers, pay-per-token for automation.

Pricing

Two Pricing Models

Fixed monthly subscriptions for interactive use, per-token billing for automation.

Standard Developer

$100/month

Claude Max 5x

Requirements analysis
Planning and research
Code reviews
Bug triage
Documentation

Power Developer

$200/month

Claude Max 20x

Everything in Standard
Full implementation pipelines
Multiple stories per day
Figma integration
Heavy code generation

Automation

Per-run (API tokens)

Pay as you go

Headless agents on ADO/Jira
PR review and answer
DoR / DoD checks
Bug fix automation
QA and doc generation

API Pricing

API Model Pricing

Token costs vary by model. Automation uses API billing.

Model	Input	Output	Cache Read	Used For
Opus 4.6	$5/MTok	$25/MTok	$0.50/MTok	Planning, code review
Sonnet 4.6	$3/MTok	$15/MTok	$0.30/MTok	Execution, PR review
Haiku 4.5	$1/MTok	$5/MTok	$0.10/MTok	File lookup, doc search

Important

API vs Subscription: Know the Cost Difference

Direct API usage can cost up to 10x more than a monthly subscription for the same work.

API Usage Is Significantly More Expensive

Running AI coding tools through direct API billing (pay-per-token) is roughly 8-10x more expensive than using the same models through a monthly subscription plan (e.g., Claude Max, GitHub Copilot, Cursor Pro). This applies to Claude Code and likely other AI coding providers as well.

Example — Claude Code (Opus 4.6): A typical heavy development day might consume 5-10M input tokens and 200-500K output tokens. On the API, that’s $30-75/day. A $200/month Max 20x subscription covers the same usage for roughly $7-10/day equivalent.

Recommendation:

Interactive development — always use a subscription plan (Standard $100/mo or Power $200/mo)
Automation agents — API billing is unavoidable (headless, no subscription option), so use model tiering and cost controls aggressively
Don’t default to API keys for developer workflows when a subscription is available

Why

Why It Costs What It Does

Context Is Expensive

High-quality code generation requires many cycles. A typical story involves ≈35 subagent invocations, and each invocation loads the full context: MCP tool schemas, project rules, codebase analysis, and prior spec files. Input tokens account for 99%+ of total usage — the AI reads far more than it writes. This is by design: thorough understanding of the codebase produces better, more consistent code.

Controls

Cost Controls

Multiple layers of protection prevent runaway spending.

Monthly Token Budget

Hard cap on monthly token spend with automatic shutdown when the limit is reached. No surprise bills.

Per-Agent Daily Limits

Each agent type has its own daily rate limit tuned to expected workload. Prevents any single agent from consuming the full budget.

Three-State Degradation

As spending approaches the cap: Normal (full operation) then Suggest-only (recommendations, no execution) then Halted (all agents stopped).

Optimization In Progress

Active work on prompt caching, context pruning, and smarter tool loading to reduce per-invocation costs without sacrificing quality.