Your AI Budget Line Is Wrong. Tokens Are the New Headcount.

TL;DR

Some engineers are now spending six figures per month on AI tokens, exceeding their own salary in AI compute costs
The cost structure of software development is inverting: human time is becoming the scarce input, AI compute is becoming the volume input
Finance teams that budget AI tokens as a software line item are miscategorising a personnel cost

A senior engineer at a well-funded AI lab recently mentioned spending $200,000 per month on AI tokens. That's $2.4 million per year in compute costs for a single engineer. Their salary, generous by any standard, is a fraction of their token spend.

This isn't an anomaly. It's the leading edge of a structural shift in how software gets built.

When I modelled the unit economics for OpenChair and OpenTradie, the cost structure surprised me. My AI inference costs, running six LLMs in orchestration across 50+ features, were the single largest line item after hosting. Not developer salaries (there's only me). Not design tools. Not marketing. AI tokens.

For a solo builder, this makes intuitive sense. I replaced team costs with AI costs. The surprise was the ratio. My token spend was roughly equivalent to what one full-time junior engineer would cost, and it produced the output of what I'd estimate as a three to four person team.

That ratio is going to reshape how organisations think about budgets.

How is AI inverting the traditional software budget?

Traditional software budgets allocate 70-80% to personnel and 20-30% to infrastructure (hosting, tools, licences). AI changes this mix in two ways.

First, AI reduces the personnel needed for a given output level. A team that shipped ten features per quarter with six engineers might ship the same ten features with three engineers plus AI tools. The personnel line shrinks.

Second, AI introduces a new cost category that scales with ambition, not headcount. The token spend for one engineer running five concurrent AI agents on complex coding tasks can exceed the engineer's salary. Multiply that across a team, and AI compute becomes a primary budget category, not a line item under "software tools."

The companies seeing this first are the ones where AI adoption is highest. At Anthropic's own engineering team, productivity per engineer has increased 200% according to their internal metrics. That productivity requires tokens. Lots of them. The per-engineer token cost is a meaningful part of the cost of that engineer's output.

Finance teams that categorise AI tokens alongside SaaS subscriptions and cloud hosting are missing the structural point. Tokens aren't infrastructure. They're the new form of labour. They should be budgeted, measured, and optimised the way headcount costs are.

How to think about token ROI

The right question isn't "how much are we spending on tokens?" It's "what output are we getting per token dollar compared to per salary dollar?"

Here's a simplified model from my own experience:

Manual approach (pre-AI): One junior engineer, $100K/year fully loaded, writes approximately 3-4 features per quarter, handles bug fixes, participates in standups and reviews. Effective cost per shipped feature: roughly $8,000-$10,000.

AI-augmented approach: One senior engineer, $180K/year, spends $3,000-$5,000/month on AI tokens, ships 15-20 features per quarter, handles bugs through AI-generated fixes, reviews AI-produced code. Effective cost per shipped feature: roughly $3,000-$4,000.

The per-feature cost dropped by half even though the individual engineer costs more. The token spend that enables the improvement is the mechanism, not the waste.

Scale this to a team level and the numbers are more dramatic. A 20-person engineering team with modest AI adoption (token spend of $500-$1,000 per engineer per month) sees efficiency gains that make the token cost trivially worth it. A 10-person team with aggressive AI adoption (token spend of $5,000-$10,000 per engineer per month) may match the output of the 20-person team at lower total cost.

Balance scale tipping: stack of token coins outweighing a junior developer salary

The pricing implication for AI products

If you're building AI products, the token economics affect your pricing model too. I explored the broader AI pricing landscape in detail, but the token-as-headcount framing adds a specific dimension.

When your product consumes AI tokens on behalf of your users, your cost of goods sold scales with usage, not with headcount. This is fundamentally different from traditional SaaS, where the marginal cost of serving one more user is near zero.

The cannibalisation paradox plays out here too. If your AI product makes your customer's team 3x more productive, the customer needs fewer seats. You earn less per-seat revenue from a more successful customer. Token-based pricing resolves this by charging for the AI work done rather than the humans doing it.

I built metered billing into OpenChair and OpenTradie from day one because the cost structure demanded it. Each AI feature consumes a different number of tokens. A simple appointment booking costs less than a comprehensive business analytics summary. Flat pricing would either overprice the simple features (losing customers) or underprice the complex ones (losing margin).

What this means for your budget

If you're a CTO, VP of Engineering, or product leader making budget decisions, here's the reframe:

Stop categorising AI tokens as a software expense. Move them to the productivity investment category alongside salaries, training, and equipment. This changes how they're evaluated (ROI per dollar, not cost-to-be-minimised) and how they're governed (per-person allocation, not company-wide cap).

Set per-engineer token budgets, not team-wide budgets. A team-wide budget creates a commons problem where no individual feels empowered to spend. A per-engineer budget (generous during exploration, optimised during production) gives each person the autonomy to experiment.

Measure output per total cost, not output per person. The metric that matters is total output divided by total investment (salaries plus tokens plus infrastructure). If adding $50K/year in token spend per engineer doubles their output, the ROI is obvious. But you'll only see it if you measure the right ratio.

Plan for token costs to decrease per unit of work. Model inference costs are dropping 10-30% per quarter as models get more efficient, caching improves, and competition increases. Today's $5,000/month engineer token budget might buy twice as much output six months from now. Budget for the trend, not the snapshot.

Prepare for the "token salary" conversation. Within two years, top engineers will evaluate job offers based on their AI token allocation alongside salary, equity, and benefits. "Unlimited tokens on the best models" will be a competitive hiring advantage as real as "unlimited PTO" was five years ago. The companies that figure this out early will attract the E-shaped builders who generate the most leverage.

The budget line is wrong because the cost structure has changed. Tokens aren't overhead. They're output. Price them accordingly.

Frequently Asked Questions

Won't token costs decrease so fast that this won't matter?

Token costs per unit of work are decreasing, but total token spend per engineer is increasing because engineers are using AI for more tasks and running more concurrent agents. The cost per query drops, but the query volume rises faster. Total spend grows even as unit economics improve, similar to how cloud computing got cheaper per instance while total cloud spend exploded.

How do you prevent engineers from wasting tokens?

The same way you prevent engineers from wasting any other resource: clear expectations and outcome measurement. If an engineer's token spend is high and their output is high, the spend is investment. If token spend is high and output is flat, that's a coaching conversation about effective AI use, not a cost control conversation.

Should startups worry about token costs?

At startup scale, token costs are usually small relative to salaries. The risk isn't overspending on tokens. It's underspending, being too conservative with model selection or usage limits, and missing the productivity gains that justify the spend. Optimise later. Ship first.