I was debugging an AI agent yesterday and caught myself hesitating: "Should I use Claude Opus for this, or will Sonnet be good enough? What about Haiku? How many tokens is this going to cost?"

And I had the weirdest flashback.

I was 16 years old, sitting in my bedroom with a Nokia brick phone, carefully composing a text message to a girl I liked. But I couldn't just send it. I had to calculate: is this text worth 10 cents?

My parents had a family plan with 200 shared text messages per month. Each additional message was 10¢. I'd burned through my allocation by the 15th. Every text after that came with economic anxiety.

Should I send "hey" (3 characters, 10¢) or wait and combine it with something else? Should I call instead (unlimited nights and weekends!)? Should I use AIM when I get home?

I was optimizing communication around carrier pricing models.

Sound familiar?

The Text Message Gold Rush (2003-2010)

Let's go back. In the early 2000s, text messages were pure profit for carriers. An SMS is 160 bytes—basically nothing. It piggybacks on existing network control channels. Sending a text costs carriers functionally zero dollars.

But they charged 10 cents. Sometimes 20 cents.

Why? Because they could. The infrastructure was new. Competition was limited. Customers had no alternatives. Texting was valuable, and carriers controlled the rails.

The Economics of Artificial Scarcity

In 2008, a Canadian senator asked telecom executives why SMS prices kept rising when the actual cost was near zero. Their response? "Market pricing."

Translation: We charge what people will pay, not what it costs us to deliver.

Consumers developed bizarre behaviors around text pricing:

Rationing messages ("I'll just tell them tomorrow in person")
Abbreviating everything (genuinely necessary to fit 160 chars, but also to reduce message count)
Asking friends to call instead of text ("call me, don't text, I'm almost out")
Obsessively tracking monthly usage ("I've used 180 of my 200 messages")

We optimized our human communication around carrier profit margins.

If that sounds insane now, it's because the market eventually corrected.

How Text Messaging Became "Free"

Three things killed per-message pricing:

1. Infrastructure Maturity

Once carriers built out their networks, the marginal cost of a text message was actually zero. They'd already paid for the towers, the spectrum, the backend systems. Adding more texts cost nothing.

Charging per message became transparently predatory once customers realized the game.

2. Competition Pressure

Remember the Verizon guy? "Can you hear me now? Good."

Carriers competed on coverage and network quality. But here's the dirty secret: the actual differences were tiny.

Verizon: 99.2% coverage. AT&T: 98.9% coverage. Sprint: 98.6%.

They spent billions on marketing to differentiate on tenths of a percent.

Eventually, competition shifted to price. And unlimited texting became the weapon.

3. Alternative Technologies

When smartphones arrived, people stopped texting through carrier SMS. They used:

iMessage (free over Wi-Fi or data)
WhatsApp (free, cross-platform)
Facebook Messenger (free)
Literally any app that used data instead of SMS

Carriers lost their monopoly on messaging. SMS pricing collapsed.

By 2012, unlimited texting became standard. By 2015, nobody even thought about text message costs anymore.

The thing that seemed like the fundamental constraint of mobile communication? It just... vanished.

Welcome to the AI Token Gold Rush (2023-Present)

Now let's talk about where we are with AI.

Developers right now are doing the exact same optimization dance I did with text messages in 2005:

"Should I use GPT-4 or GPT-3.5?" (Is this prompt worth the cost difference?)
"Can I compress this context to save tokens?" (Just like abbreviating texts to save message count)
"Should I cache these prompts?" (Reduce redundant token usage)
"Do I really need Opus for this, or will Haiku work?" (Token anxiety)
"Let me batch these API calls to optimize cost." (Waiting to send multiple messages at once)

We are optimizing our AI usage around provider pricing models.

And just like text messages, it feels permanent. Developers build entire businesses around token efficiency. Companies hire "prompt engineers" to minimize costs. Startups pitch "we reduce your token usage by 40%!"

But here's my crystal ball prediction:

Token pricing will disappear within 5 years.

Not reduce. Not get cheaper. Disappear.

AI providers will move to flat subscription pricing, and "tokens" will become an implementation detail that users never think about—just like text message limits.

Why Token Pricing Will Collapse

The same three forces that killed text message pricing are already in motion:

1. Infrastructure Maturity (Happening Now)

Right now, running AI models is expensive. GPUs are costly. Inference is compute-heavy. Providers genuinely need to charge per token to cover costs.

But look at the trend:

GPT-4 API pricing dropped 90% in 18 months
GPT-3.5 became nearly free
Claude models keep getting cheaper with each release
Llama 3 and other open models run on consumer hardware

Inference costs are plummeting. Once infrastructure is built, the marginal cost of an AI response trends toward zero—just like SMS.

2. Competition Pressure (The Parameter Wars)

Remember how carriers bragged about 99.2% vs 98.9% coverage?

AI providers are doing the exact same thing with parameters:

"Our model has 70 billion parameters!"
"Ours has 175 billion!"
"We just launched a 405 billion parameter model!"

But here's the truth: for 90% of use cases, it doesn't matter.

A 7 billion parameter model can write perfectly good code. A 13B model can summarize documents. A 70B model can do creative writing that most users can't distinguish from a 175B model.

Just like cell coverage: the difference exists, but it's marginal for everyday use.

Once customers realize this, competition will shift from "biggest model" to "best price."

3. Alternative Technologies (Open Source Models)

WhatsApp killed SMS because it was free and good enough.

Open source AI models are doing the same thing:

Llama 3: Free, runs locally, shockingly capable
Mistral: Open weights, commercial use allowed
Qwen, DeepSeek, Phi: Increasingly good smaller models

If you can run a model on a $2,000 server and serve unlimited requests, why pay per token?

Just like carriers couldn't compete with iMessage, commercial AI providers will struggle to justify token pricing when open models are "free" (after infrastructure costs).

What the Transition Will Look Like

Here's my prediction for how this plays out:

2025-2026: The Token Anxiety Era (We Are Here)

Developers obsessively optimize token usage. Prompt compression tools thrive. Every AI call has an economic calculation attached.

2026-2027: The Bundle Wars Begin

Providers start offering "unlimited" tiers—with asterisks. Maybe unlimited up to 1M tokens/month, then metered. Or unlimited for smaller models, premium for big ones.

(Remember "unlimited data" that throttled after 5GB? Same playbook.)

2027-2028: True Unlimited Emerges

A challenger (maybe an open source startup, maybe a tech giant) offers genuinely unlimited AI access for a flat monthly fee. Competitors panic and match.

ChatGPT Pro? Claude Teams? They become like cell phone plans: $20-50/month, use as much as you want.

2028-2030: Tokens Become an Implementation Detail

Nobody talks about tokens anymore. Developers don't optimize prompts for cost. AI usage becomes invisible infrastructure—like bandwidth or electricity.

We'll look back and laugh: "Remember when we used to count tokens? Like we were rationing text messages in 2005?"

The Free Phone Era

Here's the part that really seals the analogy:

Once carriers moved to unlimited texting, they stopped competing on messaging entirely. Instead, they started giving away phones.

"Switch to AT&T, get a free iPhone 4!"
"Come to Verizon, we'll pay your cancellation fees plus give you $200!"
"T-Mobile: Bring your family, we'll give everyone free phones!"

The infrastructure was built. The marginal cost was near zero. Carriers made money on subscription retention, not per-message fees. So they optimized for customer acquisition and lock-in.

The AI Equivalent?

Once token pricing collapses, AI providers will shift competition to:

Ecosystem lock-in: "Use our models, integrate with our tools, build on our platform"
Premium features: Advanced models, faster response times, better support
Enterprise services: Custom training, dedicated infrastructure, compliance
Developer tools: "Free AI credits for startups!" (Just like free phones)

Tokens will be the loss leader to get you into the ecosystem.

The Parameter Wars Are Theater

Let's talk about that second analogy: the parameter count arms race.

When I read marketing copy about "our 405B parameter model outperforms competitors," I think about Verizon's coverage maps. Yes, technically accurate. But does it matter?

The Diminishing Returns Curve

Research shows that for most tasks, model performance plateaus well before you hit 100B+ parameters.

Code generation: 13B model is 95% as good as 175B

Summarization: 7B model handles 90% of use cases

Creative writing: Highly subjective, humans can't reliably distinguish

Data extraction: Even tiny models (1-3B) excel

The 405B model is technically better. But for everyday use? The difference is like Verizon's extra 0.3% coverage: real but irrelevant.

The Marketing Game

AI providers know this. But they can't say it. Because right now, parameter count is the easiest differentiator.

"We have more parameters" is simple. Compelling. Measurable.

But it's the same game carriers played: compete on a metric that sounds important but barely affects user experience.

What Actually Matters

For the vast majority of AI use cases, what matters is:

Latency: Does it respond fast enough?
Reliability: Does it work consistently?
Cost: Can I afford to use it at scale?
Context window: Can it handle my use case?
Integration: Does it plug into my tools?

Parameter count? It's 10th on the list. But it's 1st in marketing.

Once the market matures, parameter count will go the way of cell network coverage maps: technically true, mostly ignored, occasionally relevant for edge cases.

What This Means For You Right Now

If token pricing is going to collapse, what should you do today?

If You're Building AI Products:

Don't build your moat around token efficiency.

If your competitive advantage is "we use 40% fewer tokens," you're building on sand. That advantage vanishes the moment providers move to flat pricing.

Build moats around:

Unique data you can train on
Workflow integration and UX
Network effects (user-generated content, collaboration)
Proprietary models or fine-tuning
Vertical-specific domain expertise

If You're Using AI in Your Business:

Optimize for value, not tokens.

Yes, be cost-conscious. But don't let token anxiety prevent you from using AI effectively.

Ask yourself:

"Does this AI call create $1 of value for 10¢ of cost?" → Use it
"Am I under-utilizing AI because I'm worried about tokens?" → Stop that
"Could I solve this better with a bigger model?" → Try it

Remember: in 2025, you'll wish you'd used AI more, not less. Token costs will seem quaint.

If You're Investing in AI:

Bet on infrastructure plays and ecosystem platforms, not token optimization.

Token arbitrage businesses will have a 3-5 year window before pricing collapses. That's not a bad bet if you're timing it right—but it's not a durable bet.

Look for:

Infrastructure (GPUs, inference optimization, edge deployment)
Platform plays (ecosystems that survive pricing changes)
Vertical AI (domain-specific models with proprietary data)
Picks and shovels (tools that help everyone use AI, regardless of pricing)

The Uncomfortable Prediction

Here's what I think happens by 2030:

Most people will have AI access bundled into services they already pay for.

→ Microsoft 365: Unlimited Copilot access included
→ Google Workspace: Unlimited Gemini access included
→ Apple iCloud+: Unlimited Apple Intelligence included
→ Amazon Prime: Unlimited Alexa AI included
→ Developer tools: GitHub, VS Code, Cursor—all with unlimited AI baked in

"Tokens" will be like "text message limits"—a relic of the early infrastructure days that nobody remembers except old-timers telling war stories.

Just like nobody under 25 knows what "10 cents per text" means, developers in 2030 won't understand why we obsessed over token counts.

The Bottom Line

Token pricing is artificial scarcity during an infrastructure build-out phase.

It's necessary right now because running AI is genuinely expensive. But it won't stay that way. Infrastructure costs drop. Competition increases. Alternatives emerge.

We've seen this movie before. Text messages went from 10¢ each to unlimited. Cell coverage wars gave way to price wars. Carriers started giving away phones.

AI will follow the exact same arc.

So the next time you're optimizing prompts to save tokens...

Remember: you're doing the digital equivalent of abbreviating "you" as "u" to save 10 cents on a text message.

It makes sense today. It'll be absurd tomorrow.

Can you hear me now?

Good. Because soon, nobody will be counting the cost.

Ready to Stop Counting Tokens?

We help businesses build AI strategies that don't depend on today's pricing models. Let's talk about what actually creates durable value in the AI era.

Let's Talk Strategy