Understanding Token Costs

Learn how prompt tokens, output tokens, and request shape turn into real LLM cost.

Understanding Token Costs

LLM pricing usually looks simple at first: input tokens cost one amount, output tokens cost another. In practice, though, real cost depends on much more than a single request.

To manage spend well, you need to understand what actually drives token usage.

What is a token?

A token is a small chunk of text that a model processes. Tokens are not the same as words:

short words may be one token
long words may become several tokens
punctuation and formatting also count

That is why cost estimation by "number of words" is often misleading.

The two big buckets

Most chat models separate usage into:

prompt tokens: everything you send in
completion tokens: everything the model sends back

Some providers also expose:

cached tokens
reasoning tokens
tool call overhead

The exact labels vary, but the core idea is the same: both input and output matter.

Why prompt size grows faster than expected

Many teams underestimate prompt size because they think only about the user's latest message. In reality, the prompt may include:

system instructions
chat history
retrieved documents
tool schemas
examples
structured metadata

All of that can become expensive over time.

A simple cost formula

At a high level:

text

total_cost =
  (prompt_tokens / 1_000_000) * input_rate +
  (completion_tokens / 1_000_000) * output_rate

The exact unit might be per 1K or per 1M tokens depending on the provider, but the shape is the same.

What usually drives spend

In production apps, cost often comes from patterns like:

long prompts with too much retrieved context
agents making repeated calls in a loop
verbose model outputs
large hidden system prompts
unnecessary retries

That is why cost optimization is usually a system design problem, not only a pricing problem.

A practical example

Imagine a support assistant with:

a system prompt
the last 8 chat messages
5 retrieved documents
one final answer

Even if the user asks a short question, the actual token bill comes from the whole request package, not the question alone.

Cost is not just price per token

Two providers may have different list pricing, but your real application cost also depends on:

latency
how often you retry
how many calls your workflow creates
whether the model needs extra context to stay accurate

A cheaper model can still cost more in practice if it needs more calls or more prompt scaffolding.

Reducing token cost safely

Useful ways to reduce spend include:

trimming chat history
retrieving fewer but better chunks
shortening system prompts
limiting maximum output length
routing easy tasks to cheaper models

The key word is safely. Blind cost cutting can easily reduce answer quality.

What to measure

If you want real cost visibility, measure at least:

prompt tokens
completion tokens
total tokens
cost by model
cost by feature
cost over time

Without this, teams usually discover problems only after the monthly bill arrives.

Final takeaway

Token cost is not only a billing concept. It is an architectural signal. The shape of your prompts, loops, retrieval strategy, and model choices all show up in token usage. If you can see token flow clearly, cost optimization becomes much easier and much less reactive.

Trackly

Building agents already?

Trackly helps you monitor provider usage, token costs, and project-level spend without adding heavy overhead to your app.

Try Trackly

Next article: Groq vs Together AI vs Fireworks