Instant, accurate USD cost estimates for OpenAI & Azure OpenAI
Turn any API response (Chat Completions or Responses API, streaming or non-streaming) into precise, per-request costs — 8-decimal strings or Decimal
via a typed dataclass.
Overview
openai_cost_calculator is a tiny, production-hardened helper that reads the usage counters returned by OpenAI/Azure OpenAI and outputs the exact USD cost for that call. It supports cached tokens, undated models, streaming generators, and both classic and new SDKs.
- Typed API for exact financial arithmetic (
Decimal
) - Legacy API for drop-in string output (8 decimal places)
- Pricing loaded from a remote CSV with a 24h cache + local overrides
- Offline mode for pinned environments (no network calls)
Installation
pip install openai-cost-calculator
Package name on PyPI uses dashes; import name uses underscores.
Quick start
One-line (legacy string API)
from openai import OpenAI
from openai_cost_calculator import estimate_cost
client = OpenAI()
resp = client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role":"user","content":"Hi there!"}],
)
print(estimate_cost(resp))
# {'prompt_cost_uncached': '0.00000150',
# 'prompt_cost_cached' : '0.00000000',
# 'completion_cost' : '0.00000600',
# 'total_cost' : '0.00000750'}
Typed API (recommended)
from openai import OpenAI
from openai_cost_calculator import estimate_cost_typed
client = OpenAI()
resp = client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role":"user","content":"Hi there!"}],
)
cost = estimate_cost_typed(resp)
print(cost.total_cost) # Decimal('0.00000750')
print(cost.as_dict(stringify=True)) # strings, 8 dp
Responses API
resp = client.responses.create(
model="gpt-4.1-mini",
input=[{"role":"user","content":"Hi there!"}],
)
# Both work
from openai_cost_calculator import estimate_cost, estimate_cost_typed
print(estimate_cost(resp)) # dict[str, str]
print(estimate_cost_typed(resp)) # CostBreakdown
CostBreakdown dataclass
Typed results are returned as a frozen dataclass with Decimal
fields:
CostBreakdown(
prompt_cost_uncached: Decimal,
prompt_cost_cached: Decimal,
completion_cost: Decimal,
total_cost: Decimal
)
Use .as_dict(stringify=True|False)
to convert to 8-dp strings (legacy) or raw Decimal
s.
Legacy API
estimate_cost(response) → dict[str,str]
keeps your existing code working. Returns:
{
"prompt_cost_uncached": "…",
"prompt_cost_cached" : "…",
"completion_cost" : "…",
"total_cost" : "…"
}
Pricing utilities
The library resolves model rates from a remote CSV and merges local overrides.
Function | Description |
---|---|
refresh_pricing() | Force-reload the remote CSV (24-hour cache is bypassed). |
set_offline_mode(True) | Disable all network fetches; only local overrides are used. |
add_pricing_entry(name,date, …) | Add/override a single (model, YYYY-MM-DD) row. |
add_pricing_entries([...]) | Bulk add/override multiple rows. |
clear_local_pricing() | Drop all in-process overrides (remote cache unaffected). |
date ≤ today
.
Pricing sources & cache
- Remote CSV (GitHub) is fetched at most once every 24 hours per process.
- Local overrides always take precedence over remote rows on key collision.
- Offline mode disables remote fetches — ideal for air-gapped or pinned deployments.
from openai_cost_calculator import set_offline_mode, refresh_pricing
set_offline_mode(False) # allow using the remote CSV
refresh_pricing() # refresh the 24h cache now
Streaming
Pass the generator directly. The helper walks the stream and uses the last chunk that carries .usage
.
stream = client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role":"user","content":"Hi!"}],
stream=True,
stream_options={"include_usage": True},
)
from openai_cost_calculator import estimate_cost_typed
cost = estimate_cost_typed(stream)
print(cost.total_cost)
Error handling
All recoverable errors raise CostEstimateError
with a clear message (e.g., missing pricing row, bad input).
from openai_cost_calculator import estimate_cost_typed, CostEstimateError
try:
cost = estimate_cost_typed(resp)
except CostEstimateError as e:
print("Could not estimate cost:", e)
gpt-4o-mini-2024-07-18
) — deployments names are ignored.
Cookbook — add pricing for any model
You can cost any provider’s response as long as you teach the calculator the price point for that
(model_name, model_date)
. In offline mode the library won’t reach the network;
only your overrides are used.
from litellm import completion
from openai_cost_calculator import estimate_cost_typed, set_offline_mode, add_pricing_entry
set_offline_mode(True)
# Teach the library a new price point:
add_pricing_entry(
"ollama/qwen3:30b", "2025-08-01",
input_price=0.20, # USD per 1M input tokens
output_price=0.60, # USD per 1M output tokens
cached_input_price=0.04, # optional
)
response = completion(
model="ollama/qwen3:30b",
messages=[{ "content": "respond in 20 words. who are you?","role": "user"}],
api_base="http://localhost:11434"
)
print(estimate_cost_typed(response))
YYYY-MM-DD
) you control, and supply the per-1M token prices.
Add more entries over time to reflect price changes by date.
Troubleshooting
“Pricing not found” after a new model launch
- Check that the model/date row exists in the project’s pricing CSV.
- If it exists, call
refresh_pricing()
(24h cache). - Otherwise, temporarily add a local row with
add_pricing_entry()
. Then create an issue from here.
“cached_tokens = 0” even with caching
Request usage details: classic API needs include_usage_details=True
; streaming needs stream_options={"include_usage": True}
.
What if the model string has no date?
The library uses today’s date and selects the latest CSV row with date ≤ today
.
License & contributions
MIT License © 2025 Orkun Kınay & Murat Barkın Kınay. PRs that enhance robustness (SDK changes, pricing formats) are welcome.