Why this library?
- Per-query accurate: calculates the exact cost for each user query individually, based on token counts returned by OpenAI or Azure — no model guessing, no aggregate billing approximations.
- Dual-API support: works with
chat.completions.create()
and the newresponses.create()
. - Zero boilerplate: one import & one call:
estimate_cost(resp)
. - Strongly-typed API: new typed functions return CostBreakdown dataclass with Decimal precision for accurate financial calculations.
- Backward compatibility: legacy string-based API remains unchanged.
- Pricing auto-refresh: daily CSV pull with a helper
refresh_pricing()
. - Edge-case aware: cached tokens, undated models, streaming generators, Azure deployments … handled!
- Predictable output: legacy API returns strings formatted to 8 decimal places; typed API uses Decimal for precise arithmetic.
Installation
pip install openai-cost-calculator
(Package name on PyPI uses dashes; import name is from openai_cost_calculator import …
.)
Quick start
estimate_cost_typed()
.
Legacy String API (Backward Compatible)
from openai import OpenAI
from openai_cost_calculator import estimate_cost
client = OpenAI(api_key="sk-…")
resp = client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role":"user","content":"Hi there!"}],
)
print(estimate_cost(resp))
# {'prompt_cost_uncached': '0.00000150',
# 'prompt_cost_cached' : '0.00000000',
# 'completion_cost' : '0.00000600',
# 'total_cost' : '0.00000750'}
Strongly-Typed API (Recommended for New Code)
from openai import OpenAI
from openai_cost_calculator import estimate_cost_typed
client = OpenAI(api_key="sk-…")
resp = client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role":"user","content":"Hi there!"}],
)
cost = estimate_cost_typed(resp)
print(f"Total cost: ${cost.total_cost}") # Decimal('0.00000750')
# Access individual components with Decimal precision
print(f"Prompt (uncached): ${cost.prompt_cost_uncached}")
print(f"Prompt (cached): ${cost.prompt_cost_cached}")
print(f"Completion: ${cost.completion_cost}")
# Convert to dict as needed
decimal_dict = cost.as_dict(stringify=False) # Decimal values
string_dict = cost.as_dict(stringify=True) # String values (legacy format)
Responses API
# Works with both APIs
resp = client.responses.create(
model="gpt-4.1-mini",
input=[{"role":"user","content":"Hi there!"}],
)
# Legacy: returns dict[str, str]
cost_dict = estimate_cost(resp)
# Typed: returns CostBreakdown dataclass
cost_obj = estimate_cost_typed(resp)
Public API
Strongly-Typed API (Recommended)
estimate_cost_typed(response) → CostBreakdown
Accepts ChatCompletion, streamed chunks, or Response objects; returns a dataclass with Decimal fields:
CostBreakdown(
prompt_cost_uncached: Decimal,
prompt_cost_cached: Decimal,
completion_cost: Decimal,
total_cost: Decimal
)
calculate_cost_typed(usage, rates) → CostBreakdown
Lower-level function for direct cost calculation using usage dict and rate dict.
CostBreakdown.as_dict(stringify=True)
Convert dataclass to dict. If
stringify=True
(default), returns 8-decimal strings.
If stringify=False
, returns raw Decimal objects.Legacy String API (Backward Compatible)
estimate_cost(response) → dict[str,str]
Accepts ChatCompletion, streamed chunks, or Response objects; returns a dict with:
{
"prompt_cost_uncached": "…",
"prompt_cost_cached" : "…",
"completion_cost" : "…",
"total_cost" : "…"
}
Common Functions
refresh_pricing()
— force-reload the remote CSV (handy right after the pricing sheet is updated).CostEstimateError
— one unified exception for bad input, missing pricing, etc.
When to Use Which API
Use the typed API (estimate_cost_typed
) when:
- Building new applications
- Performing financial calculations requiring precision
- Working with accounting or billing systems
- Need to aggregate costs across multiple requests
Use the legacy API (estimate_cost
) when:
- Maintaining existing codebases
- Need string output for JSON serialization
- Working with systems expecting the original format
Financial Calculation Example
from openai_cost_calculator import estimate_cost_typed
from decimal import Decimal
# Calculate precise costs across multiple requests
total_cost = Decimal('0')
for response in responses:
cost = estimate_cost_typed(response)
total_cost += cost.total_cost
print(f"Total spend: ${total_cost}") # Accurate Decimal arithmetic
Troubleshooting & FAQs
🎉 A brand-new model just launched – my code raises "pricing not found"
- Head to the pricing CSV on GitHub.
- If the new model/date is missing → open an issue or email the maintainer (orkunkinay@sabanciuniv.edu).
-
If the new row is already there →
call
refresh_pricing()
once — the 24-hour cache is then refreshed for every worker.
🔄 Streaming chunks
Just pass the generator returned by client.chat.completions.create(..., stream=True, stream_options={"include_usage": True})
straight into estimate_cost
or estimate_cost_typed
. The helper silently walks the stream and uses the last chunk that contains .usage
.
⚠️ "cached_tokens = 0" even though I know some were cached
Make sure you request include_usage_details=True
(classic) or stream_options={"include_usage": True}
(streaming). Without it the API omits the cached-token breakdown.
🏷️ Azure OpenAI deployment IDs vs. model names
Azure responses still carry the original model string (chunk.model
) — the calculator ignores the deployment name, so you're covered.
⏱️ Performance concerns
The only network call is the pricing CSV (max once every 24 h). All cost maths are pure Python and nanosecond-level.
Contributing & License
PRs for additional edge-cases, new pricing formats or SDK changes are welcome!
MIT License © 2025 Orkun Kınay & Murat Barkın Kınay