Why this library?
- Per-query accurate: calculates the exact cost for each user query individually, based on token counts returned by OpenAI or Azure — no model guessing, no aggregate billing approximations.
- Dual-API support: works with
chat.completions.create()
and the newresponses.create()
. - Zero boilerplate: one import & one call:
estimate_cost(resp)
. - Pricing auto-refresh: daily CSV pull with a helper
refresh_pricing()
. - Edge-case aware: cached tokens, undated models, streaming generators, Azure deployments … handled!
- Predictable output: every number is returned as a string formatted to 8 decimal places — ready for JSON serialisation or spreadsheets.
Installation
pip install openai-cost-calculator
(Package name on PyPI uses dashes; import name is from openai_cost_calculator import …
.)
Quick start (Chat Completion API)
from openai import OpenAI
from openai_cost_calculator import estimate_cost
client = OpenAI(api_key="sk-…")
resp = client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role":"user","content":"Hi there!"}],
)
print(estimate_cost(resp))
# {'prompt_cost_uncached': '0.00000150',
# 'prompt_cost_cached' : '0.00000000',
# 'completion_cost' : '0.00000600',
# 'total_cost' : '0.00000750'}
Quick start (Responses API)
resp = client.responses.create(
model="gpt-4.1-mini",
input=[{"role":"user","content":"Hi there!"}],
)
print(estimate_cost(resp))
Public API
Everything lives under openai_cost_calculator
:
estimate_cost(response) → dict[str,str]
Accepts ChatCompletion, streamed chunks, or Response objects; returns a dict with:
{
"prompt_cost_uncached": "…",
"prompt_cost_cached" : "…",
"completion_cost" : "…",
"total_cost" : "…"
}
refresh_pricing()
— force-reload the remote CSV (handy right after the pricing sheet is updated).CostEstimateError
— one unified exception for bad input, missing pricing, etc.Troubleshooting & FAQs
🎉 A brand-new model just launched – my code raises “pricing not found”
- Head to the pricing CSV on GitHub.
- If the new model/date is missing → open an issue or email the maintainer (orkunkinay@sabanciuniv.edu).
-
If the new row is already there →
call
refresh_pricing()
once — the 24-hour cache is then refreshed for every worker.
🔄 Streaming chunks
Just pass the generator returned by client.chat.completions.create(..., stream=True, stream_options={"include_usage": True})
straight into estimate_cost
. The helper silently walks the stream and uses the last chunk that contains .usage
.
⚠️ “cached_tokens = 0” even though I know some were cached
Make sure you request include_usage_details=True
(classic) or stream_options={"include_usage": True}
(streaming). Without it the API omits the cached-token breakdown.
🏷️ Azure OpenAI deployment IDs vs. model names
Azure responses still carry the original model string (chunk.model
) — the calculator ignores the deployment name, so you’re covered.
⏱️ Performance concerns
The only network call is the pricing CSV (max once every 24 h). All cost maths are pure Python and nanosecond-level.
Contributing & License
PRs for additional edge-cases, new pricing formats or SDK changes are welcome!
MIT License © 2025 Orkun Kınay & Murat Barkın Kınay