llm-spendguard¶
A pre-spend governor for LLM and remote-compute cost. It caps every call before the spend, prices from a verified table, reconciles against actual provider billing, and learns the cheapest config that still holds quality — then proves and enforces it.
Zero required dependencies. One-line install. It never breaks your job — over a cap it asks (interactive) or fails open with a logged warning (non-interactive), never a crash.
- 60-second quickstart — install, gate a call, see it work.
- Architecture — the gate chokepoint + the extensibility seams.
- Using with Claude & agents — make every assistant session gated.
- Learning advisor — recommend considering history, not parroting it.
Why¶
Cost overruns don't announce themselves. They slip in quietly: a hardcoded price that drifted from the real rate, a forgotten model swap, under-batching that re-bills a shared prompt on every request, a job cancelled "to save money" that still bills for the work already completed, an ungated script in some other venv leaking spend nobody is watching.
spendguard stops those before the money moves — and then tells you, with evidence, the cheaper way to do the same work without losing quality.
Your keys, your data
spendguard runs in your process with your API keys. It never proxies or resells tokens. State lives
locally under $SPENDGUARD_HOME (~/.spendguard by default). The optional team roll-up sends only
scrubbed aggregates (daily totals + generalizable learnings) — never prompts, outputs, or keys.
Quickstart¶
1. Install¶
2. Turn on the gate¶
Two lines. Every OpenAI / Anthropic call in this process is now estimated and capped before it spends:
import spendguard
spendguard.install() # patches the OpenAI + Anthropic clients in-process; safe to call once at startup
Or let Claude (or any agent) set it up conversationally, picking your caps, projects, and providers — and optionally connecting a team:
Verify the gate is actually live in this interpreter (it's per-interpreter — a different venv is not gated):
3. Spend normally — it just governs¶
Write your normal code. spendguard sits in front of every paid call:
from openai import OpenAI
client = OpenAI()
resp = client.chat.completions.create( # ← estimated, priced, and checked against your caps first
model="gpt-5.5",
messages=[{"role": "user", "content": "Summarize this ticket."}],
)
- Under cap → the call runs, the spend is recorded to the local ledger, and pricing comes from the canonical table (never a hardcoded constant).
- Over cap → interactive sessions get a prompt (proceed / skip); non-interactive runs log a warning and fail open so a batch job is never silently killed mid-flight.
Want a hard stop instead of fail-open for a critical script? Assert the gate is enforcing up front:
import spendguard
spendguard.require() # raises if the gate isn't actually enforcing here — a bypass can't run silently
4. See what you spent — and what leaked¶
spendguard report # daily / weekly / monthly, per provider, + a ledger-vs-reality leak alert
spendguard reconcile openai # local ledger vs ACTUAL provider billing → surfaces ungoverned spend
spendguard reconcile anthropic
reconcile is the honesty check: it pulls real usage from the provider and diffs it against what the gate saw.
A gap means spend escaped the gate (an ungated venv, a different machine) — exactly the leak you want to find.
5. Set caps that matter¶
Caps are split so you can govern each kind of spend independently — and a single total ceiling over everything:
spendguard config set caps.llm.daily 25 # LLM/embeddings: $25/day
spendguard config set caps.compute.monthly 400 # remote compute (e.g. vast.ai GPUs): $400/month
spendguard config set caps.total.daily 60 # everything combined: $60/day
Before any paid batch, do a separate zero-spend estimate, confirm, then submit:
6. (Optional) roll up a team¶
Used solo, everything above is fully local. To see an org's combined spend, leaks, and learnings at llmspendguard.com:
spendguard saas link # shows a code, you approve it in the browser → your verified email is the contributor
From then on, each contributor's scrubbed daily aggregates roll up under the org. Billing is by active contributors that month (free ≤ 2), and the team sees combined spend, governance coverage, and the shared learnings — the cheapest-config rules one teammate proved, now available to everyone.
What it gives you¶
- Correct prices, always — one canonical table, layered + cross-checked, never hardcoded; an
auditenforces it. - Estimate before spend — every paid path projects cost first; the gate hard-stops over caps (asks, if interactive).
- Cost-per-good-result — a cheap call that fails quality is 100% waste, so the metric is
$/good, and any model/format downgrade is quality-gated (proven byexperiment, not assumed). - The governor is caged — the advisor's own LLM use has a separate
caps.metabudget and is excluded from the corpus it analyzes, so it can't overspend or pollute its own learning. - Living, validated learnings — insights are conditional rules with a confidence + lifecycle, re-validated as data grows, and shareable (scrubbed) across a team.
- Self-contained & non-blocking — zero required deps, fail-open, state isolated under
$SPENDGUARD_HOME; observability is exported (OTel), not another dashboard to babysit.
Where to next¶
- Architecture — how the gate, pricing resolution, the learning loop, and the meta-cage fit together (with diagrams), plus honest known limitations.
- Using with Claude & agents — wire spendguard into every assistant session with a standing
CLAUDE.mdrule and slash-commands. - Learning advisor — the corpus → insights → temporal graph, the deterministic / caged-LLM split, and the meta-budget cage.
- Roadmap — what's shipped and what's next.
The full command reference lives in the project README.