Usage,
made simple.
Your plan buys two things: a request-credit budget (how much you can do) and RPM credits (how fast). Here is the whole system: tiers, credits, the burst bucket, built-in web tools and a per-key vector DB, with the real, live numbers and a simulator you can drive.
Two meters. That's the whole model.
Every call you make uses two separate meters. One caps your total volume, the other caps your pace. Understand these two and you understand everything else on this page.
Your rolling budget. Every action spends some: a chat call, a tool call, a vector op. Different actions cost different amounts (0.05 to 5 credits). Budgets refill continuously over a 5-hour and weekly window (Free: daily).
spent on every request · refills as old requests age outYour per-minute pace. A bucket refills one RPM credit every 60 ÷ RPM seconds; each request spends one. The bucket holds more than a minute's worth, so short bursts go through fine. Empty it and you get a 429.
one spent per request · the limit depends on model tier + planThe difference that matters: run out of RPM credits and you just slow down (retry in a few seconds). Run out of request credits and you've used your budget for that window. It comes back as the window rolls forward.
Every model lives in a tier.
All 113 models sort into four tiers, from small embedding models to the biggest flagship models. Your tier sets two things: your RPM-credit limit and whether your plan can reach it at all.
- Freemium3× burst
- Embeddings, transcription & the free community models.
- Lite2× burst
- Fast, lightweight coding & chat models.
- Pro2× burst
- Strong general-purpose workhorses.
- Max2× burst
- The biggest, smartest flagship models.
| Plan reaches → | Freemium | Lite | Pro | Max |
|---|---|---|---|---|
| Free $0 | ✓ | ✓ | ✓ | locked |
| Go $6.99 | ✓ | ✓ | ✓ | ✓ |
| Lite $14.99 | ✓ | ✓ | ✓ | ✓ |
| Pro $29 | ✓ | ✓ | ✓ | ✓ |
| Max $79 | ✓ | ✓ | ✓ | ✓ |
The ceiling. Free reaches up to Pro-tier models. Max-tier flagships are paid-only. Paid plans reach every tier. Go reaches all tiers too, across a set of efficient models.
A budget that refills itself.
Request credits use rolling windows: a moving window, not a calendar reset. As old requests age out, that budget comes back automatically. Paid plans use a 5-hour and a weekly window; Free uses a simple daily allowance.
| Plan | Per day | Per 5h | Per week | Rate limit |
|---|---|---|---|---|
| Free $0 | 50 | — | — | metered |
| Go $6.99 | — | 500 | 2K | unmetered |
| Lite $14.99 | — | 700 | 3K | metered |
| Pro $29 | — | 2K | 7K | metered |
| Max $79 | — | 3K | 15K | metered |
Lighter work costs a fraction of a credit; the biggest models cost several. What each call draws from your budget:
Tool-call round-trips on a model are discounted to ⅓ of its cost.
Paid plans get a head-start. Your first 50 small requests every 5 hours (short prompts, short replies) don't spend any request credits at all.
How fast, per minute.
Request credits cap your total; RPM credits cap your pace. Your per-minute limit depends on the model's tier and your plan. Lighter tiers and bigger plans get more. These are the live, enforced numbers.
| Model tier ↓ / Plan → | Free$0 | Lite$14.99 | Pro$29 | Max$79 | Burst |
|---|---|---|---|---|---|
| Max | 1 | 4 | 5 | 6 | 2× |
| Pro | 2 | 8 | 10 | 12 | 2× |
| Lite | 3 | 10 | 12 | 14 | 2× |
| Freemium | 30 | 60 | 120 | 200 | 3× |
- You earn credits at your steady rate. RPM credits refill at your per-minute limit (one every 60 ÷ RPM sec). Stay at or below that rate and the bucket stays topped up, nothing is spent.
- Bursts spend the extra. The bucket holds up to 2× a minute of credits (Freemium 3×). Go above your rate and the extra requests draw the bucket down; short spikes clear instantly.
- Empty means throttled to baseline. Once it's drained you're held at the steady refill; the overflow gets a 429 until credits build back up. Spreading work across tiers gives you more buckets.
Burst credits, like a meter you can drive.
RPM credits work like burstable (AWS T-series) credits. You earn them at your steady rate; stay at or below it and the bucket holds full. Go above and you spend the difference, until it empties and you're throttled back to baseline. Pick a plan and tier, then push the rate.
Search, scrape and extract, no extra keys.
A web-tools suite runs behind the same key at /v1/tools/*. Each successful call spends a flat amount of request credits and draws from your Lite-tier RPM credits, so tools share a pace with your lightweight models.
- cost
- 0.15 request credits / call
- available from
- Free
- pace
- Lite-tier RPM credits
- billing
- only on success
A vector DB per key.
Call /v1/vectors/* and a private vector database is set up for you on first use: collections, upserts, semantic search. Each operation spends just 0.05 request credits, and capacity grows with your plan.
| Plan | Storage | Collections | Read rpm | Write rpm | Vectors / upsert | Cost / op |
|---|---|---|---|---|---|---|
| Free | 20 MB | 2 | 300 | 10 | 100 | 0.05 |
| Lite | 200 MB | 5 | 1,200 | 60 | 1,000 | 0.05 |
| Pro | 800 MB | 20 | 6,000 | 300 | 5,000 | 0.05 |
| Max | 2 GB | ∞ | ∞ | ∞ | 10,000 | 0.05 |
Its own speed limits. The vector DB enforces separate read and write rate limits per plan, independent of your model RPM credits. 0 / ∞ on Max means unmetered. Storage shown is per-key.
Both meters are per API key and reset on their own, nothing to top up. Need more room? Spread work across model tiers, or move up a plan.
