Usage · Credits · Rate limits

Usage,
made simple.

Your plan buys two things: a request-credit budget (how much you can do) and RPM credits (how fast). Here is the whole system: tiers, credits, the burst bucket, built-in web tools and a per-key vector DB, with the real, live numbers and a simulator you can drive.

113models
4model tiers
2credit systems
1vector DB / key
01The two systems

Two meters. That's the whole model.

Every call you make uses two separate meters. One caps your total volume, the other caps your pace. Understand these two and you understand everything else on this page.

Request creditshow much

Your rolling budget. Every action spends some: a chat call, a tool call, a vector op. Different actions cost different amounts (0.05 to 5 credits). Budgets refill continuously over a 5-hour and weekly window (Free: daily).

spent on every request · refills as old requests age out
RPM creditshow fast

Your per-minute pace. A bucket refills one RPM credit every 60 ÷ RPM seconds; each request spends one. The bucket holds more than a minute's worth, so short bursts go through fine. Empty it and you get a 429.

one spent per request · the limit depends on model tier + plan

The difference that matters: run out of RPM credits and you just slow down (retry in a few seconds). Run out of request credits and you've used your budget for that window. It comes back as the window rolls forward.

02Model tiers

Every model lives in a tier.

All 113 models sort into four tiers, from small embedding models to the biggest flagship models. Your tier sets two things: your RPM-credit limit and whether your plan can reach it at all.

Freemium3× burst
Embeddings, transcription & the free community models.
Lite2× burst
Fast, lightweight coding & chat models.
Pro2× burst
Strong general-purpose workhorses.
Max2× burst
The biggest, smartest flagship models.
Plan reaches →FreemiumLiteProMax
Free $0locked
Go $6.99
Lite $14.99
Pro $29
Max $79

The ceiling. Free reaches up to Pro-tier models. Max-tier flagships are paid-only. Paid plans reach every tier. Go reaches all tiers too, across a set of efficient models.

03Request credits

A budget that refills itself.

Request credits use rolling windows: a moving window, not a calendar reset. As old requests age out, that budget comes back automatically. Paid plans use a 5-hour and a weekly window; Free uses a simple daily allowance.

PlanPer dayPer 5hPer weekRate limit
Free $050metered
Go $6.995002Kunmetered
Lite $14.997003Kmetered
Pro $292K7Kmetered
Max $793K15Kmetered
Credits per call

Lighter work costs a fraction of a credit; the biggest models cost several. What each call draws from your budget:

Vector DB operation0.05
Embeddings0.1
Web tool call0.15
Free / community0.2
Light models0.5-0.75
Standard model1
Premium flagship2-3
Top-tier / image5

Tool-call round-trips on a model are discounted to ⅓ of its cost.

Paid plans get a head-start. Your first 50 small requests every 5 hours (short prompts, short replies) don't spend any request credits at all.

04RPM credits

How fast, per minute.

Request credits cap your total; RPM credits cap your pace. Your per-minute limit depends on the model's tier and your plan. Lighter tiers and bigger plans get more. These are the live, enforced numbers.

Model tier ↓ / Plan →Free$0Lite$14.99Pro$29Max$79Burst
Max14562×
Pro2810122×
Lite31012142×
Freemium30601202003×
Read it: RPM credits per minute, per key.Go plan: unmetered, no per-minute limit, only request credits.Burst: the bucket holds this multiple of one minute of credits.
  1. You earn credits at your steady rate. RPM credits refill at your per-minute limit (one every 60 ÷ RPM sec). Stay at or below that rate and the bucket stays topped up, nothing is spent.
  2. Bursts spend the extra. The bucket holds up to 2× a minute of credits (Freemium 3×). Go above your rate and the extra requests draw the bucket down; short spikes clear instantly.
  3. Empty means throttled to baseline. Once it's drained you're held at the steady refill; the overflow gets a 429 until credits build back up. Spreading work across tiers gives you more buckets.
05Live simulator

Burst credits, like a meter you can drive.

RPM credits work like burstable (AWS T-series) credits. You earn them at your steady rate; stay at or below it and the bucket holds full. Go above and you spend the difference, until it empties and you're throttled back to baseline. Pick a plan and tier, then push the rate.

RPM-credit simulatorLIVE LIMITS
idle2× baseline
0 served0 throttledbaseline 6/min
Baseline6 rpm
Burst credits12 credits
Credit balance
Net / min-2
!Above baseline, spending. 8 rpm burns 2 credits/min from your 12-credit bucket; it empties in about 6 min, then you're throttled to 6/min and the overflow gets a 429. Tip: send the overflow to Pro-tier models, which have a separate, bigger bucket.
served at baselinespending burst creditsthrottled (429)animation sped up · token-bucket math matches production
06Web tools

Search, scrape and extract, no extra keys.

A web-tools suite runs behind the same key at /v1/tools/*. Each successful call spends a flat amount of request credits and draws from your Lite-tier RPM credits, so tools share a pace with your lightweight models.

cost
0.15 request credits / call
available from
Free
pace
Lite-tier RPM credits
billing
only on success
07Vector DB

A vector DB per key.

Call /v1/vectors/* and a private vector database is set up for you on first use: collections, upserts, semantic search. Each operation spends just 0.05 request credits, and capacity grows with your plan.

PlanStorageCollectionsRead rpmWrite rpmVectors / upsertCost / op
Free20 MB2300101000.05
Lite200 MB51,200601,0000.05
Pro800 MB206,0003005,0000.05
Max2 GB10,0000.05

Its own speed limits. The vector DB enforces separate read and write rate limits per plan, independent of your model RPM credits. 0 / ∞ on Max means unmetered. Storage shown is per-key.

Both meters are per API key and reset on their own, nothing to top up. Need more room? Spread work across model tiers, or move up a plan.