Rate Limits
Overview
Section titled “Overview”Aleatoric Data enforces rate limits at the gateway layer to ensure fair resource allocation and platform stability. Limits are applied per API key and per region independently — a request routed to US-East and another to AP-Tokyo each decrement their respective regional counters.
Token Bucket Model
Section titled “Token Bucket Model”Rate limiting follows a token bucket algorithm. Each API key is assigned a bucket with capacity tokens. Tokens refill at a constant rate tokens per second, up to the maximum capacity. Each inbound request consumes exactly one token.
Let denote the number of tokens available at time . The bucket state evolves as:
where is the time of the last token update. When a request arrives:
- If : the request is admitted and .
- If : the request is rejected with HTTP 429 Too Many Requests.
This model permits short bursts up to requests while enforcing a sustained average rate of requests per second. The bucket drains during bursts and refills during idle periods, providing natural smoothing without requiring fixed time windows.
Tier Limits
Section titled “Tier Limits”| Tier | Requests/min | Sustained req/s () | Burst Capacity () | Concurrent Streams |
|---|---|---|---|---|
| Basic | 100 | 2 | 10 | 1 |
| Pro | 120,000 | 200 | 1,000 | 10 |
| Quant | Unlimited | Unlimited | Unlimited | Unlimited |
Notes:
- Basic tier is designed for development, testing, and low-frequency monitoring use cases.
- Pro tier sustains high-throughput production workloads. The burst capacity of permits short spikes (e.g., initial snapshot hydration) without throttling.
- Quant tier removes all rate constraints. Quant clients connect via dedicated gateway endpoints with isolated capacity.
Protocol-Specific Behavior
Section titled “Protocol-Specific Behavior”Each transport protocol interacts with the rate limiter differently. Understanding these semantics is essential for optimizing throughput.
JSON-RPC
Section titled “JSON-RPC”- The global Nginx rate limit is 5,000 req/s per API key, applied before tier-specific limits.
- Batch requests count as a single token consumption. A JSON-RPC batch containing up to 100 method calls consumes 1 token from the bucket. Batches exceeding 100 calls are rejected with error code
-32600. - Individual method calls within a batch are validated independently — a partial failure returns per-call error objects without consuming additional tokens.
- Streaming RPCs consume one token at connection time. Once a bidirectional or server-streaming RPC is established, individual messages within the stream are not rate-limited.
- Unary RPCs consume one token per call, identical to JSON-RPC single requests.
- Connection limits per tier apply to the total number of concurrent active gRPC channels. Exceeding the limit returns gRPC status
RESOURCE_EXHAUSTED(code 8).
Unified Stream (SSE)
Section titled “Unified Stream (SSE)”- Each SSE connection counts against both the request rate and the concurrent stream limit for the tier.
- Once connected, event throughput is unlimited — the server pushes events at line rate without per-message throttling.
- Reconnection attempts (including automatic retry after disconnect) count as new requests. Implement client-side jitter to avoid reconnection storms.
Disk-Sync WebSocket
Section titled “Disk-Sync WebSocket”- Available to Quant tier only. No rate limits of any kind are applied.
- One WebSocket connection per API key. Attempting a second connection closes the first with close code
4008. - WebSocket frames are not metered. Full L2 book synchronization runs at native throughput.
Rate Limit Headers
Section titled “Rate Limit Headers”All HTTP responses (including 2xx successes) include rate limit metadata headers:
| Header | Type | Description |
|---|---|---|
X-RateLimit-Limit | Integer | Maximum requests permitted in the current window |
X-RateLimit-Remaining | Integer | Tokens remaining in the bucket at response time |
X-RateLimit-Reset | Unix timestamp | Time at which the bucket will be fully replenished |
X-RateLimit-Bucket | String | Bucket identifier (useful when debugging multi-key setups) |
Retry-After | Integer | Seconds until the next request will be accepted (present only on 429 responses) |
For gRPC, equivalent metadata is returned in trailing headers using the x-ratelimit-* prefix.
429 Response
Section titled “429 Response”When the token bucket is empty, the gateway returns:
HTTP:
{ "error": "rate_limit_exceeded", "message": "Token bucket exhausted. Retry after the indicated interval.", "retry_after": 1, "limit": 100, "remaining": 0, "reset": 1741872000}JSON-RPC:
{ "jsonrpc": "2.0", "id": 1, "error": { "code": -32000, "message": "Rate limit exceeded", "data": { "retry_after": 1 } }}gRPC: Status RESOURCE_EXHAUSTED with detail message containing retry_after in metadata.
SSE: An error event is emitted with code rate_limit, followed by connection termination.
Best Practices
Section titled “Best Practices”-
Prefer streaming over polling. A single SSE or gRPC stream replaces thousands of polling requests per minute. Once connected, event delivery is unlimited and lower-latency than repeated HTTP round-trips.
-
Implement exponential backoff with jitter. On receiving a 429, wait for
Retry-Afterseconds, then add uniform random jitter in to prevent thundering herd effects across distributed clients. -
Use JSON-RPC batching. Combine related calls into a single batch request. A batch of 50 calls consumes 1 token instead of 50, yielding a efficiency gain for batch-compatible workflows.
-
Cache immutable data client-side. Instrument metadata, contract addresses, and coin configuration change infrequently. Cache these responses with a TTL of 60 seconds to eliminate redundant requests.
-
Select the nearest region. Aleatoric Data serves from multiple geographic regions. Route traffic to the region closest to your infrastructure to minimize round-trip time and avoid wasting rate limit tokens on retries caused by timeouts.
-
Monitor your headers. Track
X-RateLimit-Remainingin your client metrics. Set an alert when remaining tokens drop below 10% of the limit to proactively identify workloads approaching their ceiling. -
Distribute keys by function. If your architecture has distinct read-heavy and write-heavy components, use separate API keys for each. This prevents a burst in one subsystem from starving the other.