Rate Limits

Overview

Aleatoric Data enforces rate limits at the gateway layer to ensure fair resource allocation and platform stability. Limits are applied per API key and per region independently — a request routed to US-East and another to AP-Tokyo each decrement their respective regional counters.

Token Bucket Model

Rate limiting follows a token bucket algorithm. Each API key is assigned a bucket with capacity $B$ tokens. Tokens refill at a constant rate $r$ tokens per second, up to the maximum capacity. Each inbound request consumes exactly one token.

Let $T(t)$ denote the number of tokens available at time $t$ . The bucket state evolves as:

$T(t) = \min\!\Big(B,\; T(t_0) + r \cdot (t - t_0)\Big)$

where $t_0$ is the time of the last token update. When a request arrives:

If $T(t) \geq 1$ : the request is admitted and $T(t) \leftarrow T(t) - 1$ .
If $T(t) = 0$ : the request is rejected with HTTP 429 Too Many Requests.

This model permits short bursts up to $B$ requests while enforcing a sustained average rate of $r$ requests per second. The bucket drains during bursts and refills during idle periods, providing natural smoothing without requiring fixed time windows.

Tier Limits

Tier	Requests/min	Sustained req/s ( $r$ )	Burst Capacity ( $B$ )	Concurrent Streams
Basic	100	2	10	1
Pro	120,000	200	1,000	10
Quant	Unlimited	Unlimited	Unlimited	Unlimited

Notes:

Basic tier is designed for development, testing, and low-frequency monitoring use cases.
Pro tier sustains high-throughput production workloads. The burst capacity of $B = 1{,}000$ permits short spikes (e.g., initial snapshot hydration) without throttling.
Quant tier removes all rate constraints. Quant clients connect via dedicated gateway endpoints with isolated capacity.

Protocol-Specific Behavior

Each transport protocol interacts with the rate limiter differently. Understanding these semantics is essential for optimizing throughput.

JSON-RPC

The global Nginx rate limit is 5,000 req/s per API key, applied before tier-specific limits.
Batch requests count as a single token consumption. A JSON-RPC batch containing up to 100 method calls consumes 1 token from the bucket. Batches exceeding 100 calls are rejected with error code -32600.
Individual method calls within a batch are validated independently — a partial failure returns per-call error objects without consuming additional tokens.

gRPC

Streaming RPCs consume one token at connection time. Once a bidirectional or server-streaming RPC is established, individual messages within the stream are not rate-limited.
Unary RPCs consume one token per call, identical to JSON-RPC single requests.
Connection limits per tier apply to the total number of concurrent active gRPC channels. Exceeding the limit returns gRPC status RESOURCE_EXHAUSTED (code 8).

Unified Stream (SSE)

Each SSE connection counts against both the request rate and the concurrent stream limit for the tier.
Once connected, event throughput is unlimited — the server pushes events at line rate without per-message throttling.
Reconnection attempts (including automatic retry after disconnect) count as new requests. Implement client-side jitter to avoid reconnection storms.

Disk-Sync WebSocket

Available to Quant tier only. No rate limits of any kind are applied.
One WebSocket connection per API key. Attempting a second connection closes the first with close code 4008.
WebSocket frames are not metered. Full L2 book synchronization runs at native throughput.

Rate Limit Headers

All HTTP responses (including 2xx successes) include rate limit metadata headers:

Header	Type	Description
`X-RateLimit-Limit`	Integer	Maximum requests permitted in the current window
`X-RateLimit-Remaining`	Integer	Tokens remaining in the bucket at response time
`X-RateLimit-Reset`	Unix timestamp	Time at which the bucket will be fully replenished
`X-RateLimit-Bucket`	String	Bucket identifier (useful when debugging multi-key setups)
`Retry-After`	Integer	Seconds until the next request will be accepted (present only on 429 responses)

For gRPC, equivalent metadata is returned in trailing headers using the x-ratelimit-* prefix.

429 Response

When the token bucket is empty, the gateway returns:

HTTP:

{
  "error": "rate_limit_exceeded",
  "message": "Token bucket exhausted. Retry after the indicated interval.",
  "retry_after": 1,
  "limit": 100,
  "remaining": 0,
  "reset": 1741872000
}

JSON-RPC:

{
  "jsonrpc": "2.0",
  "id": 1,
  "error": {
    "code": -32000,
    "message": "Rate limit exceeded",
    "data": {
      "retry_after": 1
    }
  }
}

gRPC: Status RESOURCE_EXHAUSTED with detail message containing retry_after in metadata.

SSE: An error event is emitted with code rate_limit, followed by connection termination.

Best Practices

Prefer streaming over polling. A single SSE or gRPC stream replaces thousands of polling requests per minute. Once connected, event delivery is unlimited and lower-latency than repeated HTTP round-trips.
Implement exponential backoff with jitter. On receiving a 429, wait for Retry-After seconds, then add uniform random jitter in $[0, 0.5 \cdot \text{Retry-After}]$ to prevent thundering herd effects across distributed clients.
Use JSON-RPC batching. Combine related calls into a single batch request. A batch of 50 calls consumes 1 token instead of 50, yielding a $50\times$ efficiency gain for batch-compatible workflows.
Cache immutable data client-side. Instrument metadata, contract addresses, and coin configuration change infrequently. Cache these responses with a TTL of 60 seconds to eliminate redundant requests.
Select the nearest region. Aleatoric Data serves from multiple geographic regions. Route traffic to the region closest to your infrastructure to minimize round-trip time and avoid wasting rate limit tokens on retries caused by timeouts.
Monitor your headers. Track X-RateLimit-Remaining in your client metrics. Set an alert when remaining tokens drop below 10% of the limit to proactively identify workloads approaching their ceiling.
Distribute keys by function. If your architecture has distinct read-heavy and write-heavy components, use separate API keys for each. This prevents a burst in one subsystem from starving the other.