Table of Contents
- Introduction
- Why Rate Limiting Matters
- The Four Pillars
- Core Concepts
- Seven Strategies Overview
- HTTP 429 Response
- Next Steps
Introduction
In today's API-driven world, rate limiting is not optionalβit's essential. Whether you're building a public REST API, a microservices architecture, or a SaaS platform, protecting your infrastructure while ensuring fair resource distribution is critical.
What is Rate Limiting?
Rate limiting controls the number of requests a client can make to your API within a specific time window. Think of it as a bouncer at a nightclubβonly a certain number of people can enter at a time.
Simple Example:
Without Rate Limiting:
Client β 10,000 requests/second β Server π₯ Crash
With Rate Limiting:
Client β 10,000 requests/second β Rate Limiter (100 allowed) β Server β
Stable
This Article Series
This is Part 1 of 10 in our comprehensive rate limiting series:
- Overview (This article) - Why, what, and how
- Fixed Window Strategy
- Sliding Window Strategy
- Token Bucket Strategy
- Concurrency Strategy
- Per-User Strategy
- Chained Strategy
- Tiered Strategy
- Comparison & Selection Guide
- Testing & Production Guide
Why Rate Limiting Matters
Real-World Problem
Scenario: E-commerce Flash Sale
9:59 AM: Normal traffic (100 req/s)
10:00 AM: Flash sale starts
10:00 AM: Traffic spikes to 50,000 req/s
10:00:30 AM: Database overwhelmed
10:00:45 AM: Complete service outage
10:15 AM: Customers angry, revenue lost
With Rate Limiting:
9:59 AM: Normal traffic (100 req/s)
10:00 AM: Flash sale starts
10:00 AM: Rate limiter allows 1,000 req/s
10:00 AM: Queue system handles overflow
10:00-10:15 AM: All customers served fairly
Result: Happy customers, no outage, maximum revenue
The Cost of NOT Having Rate Limiting
Example: Startup API (Real numbers)
Incident Report - January 15, 2025
Without Rate Limiting:
- 14:23: Buggy mobile app released
- 14:24: App makes 10 req/s per user instead of 1 req/s
- 14:25: 10,000 active users = 100,000 req/s
- 14:26: API Gateway crashes
- 14:27-16:30: Complete service outage (2 hours)
Impact:
- Lost revenue: $50,000
- AWS overage charges: $8,500
- Customer churn: 15%
- Engineer overtime: $2,000
- Reputation damage: Priceless
Total Cost: $60,500+ for a 2-hour outage
With Rate Limiting (100 req/s per user):
- 14:23: Buggy app released
- 14:24: Rate limiter kicks in
- 14:25: App gets 429 responses
- 14:26: Monitoring alerts triggered
- 14:30: Buggy version pulled
Impact:
- Lost revenue: $500
- AWS costs: Normal
- Customer churn: 0%
- Total Cost: $500
ROI: $60,000 saved with rate limiting!
The Four Pillars
1. π‘οΈ Security
Protection Against Attacks:
| Attack TypeWithout Rate LimitWith Rate Limit |
| DDoS | Server down in seconds | Attacker gets 429, others unaffected |
| Brute Force | 1M password attempts in 10 min | 10 attempts, then blocked for 1 hour |
| Credential Stuffing | 100K accounts compromised | Only 5 login attempts per IP |
| Web Scraping | Entire database downloaded | Slow drip, scraper gives up |
Real Example: GitHub
GitHub Rate Limits:
- Authenticated: 5,000 requests/hour
- Unauthenticated: 60 requests/hour
Result:
- Legitimate users: Never hit limit
- Scrapers: Frustrated and blocked
- API stays stable
2. π° Cost Control
Cloud Cost Reduction:
Scenario: Weather API
Without Rate Limiting:
Monthly Traffic:
- Expected: 10M requests
- Actual: 500M requests (bot traffic)
AWS Costs:
- API Gateway: 500M Γ $3.50/M = $1,750
- Lambda: 500M Γ $0.20/M = $100
- DynamoDB: 500M reads Γ $0.25/M = $125
- Data Transfer: $300
Total: $2,275/month
With Rate Limiting (1K req/hour per IP):
Monthly Traffic:
- Expected: 10M requests
- Actual: 12M requests (2M legitimate bursts)
AWS Costs:
- API Gateway: 12M Γ $3.50/M = $42
- Lambda: 12M Γ $0.20/M = $2.40
- DynamoDB: 12M reads Γ $0.25/M = $3
- Data Transfer: $10
Total: $57.40/month
Savings: $2,217.60/month (97% reduction!)
Annual Savings: $26,611
3. βοΈ Fair Usage
The Noisy Neighbor Problem:
Shared API (100 requests/second capacity)
Without Rate Limiting:
βββββββββββββββββββββββββββββββββββββββββββββββββββ
β User A (Power User): ββββββββββββββββββββ 90% β
β User B: β 5% β
β User C: β 5% β
βββββββββββββββββββββββββββββββββββββββββββββββββββ
Result: 99% of users have poor experience
With Rate Limiting (10 req/s per user):
βββββββββββββββββββββββββββββββββββββββββββββββββββ
β User A: ββββββββββ 10 req/s β
β User B: ββββββββββ 10 req/s β
β User C: ββββββββββ 10 req/s β
βββββββββββββββββββββββββββββββββββββββββββββββββββ
Result: Fair access for everyone
4. π Predictability
Capacity Planning Benefits:
Known Rate Limits = Predictable Infrastructure
Example:
- API Limit: 1,000 req/s
- Average Response Time: 50ms
- Required Capacity:
* Concurrent Requests: 1,000 Γ 0.05 = 50
* CPU Cores: 50 / 10 = 5 cores
* Memory: 5 Γ 2GB = 10GB RAM
* Instances: ceil(10GB / 4GB) = 3 instances
Budget:
- 3 instances Γ $100/month = $300/month
- Predictable, no surprises!
Without Rate Limiting:
- Peak load unknown (could be 1K or 100K)
- Must overprovision (10x capacity)
- Cost: $3,000/month or risk outages
Core Concepts
Request Rate
Formula:
Rate = Number of Requests / Time Window
Examples:
Web API: 100 requests/minute = 1.67 requests/second
Payment API: 5 requests/minute = 0.08 requests/second
CDN: 10K requests/second = 600K requests/minute
Time Windows
Common Window Sizes:
| Window SizeUse CaseExample |
| 1 second | Real-time APIs | Chat, streaming |
| 1 minute | Standard APIs | REST endpoints |
| 1 hour | Bulk operations | Report generation |
| 1 day | Freemium tiers | Free API access |
Burst Handling
What is a Burst?
Normal Traffic:
00:00 β 10 req/s
00:10 β 10 req/s
00:20 β 10 req/s
Bursty Traffic:
00:00 β 5 req/s
00:10 β 5 req/s
00:20 β 100 req/s (BURST!)
00:30 β 5 req/s
Burst Strategies:
- Reject Immediately (Strict)
Burst detected β All requests > limit rejected
Use: Payment processing, critical operations
- Queue (Tolerant)
Burst detected β Extra requests queued
Use: File uploads, batch processing
- Allow with Token Bucket (Flexible)
Burst detected β Use accumulated tokens
Use: User-facing APIs, bursty workloads
Queue Management
Queue States:
Request arrives:
β
Is rate limit exceeded?
ββ No β Process immediately β
ββ Yes β Is queue full?
ββ No β Add to queue π‘ (Wait)
ββ Yes β Reject request β (429)
Queue Processing:
Request completes β Dequeue next request β Process
Queue Configuration:
options.QueueLimit = 10; // Max 10 requests waiting
options.QueueProcessingOrder = QueueProcessingOrder.OldestFirst; // FIFO
Seven Strategies Overview
Quick Comparison Matrix
| StrategyComplexityBurst HandlingMemoryBest For |
| Fixed Window | β Low | β Poor | β Low | Internal APIs |
| Sliding Window | ββ Medium | β
Good | ββ Medium | Production APIs |
| Token Bucket | βββ High | β
Excellent | ββ Medium | Bursty traffic |
| Concurrency | β Low | N/A | β Low | Long operations |
| Per-User | ββ Medium | Varies | βββ High | Multi-tenant |
| Chained | βββ High | β
Strict | βββ High | Critical resources |
| Tiered | ββ Medium | Varies | ββ Medium | SaaS/Monetization |
1. Fixed Window
Concept: Fixed time buckets with counters
[00:00-00:59] β 10 requests allowed
[01:00-01:59] β Counter resets, 10 more allowed
Pros: Simple, low memory
Cons: Boundary burst problem
Read More: Fixed Window Strategy
2. Sliding Window
Concept: Weighted sliding time window
Uses previous window data to smooth traffic
No boundary burst problem
Pros: Smooth, production-ready
Cons: More complex, higher memory
3. Token Bucket
Concept: Bucket of tokens, refills at constant rate
Start: 20 tokens
Request: -1 token
Refill: +5 tokens every 10 seconds
Pros: Excellent burst handling
Cons: Complex configuration
4. Concurrency
Concept: Limits simultaneous requests
Max 5 concurrent requests
Request 6-15: Queued
Request 16+: Rejected
Pros: Protects resources
Cons: Doesn't limit rate
5. Per-User Partitioned
Concept: Independent limits per user/IP
User A: 30 req/min
User B: 30 req/min
Independent buckets
Pros: Fair allocation
Cons: Higher memory
6. Chained
Concept: Multiple limiters in sequence
Request β Concurrency Check β Rate Check β Token Check
All must pass
Pros: Comprehensive protection
Cons: Complex, restrictive
7. Tiered
Concept: Different limits by subscription
Free: 10 req/min
Basic: 50 req/min
Premium: 200 req/min
Enterprise: 1000 req/min
Pros: Monetization-ready
Cons: Requires user management
HTTP 429 Response
Standard Response Format
HTTP/1.1 429 Too Many Requests
Content-Type: application/json
Retry-After: 60
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1642521600
{
"error": "rate_limit_exceeded",
"message": "API rate limit exceeded. Please retry after 60 seconds.",
"documentation_url": "https://api.example.com/docs/rate-limits",
"limit": 100,
"remaining": 0,
"reset": "2025-01-15T10:35:00Z"
}
Response Headers
Standard Headers:
| HeaderDescriptionExample |
Retry-After | Seconds to wait | 60 |
X-RateLimit-Limit | Max requests allowed | 100 |
X-RateLimit-Remaining | Requests left | 0 |
X-RateLimit-Reset | Unix timestamp reset | 1642521600 |