Skip to content

Points

Description

How we scale our system to support millions of users:

  • Keep web tier stateless
  • Build redundancy at every tier
  • Cache data as much as you can
  • Support multiple data centers
  • Host static assets in CDN
  • Scale your data tier by sharding
  • Split tiers into individual services
  • Monitor your system and use automation tools

One cache cluster to rule them all

Be careful not to have multiple services share the same cache cluster! Shared memory can cause one service to evict critical data of another, making issues harder to detect and debug.

Queues are non-negotiable

Queues are essential. They give your system breathing room during traffic spikes and prevent services from being overwhelmed. They also help with autoscaling.

Measuring end-to-end latency

Don’t forget to monitor async message latency. Dequeue latency (time a message waits before being processed) can significantly impact your total system latency.

Design for failure

Failures will happen—network issues, rate limiting, downstream crashes. Plan for them. Use retries, circuit breakers, and dead-letter queues to handle failures gracefully.

Design for idempotency

Duplicate messages are inevitable. Your consumers must be idempotent to avoid processing the same event multiple times (e.g., double charges, duplicate records).