How would you design Twitter?
Twitter's core product is deceptively simple: users post 280-character messages and other users see them. But behind that simplicity sits one of the most ...
31 Mar 2024

Twitter's core product is deceptively simple: users post 280-character messages and other users see them. But behind that simplicity sits one of the most challenging system design problems in production: the timeline.
When a user with 50 million followers tweets, that message needs to appear in 50 million timelines. Fast. That's the design problem worth talking about.
Non-Technical Requirements
- Scalability — hundreds of millions of users, billions of tweets in storage, millions of concurrent readers. The system must scale horizontally across every layer.
- Reliability — tweets are the product. Losing a tweet is losing user trust. The system must be highly durable and available.
- Availability — 24/7 global service. Twitter is used during breaking news events when traffic spikes are massive and unpredictable.
- Data Privacy and Security — protect user accounts from takeover. Enforce privacy settings (protected accounts, blocked users). Comply with global regulations.
- Content Moderation — detect spam, abuse, and misinformation. This is both a technical challenge (ML-based detection at scale) and a policy challenge (what counts as misinformation?).
Technical Requirements
- Real-Time Data Processing — process and distribute tweets, likes, retweets, and replies in real-time. Trending topics must update within minutes.
- High Throughput — handle massive spikes during live events. The Super Bowl, elections, and breaking news can 10x normal traffic.
- Reliable Message Delivery — tweets must appear in followers' timelines. Notifications must reach users. Direct messages must be delivered reliably.
- Search and Discovery — full-text tweet search, trending topics, hashtag exploration, account suggestions. Search must be near-real-time.
- API Support — third-party apps, bots, analytics tools, and media organizations all depend on Twitter's API.
Low-Level Design
Tweet Storage. Tweets are stored in distributed databases. Twitter historically used MySQL with heavy caching. For new systems, Cassandra or similar wide-column stores handle the write-heavy tweet ingestion workload well. Partition by tweet ID for even distribution.
Messaging Infrastructure. Kafka for event streaming. Every tweet, like, retweet, and reply is an event. These events flow to timeline generation, notification delivery, search indexing, and analytics pipelines.
Content Moderation Pipeline. ML models analyze tweet text and media at ingestion time. NLP for text classification. Image/video recognition for media. Confidence scores determine auto-action vs. human review.
Search Indexing. Elasticsearch indexes tweets in near-real-time. Supports full-text search, hashtag queries, and user search. Must handle billions of documents with sub-second query times.
API Gateway. Handles authentication (OAuth), rate limiting (per-app and per-user), request routing, and API versioning. Critical for protecting backend services from abuse.
High-Level Design
- Client Applications — mobile apps (iOS, Android), web interface, and TweetDeck. Each implements feed rendering, compose, search, and notifications.
- Backend Services — microservices for authentication, tweet creation, timeline generation, direct messaging, search, notifications, and content moderation.
- Database Layer — different stores for different access patterns. Relational for user data. Wide-column for tweets. Graph or denormalized adjacency lists for the social graph. Time-series for analytics.
- CDN — cache and serve profile images, media attachments, and static assets from edge servers globally.
- Infrastructure — cloud-based with auto-scaling, load balancing, and multi-region deployment.
The Hardest Problem: Timeline Generation
This is where Twitter's system design gets interesting. Two approaches:
Fanout on write (push model). When a user tweets, the system writes that tweet into every follower's timeline cache. Reads are fast — just fetch the pre-built timeline. But writes are expensive. A user with 50 million followers triggers 50 million writes per tweet.
Fanout on read (pull model). When a user opens their timeline, the system queries all accounts they follow and assembles the timeline on the fly. Writes are cheap. But reads are expensive and slow, especially for users who follow thousands of accounts.
Twitter's actual approach: hybrid. Regular users (under ~10,000 followers) use fanout on write. Their tweets are pushed into follower timelines immediately. Celebrity accounts (millions of followers) use fanout on read. Their tweets are fetched and merged at read time.
The merge happens in a timeline service that combines the pre-built timeline (from push) with recent tweets from celebrity accounts (from pull). The result is cached.
The Trade-Offs
Write amplification vs. read latency. Push model gives fast reads but massive write amplification. Pull model gives cheap writes but slow reads. The hybrid approach is more complex to implement and operate, but it's the only way to handle both regular users and celebrities efficiently.
Search freshness vs. cost. Indexing every tweet in real-time is expensive. Twitter indexes tweets within seconds, but that requires a massive Elasticsearch cluster. For a smaller platform, indexing every few minutes might be acceptable and much cheaper.
Consistency vs. performance. When you like a tweet, the like count on your screen updates immediately. But other users might see the old count for a few seconds. This eventual consistency is acceptable for social metrics and avoids expensive distributed transactions.
Timeline ranking vs. chronological. A ranked timeline keeps users engaged but reduces transparency. A chronological timeline is predictable but surfaces less relevant content. Twitter offers both — users can choose. But the default (ranked) is what drives engagement metrics.
Twitter's architecture is a case study in managing write amplification at scale. The key insight: treat high-follower accounts differently from regular accounts. One size doesn't fit all.