System Design Interview Playbook: 10-20min Structure and TypeScript

I get asked about system design interviews a lot — partly because I’ve both interviewed at and conducted interviews for big teams, and partly because I run workshops through PersiaJS and on my newsletter, Monday by Gazar. Over the last decade I’ve learned that doing well in these interviews isn’t about memorizing buzzwords; it’s about a repeatable, clear thought process and being able to justify trade-offs under time pressure.
Below is the playbook I use and teach. I’ll walk through the structure I follow in interviews, give concrete examples (including TypeScript snippets), and share real mistakes I’ve made so you can avoid them.
High-level principle I live by
- Clarify first. Sketch quickly. Iterate. Be explicit about trade-offs.
I used to jump straight into diagrams and then realize I hadn’t clarified a crucial constraint (e.g., “Is eventual consistency acceptable?”). That wasted time and sometimes lost the interviewer’s trust. Now I always spend the first 2–3 minutes asking clarifying questions.
Interview structure I use (repeatable, 10–20 minute timing)
1. Clarify requirements (2–3 min)
- Who are the users? QPS? P95 latency targets? Data retention? Consistency requirements?
- Ask about traffic patterns, read/write ratio, size of objects.
2. High-level design (3–5 min)
- Draw the main components: client, API gateway, application services, caches, DB, async pipelines, storage.
3. Define APIs and data model (2–3 min)
- Show the external API shape and the minimal data schema.
4. Capacity, scaling & bottlenecks (3–5 min)
- Estimate traffic, sizing, identify bottlenecks & how to scale them.
5. Deep dive into one component (5–10 min)
- Pick the most interesting/hard part (caching, sharding, consistency, search) and go deep.
6. Operational concerns & trade-offs (2–3 min)
- Monitoring, deployments, SLOs, rate limiting, security, cost.
7. Wrap up, alternatives, and follow-up questions (1–2 min)
This structure maps nicely to how interviewers score: clarity, trade-offs, architecture, scalability, and operational awareness.
Concrete example: URL shortener (short walkthrough)
I use this example in workshops because it touches on hashing, collision handling, data model, caching, and analytics.
Clarify
- QPS: let’s assume 10k read QPS, 100 write QPS.
- Latency target: <100ms for redirects.
- We should support custom aliases.
- Data retention: indefinite (but soft delete allowed).
High-level components
- Clients -> API Gateway -> Shortener Service -> Storage (primary DB) + Cache (Redis) -> Analytics pipeline (Kafka -> batch store)
APIs and data model
- API: CreateShort(url, optional alias) -> code
- Redirect endpoint: GET /r/{code} -> 302 to original URL
Generating codes
- Use a base62 encoding of an auto-increment ID OR a hash (with collision handling).
- For global scale, prefer a partitioned keyspace: e.g., use a prefix for region/cluster or a shard id.
Simple base62 encoder example in TypeScript:
const ALPHABET = "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ";
const BASE = ALPHABET.length;
export function encodeBase62(num: number): string {
if (num === 0) return ALPHABET[0];
let s = "";
while (num > 0) {
s = ALPHABET[num % BASE] + s;
num = Math.floor(num / BASE);
}
return s;
}
Scaling notes
- Reads are dominant. Cache redirects in Redis (hot keys).
- Writes are small QPS — can be handled with an RDBMS (for consistency) or a write-sharded NoSQL.
- Partition by code prefix using consistent hashing or range shards.
- For custom aliases, check uniqueness with DB transactions or a dedicated index.
Real-world trade-offs I’ve seen
- I once designed a system that used only auto-increment IDs and a single DB master. It worked at low scale but became a pain when we needed geo-replication. We should’ve planned sharding earlier and separated the “assign id” service.
- Using UUIDs avoids central counters but makes short codes longer. You must choose based on product constraints.
Example deep-dive I often pick: rate limiting (since interviewers like algorithms & distributed state)
Common approaches: fixed window, sliding window, token bucket, leaky bucket. For distributed systems, you either use a centralized store (Redis) or embed tokens in clients with consistent coordination.
In interviews, mention the distributed design:
- Use Redis (INCR + expire) or use a token bucket per user stored in Redis (hash per user).
- For large scale consider approximate algorithms (e.g., leaky bucket via local counters + periodic sync) to reduce write pressure.
- Call out fairness, bursts allowed, and what happens if Redis fails.
What interviewers are actually listening for
- Do you ask clarifying questions? Good.
- Can you draw a coherent high-level design and then zoom in? Good.
- Can you estimate capacity and identify bottlenecks? Good.
- Can you articulate trade-offs and operational concerns (monitoring, backups, SLOs, failure modes)? Great.
- Can you implement a small algorithm and reason about correctness/performance? Excellent.
Common mistakes I've made and seen
- Over-optimizing too early. I used to design distributed queues and leader election for features that never reached scale. I wasted time and complicated the design. Now I ask: “What scale do you actually need?” and often propose a simpler solution with a clear migration path.
- Ignoring operational concerns. You can make a pretty architecture on a whiteboard, but if it’s impossible to operate (opaque failure modes, no metrics), it’s a non-starter.
- Not justifying “why”. Don’t just list components — explain why you chose them. If you pick Dynamo-style storage, say if you want availability over consistency and why.
A short cheat-sheet of common building blocks (one-line descriptions)
- Load Balancer: distributes requests; use health checks.
- API Gateway: authentication, rate limiting, routing.
- Cache (Redis, Memcached): reduce read latency; think expiration & cache stampede.
- RDBMS: strong consistency, transactions.
- NoSQL (Cassandra, Dynamo): high write throughput, partition-tolerant.
- Queue (Kafka, RabbitMQ): async workloads and reliability.
- Object Store (S3): large binary storage, cheap.
- CDNs: global caching for static content.
- Monitoring (Prometheus + Grafana): latency, errors, QPS.
- Tracing (Jaeger): distributed request tracing.
How I recommend practicing (actionable plan)
- Start small: design a URL shortener, file store, or rate limiter. Timebox yourself: 20–30 minutes sketches.
- Do mock interviews with peers or use platforms like Interviewing.io. I learned more from doing five live mocks than from reading 50 articles.
- Learn a couple of common algorithms deeply: consistent hashing, leader election basics, token bucket.
- Read real postmortems (SRE postmortems from major companies). They teach failure modes.
- Keep a cheatsheet of capacities: average object sizes, network costs, disk IOPS — approximate numbers help make realistic trade-offs.
Resources I actually read or send people
- High Scalability (case studies)
- “Designing Data-Intensive Applications” by Martin Kleppmann — read, re-read, and annotate.
- Blog posts from major infra teams — they’re gold for real-world trade-offs.
- My newsletter, Monday by Gazar, often covers architecture patterns and practical lessons (shameless plug — I send stuff I wish someone told me years ago).
Final advice — what I tell candidates before an interview
- Talk out loud. Don’t be silent while thinking.
- Be explicit about assumptions. If you assume 10k QPS, say it.
- Prioritize: solve the core problem first, add improvements later.
- If time is low, pick a single component to go deep on (caching, sharding, consistency), and make sure you show you can think end-to-end.
If you want, we can do a mock system design now: pick a prompt (news feed, chat, ecommerce search, or video streaming) and I’ll walk you through how I’d tackle it in a 30-minute interview. I’ll even provide follow-up feedback and a short TypeScript prototype for key pieces.