system-design

System Design Interview Playbook: 10-20min Structure and TypeScript

I've both interviewed at and conducted system design interviews for large engineering teams. I also run workshops through PersiaJS and my newsletter. Over...

22 Oct 2025

I've both interviewed at and conducted system design interviews for large engineering teams. I also run workshops through PersiaJS and my newsletter. Over the past decade I've learned one thing: doing well in these interviews isn't about memorizing buzzwords. It's about a repeatable, clear thought process and being able to justify trade-offs under time pressure.

Here's the playbook I use and teach.

The One Principle That Matters

Clarify first. Sketch quickly. Iterate. Be explicit about trade-offs.

I used to jump straight into diagrams and then realize I hadn't clarified a crucial constraint (e.g., "Is eventual consistency acceptable?"). That wasted time and sometimes lost the interviewer's trust. Now I always spend the first 2-3 minutes asking clarifying questions.

The Structure (Repeatable, 10-20 Minutes)

1. Clarify Requirements (2-3 min)

Who are the users? QPS? P95 latency targets? Data retention? Consistency requirements? Read/write ratio? Object sizes? Traffic patterns?

This is where you show the interviewer you think before you build.

2. High-Level Design (3-5 min)

Draw the main components: client, API gateway, application services, caches, database, async pipelines, storage. Keep it simple. Don't over-detail yet.

3. Define APIs and Data Model (2-3 min)

Show the external API shape and the minimal data schema. This grounds the discussion in concrete terms.

4. Capacity, Scaling, and Bottlenecks (3-5 min)

Estimate traffic and storage. Identify bottlenecks. Explain how to scale them. This is where napkin math matters.

5. Deep Dive Into One Component (5-10 min)

Pick the hardest or most interesting part (caching, sharding, consistency, search) and go deep. This is where you differentiate yourself.

6. Operational Concerns and Trade-Offs (2-3 min)

Monitoring, deployments, SLOs, rate limiting, security, cost. Show you think about running the system, not just building it.

7. Wrap Up (1-2 min)

Summarize decisions, acknowledge alternatives, invite follow-up questions.

This maps to how interviewers actually score: clarity, trade-offs, architecture, scalability, and operational awareness.

Concrete Example: URL Shortener

I use this in workshops because it touches hashing, collision handling, data models, caching, and analytics.

Clarify

  • QPS: 10k read, 100 write.
  • Latency target: <100ms for redirects.
  • Support custom aliases.
  • Data retention: indefinite (soft delete allowed).

High-Level Components

Clients → API Gateway → Shortener Service → Storage (primary DB) + Cache (Redis) → Analytics pipeline (Kafka → batch store)

APIs and Data Model

  • CreateShort(url, optional alias) → code
  • GET /r/{code} → 302 to original URL

Generating Codes

Use Base62 encoding of an auto-increment ID or a hash (with collision handling). For global scale, prefer a partitioned keyspace.

Text
const ALPHABET = "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ";
const BASE = ALPHABET.length;
export function encodeBase62(num: number): string {
   if (num === 0) return ALPHABET[0];
      let s = "";
      while (num > 0) {
         s = ALPHABET[num % BASE] + s;
         num = Math.floor(num / BASE);
      }
   return s;
}

Scaling Notes

  • Reads dominate. Cache redirects in Redis (hot keys).
  • Writes are low QPS — an RDBMS handles them fine.
  • Partition by code prefix using consistent hashing.
  • For custom aliases, enforce uniqueness with DB transactions or a dedicated index.

Real-World Trade-Offs I've Seen

I once designed a system using only auto-increment IDs and a single DB master. It worked at low scale but became painful when we needed geo-replication. We should have planned sharding earlier and separated the "assign ID" service.

UUIDs avoid central counters but make short codes longer. Choose based on product constraints.

Example Deep-Dive: Rate Limiting

Interviewers love this because it combines algorithms and distributed state.

Common approaches: fixed window, sliding window, token bucket, leaky bucket.

For distributed systems, use a centralized store (Redis) or embed tokens in clients with consistent coordination.

  • Use Redis (INCR + expire) or a token bucket per user stored in Redis.
  • For large scale, consider approximate algorithms (leaky bucket via local counters + periodic sync) to reduce write pressure.
  • Call out fairness, burst handling, and what happens if Redis fails.

What Interviewers Actually Listen For

  • Do you ask clarifying questions? Good.
  • Can you draw a coherent high-level design and zoom in? Good.
  • Can you estimate capacity and identify bottlenecks? Good.
  • Can you articulate trade-offs and operational concerns? Great.
  • Can you implement a small algorithm and reason about correctness? Excellent.

Common Mistakes I've Made

Over-optimizing too early. I used to design distributed queues and leader election for features that never reached scale. Now I ask: "What scale do you actually need?" and propose a simpler solution with a clear migration path.

Ignoring operational concerns. A pretty architecture on a whiteboard that's impossible to operate (opaque failure modes, no metrics) is a non-starter.

Not justifying "why." Don't just list components — explain why you chose them. If you pick Dynamo-style storage, say whether you want availability over consistency and why.

Building Blocks Cheat Sheet

  • Load Balancer — distributes requests; use health checks.
  • API Gateway — authentication, rate limiting, routing.
  • Cache (Redis, Memcached) — reduce read latency; think expiration and cache stampede.
  • RDBMS — strong consistency, transactions.
  • NoSQL (Cassandra, DynamoDB) — high write throughput, partition-tolerant.
  • Queue (Kafka, RabbitMQ) — async workloads, reliability, decoupling.
  • Object Store (S3) — large binary storage, cheap and durable.
  • CDN — global caching for static content.
  • Monitoring (Prometheus + Grafana) — latency, errors, QPS dashboards.
  • Tracing (Jaeger) — distributed request tracing for debugging.

How to Practice

  • Start small. Design a URL shortener, file store, or rate limiter. Timebox yourself: 20-30 minutes.
  • Do mock interviews. I learned more from five live mocks than from reading fifty articles.
  • Learn key algorithms deeply. Consistent hashing, leader election basics, token bucket.
  • Read real postmortems. SRE postmortems from major companies teach failure modes better than any textbook.
  • Keep a capacity cheat sheet. Average object sizes, network costs, disk IOPS. Approximate numbers help make realistic trade-offs.

Resources I Actually Recommend

  • Designing Data-Intensive Applications by Martin Kleppmann — read it, re-read it, annotate it.
  • High Scalability blog — real case studies from production systems.
  • Engineering blog posts from major infra teams — gold for real-world trade-offs.

Final Advice

  • Talk out loud. Don't go silent while thinking.
  • Be explicit about assumptions. If you assume 10k QPS, say it.
  • Prioritize: solve the core problem first, add improvements later.
  • If time is running low, pick a single component to go deep on and show you can think end-to-end.