System Design Interview Playbook: 10-20min Structure and TypeScript

I get asked about system design interviews a lot — partly because I’ve both interviewed at and conducted interviews for big teams, and partly because I run workshops through PersiaJS and on my newsletter, Monday by Gazar. Over the last decade I’ve learned that doing well in these interviews isn’t about memorizing buzzwords; it’s about a repeatable, clear thought process and being able to justify trade-offs under time pressure.

Below is the playbook I use and teach. I’ll walk through the structure I follow in interviews, give concrete examples (including TypeScript snippets), and share real mistakes I’ve made so you can avoid them.

High-level principle I live by

Clarify first. Sketch quickly. Iterate. Be explicit about trade-offs.

I used to jump straight into diagrams and then realize I hadn’t clarified a crucial constraint (e.g., “Is eventual consistency acceptable?”). That wasted time and sometimes lost the interviewer’s trust. Now I always spend the first 2–3 minutes asking clarifying questions.

Interview structure I use (repeatable, 10–20 minute timing)

1. Clarify requirements (2–3 min)

Who are the users? QPS? P95 latency targets? Data retention? Consistency requirements?
Ask about traffic patterns, read/write ratio, size of objects.

2. High-level design (3–5 min)

Draw the main components: client, API gateway, application services, caches, DB, async pipelines, storage.

3. Define APIs and data model (2–3 min)

Show the external API shape and the minimal data schema.

4. Capacity, scaling & bottlenecks (3–5 min)

Estimate traffic, sizing, identify bottlenecks & how to scale them.

5. Deep dive into one component (5–10 min)

Pick the most interesting/hard part (caching, sharding, consistency, search) and go deep.

6. Operational concerns & trade-offs (2–3 min)

Monitoring, deployments, SLOs, rate limiting, security, cost.

7. Wrap up, alternatives, and follow-up questions (1–2 min)

This structure maps nicely to how interviewers score: clarity, trade-offs, architecture, scalability, and operational awareness.

Concrete example: URL shortener (short walkthrough)

I use this example in workshops because it touches on hashing, collision handling, data model, caching, and analytics.

Clarify

QPS: let’s assume 10k read QPS, 100 write QPS.
Latency target: <100ms for redirects.
We should support custom aliases.
Data retention: indefinite (but soft delete allowed).

High-level components

Clients -> API Gateway -> Shortener Service -> Storage (primary DB) + Cache (Redis) -> Analytics pipeline (Kafka -> batch store)

APIs and data model

API: CreateShort(url, optional alias) -> code
Redirect endpoint: GET /r/{code} -> 302 to original URL

Generating codes

Use a base62 encoding of an auto-increment ID OR a hash (with collision handling).
For global scale, prefer a partitioned keyspace: e.g., use a prefix for region/cluster or a shard id.

Simple base62 encoder example in TypeScript:

const ALPHABET = "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ";
const BASE = ALPHABET.length;
export function encodeBase62(num: number): string {
   if (num === 0) return ALPHABET[0];
      let s = "";
      while (num > 0) {
         s = ALPHABET[num % BASE] + s;
         num = Math.floor(num / BASE);
      }
   return s;
}

Scaling notes

Reads are dominant. Cache redirects in Redis (hot keys).
Writes are small QPS — can be handled with an RDBMS (for consistency) or a write-sharded NoSQL.
Partition by code prefix using consistent hashing or range shards.
For custom aliases, check uniqueness with DB transactions or a dedicated index.

Real-world trade-offs I’ve seen

I once designed a system that used only auto-increment IDs and a single DB master. It worked at low scale but became a pain when we needed geo-replication. We should’ve planned sharding earlier and separated the “assign id” service.
Using UUIDs avoids central counters but makes short codes longer. You must choose based on product constraints.

Example deep-dive I often pick: rate limiting (since interviewers like algorithms & distributed state)

Common approaches: fixed window, sliding window, token bucket, leaky bucket. For distributed systems, you either use a centralized store (Redis) or embed tokens in clients with consistent coordination.

In interviews, mention the distributed design:

Use Redis (INCR + expire) or use a token bucket per user stored in Redis (hash per user).
For large scale consider approximate algorithms (e.g., leaky bucket via local counters + periodic sync) to reduce write pressure.
Call out fairness, bursts allowed, and what happens if Redis fails.

What interviewers are actually listening for

Do you ask clarifying questions? Good.
Can you draw a coherent high-level design and then zoom in? Good.
Can you estimate capacity and identify bottlenecks? Good.
Can you articulate trade-offs and operational concerns (monitoring, backups, SLOs, failure modes)? Great.
Can you implement a small algorithm and reason about correctness/performance? Excellent.

Common mistakes I've made and seen

Over-optimizing too early. I used to design distributed queues and leader election for features that never reached scale. I wasted time and complicated the design. Now I ask: “What scale do you actually need?” and often propose a simpler solution with a clear migration path.
Ignoring operational concerns. You can make a pretty architecture on a whiteboard, but if it’s impossible to operate (opaque failure modes, no metrics), it’s a non-starter.
Not justifying “why”. Don’t just list components — explain why you chose them. If you pick Dynamo-style storage, say if you want availability over consistency and why.

A short cheat-sheet of common building blocks (one-line descriptions)

Load Balancer: distributes requests; use health checks.
API Gateway: authentication, rate limiting, routing.
Cache (Redis, Memcached): reduce read latency; think expiration & cache stampede.
RDBMS: strong consistency, transactions.
NoSQL (Cassandra, Dynamo): high write throughput, partition-tolerant.
Queue (Kafka, RabbitMQ): async workloads and reliability.
Object Store (S3): large binary storage, cheap.
CDNs: global caching for static content.
Monitoring (Prometheus + Grafana): latency, errors, QPS.
Tracing (Jaeger): distributed request tracing.

How I recommend practicing (actionable plan)

Start small: design a URL shortener, file store, or rate limiter. Timebox yourself: 20–30 minutes sketches.
Do mock interviews with peers or use platforms like Interviewing.io. I learned more from doing five live mocks than from reading 50 articles.
Learn a couple of common algorithms deeply: consistent hashing, leader election basics, token bucket.
Read real postmortems (SRE postmortems from major companies). They teach failure modes.
Keep a cheatsheet of capacities: average object sizes, network costs, disk IOPS — approximate numbers help make realistic trade-offs.

Resources I actually read or send people

High Scalability (case studies)
“Designing Data-Intensive Applications” by Martin Kleppmann — read, re-read, and annotate.
Blog posts from major infra teams — they’re gold for real-world trade-offs.
My newsletter, Monday by Gazar, often covers architecture patterns and practical lessons (shameless plug — I send stuff I wish someone told me years ago).

Final advice — what I tell candidates before an interview

Talk out loud. Don’t be silent while thinking.
Be explicit about assumptions. If you assume 10k QPS, say it.
Prioritize: solve the core problem first, add improvements later.
If time is low, pick a single component to go deep on (caching, sharding, consistency), and make sure you show you can think end-to-end.

If you want, we can do a mock system design now: pick a prompt (news feed, chat, ecommerce search, or video streaming) and I’ll walk you through how I’d tackle it in a 30-minute interview. I’ll even provide follow-up feedback and a short TypeScript prototype for key pieces.

GAZAR