database

CAP Theorem: Understanding Distributed Systems

Every distributed system makes a bet. The CAP theorem tells you which bets are available.

29 Apr 2024

CAP Theorem: Understanding Distributed Systems

Every distributed system makes a bet. The CAP theorem tells you which bets are available.

Proposed by Eric Brewer in 2000, the CAP theorem states that a distributed system can only guarantee two out of three properties at any given time:

Consistency -- Every read returns the most recent write. All nodes see the same data at the same time. Ask any node, get the same answer.

Availability -- Every request gets a response. The system stays up and serves requests, even if some nodes are down. You always get an answer, though it might be stale.

Partition Tolerance -- The system keeps working even when network communication between nodes breaks. Messages get lost or delayed, and the system still functions.

Why You Can't Have All Three

Here's the key insight: network partitions are not optional. In any real distributed system, networks fail. Cables get cut. Switches die. Cloud regions go dark.

So partition tolerance isn't a choice. It's a given.

That leaves you with a real decision: when the network splits, do you sacrifice consistency or availability?

CP Systems: Consistency + Partition Tolerance

When a partition happens, the system blocks or returns errors rather than serving stale data. Every successful read is guaranteed to be current.

Examples: MongoDB (in default config), HBase, Redis (in cluster mode)

Use when: Correctness matters more than uptime. Financial transactions, inventory systems, anything where stale reads cause real damage.

Cost: During a partition, some requests fail. Users see errors.

AP Systems: Availability + Partition Tolerance

When a partition happens, the system keeps serving requests but may return stale or conflicting data. It reconciles later.

Examples: Cassandra, DynamoDB, CouchDB

Use when: Uptime matters more than perfect accuracy. Social feeds, analytics, shopping carts -- places where eventual consistency is acceptable.

Cost: You might read outdated data. Conflict resolution gets complicated.

CA Systems: Consistency + Availability

This only works when there are no partitions -- meaning a single-node system or a system that assumes perfect network reliability.

Examples: Traditional single-node PostgreSQL, MySQL (non-replicated)

Cost: The moment you distribute across nodes, you lose this guarantee. CA doesn't exist in practice for distributed systems.

The Real-World Nuance

CAP is a spectrum, not a switch. Most modern databases let you tune the trade-off per query or per operation.

Cassandra lets you choose consistency levels per read/write. MongoDB lets you configure write concerns and read preferences. DynamoDB offers strongly consistent reads as an option.

The question isn't "which two do I pick forever." It's "which trade-off do I make for this specific operation."

Understanding CAP doesn't give you the answer. It gives you the right question to ask every time you design a distributed data layer.

Keep reading