Non-Functional Requirements in Software Architecture: How to Define Them

Share:

Most engineers I work with can tell you what a system should do. Few can tell you how well it should do it. That gap between "what" and "how well" is where non-functional requirements live, and ignoring them is how you end up with systems that technically work but fail in production.

I've seen this play out at Tipalti, at Mecca Brands, and in every startup I've consulted for. A feature ships. It passes all the functional tests. Then it hits real traffic and everything falls apart. Why? Because nobody asked: "How fast should this be? How many users? What happens when the database is slow? Can we recover from a failure?"

Non-functional requirements answer those questions. They define the quality attributes of your system: performance, reliability, security, scalability, maintainability. They're the constraints that shape your architecture decisions, the metrics that tell you if you're building the right thing, and the guardrails that prevent you from shipping something that works in theory but breaks in practice.

Let me show you how I define them, why they matter, and how to turn vague wishes into measurable, actionable requirements.

What Are Non-Functional Requirements, Really?

Functional requirements tell you what the system does. "Users can create an account." "The system processes payments." "Reports generate monthly summaries." These are the features, the behaviors, the business logic.

Non-functional requirements tell you how well the system does those things. "Account creation must complete in under 200ms for 95% of requests." "Payment processing must handle 10,000 transactions per second." "Reports must generate within 5 minutes for datasets up to 100GB."

The difference matters because it changes how you build. If your only requirement is "users can create accounts," you might build a simple CRUD API with a single database. If your requirement is "users can create accounts in under 200ms at 10k QPS," you need caching, connection pooling, maybe read replicas, and a completely different architecture.

Non-functional requirements are constraints. They force trade-offs. They make you choose between consistency and availability, between speed and cost, between simplicity and resilience. They're what separate a prototype from a production system.

The Categories That Actually Matter

I organize non-functional requirements into categories that map to real engineering decisions. Not academic theory. Practical buckets that help me make architecture choices.

Performance and Scalability

This is where most teams start, and where most teams get it wrong. They say "it needs to be fast" or "it should scale." That's useless. Fast compared to what? Scale to how many users?

I break performance down into measurable dimensions:

Latency: How long does an operation take? I specify percentiles, not averages. P50, P95, P99. Average latency hides outliers. If your P99 is 5 seconds but your average is 200ms, you have a problem that averages won't reveal.

Throughput: How many operations per second? Reads and writes separately. A system that handles 100k reads per second but only 100 writes per second needs different architecture than one with balanced traffic.

Capacity: How much data? How many concurrent users? Storage size, memory usage, network bandwidth. These numbers determine whether you need sharding, whether you can fit everything in memory, whether you need a CDN.

Scalability: How does performance change as load increases? Linear? Sub-linear? Does it degrade gracefully or fall off a cliff?

I write these as specific, measurable targets. "API endpoints must respond in under 100ms for P95 latency at 5,000 requests per second." "The system must support 10 million active users with 1TB of data storage." "Throughput must scale linearly with additional servers up to 50 instances."

Reliability and Availability

Reliability is about correctness. Availability is about uptime. They're related but different.

Availability: What percentage of time is the system operational? 99.9% means 8.76 hours of downtime per year. 99.99% means 52.56 minutes. These numbers drive decisions about redundancy, failover, disaster recovery.

Fault Tolerance: What happens when components fail? Can the system degrade gracefully? Do we need active-active replication or is active-passive enough?

Recovery Time: How quickly can we recover from a failure? Recovery Time Objective (RTO) and Recovery Point Objective (RPO) matter. If we can lose 1 hour of data and be back online in 15 minutes, that's very different from needing zero data loss and 30-second recovery.

Error Rates: What's acceptable? 0.1% error rate? 0.01%? This affects how much error handling, retry logic, and circuit breakers you need.

I've seen teams aim for 99.99% availability without understanding what that costs. It means multiple data centers, automated failover, constant monitoring, and a budget for redundancy. Sometimes 99.9% is enough, and the engineering effort is better spent elsewhere.

Security

Security requirements are non-functional because they constrain how you implement features, not what features you build. They're also the requirements that get ignored until there's a breach.

Authentication and Authorization: Who can access what? How do we verify identity? What's the session timeout? Do we need multi-factor authentication?

Data Protection: Encryption at rest and in transit. What algorithms? What key management? Compliance requirements (GDPR, PCI-DSS, HIPAA) drive specific technical choices.

Vulnerability Management: How quickly do we patch known vulnerabilities? What's our process for security updates?

Audit and Compliance: What do we need to log? How long do we retain logs? What compliance frameworks apply?

I write security requirements as specific technical constraints. "All API communication must use TLS 1.3." "Passwords must be hashed using bcrypt with cost factor 12." "Personal data must be encrypted at rest using AES-256." "Security patches must be applied within 7 days of release."

Maintainability and Operability

This is where junior engineers and most product managers stop thinking. They assume the system will just work. It won't. You need to plan for how you'll maintain it.

Observability: What metrics do we need? What logs? What traces? Can we debug production issues? I've spent too many nights debugging production problems with insufficient observability. Now I define this upfront.

Deployment: How do we deploy? Zero-downtime? Blue-green? Canary? How long do deployments take? What's the rollback process?

Monitoring and Alerting: What do we alert on? What's the on-call process? How do we know when something is wrong?

Documentation: What needs to be documented? API docs? Runbooks? Architecture diagrams? This affects tooling choices and team velocity.

Testability: How do we test this? Unit tests? Integration tests? Can we test failure scenarios? This affects how we structure code and what abstractions we use.

I write these as operational constraints. "All services must expose Prometheus metrics for latency, error rate, and throughput." "Deployments must complete in under 5 minutes with zero-downtime." "We must be able to rollback to the previous version within 2 minutes."

Usability and Accessibility

For user-facing systems, these matter. They're non-functional because they constrain implementation, not functionality.

Response Time: How quickly does the UI respond? Perceived performance matters more than actual performance sometimes.

Accessibility: WCAG compliance? Screen reader support? Keyboard navigation?

Browser Support: Which browsers? Which versions? This affects what JavaScript features you can use, what CSS, what polyfills you need.

Mobile Support: Responsive design? Native apps? Performance on mobile networks?

I specify these as concrete constraints. "Page load time must be under 2 seconds on 3G networks." "The application must be WCAG 2.1 AA compliant." "We support Chrome, Firefox, Safari, and Edge in their latest two versions."

How to Define Non-Functional Requirements: A Practical Process

I use a process that turns vague wishes into specific, measurable requirements. It's not complicated, but it requires discipline.

Step 1: Ask the Right Questions

Start with questions, not assumptions. I use a checklist:

Performance Questions:

  • What's the expected traffic? Peak? Average?
  • How many concurrent users?
  • What's the acceptable response time?
  • How much data are we storing? Growing how fast?
  • What's the read/write ratio?

Reliability Questions:

  • What's the cost of downtime? Per minute? Per hour?
  • How quickly do we need to recover?
  • How much data can we lose?
  • What are the critical user journeys that must always work?

Security Questions:

  • What data are we handling? Personal? Financial? Health?
  • What compliance requirements apply?
  • Who are the users? Internal? External? Both?
  • What's the threat model?

Operational Questions:

  • Who's operating this? What's their skill level?
  • What's the deployment frequency?
  • What's the on-call process?
  • What tools do we already have?

I ask these questions in meetings with product, business, and operations. I take notes. I don't assume I know the answers.

Step 2: Quantify Everything

Vague requirements are useless. "Fast" means nothing. "Scalable" means nothing. "Secure" means nothing.

I turn every requirement into a number with a unit:

  • "Fast" becomes "P95 latency under 200ms"
  • "Scalable" becomes "handles 10k QPS with linear scaling to 50 instances"
  • "Secure" becomes "TLS 1.3, AES-256 encryption at rest, OWASP Top 10 compliance"

If I can't quantify it, I push back. I ask: "What does 'fast' mean to you? Give me a number. What happens if it's slower? What's the business impact?"

Sometimes the answer is "I don't know." That's fine. We make an educated guess, document the assumption, and plan to measure and adjust. But we still write down a number. A wrong number is better than no number because it gives us something to test against.

Step 3: Prioritize and Trade Off

Not all requirements are equal. Some are must-haves. Some are nice-to-haves. Some conflict with each other.

I categorize requirements:

Must Have (P0): System doesn't work without these. "Payment processing must be secure and compliant with PCI-DSS." "User data must be encrypted."

Should Have (P1): Important but system can function without them temporarily. "API should respond in under 100ms for 95% of requests."

Nice to Have (P2): Improve experience but not critical. "Dashboard should load in under 1 second."

I also identify conflicts. High availability and low cost conflict. Strong consistency and high performance conflict. Perfect security and developer velocity conflict.

I document the trade-offs explicitly. "We prioritize availability over consistency for the read path, accepting eventual consistency for non-critical data." "We accept higher latency for stronger security in authentication flows."

Step 4: Make Them Testable

A requirement you can't test is a requirement you can't verify. I write requirements as testable assertions.

Instead of: "The system should be fast"

I write: "The /api/users endpoint must respond in under 100ms for 95% of requests, measured over a 24-hour period with production traffic patterns."

That's testable. I can write a load test. I can set up monitoring. I can verify it in production.

I also define how to measure. What tools? What metrics? What's the test environment? What's the acceptance criteria?

Step 5: Document and Communicate

Requirements that live only in my head are useless. I document them. I share them. I make them part of the architecture decision record.

I use a simple format:

Text
Requirement: [Clear statement]
Category: [Performance | Reliability | Security | etc.]
Priority: [P0 | P1 | P2]
Measurement: [How we measure it]
Target: [Specific number or criteria]
Current State: [Where we are now]
Trade-offs: [What we're giving up]

I put these in Confluence, in ADRs, in design docs. I reference them in code reviews. I use them in architecture discussions.

When someone proposes a change that violates a requirement, I point to the document. "This violates our P95 latency requirement of 100ms. Here's the requirement. How do we address this?"

Real Examples from Production Systems

Let me show you how this works in practice with examples from systems I've built.

Example 1: Payment Processing API

Context: Building a payment processing API at Tipalti. High stakes. Money involved. Compliance required.

Functional Requirements: Process payment requests, validate cards, charge accounts, return results.

Non-Functional Requirements I Defined:

Performance:

  • P95 latency: Under 500ms for payment processing
  • Throughput: 5,000 transactions per second
  • Capacity: Support 100 million transactions per month

Reliability:

  • Availability: 99.99% (52 minutes downtime per year)
  • RTO: 5 minutes (recover from failure in 5 minutes)
  • RPO: 0 (zero data loss)
  • Error rate: Under 0.01% (1 in 10,000 transactions)

Security:

  • PCI-DSS Level 1 compliance (required for payment processing)
  • TLS 1.3 for all communication
  • AES-256 encryption at rest for card data
  • Tokenization for card numbers (never store full numbers)
  • Audit logging for all payment operations

Operational:

  • Deployments: Zero-downtime, canary deployments
  • Monitoring: Real-time alerts for error rate spikes, latency degradation
  • Observability: Full request tracing, payment flow visibility

Trade-offs Made:

  • Chose strong consistency over high performance (money is involved, we need ACID transactions)
  • Chose higher cost (redundant infrastructure) over lower availability
  • Chose slower development velocity (compliance overhead) over faster feature delivery

These requirements drove architecture decisions: PostgreSQL for ACID guarantees, Redis for caching with careful invalidation, multiple data centers for redundancy, comprehensive monitoring and alerting.

Example 2: Content Delivery System

Context: Building a content delivery system for a media company. High traffic, global users, large files.

Functional Requirements: Upload content, store files, serve content to users, generate thumbnails.

Non-Functional Requirements I Defined:

Performance:

  • P95 latency: Under 200ms for metadata API, under 2 seconds for file downloads
  • Throughput: 50,000 reads per second, 1,000 writes per second
  • Capacity: 10TB storage, growing at 100GB per month
  • Scalability: Linear scaling with additional CDN nodes

Reliability:

  • Availability: 99.9% (acceptable for non-critical content)
  • RTO: 30 minutes
  • RPO: 1 hour (acceptable to lose recent uploads in disaster scenario)
  • Error rate: Under 0.1%

Performance (Global):

  • CDN coverage: Serve content from edge locations in North America, Europe, Asia
  • Cache hit ratio: Above 90% for popular content

Operational:

  • Cost: Storage costs must stay under $10,000 per month
  • Deployment: Blue-green deployments, 10-minute deployment window
  • Monitoring: Track cache hit rates, bandwidth usage, storage growth

Trade-offs Made:

  • Chose eventual consistency over strong consistency (content updates can be slightly delayed)
  • Chose lower availability target (99.9% vs 99.99%) to reduce costs
  • Chose CDN caching over always-fresh content (acceptable staleness for media)

These requirements drove decisions: S3 for storage, CloudFront for CDN, eventual consistency for metadata, aggressive caching strategies.

Example 3: Real-Time Collaboration Feature

Context: Adding real-time collaboration to a document editing app. Multiple users, low latency critical.

Functional Requirements: Show other users' cursors, sync edits in real-time, handle conflicts.

Non-Functional Requirements I Defined:

Performance:

  • Latency: Under 50ms for cursor updates, under 100ms for text edits (P95)
  • Throughput: Support 100 concurrent users per document
  • Scalability: Support 10,000 concurrent documents

Reliability:

  • Availability: 99.9% (acceptable for collaboration features)
  • Message delivery: At-least-once delivery (idempotent operations)
  • Conflict resolution: Automatic merge for 95% of conflicts

Operational:

  • WebSocket connection management: Handle 100k concurrent connections
  • Monitoring: Track message latency, connection drop rates, conflict frequency

Trade-offs Made:

  • Chose eventual consistency over strong consistency (real-time requires accepting some inconsistency)
  • Chose higher infrastructure costs (WebSocket servers) over simpler polling
  • Chose complex conflict resolution logic over simpler "last write wins"

These requirements drove architecture: WebSocket servers, operational transforms for conflict resolution, Redis for presence tracking, careful connection management.

Common Mistakes I've Made (So You Don't Have To)

I've made every mistake possible with non-functional requirements. Here's what I learned.

Mistake 1: Assuming Instead of Asking

I used to assume I knew what "fast" meant. I'd build something that I thought was fast, then discover the product team had different expectations.

Fix: Always ask. Get numbers. Document assumptions. If you don't know, make an educated guess and label it as an assumption.

Mistake 2: Setting Requirements Without Business Context

I'd set aggressive performance targets without understanding the business impact. Why does it need to be this fast? What happens if it's slower?

Fix: Connect requirements to business outcomes. "P95 latency under 100ms because user studies show abandonment increases 5% for every 100ms delay." That context helps prioritize and justify trade-offs.

Mistake 3: Requirements That Can't Be Measured

"The system should feel responsive." How do you measure "feel"? You can't. That requirement is useless.

Fix: Every requirement must be measurable. If you can't measure it, you can't verify it. If you can't verify it, it's not a requirement.

Mistake 4: Ignoring Operational Requirements

I'd design systems that worked great in theory but were impossible to operate. No monitoring. No deployment process. No runbooks.

Fix: Include operational requirements from the start. How will this be deployed? Monitored? Debugged? Maintained? Those requirements shape architecture just as much as performance requirements.

Mistake 5: Requirements That Conflict Without Acknowledgment

I'd set requirements that directly conflicted and then act surprised when we couldn't meet both.

Fix: Explicitly document conflicts and trade-offs. "We prioritize X over Y because Z." That clarity helps make decisions and sets expectations.

Mistake 6: Requirements That Never Get Updated

I'd set requirements at the start of a project and never revisit them. The system would evolve, requirements would become outdated, but we'd still be optimizing for the wrong things.

Fix: Review and update requirements regularly. As you learn more about the system, as business needs change, as you measure actual performance, update the requirements.

How to Validate Non-Functional Requirements

Defining requirements is half the battle. Validating them is the other half. I use several techniques.

Load Testing

I write load tests that simulate production traffic patterns. I measure latency, throughput, error rates. I push the system until it breaks, then I know the limits.

I use tools like k6, Apache JMeter, or custom scripts. I run tests in staging environments that mirror production. I test not just happy paths but failure scenarios: database slow, cache down, network partitions.

Monitoring and Observability

I instrument everything. Prometheus for metrics. Grafana for dashboards. Distributed tracing for request flows. Log aggregation for debugging.

I set up alerts based on requirements. "Alert if P95 latency exceeds 100ms." "Alert if error rate exceeds 0.1%." "Alert if availability drops below 99.9%."

I review these metrics regularly. Are we meeting requirements? Where are we close to limits? What's trending in the wrong direction?

Chaos Engineering

I intentionally break things to see if the system meets reliability requirements. Kill a database. Slow down network. Fill up disk space. Does the system degrade gracefully? Can it recover? Does it meet RTO and RPO targets?

Tools like Chaos Monkey, Gremlin, or custom scripts help. I run these in staging first, then carefully in production.

Code Reviews and Architecture Reviews

I use requirements as criteria in reviews. "Does this change violate our latency requirements?" "Does this meet our security requirements?" "How does this affect our operational requirements?"

Requirements become part of the definition of done. A feature isn't complete until it meets non-functional requirements.

Integrating Requirements into Your Development Process

Requirements aren't a one-time exercise. They're part of how you build software. I integrate them into every phase.

During Planning

I define non-functional requirements alongside functional requirements. I don't wait until implementation. I ask about performance, reliability, security, operations during initial discussions.

During Design

Requirements drive architecture decisions. Do we need caching? What database? What deployment strategy? Requirements answer these questions.

I document decisions in Architecture Decision Records (ADRs) and reference the requirements that drove them.

During Implementation

I write code with requirements in mind. I instrument for observability. I write tests that verify requirements. I make trade-offs explicit in code comments and documentation.

During Review

I use requirements as acceptance criteria. Does this meet performance targets? Security requirements? Operational requirements?

During Operations

I monitor against requirements. I alert on violations. I review and update requirements as the system evolves.

Tools and Templates I Use

I've developed templates and tools that make this process easier. Here's what I use.

Requirements Template

I use a simple markdown template:

Markdown
## Non-Functional Requirements

### Performance
- **Latency**: [P50, P95, P99 targets]
- **Throughput**: [QPS, TPS targets]
- **Capacity**: [Storage, users, data size]

### Reliability
- **Availability**: [Percentage, downtime budget]
- **RTO**: [Recovery time objective]
- **RPO**: [Recovery point objective]
- **Error Rate**: [Acceptable error percentage]

### Security
- [Specific security constraints]

### Operational
- [Deployment, monitoring, observability requirements]

### Trade-offs
- [Explicit trade-offs and priorities]

Measurement Checklist

I maintain a checklist of what to measure:

  • Latency percentiles (P50, P95, P99)
  • Throughput (requests per second)
  • Error rates
  • Availability percentage
  • Resource utilization (CPU, memory, disk, network)
  • Cache hit rates
  • Database query performance
  • API response times
  • User-facing metrics (page load, time to interactive)

Architecture Decision Record Template

When requirements drive architecture decisions, I document them:

Markdown
## ADR: [Decision Title]

### Context
[Why this decision is needed]

### Requirements
[Non-functional requirements that drive this decision]

### Decision
[The decision made]

### Consequences
[Positive and negative consequences]

### Alternatives Considered
[Other options and why they were rejected]

Closing Thoughts

Non-functional requirements aren't optional. They're not nice-to-haves. They're the difference between software that works in a demo and software that works in production.

I've seen too many projects fail because they ignored non-functional requirements. Features that worked perfectly in development but collapsed under load. Systems that passed all tests but couldn't be deployed. Applications that met every functional requirement but violated security or compliance standards.

The process I've outlined here isn't complicated. Ask questions. Quantify everything. Prioritize and trade off. Make requirements testable. Document and communicate. But it requires discipline. It requires pushing back when requirements are vague. It requires making hard choices about trade-offs.

Start with the questions I've shared. Use the templates. Integrate requirements into your development process. Measure and validate. Update as you learn.

Your future self will thank you when you're debugging a production issue and you have clear requirements to guide you. Your team will thank you when architecture decisions are driven by data, not opinions. Your users will thank you when the system actually works the way it's supposed to.

If you're working on a system right now, take 30 minutes. Write down the non-functional requirements. Make them specific. Make them measurable. Share them with your team. I guarantee you'll find gaps, assumptions, and conflicts you didn't know existed.

And once you've defined them, the real work begins: building a system that actually meets them.

Is this direct enough, or do you want me to add more edge to the opinions? Should I dive deeper into any specific category of requirements?

What non-functional requirements have you ignored that came back to bite you in production? Share your war stories.


Source: Non-Functional Requirements in Software Architecture: How to Define Them
Crawled on: 2026-01-04

Comments

Please log in to post a comment.

No comments yet.

Be the first to comment!