Top System Designs Interview Questions

system-designcachingperformanceoptimizationlatencyredis

Definition: A technique of storing copies of frequently used data in a temporary storage location (RAM) so it can be accessed faster.

Why use it?

  • Latency: RAM access (microseconds) is much faster than Disk/Network access (milliseconds).
  • Load Reduction: It prevents the Database from being hammered by the same queries over and over (e.g., fetching "Top Products").

system-designfundamentalsarchitecturesoftware-engineering

Definition: The process of defining the architecture, components, interfaces, and data for a system to satisfy specific requirements.

Importance: It bridges the gap between business requirements and code. Good design ensures the system is Scalable (can grow), Reliable (doesn't crash), and Maintainable (easy to fix). Without it, systems become "technical debt" traps that fail under load.

system-designapisecuritythrottlingreliabilityddos-protection

Definition: Controlling the number of requests a client can send to an API within a specified time window (e.g., 100 requests per minute).

Why needed?

  • Prevention: Stops DDoS attacks and brute-force login attempts.
  • Fairness: Ensures one heavy user doesn't hog all system resources, slowing it down for others.
  • Cost Control: Prevents auto-scaling bills from exploding due to bots.

system-designconsistencydistributed-systemsdatabaseavailability

Concept: Instead of guaranteeing that everyone sees the latest data instantly (Strong Consistency), we guarantee that if no new updates are made, all accesses will eventually return the last updated value.

Example: When you post on Instagram, your friend might not see it for a few seconds. That is acceptable. We trade immediate accuracy for High Availability and speed.

system-designcdncachingperformancelatencynetworking

CDN (Content Delivery Network): A geographically distributed network of proxy servers. They cache content closer to the end-user.

When to use:

  • Static Assets: Serving images, CSS, JS, and Video files.
  • Global Audience: To ensure a user in London gets data from a London server, not one in New York.
  • Load Reduction: It offloads traffic from your main application servers.

Master Concepts with Targeted MCQs

Strengthen your fundamentals with topic-wise MCQs designed to sharpen accuracy and speed.

system-designdatabaseshardingpartitioningscalingdistributed-data

Definition: Breaking up a large database into smaller, faster, more manageable parts called "Shards."

Horizontal Partitioning (Sharding): Splitting rows across multiple servers. Example: Users A-M go to DB Server 1, Users N-Z go to DB Server 2.

Why? It allows a database to exceed the storage and computing limits of a single physical machine (Horizontal Scaling).

system-designmessagingasynchronousdecouplingkafkarabbitmq

Definition: A buffer that stores messages between a Producer (sender) and a Consumer (receiver) to allow asynchronous communication (e.g., RabbitMQ, Kafka, SQS).

When to use:

  • Decoupling: Allow services to evolve independently.
  • Load Smoothing: If 1000 requests come in at once, the queue buffers them so the worker can process them at its own pace without crashing.
  • Reliability: If the consumer is down, the message stays in the queue until it recovers.

system-designdatabaseperformanceoptimizationindexingb-tree

Definition: A data structure (like a B-Tree) that improves the speed of data retrieval operations on a database table at the cost of additional writes and storage space.

Analogy: It is like the index at the back of a book. Instead of reading every page (Full Table Scan) to find a topic, you look at the index to find the exact page number immediately.

Trade-off: Indexes make Reads fast but Writes slow (because the index must be updated on every insert/update).

system-designrequirementsarchitecturefundamentalsnon-functional-requirements

Functional Requirements (The "What")

Behaviors the system must perform. If these are missing, the system doesn't work.

  • Example: "User must be able to login."
  • Example: "System must send an email invoice."

Non-Functional Requirements (The "How")

Quality attributes regarding performance and reliability.

  • Example: "System must respond in < 200ms." (Latency)
  • Example: "System must store data for 5 years." (Retention)

system-designscalinginfrastructurecapacity-planningperformance

Vertical Scaling (Scale Up)

Adding more power (CPU, RAM, Storage) to an existing single server. Easy to do, but has a hardware limit (ceiling) and creates a single point of failure.

Horizontal Scaling (Scale Out)

Adding more servers to the pool. This allows infinite scaling and high availability but requires complexity in application logic (statelessness, partitioning) and load balancing.

Turn Any Job Description Into an Interview Strategy

Paste your JD to get company insights, required skills, expected questions, and a personalized prep plan.

system-designnetworkingsecurityinfrastructureproxynginx

Reverse Proxy: A server that sits in front of web servers and forwards client requests to them. Primary goals are Security, Anonymity (hiding backend IP), and Performance (SSL termination, caching, compression). Example: Nginx.

Difference: While a Load Balancer specifically distributes traffic across multiple servers to manage load, a Reverse Proxy is often used even with a single server to handle security and protocol management. (Note: Most modern tools like Nginx perform both roles).

system-designload-balancingscalabilityavailabilitynetworking

A Load Balancer is like a traffic cop sitting in front of your servers. It directs incoming client requests across a group of backend servers to ensure no single server is overwhelmed.

  • Availability: If a server dies, the LB stops sending traffic to it.
  • Scalability: It allows you to add more servers easily to handle more traffic.

system-designdistributed-systemsconsistencyavailabilitypartition-tolerancetheory

In any distributed data system, you can only provide two of the following three guarantees:

  1. Consistency (C): Every read receives the most recent write or an error.
  2. Availability (A): Every request receives a response (no error), but data might be stale.
  3. Partition Tolerance (P): The system continues to operate despite network failures between nodes.

The Trade-off: Since networks will fail (P is mandatory), you must choose between CP (Wait for consistency, risk timeout) or AP (Return data fast, risk staleness).

system-designdatabasereplicationhigh-availabilityread-scalingfailover

Definition: The process of copying data from a primary database (Master) to one or more secondary databases (Slaves/Replicas).

Why use it?

  • High Availability (Failover): If the Master crashes, a Slave can be promoted to keep the app running.
  • Read Scaling: You can distribute read queries (SELECT) across multiple Replicas to reduce the load on the Master.
  • Backup/Analytics: Run heavy analysis reports on a Replica to avoid slowing down the production Master.

system-designsession-managementsecuritycachingredisjwt

1. Client-Side (Stateless)

Use JWT (JSON Web Tokens). The session data is encrypted/signed and stored in the user's browser (Cookie/LocalStorage). The server decodes it on every request. Good for scalability.

2. Server-Side (Stateful)

Use a Distributed Cache (Redis/Memcached). The browser holds a simple SessionID cookie. The server looks up the data in Redis. This is secure and allows for easy session revocation (banning a user instantly).

Experience a Real Interview

Practice with expert evaluators and receive detailed feedback to improve instantly.

system-designreliabilityload-balancingdevopsavailabilitymonitoring

Definition: API endpoints (e.g., /health) that a Load Balancer polls to determine if a service is running correctly.

  • Liveness Probe: "Is the container running?" If no, restart it.
  • Readiness Probe: "Is the service ready to accept traffic?" (e.g., DB connection established). If no, stop sending traffic.

Importance: They ensure high availability by automatically removing crashing or overwhelmed servers from rotation, preventing users from hitting dead endpoints.

system-designdatabasesqlnosqldata-modelingstorage

SQL (Relational)

  • Structure: Pre-defined schema (Tables, Rows, Columns).
  • Scaling: Vertical (Scale Up - bigger CPU/RAM).
  • Consistency: Strong Consistency (ACID transactions).
  • Best For: Financial systems, complex relationships, structured data (PostgreSQL, MySQL).

NoSQL (Non-Relational)

  • Structure: Dynamic schema (Documents, Key-Value, Graphs).
  • Scaling: Horizontal (Scale Out - sharding across servers).
  • Consistency: Eventual Consistency (BASE).
  • Best For: Big data, real-time analytics, flexible content management (MongoDB, Cassandra, Redis).

system-designload-balancingsession-managementscalingstateful

Definition: A Load Balancer feature that ensures a specific user is always routed to the same specific server for the duration of their session.

When to use: Used when the application stores Session State locally in the server's RAM (e.g., a shopping cart stored in memory). If the user were routed to a different server, they would lose their cart.

Downside: It creates uneven load distribution and makes scaling difficult. Ideally, use a distributed cache (Redis) for sessions instead.

system-designdevopsconfiguration-managementdeploymentenvironment-variables

1. Environment Variables: Store config values (DB connection strings, API keys) in the OS environment variables, not the code.

2. Configuration Files: Use separate files (e.g., appsettings.dev.json, appsettings.prod.json) that are loaded based on the active environment flag.

3. Centralized Config Store: For distributed systems, use tools like Consul, Etcd, or AWS Parameter Store to manage config dynamically without redeploying code.

Rule of Thumb: Never commit secrets (passwords/keys) to version control (Git).

system-designrecommendation-systemmachine-learningalgorithmscollaborative-filtering

For a simple MVP (Minimum Viable Product) without complex Machine Learning:

  • Most Popular: Recommend items with the highest total sales or views in the last 24 hours.
  • Recency: Recommend the newest items added to the catalog.
  • Content-Based (Simple Tags): "Because you liked [Action Movie], here are more [Action Movies]." Match based on categories or tags.
  • Collaborative Filtering (Basic): "Users who bought this also bought that" (using simple SQL aggregations).

Master Concepts with Targeted MCQs

Strengthen your fundamentals with topic-wise MCQs designed to sharpen accuracy and speed.

system-designobservabilityloggingdistributed-tracingmonitoringdebugging

Logging (The "What"): Records discrete events (e.g., "Database connection failed", "User logged in"). Logs provide the details of specific errors or state changes within a single service.

Tracing (The "Where"): Tracks the lifecycle of a request as it flows across multiple microservices. It visualizes the path and latency of a request (e.g., Service A -> Service B -> Database).

Why both? In a distributed system, a request might fail in Service C, but the error originated from bad data in Service A. Logs tell you what error occurred, but Tracing tells you where it happened and how the request got there.

system-designsecurityapiauthenticationauthorizationhttps

1. Encryption in Transit (HTTPS)

Always use TLS (SSL) to encrypt data between the client and server to prevent Man-in-the-Middle (MITM) attacks.

2. Authentication & Authorization

Use robust standards like OAuth2 and OpenID Connect. Issue JWTs (JSON Web Tokens) for stateless authentication. Ensure users only access data they are permitted to (RBAC/ABAC).

3. Rate Limiting & Throttling

Prevent DDoS and brute-force attacks by limiting the number of requests a user/IP can make within a time window.

4. Input Validation

Sanitize all incoming data to prevent SQL Injection and XSS (Cross-Site Scripting) attacks. Never trust user input.

5. API Gateway

Use a gateway as a single entry point to enforce security policies, manage CORS, and hide internal service architecture.

💬