Database Design Interview Questions

1 min read

Your Guide to Database Design & Theory

An interviewer doesn't just want a 'code monkey.' They want an engineer who understands the theory behind the database. Why are databases reliable? Why do we structure tables in a certain way?

These questions are designed to test your core computer science knowledge. The two biggest topics are:

ACID Properties: This is the definition of a reliable transaction. You must know what A, C, I, and D stand for.
Normalization: This is the process of designing a clean, efficient database. You must be able to explain the first three normal forms (1NF, 2NF, 3NF).

If you can explain these two concepts clearly, you will prove you have a solid theoretical foundation. This cluster will give you simple, clear definitions for each.

ACID Properties: Explained with a Bank Transfer Example

Interview Question: 'What are the ACID properties?'

Answer: 'ACID is an acronym that describes the four guarantees of a reliable database transaction. The classic example is a bank transfer of $100 from your checking to your savings account.'

A - Atomicity

'All or nothing.' A transaction is 'atomic' or indivisible. It must either fully complete or fully fail (rollback). It cannot be left in a partial state.

Bank Example: The transfer must both subtract $100 from checking and add $100 to savings. If the system crashes after the subtraction but before the addition, the entire transaction is rolled back. The $100 is returned to checking, as if nothing ever happened.

C - Consistency

'The database stays valid.' A transaction must bring the database from one valid (consistent) state to another. It cannot violate any of the database's integrity rules (like constraints, or business rules).

Bank Example: A rule says the checking account balance cannot go below $0. If you only have $50 in checking and try to transfer $100, the transaction violates the 'consistency' of the data and is rolled back.

I - Isolation

'Transactions don't interfere with each other.' If multiple transactions are running at the same time (concurrently), the end result must be the same as if they were run one after another (serially). One transaction's partial, uncommitted work is not visible to other transactions.

Bank Example: While your transfer is in progress (Step 1 done, Step 2 not yet), another transaction tries to calculate your total net worth. Because of isolation, that second transaction will either see the state before you started the transfer, or the state after you finished, but never the intermediate state where the $100 is 'missing'.

D - Durability

'Once it's saved, it stays saved.' Once a transaction has been successfully committed, the changes are permanent and will survive any system failure (like a power outage or crash). The changes are written to a transaction log or non-volatile memory.

Bank Example: Once the transfer is complete and you see 'Success', that change is permanent. Even if the database server crashes one second later, your $100 is safely in savings when it reboots.

Database Normalization: 1NF, 2NF, 3NF Explained

Interview Question: 'What is database normalization? Can you explain 1NF, 2NF, and 3NF?'

Answer: Normalization is the process of structuring a database to reduce data redundancy (duplication) and improve data integrity. The normal forms are a set of rules to achieve this. A good summary is that all data should depend on: The key, the whole key, and nothing but the key.

First Normal Form (1NF) - 'The Key'

Rule: Each column must contain 'atomic' (indivisible) values, and there can be no repeating groups.
Bad (Un-normalized): A Users table with columns UserID, Name, Phone1, Phone2 (a repeating group of phone numbers).
Good (1NF): You split this into two tables:
- Users (UserID, Name)
- PhoneNumbers (UserID, PhoneNumber)
This also makes the table atomic—you can now have 3 phones, or 0, without needing Phone3 or null columns.

Second Normal Form (2NF) - 'The Whole Key'

Rule: Must be in 1NF and every non-key column must depend on the entire primary key. (This rule only applies to tables with a composite primary key.)
Bad (1NF, not 2NF): A table OrderDetails with a composite key (OrderID, ProductID):
- (OrderID, ProductID), Quantity, ProductName
Here, Quantity depends on the whole key (it's the quantity for that order and product). But ProductName depends only on ProductID (a partial dependency). This is redundant; if the product name changes, you have to update it in many rows.
Good (2NF): Split into two tables:
- OrderDetails ((OrderID, ProductID), Quantity)
- Products (ProductID, ProductName)

Third Normal Form (3NF) - 'Nothing But The Key'

Rule: Must be in 2NF and no non-key column can depend on another non-key column (called a transitive dependency).
Bad (2NF, not 3NF): A Users table:
- UserID (Key), Name, ZipCode, City
Here, City depends on ZipCode, and ZipCode depends on UserID. The City has a transitive dependency on the UserID through ZipCode. This is redundant; if a zip code's city name changes, you have to update many users.
Good (3NF): Split into two tables:
- Users (UserID, Name, ZipCode (Foreign Key))
- ZipCodes (ZipCode (Key), City)

Transactions and Isolation Levels Explained

Interview Question: 'What is an isolation level?'

This is a senior-level follow-up to the 'ACID' question. It specifically tests your understanding of the 'I' (Isolation).

Answer: 'A transaction is a wrapper for one or more SQL statements that makes them 'atomic' (all-or-nothing) with BEGIN TRAN, COMMIT, and ROLLBACK. An Isolation Level is a setting that controls how much one transaction is isolated from other concurrent transactions. It's a trade-off between consistency (safety) and performance.'

Higher isolation levels are safer but can cause 'blocking', reducing performance. Lower levels are faster but can cause data integrity problems.

The 4 Problems Isolation Levels Solve:

You should know the problems that isolation levels prevent.

1. Dirty Read: The worst! Transaction A reads data that Transaction B has changed but not yet committed. If B rolls back, A has read 'dirty' data that never existed.
2. Non-Repeatable Read: Transaction A reads a row. Transaction B updates that same row and commits. Transaction A reads the same row again and gets a different value.
3. Phantom Read: Transaction A runs a query (e.g., SELECT COUNT() FROM Users WHERE Age > 30). Transaction B inserts a new row that matches that WHERE clause and commits. Transaction A runs the same query again and gets a different count. A 'phantom' row has appeared.

The 4 Common Isolation Levels:

Each level prevents (or allows) these problems.

Read Uncommitted: The lowest level. Allows Dirty Reads, Non-Repeatable Reads, and Phantom Reads. Very fast, very dangerous.
Read Committed: (The default for SQL Server). Prevents Dirty Reads. This is a good, standard balance.
Repeatable Read: Prevents Dirty Reads and Non-Repeatable Reads. It does this by 'locking' all rows it has read, which can block other transactions.
Serializable: The highest level. Prevents all three problems. It 'locks' the entire range* of data being queried, which can severely impact performance. It's as if the transactions are truly running one at a time.

The simple answer: 'Isolation levels are settings that control the trade-off between performance and data consistency in concurrent transactions. The default, 'Read Committed', prevents 'dirty reads' and is usually the best balance.'

SQL Performance and Optimization
12 Feb 2025
Advanced Querying
12 Feb 2025