Why Your SQL Query is Slow: An Interview Guide

1 min read

Your Interview Guide to SQL Performance

'A query is running slow. What do you do?' This is a staple of technical interviews. Your answer reveals your entire troubleshooting process.

A good answer doesn't just list one solution. It describes a process.

1. Check the Execution Plan: First, I'd get the 'Actual Execution Plan' for the query. This is the 'roadmap' the database engine uses. I'd look for expensive operations.
2. Look for 'Table Scans': The most common problem is a 'Full Table Scan' on a large table. This means the database is reading every single row because it can't find a better way.
3. Analyze Indexes: A Table Scan usually means a missing or incorrect index. I'd check if the columns in the WHERE clause and JOIN conditions are properly indexed.
4. Check Query Logic: I'd look for 'non-SARGable' queries (e.g., using a function on a column in the WHERE clause).
5. Analyze Joins: Are the joins efficient? Am I using a correlated subquery where a JOIN would be faster?

This cluster will dive into each of these steps, giving you the ammo you need to prove you can write fast, efficient SQL.

`Clustered` vs. `Non-Clustered` Indexes: A Simple Analogy

Interview Question: 'Explain Clustered vs. Non-Clustered Indexes.'

This is the most common SQL performance question. Using a simple analogy is the best way to answer.

Answer: 'An index is like an index in the back of a textbook; it helps the database find data without scanning the whole book. The two main types are Clustered and Non-Clustered.'

1. Clustered Index (The Dictionary)

Analogy: A dictionary. The data itself is the index. The words in a dictionary are physically sorted in alphabetical order.
How it works: A clustered index defines the physical sort order of the data rows in the table. Because the data can only be physically sorted in one way, you can only have one clustered index per table.
Common Use: The PRIMARY KEY of a table is automatically created as a clustered index by default in SQL Server.

-- When you make 'UserID' the PRIMARY KEY,
-- the table's rows are physically stored on disk in order of UserID.
CREATE TABLE Users (
  UserID INT PRIMARY KEY CLUSTERED,
  LastName VARCHAR(100)
);

2. Non-Clustered Index (The Textbook Index)

Analogy: The index at the back of a textbook. It's a separate structure from the data. The index is sorted alphabetically, and each entry has a 'pointer' (a page number) telling you where to find the actual data.
How it works: It's a data structure (like a B-tree) that is separate from the table. It stores the indexed column values, sorted in order, and a 'pointer' (called a 'row locator' or the clustered key) back to the actual data row.
Common Use: You can have many non-clustered indexes. You create them on columns that are frequently used in WHERE clauses or JOINs (like foreign keys, LastName, Email, etc.) to speed up lookups.

-- We want to find users by 'LastName' quickly.
-- This creates a new, separate data structure.
CREATE NONCLUSTERED INDEX IX_Users_LastName
ON Users (LastName);

-- Now, a query like this is super fast:
-- SELECT * FROM Users WHERE LastName = 'Smith';
-- The DB will 'seek' the 'IX_Users_LastName' index (fast)
-- find 'Smith', get the pointer, and jump directly to the data row.

How to Read a SQL Execution Plan

Interview Question: 'What do you look for in an Execution Plan?'

This question separates junior and senior candidates. You don't need to be a DBA, but you need to know the basics.

Answer: An execution plan is the database's 'roadmap' for how it will get your data. I look for the most expensive operations, which are usually shown as a high percentage cost. The main things I look for are 'scans' vs. 'seeks'.

The Big Two: Scans vs. Seeks

When you read a plan (usually from right-to-left), you'll see these operations on your tables.

1. Table Scan (or Clustered Index Scan) - BAD 👎

What it means: The database is reading the entire table from top to bottom to find your data.
Analogy: You're looking for 'Smith' in a phone book with no index, so you have to read every single page.
Why it happens: You have no index on your WHERE clause column, or your query is 'non-SARGable'.
The Fix: Add a non-clustered index on the WHERE clause column.

2. Index Seek (or Clustered Index Seek) - GOOD 👍

What it means: The database is using an index to 'jump' directly to the data it needs.
Analogy: You're using the index in the back of the book. You find 'Smith', see it's on page 245, and flip directly there.
Why it happens: You have a good query and a good index.
The Fix: Nothing! This is what you want to see.

Other Things to Look For:

Key Lookup (RID Lookup): This is a 'Seek' (good) followed by another lookup (less good). It happens when your non-clustered index doesn't contain all the columns your query needs (e.g., you SELECT FirstName, LastName but your index is only on LastName). The fix is to 'include' FirstName in your index.
Thick Arrows: The arrows connecting the operations show the number of rows being processed. If you see a thick arrow (millions of rows) going into an operation that suddenly becomes a thin arrow (10 rows), you know that operation is your filter, and you need to make it as efficient as possible.

`JOIN` vs. `Subquery`: Which is Faster?

Interview Question: 'Which is faster, a `JOIN` or a subquery?'

This is a common question, and the answer is nuanced.

Answer: 'It depends, but modern SQL optimizers are very good, and in most cases, a JOIN is preferred because it's often more readable and the optimizer can handle it more efficiently. The real performance killer is a correlated subquery.'

1. Standard Subquery (in `WHERE` clause)

This is often fine. The optimizer will usually rewrite this as a JOIN under the hood.

-- Get all products that have been sold
SELECT *
FROM Products
WHERE ProductID IN (SELECT ProductID FROM OrderDetails);

-- The optimizer will likely treat this exactly like:
SELECT p.*
FROM Products p
INNER JOIN OrderDetails od ON p.ProductID = od.ProductID;

2. Correlated Subquery (The Performance Killer)

This is the one to avoid. A correlated subquery is a subquery in the SELECT or WHERE clause that runs once for every single row in the outer query. This is a hidden N+1 loop.

'Before': Correlated Subquery (BAD 👎)

Imagine you have 10,000 products. This query will run 10,001 queries total (1 outer, 10,000 inner).

-- For each product, run a new query to get its order count
SELECT 
  p.ProductID,
  p.Name,
  (SELECT COUNT(*)
   FROM OrderDetails od
   WHERE od.ProductID = p.ProductID) AS OrderCount
FROM Products p;

'After': `LEFT JOIN` with `GROUP BY` (GOOD 👍)

This is the correct way. The database does one scan of both tables, joins them, and aggregates the result. This is vastly more efficient.

-- Join the tables once, then group the results
SELECT
  p.ProductID,
  p.Name,
  COUNT(od.OrderID) AS OrderCount
FROM Products p
LEFT JOIN OrderDetails od ON p.ProductID = od.ProductID
GROUP BY p.ProductID, p.Name;

The simple answer: 'A simple subquery in a WHERE IN clause is often fine, but I prefer JOINs for clarity. The main thing to avoid is a correlated subquery, which runs for every row and scales terribly. I would always rewrite that using a JOIN and GROUP BY.'

What are SARGable Queries?

Interview Question: 'What does SARGable mean?'

This is a fantastic 'deep cut' term to know. It shows you have a senior-level understanding of how indexes really work.

Answer: 'SARGable stands for Search-ARGument-able. It means a query is written in a way that allows the database engine to use an index to 'seek' the data. A non-SARGable query forces the engine to ignore the index and do a 'table scan'.'

The golden rule is: Do not use functions on your columns in the WHERE clause.

'Before': Non-SARGable Query (BAD 👎)

Imagine you have an index on the OrderDate column.

-- Find all orders from 2024
SELECT * FROM Orders
WHERE YEAR(OrderDate) = 2024;

-- WHY IT'S BAD:
-- To run this query, the database must:
-- 1. Take every single row in the 'Orders' table.
-- 2. Run the 'YEAR()' function on its 'OrderDate' value.
-- 3. Check if that result is '2024'.
-- The index on 'OrderDate' is 100% useless.
-- This is a 'Table Scan'.

'After': SARGable Query (GOOD 👍)

By rewriting the query to not use a function on the column, we make the index 'searchable'.

-- Find all orders from 2024
SELECT * FROM Orders
WHERE OrderDate >= '2024-01-01' AND OrderDate < '2025-01-01';

-- WHY IT'S GOOD:
-- The values '2024-01-01' and '2025-01-01' are calculated once.
-- The database can now use the index on 'OrderDate'.
-- It can 'seek' directly to the start of the 2024 data.
-- This is an 'Index Seek' and is thousands of times faster.

Common Non-SARGable Mistakes:

WHERE YEAR(MyDate) = 2024
WHERE DATEDIFF(day, MyDate, GETDATE()) = 1
WHERE SUBSTRING(MyColumn, 1, 3) = 'ABC'
WHERE MyColumn LIKE '%MyString' (A leading wildcard is non-SARGable)
WHERE MyColumn + ' ' + MyOtherColumn = 'Full Name'

In all these cases, you are hiding the true value of the indexed column from the engine.

Advanced Querying
12 Feb 2025
Database Design & Theory
12 Feb 2025