SQL Performance Tuning at Scale: From Hours to Milliseconds

The query was timing out. It had been running for 23 minutes before the connection dropped. The business team needed the report by 9am. It was 7:30am.

This is the story of how we went from hours to milliseconds — and what we learned about SQL performance along the way.

The Anatomy of a Slow Query

Before you can fix a slow query, you need to understand why it's slow. The answer is almost always one of:

Missing index — the database is scanning millions of rows to find the ones you want
Bad join order — joining large tables before filtering, instead of filtering before joining
N+1 problem — running one query per row instead of one query for all rows
Bad statistics — the query planner has wrong information about your data distribution
Locking — your query is waiting for another transaction to release a lock

Start with EXPLAIN (MySQL) or EXPLAIN ANALYZE (PostgreSQL). Read the query plan. Find where the cost is concentrated. That's your target.

Indexing: More Art Than Science

The textbook answer is "add an index on the columns in your WHERE clause." The real answer is more nuanced.

Composite index column order matters.

If you have WHERE status = 'active' AND created_at > '2024-01-01', the right composite index is (status, created_at) if status has low cardinality (few distinct values) and created_at has high cardinality.

The index should go from high-selectivity to low-selectivity? Actually no — it depends. If you're always filtering by status first (and status eliminates most rows), put it first. The optimizer will use the index efficiently.

Covering indexes are underrated.

A covering index includes all the columns the query needs — not just the WHERE columns, but also the SELECT columns. The database can answer the query entirely from the index, without touching the actual table rows.

-- Without covering index: hit index, then fetch row for each match
SELECT customer_id, rental_status, created_at 
FROM rentals 
WHERE fleet_id = 42 AND rental_status = 'active';

-- With covering index on (fleet_id, rental_status, customer_id, created_at)
-- The database never needs to read the actual table

On a table with millions of rows, this can turn a 2-second query into a 10ms query.

Index on expressions, not just columns.

-- Can't use an index on created_at
WHERE DATE(created_at) = '2024-08-20'

-- Can use an index on created_at
WHERE created_at >= '2024-08-20' AND created_at < '2024-08-21'

Never apply functions to indexed columns in your WHERE clause if you want the index to be used.

The Report That Took 23 Minutes

The query that was killing us was a financial reconciliation report:

3 million rental records
5 million payment records
1.2 million customer records
Aggregations across 60-day windows
8 joins
Running on the production database

Three problems:

No composite indexes on the join columns
Full table scans on the payments table
Running on production, competing with live traffic

Fix 1: Indexes

Added composite indexes on the most-filtered columns:

rentals(customer_id, status, created_at)
payments(rental_id, payment_date, status)
customers(id, account_status)

Query time: 23 minutes → 4 minutes.

Fix 2: Query restructuring

The original query joined all tables first, then filtered. We restructured to filter first, then join:

-- Before: join everything, then filter
SELECT ... FROM rentals r
JOIN payments p ON r.id = p.rental_id
JOIN customers c ON r.customer_id = c.id
WHERE r.created_at > '2024-06-01'
AND p.status = 'settled'

-- After: filter early using subquery/CTE
WITH active_rentals AS (
  SELECT id, customer_id 
  FROM rentals 
  WHERE created_at > '2024-06-01'
    AND status IN ('active', 'completed')
),
settled_payments AS (
  SELECT rental_id, amount
  FROM payments
  WHERE status = 'settled'
    AND payment_date > '2024-06-01'
)
SELECT ... FROM active_rentals ar
JOIN settled_payments sp ON ar.id = sp.rental_id
JOIN customers c ON ar.customer_id = c.id

Query time: 4 minutes → 45 seconds.

Fix 3: Move to BigQuery

For reporting workloads, we moved off the production MySQL database entirely. All data syncs nightly to BigQuery, and reports run against BigQuery.

BigQuery is designed for analytical workloads across massive datasets. The same query that took 45 seconds in MySQL runs in 3 seconds in BigQuery across 5x more data.

Reporting queries: 45 seconds → 3 seconds. Never touching production again.

BigQuery-Specific Optimizations

If you're running analytics on BigQuery, a few things matter a lot:

Partition tables by date. If your queries always filter by date, partition the table by date column. BigQuery only reads partitions that match your filter — massively reducing data scanned and cost.

Cluster related columns. Clustering sorts the data within each partition by specified columns. Queries filtering by clustered columns scan less data.

Avoid SELECT *. BigQuery charges by data scanned. Selecting only the columns you need can reduce query cost by 10-100x on wide tables.

Use approximate aggregation functions for analytics. APPROX_COUNT_DISTINCT is dramatically faster than COUNT(DISTINCT ...) and accurate enough for most reporting use cases.

The MySQL Query Cache Problem

One counterintuitive lesson: the MySQL query cache can hurt performance on write-heavy tables.

On tables with frequent updates (like our rentals table, updated with every status change), the query cache is invalidated on every write. The overhead of invalidating cached queries can actually make the database slower than without caching.

We disabled the query cache for write-heavy tables and saw a measurable improvement in write throughput.

Monitoring for Query Performance

You can't optimize what you don't measure.

We monitor:

Slow query log — queries taking more than 1 second go to our logging system
Query latency by endpoint — which API endpoints generate the slowest database queries?
Lock wait time — how much time is being spent waiting for locks?
Index utilization — which indexes are being used? Which were created and never used?

The unused index question is important. Indexes aren't free — they slow down writes and consume storage. An index that exists but is never used is a liability.

SQL performance is one of those skills that pays dividends for years. Every optimization you learn compounds. The query that used to scare you becomes straightforward, and you start seeing the patterns before the problem manifests.

Start with EXPLAIN. Read the query plan. Find the cost. Fix the cause.

It's rarely magic — just methodical diagnosis.