The query was timing out. It had been running for 23 minutes before the connection dropped. The business team needed the report by 9am. It was 7:30am.
This is the story of how we went from hours to milliseconds — and what we learned about SQL performance along the way.
The Anatomy of a Slow Query
Before you can fix a slow query, you need to understand why it's slow. The answer is almost always one of:
- Missing index — the database is scanning millions of rows to find the ones you want
- Bad join order — joining large tables before filtering, instead of filtering before joining
- N+1 problem — running one query per row instead of one query for all rows
- Bad statistics — the query planner has wrong information about your data distribution
- Locking — your query is waiting for another transaction to release a lock
Start with EXPLAIN (MySQL) or EXPLAIN ANALYZE (PostgreSQL). Read the query plan. Find where the cost is concentrated. That's your target.
Indexing: More Art Than Science
The textbook answer is "add an index on the columns in your WHERE clause." The real answer is more nuanced.
Composite index column order matters.
If you have WHERE status = 'active' AND created_at > '2024-01-01', the right composite index is (status, created_at) if status has low cardinality (few distinct values) and created_at has high cardinality.
The index should go from high-selectivity to low-selectivity? Actually no — it depends. If you're always filtering by status first (and status eliminates most rows), put it first. The optimizer will use the index efficiently.
Covering indexes are underrated.
A covering index includes all the columns the query needs — not just the WHERE columns, but also the SELECT columns. The database can answer the query entirely from the index, without touching the actual table rows.
-- Without covering index: hit index, then fetch row for each match
SELECT customer_id, rental_status, created_at
FROM rentals
WHERE fleet_id = 42 AND rental_status = 'active';
-- With covering index on (fleet_id, rental_status, customer_id, created_at)
-- The database never needs to read the actual table
On a table with millions of rows, this can turn a 2-second query into a 10ms query.
Index on expressions, not just columns.
-- Can't use an index on created_at
WHERE DATE(created_at) = '2024-08-20'
-- Can use an index on created_at
WHERE created_at >= '2024-08-20' AND created_at < '2024-08-21'
Never apply functions to indexed columns in your WHERE clause if you want the index to be used.
The Report That Took 23 Minutes
The query that was killing us was a financial reconciliation report:
- 3 million rental records
- 5 million payment records
- 1.2 million customer records
- Aggregations across 60-day windows
- 8 joins
- Running on the production database
Three problems:
- No composite indexes on the join columns
- Full table scans on the payments table
- Running on production, competing with live traffic
Fix 1: Indexes
Added composite indexes on the most-filtered columns:
rentals(customer_id, status, created_at)payments(rental_id, payment_date, status)customers(id, account_status)
Query time: 23 minutes → 4 minutes.
Fix 2: Query restructuring
The original query joined all tables first, then filtered. We restructured to filter first, then join:
-- Before: join everything, then filter
SELECT ... FROM rentals r
JOIN payments p ON r.id = p.rental_id
JOIN customers c ON r.customer_id = c.id
WHERE r.created_at > '2024-06-01'
AND p.status = 'settled'
-- After: filter early using subquery/CTE
WITH active_rentals AS (
SELECT id, customer_id
FROM rentals
WHERE created_at > '2024-06-01'
AND status IN ('active', 'completed')
),
settled_payments AS (
SELECT rental_id, amount
FROM payments
WHERE status = 'settled'
AND payment_date > '2024-06-01'
)
SELECT ... FROM active_rentals ar
JOIN settled_payments sp ON ar.id = sp.rental_id
JOIN customers c ON ar.customer_id = c.id
Query time: 4 minutes → 45 seconds.
Fix 3: Move to BigQuery
For reporting workloads, we moved off the production MySQL database entirely. All data syncs nightly to BigQuery, and reports run against BigQuery.
BigQuery is designed for analytical workloads across massive datasets. The same query that took 45 seconds in MySQL runs in 3 seconds in BigQuery across 5x more data.
Reporting queries: 45 seconds → 3 seconds. Never touching production again.
BigQuery-Specific Optimizations
If you're running analytics on BigQuery, a few things matter a lot:
Partition tables by date. If your queries always filter by date, partition the table by date column. BigQuery only reads partitions that match your filter — massively reducing data scanned and cost.
Cluster related columns. Clustering sorts the data within each partition by specified columns. Queries filtering by clustered columns scan less data.
Avoid SELECT *. BigQuery charges by data scanned. Selecting only the columns you need can reduce query cost by 10-100x on wide tables.
Use approximate aggregation functions for analytics. APPROX_COUNT_DISTINCT is dramatically faster than COUNT(DISTINCT ...) and accurate enough for most reporting use cases.
The MySQL Query Cache Problem
One counterintuitive lesson: the MySQL query cache can hurt performance on write-heavy tables.
On tables with frequent updates (like our rentals table, updated with every status change), the query cache is invalidated on every write. The overhead of invalidating cached queries can actually make the database slower than without caching.
We disabled the query cache for write-heavy tables and saw a measurable improvement in write throughput.
Monitoring for Query Performance
You can't optimize what you don't measure.
We monitor:
- Slow query log — queries taking more than 1 second go to our logging system
- Query latency by endpoint — which API endpoints generate the slowest database queries?
- Lock wait time — how much time is being spent waiting for locks?
- Index utilization — which indexes are being used? Which were created and never used?
The unused index question is important. Indexes aren't free — they slow down writes and consume storage. An index that exists but is never used is a liability.
SQL performance is one of those skills that pays dividends for years. Every optimization you learn compounds. The query that used to scare you becomes straightforward, and you start seeing the patterns before the problem manifests.
Start with EXPLAIN. Read the query plan. Find the cost. Fix the cause.
It's rarely magic — just methodical diagnosis.