PostgreSQL Query Optimization
Techniques and patterns for optimizing PostgreSQL queries in high-traffic applications.
Understanding Query Plans
Always start with EXPLAIN ANALYZE
:
1EXPLAIN ANALYZE
2SELECT u.name, COUNT(p.id) as post_count
3FROM users u
4LEFT JOIN posts p ON u.id = p.user_id
5WHERE u.created_at > '2024-01-01'
6GROUP BY u.id, u.name
7ORDER BY post_count DESC
8LIMIT 10;
Focus on: Seq Scan vs Index Scan, nested loops, hash joins, and actual vs estimated rows.
Indexing Strategies
Basic Indexes
1-- Single column index
2CREATE INDEX idx_users_email ON users(email);
3
4-- Composite index (order matters!)
5CREATE INDEX idx_posts_user_created
6ON posts(user_id, created_at DESC);
Partial Indexes
For frequently filtered queries:
1-- Only index active users
2CREATE INDEX idx_active_users
3ON users(email)
4WHERE status = 'active';
Expression Indexes
For computed values:
1CREATE INDEX idx_users_lower_email
2ON users(LOWER(email));
3
4-- Now this query uses the index
5SELECT * FROM users WHERE LOWER(email) = 'user@example.com';
N+1 Query Problem
Loading related data in a loop causes exponential query growth.
Bad:
1// N+1 queries: 1 + N
2const users = await db.users.findMany();
3for (const user of users) {
4 user.posts = await db.posts.findMany({ where: { userId: user.id } });
5}
Good:
1// Single query with join
2const users = await db.users.findMany({
3 include: { posts: true }
4});
Connection Pooling
1import { Pool } from 'pg';
2
3const pool = new Pool({
4 host: 'localhost',
5 database: 'mydb',
6 max: 20, // Max connections
7 idleTimeoutMillis: 30000,
8 connectionTimeoutMillis: 2000,
9});
10
11// Use pool instead of individual connections
12const result = await pool.query('SELECT * FROM users WHERE id = $1', [userId]);
Batch Operations
Instead of individual inserts:
1-- Bad: Multiple round trips
2INSERT INTO users (name, email) VALUES ('Alice', 'alice@example.com');
3INSERT INTO users (name, email) VALUES ('Bob', 'bob@example.com');
4INSERT INTO users (name, email) VALUES ('Charlie', 'charlie@example.com');
5
6-- Good: Single query
7INSERT INTO users (name, email) VALUES
8 ('Alice', 'alice@example.com'),
9 ('Bob', 'bob@example.com'),
10 ('Charlie', 'charlie@example.com');
Materialized Views
For expensive aggregate queries:
1CREATE MATERIALIZED VIEW user_statistics AS
2SELECT
3 u.id,
4 u.name,
5 COUNT(DISTINCT p.id) as post_count,
6 COUNT(DISTINCT c.id) as comment_count,
7 MAX(p.created_at) as last_post_date
8FROM users u
9LEFT JOIN posts p ON u.id = p.user_id
10LEFT JOIN comments c ON u.id = c.user_id
11GROUP BY u.id, u.name;
12
13-- Refresh periodically
14REFRESH MATERIALIZED VIEW user_statistics;
Materialized views reduced our dashboard load time from 8s to 200ms.
Monitoring Queries
Track slow queries:
1-- Enable slow query logging
2ALTER SYSTEM SET log_min_duration_statement = 1000; -- Log queries > 1s
3SELECT pg_reload_conf();
4
5-- View currently running queries
6SELECT
7 pid,
8 now() - query_start as duration,
9 query
10FROM pg_stat_activity
11WHERE state = 'active'
12ORDER BY duration DESC;
Production Metrics
From our optimization work:
| Metric | Before | After | Improvement | |--------|--------|-------|-------------| | Avg Query Time | 450ms | 85ms | 81% faster | | P95 Latency | 2.3s | 320ms | 86% faster | | DB CPU Usage | 78% | 32% | 59% reduction |
Placeholder content - replace with your actual PostgreSQL experiments.