PostgreSQL Query Optimization

Techniques and patterns for optimizing PostgreSQL queries in high-traffic applications.

Understanding Query Plans

Always start with EXPLAIN ANALYZE:

SQL

1EXPLAIN ANALYZE
2SELECT u.name, COUNT(p.id) as post_count
3FROM users u
4LEFT JOIN posts p ON u.id = p.user_id
5WHERE u.created_at > '2024-01-01'
6GROUP BY u.id, u.name
7ORDER BY post_count DESC
8LIMIT 10;

Reading Plans

Focus on: Seq Scan vs Index Scan, nested loops, hash joins, and actual vs estimated rows.

Indexing Strategies

Basic Indexes

SQL

1-- Single column index
2CREATE INDEX idx_users_email ON users(email);
3
4-- Composite index (order matters!)
5CREATE INDEX idx_posts_user_created 
6ON posts(user_id, created_at DESC);

Partial Indexes

For frequently filtered queries:

SQL

1-- Only index active users
2CREATE INDEX idx_active_users 
3ON users(email) 
4WHERE status = 'active';

Expression Indexes

For computed values:

SQL

1CREATE INDEX idx_users_lower_email 
2ON users(LOWER(email));
3
4-- Now this query uses the index
5SELECT * FROM users WHERE LOWER(email) = 'user@example.com';

N+1 Query Problem

Common Issue

Loading related data in a loop causes exponential query growth.

Bad:

TypeScript

1// N+1 queries: 1 + N
2const users = await db.users.findMany();
3for (const user of users) {
4  user.posts = await db.posts.findMany({ where: { userId: user.id } });
5}

Good:

TypeScript

1// Single query with join
2const users = await db.users.findMany({
3  include: { posts: true }
4});

Connection Pooling

TypeScript

1import { Pool } from 'pg';
2
3const pool = new Pool({
4  host: 'localhost',
5  database: 'mydb',
6  max: 20, // Max connections
7  idleTimeoutMillis: 30000,
8  connectionTimeoutMillis: 2000,
9});
10
11// Use pool instead of individual connections
12const result = await pool.query('SELECT * FROM users WHERE id = $1', [userId]);

Batch Operations

Instead of individual inserts:

SQL

1-- Bad: Multiple round trips
2INSERT INTO users (name, email) VALUES ('Alice', 'alice@example.com');
3INSERT INTO users (name, email) VALUES ('Bob', 'bob@example.com');
4INSERT INTO users (name, email) VALUES ('Charlie', 'charlie@example.com');
5
6-- Good: Single query
7INSERT INTO users (name, email) VALUES 
8  ('Alice', 'alice@example.com'),
9  ('Bob', 'bob@example.com'),
10  ('Charlie', 'charlie@example.com');

Materialized Views

For expensive aggregate queries:

SQL

1CREATE MATERIALIZED VIEW user_statistics AS
2SELECT 
3  u.id,
4  u.name,
5  COUNT(DISTINCT p.id) as post_count,
6  COUNT(DISTINCT c.id) as comment_count,
7  MAX(p.created_at) as last_post_date
8FROM users u
9LEFT JOIN posts p ON u.id = p.user_id
10LEFT JOIN comments c ON u.id = c.user_id
11GROUP BY u.id, u.name;
12
13-- Refresh periodically
14REFRESH MATERIALIZED VIEW user_statistics;

Materialized views reduced our dashboard load time from 8s to 200ms.

Monitoring Queries

Track slow queries:

SQL

1-- Enable slow query logging
2ALTER SYSTEM SET log_min_duration_statement = 1000; -- Log queries > 1s
3SELECT pg_reload_conf();
4
5-- View currently running queries
6SELECT 
7  pid,
8  now() - query_start as duration,
9  query
10FROM pg_stat_activity
11WHERE state = 'active'
12ORDER BY duration DESC;

Production Metrics

From our optimization work:

| Metric | Before | After | Improvement | |--------|--------|-------|-------------| | Avg Query Time | 450ms | 85ms | 81% faster | | P95 Latency | 2.3s | 320ms | 86% faster | | DB CPU Usage | 78% | 32% | 59% reduction |

Placeholder content - replace with your actual PostgreSQL experiments.