Data & Analytics

Mastering CTE: Escape Subquery Hell Once and For All

One WITH clause transforms unreadable queries into clear, logical steps. Recursive CTEs handle hierarchies with ease.

Mastering CTE: Escape Subquery Hell Once and For All

Mastering CTE: Escape Subquery Hell Once and For All

One WITH clause transforms unreadable queries into clear, logical steps. Recursive CTEs handle hierarchies with ease.

TL;DR

  • CTE (Common Table Expression): Temporary named result set within a query
  • Readability: Break complex subqueries into named, logical steps
  • Reusability: Reference the same CTE multiple times
  • Recursive CTE: Handle hierarchies, generate date series, and more

1. What is a CTE?

The Subquery Problem

sql
-- Hard to read nested subqueries
SELECT
    user_id,
    total_amount,
    user_avg,
    total_amount - user_avg as diff
FROM (
    SELECT
        user_id,
        SUM(amount) as total_amount,
        AVG(SUM(amount)) OVER () as user_avg
    FROM (
        SELECT
            user_id,
            amount
        FROM orders
        WHERE status = 'completed'
          AND created_at >= '2024-01-01'
    ) filtered_orders
    GROUP BY user_id
) user_totals
WHERE total_amount > user_avg;

What's going on here?

With CTE

sql
WITH filtered_orders AS (
    SELECT user_id, amount
    FROM orders
    WHERE status = 'completed'
      AND created_at >= '2024-01-01'
),
user_totals AS (
    SELECT
        user_id,
        SUM(amount) as total_amount,
        AVG(SUM(amount)) OVER () as user_avg
    FROM filtered_orders
    GROUP BY user_id
)
SELECT
    user_id,
    total_amount,
    user_avg,
    total_amount - user_avg as diff
FROM user_totals
WHERE total_amount > user_avg;

Each step is crystal clear!

2. CTE Basic Syntax

Single CTE

sql
WITH cte_name AS (
    SELECT ...
)
SELECT * FROM cte_name;

Multiple CTEs

sql
WITH
first_cte AS (
    SELECT ...
),
second_cte AS (
    SELECT * FROM first_cte  -- Can reference previous CTEs
    WHERE ...
),
third_cte AS (
    SELECT * FROM second_cte
    JOIN first_cte ON ...    -- Can join multiple CTEs
)
SELECT * FROM third_cte;

3. Practical Example: Revenue Analysis

Problem: Monthly revenue with month-over-month growth rate

sql
WITH monthly_sales AS (
    -- Step 1: Aggregate monthly revenue
    SELECT
        DATE_TRUNC('month', order_date) as month,
        SUM(amount) as revenue
    FROM orders
    WHERE status = 'completed'
    GROUP BY DATE_TRUNC('month', order_date)
),
sales_with_growth AS (
    -- Step 2: Get previous month's revenue
    SELECT
        month,
        revenue,
        LAG(revenue) OVER (ORDER BY month) as prev_revenue
    FROM monthly_sales
)
-- Step 3: Calculate growth rate
SELECT
    month,
    revenue,
    prev_revenue,
    ROUND((revenue - prev_revenue) / prev_revenue * 100, 2) as growth_rate
FROM sales_with_growth
WHERE prev_revenue IS NOT NULL
ORDER BY month;

Result:

monthrevenueprev_revenuegrowth_rate
2024-02-0115000012000025.00
2024-03-0118000015000020.00
2024-04-01165000180000-8.33

4. CTE vs Subquery vs Temp Table

FeatureSubqueryCTETemp Table
ReadabilityLowHighHigh
ReusabilityNoWithin same queryWithin session
PerformanceExecutes each timeDB dependentStored once
IndexingNoNoYes
Use caseSimple filteringComplex multi-stepLarge data/repeated use

When to Use What?

  • Subquery: Simple filtering, EXISTS checks
  • CTE: Complex multi-step processing, readability matters
  • Temp Table: Large datasets, need indexes, multiple query reuse

5. Recursive CTE: Hierarchical Data

Organization Chart Example

sql
-- Employees table
CREATE TABLE employees (
    id INT PRIMARY KEY,
    name VARCHAR(100),
    manager_id INT  -- Manager's id (NULL for CEO)
);

-- Sample data
INSERT INTO employees VALUES
(1, 'Alice CEO', NULL),
(2, 'Bob VP', 1),
(3, 'Carol Dir', 2),
(4, 'Dave Mgr', 3),
(5, 'Eve Staff', 4),
(6, 'Frank VP', 1),
(7, 'Grace Dir', 6);

Build Org Chart with Recursive CTE

sql
WITH RECURSIVE org_tree AS (
    -- Base case: Top level (CEO)
    SELECT
        id,
        name,
        manager_id,
        1 as level,
        name as path
    FROM employees
    WHERE manager_id IS NULL

    UNION ALL

    -- Recursive case: Subordinates
    SELECT
        e.id,
        e.name,
        e.manager_id,
        ot.level + 1,
        ot.path || ' → ' || e.name
    FROM employees e
    JOIN org_tree ot ON e.manager_id = ot.id
)
SELECT
    REPEAT('  ', level - 1) || name as org_chart,
    level,
    path
FROM org_tree
ORDER BY path;

Result:

org_chartlevelpath
Alice CEO1Alice CEO
  Bob VP2Alice CEO → Bob VP
    Carol Dir3Alice CEO → Bob VP → Carol Dir
      Dave Mgr4Alice CEO → Bob VP → Carol Dir → Dave Mgr
        Eve Staff5Alice CEO → Bob VP → Carol Dir → Dave Mgr → Eve Staff
  Frank VP2Alice CEO → Frank VP
    Grace Dir3Alice CEO → Frank VP → Grace Dir

6. Recursive CTE: Generate Date Series

Problem: Show zero for dates with no orders

sql
-- Dates with no orders don't appear
SELECT DATE(order_date) as date, COUNT(*) as orders
FROM orders
GROUP BY DATE(order_date);

Solution: Generate date series, then LEFT JOIN

sql
WITH RECURSIVE date_series AS (
    -- Start date
    SELECT DATE '2024-01-01' as date

    UNION ALL

    -- Increment by one day
    SELECT date + INTERVAL '1 day'
    FROM date_series
    WHERE date < '2024-01-31'
),
daily_orders AS (
    SELECT DATE(order_date) as date, COUNT(*) as orders
    FROM orders
    WHERE order_date >= '2024-01-01' AND order_date < '2024-02-01'
    GROUP BY DATE(order_date)
)
SELECT
    ds.date,
    COALESCE(do.orders, 0) as orders
FROM date_series ds
LEFT JOIN daily_orders do ON ds.date = do.date
ORDER BY ds.date;

Now zero-order days appear too!

7. Recursive CTE: Running Totals

Daily Cumulative Revenue

sql
WITH RECURSIVE daily_sales AS (
    SELECT
        DATE(order_date) as date,
        SUM(amount) as daily_amount
    FROM orders
    GROUP BY DATE(order_date)
),
cumulative AS (
    -- First day
    SELECT
        date,
        daily_amount,
        daily_amount as cumulative_amount,
        1 as day_num
    FROM daily_sales
    WHERE date = (SELECT MIN(date) FROM daily_sales)

    UNION ALL

    -- Subsequent days
    SELECT
        ds.date,
        ds.daily_amount,
        c.cumulative_amount + ds.daily_amount,
        c.day_num + 1
    FROM daily_sales ds
    JOIN cumulative c ON ds.date = c.date + INTERVAL '1 day'
)
SELECT * FROM cumulative ORDER BY date;
💡 In practice, `SUM() OVER (ORDER BY date)` window function is more efficient, but this demonstrates recursive CTE mechanics.

8. Category Hierarchy

Get Full Category Path

sql
-- Categories table (self-referencing)
CREATE TABLE categories (
    id INT PRIMARY KEY,
    name VARCHAR(100),
    parent_id INT
);

-- Sample data
INSERT INTO categories VALUES
(1, 'Electronics', NULL),
(2, 'Computers', 1),
(3, 'Laptops', 2),
(4, 'Gaming Laptops', 3),
(5, 'Clothing', NULL),
(6, 'Men', 5);

-- Get full paths
WITH RECURSIVE category_path AS (
    SELECT
        id,
        name,
        parent_id,
        name as full_path,
        1 as depth
    FROM categories
    WHERE parent_id IS NULL

    UNION ALL

    SELECT
        c.id,
        c.name,
        c.parent_id,
        cp.full_path || ' > ' || c.name,
        cp.depth + 1
    FROM categories c
    JOIN category_path cp ON c.parent_id = cp.id
)
SELECT id, name, full_path, depth
FROM category_path
ORDER BY full_path;

Result:

idnamefull_pathdepth
5ClothingClothing1
6MenClothing > Men2
1ElectronicsElectronics1
2ComputersElectronics > Computers2
3LaptopsElectronics > Computers > Laptops3
4Gaming LaptopsElectronics > Computers > Laptops > Gaming Laptops4

9. Preventing Infinite Loops

Recursive CTEs can loop forever if written incorrectly.

Safety Net 1: LIMIT

sql
WITH RECURSIVE bad_cte AS (
    SELECT 1 as n
    UNION ALL
    SELECT n + 1 FROM bad_cte  -- No termination condition!
)
SELECT * FROM bad_cte
LIMIT 100;  -- Safety net

Safety Net 2: Depth Limit

sql
WITH RECURSIVE safe_tree AS (
    SELECT id, name, parent_id, 1 as depth
    FROM categories
    WHERE parent_id IS NULL

    UNION ALL

    SELECT c.id, c.name, c.parent_id, st.depth + 1
    FROM categories c
    JOIN safe_tree st ON c.parent_id = st.id
    WHERE st.depth < 10  -- Max 10 levels
)
SELECT * FROM safe_tree;

Safety Net 3: Visited Check (Prevent Cycles)

sql
WITH RECURSIVE safe_tree AS (
    SELECT
        id,
        name,
        parent_id,
        ARRAY[id] as visited  -- Array of visited ids
    FROM categories
    WHERE parent_id IS NULL

    UNION ALL

    SELECT
        c.id,
        c.name,
        c.parent_id,
        st.visited || c.id
    FROM categories c
    JOIN safe_tree st ON c.parent_id = st.id
    WHERE NOT c.id = ANY(st.visited)  -- Skip if already visited
)
SELECT * FROM safe_tree;

10. Pro Tips

1. Use Clear CTE Names

sql
-- Bad
WITH a AS (...), b AS (...), c AS (...)

-- Good
WITH daily_orders AS (...),
     user_totals AS (...),
     top_customers AS (...)

2. Debug Step by Step

sql
WITH step1 AS (...),
     step2 AS (...),
     step3 AS (...)

-- Debug: Check intermediate step
SELECT * FROM step2;  -- Check step2 instead of step3

3. Consider Performance

sql
-- PostgreSQL: MATERIALIZED hint to store CTE result
WITH MATERIALIZED expensive_calc AS (
    -- Complex calculation
)
SELECT * FROM expensive_calc a
JOIN expensive_calc b ON ...;  -- Computed only once despite two references

Conclusion

SituationSolution
Complex nested subqueriesRegular CTE for step separation
Hierarchical data (org chart, categories)Recursive CTE
Generate date/number sequencesRecursive CTE
Reuse same result multiple timesDefine once with CTE

CTE Key Points:

  • WITH clause separates queries into logical steps
  • Good naming alone improves readability 200%
  • Recursive CTEs handle hierarchies and sequences
  • Infinite loop prevention is essential!

References

  1. PostgreSQL Documentation - WITH Queries
  2. MySQL 8.0 - Recursive CTE
  3. SQL Server - Common Table Expressions