SQL JOINs and Aggregations — Real Data Skills

A single table can answer simple questions. Most interesting questions require two or more. Which customers generated the most revenue? Which product categories are underperforming? Answering those means pulling together data that lives in separate tables — and that's what JOINs do. Aggregations turn those joined rows into summaries. Together, they're the core of analytical SQL.

This project uses sqlite3 from Python's standard library — no install needed. The database has three related tables: orders, customers, and products.

Step 1 — Build the schema. Three tables that mirror a real transactional data model:

import sqlite3
import pandas as pd

conn = sqlite3.connect(':memory:')

conn.executescript("""
CREATE TABLE customers (
  customer_id INTEGER PRIMARY KEY,
  name        TEXT,
  city        TEXT,
  tier        TEXT   -- 'enterprise', 'mid-market', 'smb'
);

CREATE TABLE products (
  product_id   INTEGER PRIMARY KEY,
  product_name TEXT,
  category     TEXT,
  unit_price   REAL
);

CREATE TABLE orders (
  order_id    INTEGER PRIMARY KEY,
  customer_id INTEGER REFERENCES customers(customer_id),
  product_id  INTEGER REFERENCES products(product_id),
  quantity    INTEGER,
  order_date  TEXT
);
""")
conn.commit()

conn.executescript("""
INSERT INTO customers VALUES
  (1, 'Apex Corp',    'Chicago',  'enterprise'),
  (2, 'Bluewave LLC', 'Austin',   'mid-market'),
  (3, 'Cortex Inc',   'New York', 'enterprise'),
  (4, 'Driftwood Co', 'Seattle',  'smb'),
  (5, 'Echo Systems', 'Boston',   'mid-market');

INSERT INTO products VALUES
  (1, 'Analytics Pro',  'Software',  1200),
  (2, 'Data Starter',   'Software',   450),
  (3, 'Support Plus',   'Service',    800),
  (4, 'Onboarding Kit', 'Service',    350);

INSERT INTO orders VALUES
  (1, 1, 1, 2, '2026-01-10'),
  (2, 2, 2, 1, '2026-01-15'),
  (3, 3, 1, 3, '2026-01-22'),
  (4, 1, 3, 1, '2026-02-03'),
  (5, 4, 2, 2, '2026-02-11'),
  (6, 5, 4, 1, '2026-02-19'),
  (7, 2, 1, 1, '2026-03-05'),
  (8, 3, 3, 2, '2026-03-14'),
  (9, 4, 4, 3, '2026-03-20'),
  (10,5, 1, 1, '2026-03-28');
""")
conn.commit()
print("Schema ready.")

Step 2 — INNER JOIN. Returns only rows that have a match in both tables. The most common join in practice:

# Order details enriched with customer name and product name
df = pd.read_sql_query("""
  SELECT
    o.order_id,
    c.name          AS customer,
    p.product_name,
    o.quantity,
    (o.quantity * p.unit_price) AS revenue
  FROM orders o
  INNER JOIN customers c ON o.customer_id = c.customer_id
  INNER JOIN products  p ON o.product_id  = p.product_id
  ORDER BY revenue DESC
""", conn)
print(df)

Step 3 — LEFT JOIN. Returns all rows from the left table, with NULLs where there's no match on the right — useful for finding customers with no orders:

# Find customers who haven't ordered (none here, but the pattern matters)
df = pd.read_sql_query("""
  SELECT
    c.name,
    c.tier,
    COUNT(o.order_id) AS order_count
  FROM customers c
  LEFT JOIN orders o ON c.customer_id = o.customer_id
  GROUP BY c.customer_id, c.name, c.tier
  ORDER BY order_count ASC
""", conn)
print(df)

Step 4 — Aggregate across joins. The real power comes from combining joins with GROUP BY — summarizing data that's been assembled from multiple tables:

# Revenue by customer tier
df = pd.read_sql_query("""
  SELECT
    c.tier,
    COUNT(DISTINCT c.customer_id)  AS customers,
    COUNT(o.order_id)              AS orders,
    SUM(o.quantity * p.unit_price) AS total_revenue,
    ROUND(AVG(o.quantity * p.unit_price), 0) AS avg_order
  FROM orders o
  INNER JOIN customers c ON o.customer_id = c.customer_id
  INNER JOIN products  p ON o.product_id  = p.product_id
  GROUP BY c.tier
  ORDER BY total_revenue DESC
""", conn)
print(df)

# Revenue by product category — only categories over $3,000 total
df = pd.read_sql_query("""
  SELECT
    p.category,
    SUM(o.quantity * p.unit_price) AS revenue
  FROM orders o
  INNER JOIN products p ON o.product_id = p.product_id
  GROUP BY p.category
  HAVING SUM(o.quantity * p.unit_price) > 3000
  ORDER BY revenue DESC
""", conn)
print(df)

conn.close()

Once you can join and aggregate, you can answer almost any reporting question from a normalized database. The variations — which join type, what to group by, what to filter in HAVING — are judgment calls, not new syntax.

Want to go further? Browse our full resource library →