A single table can answer simple questions. Most interesting questions require two or more. Which customers generated the most revenue? Which product categories are underperforming? Answering those means pulling together data that lives in separate tables — and that's what JOINs do. Aggregations turn those joined rows into summaries. Together, they're the core of analytical SQL.
This project uses sqlite3 from Python's standard library — no install needed. The database has three related tables: orders, customers, and products.
Step 1 — Build the schema. Three tables that mirror a real transactional data model:
import sqlite3
import pandas as pd
conn = sqlite3.connect(':memory:')
conn.executescript("""
CREATE TABLE customers (
customer_id INTEGER PRIMARY KEY,
name TEXT,
city TEXT,
tier TEXT -- 'enterprise', 'mid-market', 'smb'
);
CREATE TABLE products (
product_id INTEGER PRIMARY KEY,
product_name TEXT,
category TEXT,
unit_price REAL
);
CREATE TABLE orders (
order_id INTEGER PRIMARY KEY,
customer_id INTEGER REFERENCES customers(customer_id),
product_id INTEGER REFERENCES products(product_id),
quantity INTEGER,
order_date TEXT
);
""")
conn.commit()
conn.executescript("""
INSERT INTO customers VALUES
(1, 'Apex Corp', 'Chicago', 'enterprise'),
(2, 'Bluewave LLC', 'Austin', 'mid-market'),
(3, 'Cortex Inc', 'New York', 'enterprise'),
(4, 'Driftwood Co', 'Seattle', 'smb'),
(5, 'Echo Systems', 'Boston', 'mid-market');
INSERT INTO products VALUES
(1, 'Analytics Pro', 'Software', 1200),
(2, 'Data Starter', 'Software', 450),
(3, 'Support Plus', 'Service', 800),
(4, 'Onboarding Kit', 'Service', 350);
INSERT INTO orders VALUES
(1, 1, 1, 2, '2026-01-10'),
(2, 2, 2, 1, '2026-01-15'),
(3, 3, 1, 3, '2026-01-22'),
(4, 1, 3, 1, '2026-02-03'),
(5, 4, 2, 2, '2026-02-11'),
(6, 5, 4, 1, '2026-02-19'),
(7, 2, 1, 1, '2026-03-05'),
(8, 3, 3, 2, '2026-03-14'),
(9, 4, 4, 3, '2026-03-20'),
(10,5, 1, 1, '2026-03-28');
""")
conn.commit()
print("Schema ready.")
Step 2 — INNER JOIN. Returns only rows that have a match in both tables. The most common join in practice:
# Order details enriched with customer name and product name
df = pd.read_sql_query("""
SELECT
o.order_id,
c.name AS customer,
p.product_name,
o.quantity,
(o.quantity * p.unit_price) AS revenue
FROM orders o
INNER JOIN customers c ON o.customer_id = c.customer_id
INNER JOIN products p ON o.product_id = p.product_id
ORDER BY revenue DESC
""", conn)
print(df)
Step 3 — LEFT JOIN. Returns all rows from the left table, with NULLs where there's no match on the right — useful for finding customers with no orders:
# Find customers who haven't ordered (none here, but the pattern matters)
df = pd.read_sql_query("""
SELECT
c.name,
c.tier,
COUNT(o.order_id) AS order_count
FROM customers c
LEFT JOIN orders o ON c.customer_id = o.customer_id
GROUP BY c.customer_id, c.name, c.tier
ORDER BY order_count ASC
""", conn)
print(df)
Step 4 — Aggregate across joins. The real power comes from combining joins with GROUP BY — summarizing data that's been assembled from multiple tables:
# Revenue by customer tier
df = pd.read_sql_query("""
SELECT
c.tier,
COUNT(DISTINCT c.customer_id) AS customers,
COUNT(o.order_id) AS orders,
SUM(o.quantity * p.unit_price) AS total_revenue,
ROUND(AVG(o.quantity * p.unit_price), 0) AS avg_order
FROM orders o
INNER JOIN customers c ON o.customer_id = c.customer_id
INNER JOIN products p ON o.product_id = p.product_id
GROUP BY c.tier
ORDER BY total_revenue DESC
""", conn)
print(df)
# Revenue by product category — only categories over $3,000 total
df = pd.read_sql_query("""
SELECT
p.category,
SUM(o.quantity * p.unit_price) AS revenue
FROM orders o
INNER JOIN products p ON o.product_id = p.product_id
GROUP BY p.category
HAVING SUM(o.quantity * p.unit_price) > 3000
ORDER BY revenue DESC
""", conn)
print(df)
conn.close()
Once you can join and aggregate, you can answer almost any reporting question from a normalized database. The variations — which join type, what to group by, what to filter in HAVING — are judgment calls, not new syntax.
Want to go further? Browse our full resource library →