Most business analysis questions boil down to the same handful of operations: group the data, aggregate it, sort it, and present it clearly. Pandas handles all of that. This project walks through a representative sales analysis from a raw DataFrame to a formatted summary — the kind of thing you'd produce before a weekly review meeting.
No API key needed. No credentials. Just pandas, which Colab has pre-installed.
Build a sample dataset inline — this represents the kind of raw export you'd get from a CRM or order management system:
import pandas as pd
data = {
"order_id": [1,2,3,4,5,6,7,8,9,10],
"rep": ["Alice","Bob","Alice","Carol","Bob","Carol","Alice","Bob","Carol","Alice"],
"region": ["East","West","East","North","West","North","East","West","North","East"],
"product": ["Pro","Starter","Pro","Pro","Starter","Pro","Starter","Pro","Starter","Pro"],
"revenue": [1200,450,1800,1500,380,1100,420,1650,390,1350],
"closed_date":["2026-01-05","2026-01-08","2026-01-15","2026-01-20","2026-02-02",
"2026-02-10","2026-02-18","2026-03-01","2026-03-12","2026-03-22"],
}
df = pd.DataFrame(data)
df["closed_date"] = pd.to_datetime(df["closed_date"])
df["month"] = df["closed_date"].dt.to_period("M")
With the data structured, the analysis is a series of groupby operations. Revenue by rep, win rate by product, and monthly trend are the three most common asks in a sales review:
# Revenue and deal count by rep — sorted by revenue
by_rep = (df.groupby("rep")
.agg(total_revenue=("revenue","sum"), deals=("order_id","count"))
.sort_values("total_revenue", ascending=False))
print("── Revenue by Rep ──")
print(by_rep)
# Average deal size by product
by_product = df.groupby("product")["revenue"].mean().round(0)
print("\n── Avg Deal Size by Product ──")
print(by_product)
# Monthly revenue trend
by_month = df.groupby("month")["revenue"].sum()
print("\n── Monthly Revenue ──")
print(by_month)
The .agg() pattern with named outputs keeps the result readable and avoids column renaming after the fact. Name your aggregations at the point of creation — it saves time every time someone else reads your code.
One more useful slice: ranking reps within their region. This is the kind of comparison that takes a meeting to surface manually but a few lines to compute:
df["region_rank"] = (df.groupby("region")["revenue"]
.rank(ascending=False, method="min")
.astype(int))
print(df[["rep","region","revenue","region_rank"]]
.sort_values(["region","region_rank"]))
The analysis itself is fast. What takes time is knowing which question to ask — and whether the answer is actually telling you something useful about the business.
Want to go deeper? Browse our full resource library →