Day 31 - Computing columns with Pandas
Computing New Columns and Basic Statistics in Pandas
1. Introduction (10 minutes)
- Overview:
- Recap our restaurant orders DataFrame from the previous lecture.
- Today we extend it by computing new columns and aggregating data to derive insights.
- Learning Goals:
- Learn how to create new columns using operator lifting.
- Understand how to use boolean masks for conditional updates.
- Perform basic aggregation with groupby.
2. Computing New Columns (20 minutes)
- Adding a Total Price Column:
- Assume each dish has a fixed price given in a separate dictionary.
- Example Code:
# Price list for dishes
prices = {'Pasta': 12.0, 'Salad': 8.0, 'Burger': 10.0, 'Steak': 20.0}
# Create a new column 'total_price'
orders['total_price'] = orders.apply(lambda row: row['quantity'] * prices.get(row['dish'], 0), axis=1)
print(orders[['dish', 'quantity', 'total_price']])
- Updating Columns Conditionally:
- Suppose we want to apply a 10% discount to takeout orders.
- Example Code:
discount_mask = orders['order_type'] == 'takeout'
orders.loc[discount_mask, 'total_price'] = orders.loc[discount_mask, 'total_price'] * 0.90
print(orders[['order_type', 'total_price']])
- Interactive Exercise:
- Ask: “How would you create a column that flags orders as 'high' if quantity ≥ 3, otherwise 'low'?”
- Have students work in pairs to write a solution using a lambda function and
apply
.
3. Aggregating and Grouping Data (20 minutes)
- Aggregating Overall Totals:
- Compute the total quantity of dishes sold.
total_quantity = orders['quantity'].sum()
print("Total dishes sold:", total_quantity)
- Compute the total quantity of dishes sold.
- GroupBy Operations:
- Example 1: Total sales per dish.
sales_per_dish = orders.groupby('dish')['total_price'].sum()
print(sales_per_dish) - Example 2: Average quantity per order type.
avg_quantity_by_type = orders.groupby('order_type')['quantity'].mean()
print(avg_quantity_by_type)
- Example 1: Total sales per dish.
- Discussion:
- Explain that groupby creates a grouped object on which aggregation functions (sum, mean, etc.) can be applied.
- Interactive Exercise:
- “How many orders (rows) were made for each order type?” (Use
count()
aggregation.) - “What is the overall average total_price across orders?”
- “How many orders (rows) were made for each order type?” (Use
4. Wrap-Up (5 minutes)
- Recap:
- We computed new columns (e.g., total_price) using apply and lambda functions.
- We applied conditional updates using masks and .loc.
- We performed aggregations with groupby to derive totals and averages.