Day 30 - Computing columns with Pandas

Skills: None

Pre-reading: 10.1.4 (PROBABLY EXPAND), 10.1.5

Intro (10 mins)

Today we extend our work with Pandas DataFrames by computing new columns and aggregating data to answer questions.

You can create new columns using arithmetic and apply with lambda:

# Assume orders DataFrame and prices dictionary already defined
prices = {'Pasta': 12.0, 'Salad': 8.0, 'Burger': 10.0, 'Steak': 19.0}
orders['total_price'] = orders.apply(lambda row: row['quantity'] * prices.get(row['dish'], 0), axis=1)
print(orders[['dish', 'quantity', 'total_price']])

You can update columns conditionally using boolean masks and .loc:

discount_mask = orders['order_type'] == 'takeout'
orders.loc[discount_mask, 'total_price'] = orders.loc[discount_mask, 'total_price'] * 0.90

Aggregation and grouping let you summarize data:

total_quantity = orders['quantity'].sum()
sales_per_dish = orders.groupby('dish')['total_price'].sum()
avg_quantity_by_type = orders.groupby('order_type')['quantity'].mean()

These tools let you answer questions like "What is the total sales per dish?" or "What is the average quantity per order type?"

Class Exercises (35 mins)

Add a new column to your DataFrame called is_large_order that is True if quantity ≥ 3, otherwise False.
Create a new column called discounted_price that applies a 10% discount to takeout orders, and is equal to total_price otherwise.
Compute the total revenue (sum of discounted_price) for all orders.
Group the DataFrame by dish and compute the average quantity ordered for each dish.
For each order type, compute the total number of orders (rows).
What is the overall average total_price across all orders?
Sort the DataFrame by discounted_price in descending order and print the top 3 rows.
Save the DataFrame with your new columns to a new CSV file.
What happens if you try to group by a column that doesn't exist? Try it and note the error.
Create a column called order_label that is "big" if quantity ≥ 3, "medium" if quantity is 2, and "small" otherwise.
Filter the DataFrame to show only orders where discounted_price is greater than $20.

Wrap-up (5 mins)

You can compute new columns, update values conditionally, and aggregate data using Pandas.
These tools let you answer complex questions about your data efficiently.

Skills: None​

Pre-reading: 10.1.4 (PROBABLY EXPAND), 10.1.5​

Intro (10 mins)​

Class Exercises (35 mins)​

Wrap-up (5 mins)​

Skills: None

Pre-reading: 10.1.4 (PROBABLY EXPAND), 10.1.5

Intro (10 mins)

Class Exercises (35 mins)

Wrap-up (5 mins)