Day 30 - Computing columns with Pandas
Skills: None
Pre-reading: 10.1.4 (PROBABLY EXPAND), 10.1.5
Intro (10 mins)
- Today we extend our work with Pandas DataFrames by computing new columns and aggregating data to answer questions.
- You can create new columns using arithmetic and
apply
withlambda
:# Assume orders DataFrame and prices dictionary already defined
prices = {'Pasta': 12.0, 'Salad': 8.0, 'Burger': 10.0, 'Steak': 19.0}
orders['total_price'] = orders.apply(lambda row: row['quantity'] * prices.get(row['dish'], 0), axis=1)
print(orders[['dish', 'quantity', 'total_price']]) - You can update columns conditionally using boolean masks and
.loc
:discount_mask = orders['order_type'] == 'takeout'
orders.loc[discount_mask, 'total_price'] = orders.loc[discount_mask, 'total_price'] * 0.90 - Aggregation and grouping let you summarize data:
total_quantity = orders['quantity'].sum()
sales_per_dish = orders.groupby('dish')['total_price'].sum()
avg_quantity_by_type = orders.groupby('order_type')['quantity'].mean() - These tools let you answer questions like "What is the total sales per dish?" or "What is the average quantity per order type?"
Class Exercises (35 mins)
- Add a new column to your DataFrame called
is_large_order
that isTrue
if quantity ≥ 3, otherwiseFalse
. - Create a new column called
discounted_price
that applies a 10% discount to takeout orders, and is equal tototal_price
otherwise. - Compute the total revenue (sum of
discounted_price
) for all orders. - Group the DataFrame by
dish
and compute the average quantity ordered for each dish. - For each order type, compute the total number of orders (rows).
- What is the overall average
total_price
across all orders? - Sort the DataFrame by
discounted_price
in descending order and print the top 3 rows. - Save the DataFrame with your new columns to a new CSV file.
- What happens if you try to group by a column that doesn't exist? Try it and note the error.
- Create a column called
order_label
that is"big"
ifquantity
≥ 3,"medium"
ifquantity
is 2, and"small"
otherwise. - Filter the DataFrame to show only orders where
discounted_price
is greater than $20.
Wrap-up (5 mins)
- You can compute new columns, update values conditionally, and aggregate data using Pandas.
- These tools let you answer complex questions about your data efficiently.