Skip to main content

Day 30 - Computing columns with Pandas

Skills: None

Pre-reading: 10.1.4 (PROBABLY EXPAND), 10.1.5

Intro (10 mins)

  • Today we extend our work with Pandas DataFrames by computing new columns and aggregating data to answer questions.
  • You can create new columns using arithmetic and apply with lambda:
    # Assume orders DataFrame and prices dictionary already defined
    prices = {'Pasta': 12.0, 'Salad': 8.0, 'Burger': 10.0, 'Steak': 19.0}
    orders['total_price'] = orders.apply(lambda row: row['quantity'] * prices.get(row['dish'], 0), axis=1)
    print(orders[['dish', 'quantity', 'total_price']])
  • You can update columns conditionally using boolean masks and .loc:
    discount_mask = orders['order_type'] == 'takeout'
    orders.loc[discount_mask, 'total_price'] = orders.loc[discount_mask, 'total_price'] * 0.90
  • Aggregation and grouping let you summarize data:
    total_quantity = orders['quantity'].sum()
    sales_per_dish = orders.groupby('dish')['total_price'].sum()
    avg_quantity_by_type = orders.groupby('order_type')['quantity'].mean()
  • These tools let you answer questions like "What is the total sales per dish?" or "What is the average quantity per order type?"

Class Exercises (35 mins)

  • Add a new column to your DataFrame called is_large_order that is True if quantity ≥ 3, otherwise False.
  • Create a new column called discounted_price that applies a 10% discount to takeout orders, and is equal to total_price otherwise.
  • Compute the total revenue (sum of discounted_price) for all orders.
  • Group the DataFrame by dish and compute the average quantity ordered for each dish.
  • For each order type, compute the total number of orders (rows).
  • What is the overall average total_price across all orders?
  • Sort the DataFrame by discounted_price in descending order and print the top 3 rows.
  • Save the DataFrame with your new columns to a new CSV file.
  • What happens if you try to group by a column that doesn't exist? Try it and note the error.
  • Create a column called order_label that is "big" if quantity ≥ 3, "medium" if quantity is 2, and "small" otherwise.
  • Filter the DataFrame to show only orders where discounted_price is greater than $20.

Wrap-up (5 mins)

  • You can compute new columns, update values conditionally, and aggregate data using Pandas.
  • These tools let you answer complex questions about your data efficiently.