Skip to main content

Day 31 - Computing columns with Pandas

Computing New Columns and Basic Statistics in Pandas

1. Introduction (10 minutes)

  • Overview:
    • Recap our restaurant orders DataFrame from the previous lecture.
    • Today we extend it by computing new columns and aggregating data to derive insights.
  • Learning Goals:
    • Learn how to create new columns using operator lifting.
    • Understand how to use boolean masks for conditional updates.
    • Perform basic aggregation with groupby.

2. Computing New Columns (20 minutes)

  • Adding a Total Price Column:
    • Assume each dish has a fixed price given in a separate dictionary.
    • Example Code:
      # Price list for dishes
      prices = {'Pasta': 12.0, 'Salad': 8.0, 'Burger': 10.0, 'Steak': 20.0}

      # Create a new column 'total_price'
      orders['total_price'] = orders.apply(lambda row: row['quantity'] * prices.get(row['dish'], 0), axis=1)
      print(orders[['dish', 'quantity', 'total_price']])
  • Updating Columns Conditionally:
    • Suppose we want to apply a 10% discount to takeout orders.
    • Example Code:
      discount_mask = orders['order_type'] == 'takeout'
      orders.loc[discount_mask, 'total_price'] = orders.loc[discount_mask, 'total_price'] * 0.90
      print(orders[['order_type', 'total_price']])
  • Interactive Exercise:
    • Ask: “How would you create a column that flags orders as 'high' if quantity ≥ 3, otherwise 'low'?”
    • Have students work in pairs to write a solution using a lambda function and apply.

3. Aggregating and Grouping Data (20 minutes)

  • Aggregating Overall Totals:
    • Compute the total quantity of dishes sold.
      total_quantity = orders['quantity'].sum()
      print("Total dishes sold:", total_quantity)
  • GroupBy Operations:
    • Example 1: Total sales per dish.
      sales_per_dish = orders.groupby('dish')['total_price'].sum()
      print(sales_per_dish)
    • Example 2: Average quantity per order type.
      avg_quantity_by_type = orders.groupby('order_type')['quantity'].mean()
      print(avg_quantity_by_type)
  • Discussion:
    • Explain that groupby creates a grouped object on which aggregation functions (sum, mean, etc.) can be applied.
  • Interactive Exercise:
    • “How many orders (rows) were made for each order type?” (Use count() aggregation.)
    • “What is the overall average total_price across orders?”

4. Wrap-Up (5 minutes)

  • Recap:
    • We computed new columns (e.g., total_price) using apply and lambda functions.
    • We applied conditional updates using masks and .loc.
    • We performed aggregations with groupby to derive totals and averages.