Skip to main content

Day 30 - Tables in Python

TODO: make an actual CSV with restaurant orders, replace placeholder below

Introduction to Tables in Python with Pandas

1. Introduction (10 minutes)

  • Motivation & Overview:
    • Introduce Pandas as a popular package for table-based data analysis in Python.
  • Key Concepts:
    • DataFrame: A table with rows and columns.
    • Series: A single column (or row) of data.
  • Fresh Example Context:
    • A restaurant order dataset, where each order has a date, dish name, quantity, and order type (dine-in or takeout).

2. Creating and Loading DataFrames (15 minutes)

  • Manually Creating a DataFrame:
    • Example Code:
      import pandas as pd

      # Create a simple restaurant orders DataFrame manually
      data = {
      'date': ['2023-07-01', '2023-07-01', '2023-07-02', '2023-07-02', '2023-07-03'],
      'dish': ['Pasta', 'Salad', 'Burger', 'Pasta', 'Steak'],
      'quantity': [2, 1, 3, 2, 1],
      'order_type': ['dine-in', 'takeout', 'dine-in', 'dine-in', 'takeout']
      }
      orders = pd.DataFrame(data)
      print(orders)
  • Loading Data from CSV:
    • Show how to load a CSV from a URL (use a different CSV URL or assume a local file for this example).
      orders_url = "https://example.com/restaurant_orders.csv"  # replace with a valid URL
      orders = pd.read_csv(orders_url, header=0)
  • Accessing Data:
    • Access columns by label (e.g., orders['dish']) and rows by index (e.g., orders.iloc[2]).
  • Interactive Exercise:
    • Ask: “What does orders['quantity'][1] return?”
    • Have students experiment using interactive Python sessions.

3. Data Cleaning and Filtering Rows (20 minutes)

  • Cleaning Data:
    • Suppose the CSV sometimes uses missing values (NaN) for order type.
    • Example Code to Replace NaN:
      # Replace missing values in order_type with 'unknown'
      orders['order_type'] = orders['order_type'].fillna('unknown')
  • Filtering Rows:
    • Filter rows to show only dine-in orders.
      dine_in_orders = orders[orders['order_type'] == 'dine-in']
      print(dine_in_orders)
  • Using Boolean Masks:
    • Explain that masks are arrays of booleans. For example:
      mask = orders['quantity'] > 1
      filtered_orders = orders[mask]
      print(filtered_orders)
  • Interactive Questions:
    • “What does orders[orders['dish'] == 'Pasta'] return?”
    • “Try filtering orders to show only those from 2023-07-02.”

4. Wrap-Up (5 minutes)

  • Recap:
    • We introduced Pandas DataFrames and Series, manually creating DataFrames and loading from CSV files.
    • We practiced basic data cleaning (filling missing values) and filtering rows using boolean masks.