Day 30 - Tables in Python
TODO: make an actual CSV with restaurant orders, replace placeholder below
Introduction to Tables in Python with Pandas
1. Introduction (10 minutes)
- Motivation & Overview:
- Introduce Pandas as a popular package for table-based data analysis in Python.
- Key Concepts:
- DataFrame: A table with rows and columns.
- Series: A single column (or row) of data.
- Fresh Example Context:
- A restaurant order dataset, where each order has a date, dish name, quantity, and order type (dine-in or takeout).
2. Creating and Loading DataFrames (15 minutes)
- Manually Creating a DataFrame:
- Example Code:
import pandas as pd
# Create a simple restaurant orders DataFrame manually
data = {
'date': ['2023-07-01', '2023-07-01', '2023-07-02', '2023-07-02', '2023-07-03'],
'dish': ['Pasta', 'Salad', 'Burger', 'Pasta', 'Steak'],
'quantity': [2, 1, 3, 2, 1],
'order_type': ['dine-in', 'takeout', 'dine-in', 'dine-in', 'takeout']
}
orders = pd.DataFrame(data)
print(orders)
- Example Code:
- Loading Data from CSV:
- Show how to load a CSV from a URL (use a different CSV URL or assume a local file for this example).
orders_url = "https://example.com/restaurant_orders.csv" # replace with a valid URL
orders = pd.read_csv(orders_url, header=0)
- Show how to load a CSV from a URL (use a different CSV URL or assume a local file for this example).
- Accessing Data:
- Access columns by label (e.g.,
orders['dish']
) and rows by index (e.g.,orders.iloc[2]
).
- Access columns by label (e.g.,
- Interactive Exercise:
- Ask: “What does
orders['quantity'][1]
return?” - Have students experiment using interactive Python sessions.
- Ask: “What does
3. Data Cleaning and Filtering Rows (20 minutes)
- Cleaning Data:
- Suppose the CSV sometimes uses missing values (NaN) for order type.
- Example Code to Replace NaN:
# Replace missing values in order_type with 'unknown'
orders['order_type'] = orders['order_type'].fillna('unknown')
- Filtering Rows:
- Filter rows to show only dine-in orders.
dine_in_orders = orders[orders['order_type'] == 'dine-in']
print(dine_in_orders)
- Filter rows to show only dine-in orders.
- Using Boolean Masks:
- Explain that masks are arrays of booleans. For example:
mask = orders['quantity'] > 1
filtered_orders = orders[mask]
print(filtered_orders)
- Explain that masks are arrays of booleans. For example:
- Interactive Questions:
- “What does
orders[orders['dish'] == 'Pasta']
return?” - “Try filtering orders to show only those from 2023-07-02.”
- “What does
4. Wrap-Up (5 minutes)
- Recap:
- We introduced Pandas DataFrames and Series, manually creating DataFrames and loading from CSV files.
- We practiced basic data cleaning (filling missing values) and filtering rows using boolean masks.