Day 29 - Tables in Python

Skills: None

Pre-reading: 10.1.1, 10.1.2, 10.1.3

Intro (10 mins)

Today we introduce Pandas, a popular Python library for working with tables (DataFrames).
A DataFrame is a table with rows and columns, similar to a spreadsheet or a Pyret table.
A Series is a single column (or row) of data from a DataFrame.

Example: Restaurant orders table, with columns for date, dish, quantity, and order type.

import pandas as pd

data = {
    'date': ['2023-07-01', '2023-07-01', '2023-07-02', '2023-07-02', '2023-07-03'],
    'dish': ['Pasta', 'Salad', 'Burger', 'Pasta', 'Steak'],
    'quantity': [2, 1, 3, 2, 1],
    'order_type': ['dine-in', 'takeout', 'dine-in', 'dine-in', 'takeout']
}
orders = pd.DataFrame(data)
print(orders)

You can also load data from a CSV file (locally or, as shown here, from a URL):
```
orders = pd.read_csv("https://pdi.run/f25-restaurant-orders.csv")
```
Access columns by label: orders['dish'], rows by index: orders.iloc[2].

Filtering: Use boolean masks to select rows, e.g.:

dine_in_orders = orders[orders['order_type'] == 'dine-in']

Cleaning: Replace missing values (NaN) with a default:

orders['order_type'] = orders['order_type'].fillna('unknown')

All of these operations are similar to what you did with tables in Pyret, but with different syntax and, more complexity (though also, more power).

Class Exercises (35 mins)

Load a CSV file (either from a URL or local file) into a DataFrame. Consider using an example from earlier in class. Print the first 5 rows.
Create a DataFrame manually with at least 5 rows and columns: date, dish, quantity, order_type.
Access the dish column and print all unique dish names.
What does orders['quantity'][1] return? Try it and explain.
Filter the DataFrame to show only orders where order_type is "takeout".
Replace all missing values in the order_type column with "unknown".
Filter the DataFrame to show only orders where quantity is greater than 1.
What does orders[orders['dish'] == 'Pasta'] return? Try it and explain.
Filter the DataFrame to show only orders from a specific date (e.g., "2023-07-02").
Add a new column called total_price that is quantity * 10 (assume each dish costs $10).
What happens if you try to access a column that doesn't exist? Try it and note the error.
Save your filtered DataFrame to a new CSV file.
Try sorting the DataFrame by quantity in descending order.

Wrap-up (5 mins)

Pandas DataFrames let you work with tables in Python, similar to Pyret tables but with more features.
You can create, load, clean, filter, and manipulate tabular data efficiently.

Skills: None​

Pre-reading: 10.1.1, 10.1.2, 10.1.3​

Intro (10 mins)​

Class Exercises (35 mins)​

Wrap-up (5 mins)​

Skills: None

Pre-reading: 10.1.1, 10.1.2, 10.1.3

Intro (10 mins)

Class Exercises (35 mins)

Wrap-up (5 mins)