Day 29 - Tables in Python
Skills: None
Pre-reading: 10.1.1, 10.1.2, 10.1.3
Intro (10 mins)
- Today we introduce Pandas, a popular Python library for working with tables (DataFrames).
- A DataFrame is a table with rows and columns, similar to a spreadsheet or a Pyret table.
- A Series is a single column (or row) of data from a DataFrame.
- Example: Restaurant orders table, with columns for date, dish, quantity, and order type.
import pandas as pd
data = {
'date': ['2023-07-01', '2023-07-01', '2023-07-02', '2023-07-02', '2023-07-03'],
'dish': ['Pasta', 'Salad', 'Burger', 'Pasta', 'Steak'],
'quantity': [2, 1, 3, 2, 1],
'order_type': ['dine-in', 'takeout', 'dine-in', 'dine-in', 'takeout']
}
orders = pd.DataFrame(data)
print(orders) - You can also load data from a CSV file (locally or, as shown here, from a URL):
orders = pd.read_csv("https://pdi.run/f25-restaurant-orders.csv")
- Access columns by label:
orders['dish']
, rows by index:orders.iloc[2]
. - Filtering: Use boolean masks to select rows, e.g.:
dine_in_orders = orders[orders['order_type'] == 'dine-in']
- Cleaning: Replace missing values (NaN) with a default:
orders['order_type'] = orders['order_type'].fillna('unknown')
- All of these operations are similar to what you did with tables in Pyret, but with different syntax and, more complexity (though also, more power).
Class Exercises (35 mins)
- Load a CSV file (either from a URL or local file) into a DataFrame. Consider using an example from earlier in class. Print the first 5 rows.
- Create a DataFrame manually with at least 5 rows and columns:
date
,dish
,quantity
,order_type
. - Access the
dish
column and print all unique dish names. - What does
orders['quantity'][1]
return? Try it and explain. - Filter the DataFrame to show only orders where
order_type
is"takeout"
. - Replace all missing values in the
order_type
column with"unknown"
. - Filter the DataFrame to show only orders where
quantity
is greater than 1. - What does
orders[orders['dish'] == 'Pasta']
return? Try it and explain. - Filter the DataFrame to show only orders from a specific date (e.g.,
"2023-07-02"
). - Add a new column called
total_price
that isquantity * 10
(assume each dish costs $10). - What happens if you try to access a column that doesn't exist? Try it and note the error.
- Save your filtered DataFrame to a new CSV file.
- Try sorting the DataFrame by
quantity
in descending order.
Wrap-up (5 mins)
- Pandas DataFrames let you work with tables in Python, similar to Pyret tables but with more features.
- You can create, load, clean, filter, and manipulate tabular data efficiently.