Day 29 - Pandas basics (creating, loading, accessing)
Skills: None
Pre-reading: 10.1.1, 10.1.2, 10.1.3
Reference: For all work with tables, refer to the Tables page in the menu at the top of the page! It covers Pyret tables, but is good reference regardless.
Intro (20 mins)
Today we learn Pandas, Python's library for working with tables (called DataFrames). Everything you learned about Pyret tables has a Pandas equivalent, but with different syntax.
Creating tables literally (like Pyret's table: ... row: ... end
):
# Pyret way:
# orders = table: date, dish, quantity, order_type
# row: "2023-07-01", "Pasta", 2, "dine-in"
# row: "2023-07-01", "Salad", 1, "takeout"
# end
# Pandas way -- note that data is given column by column, rather than row by row.
import pandas as pd
data = {
'date': ['2023-07-01', '2023-07-01', '2023-07-02'],
'dish': ['Pasta', 'Salad', 'Burger'],
'quantity': [2, 1, 3],
'order_type': ['dine-in', 'takeout', 'dine-in']
}
orders = pd.DataFrame(data)
Loading from CSV (like Pyret's load-table: ... source: csv-table-url(...)
):
# Pyret way:
# orders = load-table: date, dish, quantity, order_type
# source: csv-table-url("https://pdi.run/f25-restaurant-orders.csv", default-options)
# end
# Pandas way -- note that it automatically infers columns names.
orders = pd.read_csv("https://pdi.run/f25-restaurant-orders.csv")
Accessing data (like Pyret's .row-n(N)
and row["column"]
):
# Pyret way:
# orders.row-n(1)["dish"]
# Pandas way:
orders.iloc[1] # Get row by index (like .row-n)
orders.iloc[1]['dish'] # Get single value from row (like .row-n(1)["dish"])
Extracting columns as lists (like Pyret's .get-column()
):
# Pyret way:
# quantities = orders.get-column("quantity") # Extract column as list
# Pandas way:
quantities = orders['quantity'] # Extract column as Series -- not quite a list, but similar
Built-in statistics (like Pyret's mean(table, "column")
):
# Pyret way:
# mean(orders, "quantity") # Direct table operation
# sum(orders, "quantity") # Direct table operation
# Pandas way:
orders['quantity'].mean() # Series operation
orders['quantity'].sum() # Series operation
Class Exercises (35 mins)
Creating and loading tables:
- Create a DataFrame manually with workout data: columns
date
,activity
,duration
. Make at least 5 rows. - Load the CSV from https://pdi.run/f25-2000-photos.csv into a DataFrame. Print the first 5 rows.
Accessing data:
- Get the second row from your workout DataFrame (remember: Python uses 0-based indexing).
- Extract the
activity
column and print all unique activity names. - Get the duration value from the third workout (combining row and column access).
- What happens if you try to access a row that doesn't exist? Try it and note the error.
- What happens if you try to access a column that doesn't exist? Try it and note the error.
Extracting columns and statistics:
- Extract the
duration
column from your workout DataFrame and store it in a variable calleddurations
. - Work with the
durations
Series to find:.mean()
,.sum()
,.max()
,.min()
. - Calculate the range (difference between max and min) of workout durations.
- For the photos dataset, extract a numeric column and calculate its median using
.median()
.
Wrap-up (5 mins)
- Pandas DataFrames provide the same basic table operations you learned in Pyret Day 7: creating tables, loading from CSV, accessing rows/columns, and computing statistics.
- The syntax is different, but the concepts are identical. Tomorrow we'll see filtering, sorting, and column operations.