Skip to main content

Day 29 - Pandas basics (creating, loading, accessing)

Skills: None

Pre-reading: 10.1.1, 10.1.2, 10.1.3

Reference: For all work with tables, refer to the Tables page in the menu at the top of the page! It covers Pyret tables, but is good reference regardless.

Intro (20 mins)

Today we learn Pandas, Python's library for working with tables (called DataFrames). Everything you learned about Pyret tables has a Pandas equivalent, but with different syntax.

Creating tables literally (like Pyret's table: ... row: ... end):

# Pyret way:
# orders = table: date, dish, quantity, order_type
# row: "2023-07-01", "Pasta", 2, "dine-in"
# row: "2023-07-01", "Salad", 1, "takeout"
# end

# Pandas way -- note that data is given column by column, rather than row by row.
import pandas as pd
data = {
'date': ['2023-07-01', '2023-07-01', '2023-07-02'],
'dish': ['Pasta', 'Salad', 'Burger'],
'quantity': [2, 1, 3],
'order_type': ['dine-in', 'takeout', 'dine-in']
}
orders = pd.DataFrame(data)

Loading from CSV (like Pyret's load-table: ... source: csv-table-url(...)):

# Pyret way:
# orders = load-table: date, dish, quantity, order_type
# source: csv-table-url("https://pdi.run/f25-restaurant-orders.csv", default-options)
# end

# Pandas way -- note that it automatically infers columns names.
orders = pd.read_csv("https://pdi.run/f25-restaurant-orders.csv")

Accessing data (like Pyret's .row-n(N) and row["column"]):

# Pyret way:
# orders.row-n(1)["dish"]

# Pandas way:
orders.iloc[1] # Get row by index (like .row-n)
orders.iloc[1]['dish'] # Get single value from row (like .row-n(1)["dish"])

Extracting columns as lists (like Pyret's .get-column()):

# Pyret way:
# quantities = orders.get-column("quantity") # Extract column as list

# Pandas way:
quantities = orders['quantity'] # Extract column as Series -- not quite a list, but similar

Built-in statistics (like Pyret's mean(table, "column")):

# Pyret way:
# mean(orders, "quantity") # Direct table operation
# sum(orders, "quantity") # Direct table operation

# Pandas way:
orders['quantity'].mean() # Series operation
orders['quantity'].sum() # Series operation

Class Exercises (35 mins)

Creating and loading tables:

  • Create a DataFrame manually with workout data: columns date, activity, duration. Make at least 5 rows.
  • Load the CSV from https://pdi.run/f25-2000-photos.csv into a DataFrame. Print the first 5 rows.

Accessing data:

  • Get the second row from your workout DataFrame (remember: Python uses 0-based indexing).
  • Extract the activity column and print all unique activity names.
  • Get the duration value from the third workout (combining row and column access).
  • What happens if you try to access a row that doesn't exist? Try it and note the error.
  • What happens if you try to access a column that doesn't exist? Try it and note the error.

Extracting columns and statistics:

  • Extract the duration column from your workout DataFrame and store it in a variable called durations.
  • Work with the durations Series to find: .mean(), .sum(), .max(), .min().
  • Calculate the range (difference between max and min) of workout durations.
  • For the photos dataset, extract a numeric column and calculate its median using .median().

Wrap-up (5 mins)

  • Pandas DataFrames provide the same basic table operations you learned in Pyret Day 7: creating tables, loading from CSV, accessing rows/columns, and computing statistics.
  • The syntax is different, but the concepts are identical. Tomorrow we'll see filtering, sorting, and column operations.