Skip to main content

Day 29 - Tables in Python

Skills: None

Pre-reading: 10.1.1, 10.1.2, 10.1.3

Intro (10 mins)

  • Today we introduce Pandas, a popular Python library for working with tables (DataFrames).
  • A DataFrame is a table with rows and columns, similar to a spreadsheet or a Pyret table.
  • A Series is a single column (or row) of data from a DataFrame.
  • Example: Restaurant orders table, with columns for date, dish, quantity, and order type.
    import pandas as pd

    data = {
    'date': ['2023-07-01', '2023-07-01', '2023-07-02', '2023-07-02', '2023-07-03'],
    'dish': ['Pasta', 'Salad', 'Burger', 'Pasta', 'Steak'],
    'quantity': [2, 1, 3, 2, 1],
    'order_type': ['dine-in', 'takeout', 'dine-in', 'dine-in', 'takeout']
    }
    orders = pd.DataFrame(data)
    print(orders)
  • You can also load data from a CSV file (locally or, as shown here, from a URL):
    orders = pd.read_csv("https://pdi.run/f25-restaurant-orders.csv")
  • Access columns by label: orders['dish'], rows by index: orders.iloc[2].
  • Filtering: Use boolean masks to select rows, e.g.:
    dine_in_orders = orders[orders['order_type'] == 'dine-in']
  • Cleaning: Replace missing values (NaN) with a default:
    orders['order_type'] = orders['order_type'].fillna('unknown')
  • All of these operations are similar to what you did with tables in Pyret, but with different syntax and, more complexity (though also, more power).

Class Exercises (35 mins)

  • Load a CSV file (either from a URL or local file) into a DataFrame. Consider using an example from earlier in class. Print the first 5 rows.
  • Create a DataFrame manually with at least 5 rows and columns: date, dish, quantity, order_type.
  • Access the dish column and print all unique dish names.
  • What does orders['quantity'][1] return? Try it and explain.
  • Filter the DataFrame to show only orders where order_type is "takeout".
  • Replace all missing values in the order_type column with "unknown".
  • Filter the DataFrame to show only orders where quantity is greater than 1.
  • What does orders[orders['dish'] == 'Pasta'] return? Try it and explain.
  • Filter the DataFrame to show only orders from a specific date (e.g., "2023-07-02").
  • Add a new column called total_price that is quantity * 10 (assume each dish costs $10).
  • What happens if you try to access a column that doesn't exist? Try it and note the error.
  • Save your filtered DataFrame to a new CSV file.
  • Try sorting the DataFrame by quantity in descending order.

Wrap-up (5 mins)

  • Pandas DataFrames let you work with tables in Python, similar to Pyret tables but with more features.
  • You can create, load, clean, filter, and manipulate tabular data efficiently.