Skip to main content

TODO: needs same CSV file as L30

File I/O and CSV Processing in Python

1. Introduction and Motivation (10 minutes)

A. Overview

  • Context:
    • In previous lectures, you learned how high‑level libraries like Pandas work with CSV files. Today, we’ll “go under the hood” and process CSVs manually using basic file I/O, loops, and string methods.
  • Goals:
    • Learn how to read a text file and convert it into a table-like structure (a list of lists).
    • Perform basic data cleaning and filtering using loops.
    • Write the filtered data back to a new CSV file.
  • Motivating Example:
    • Imagine a CSV file called orders.csv that contains restaurant orders. Each row has: date, dish, quantity, and order type (e.g., "dine-in" or "takeout").

B. Why Process CSVs Manually?

  • Understand the mechanics behind file reading and writing.
  • Gain insight into how libraries like Pandas automate these tasks.

2. Reading and Parsing CSV Files Manually (20 minutes)

A. Reading a File in Python

  • Basic File Reading:
    • Use the built‑in open function and readlines() to read the entire file into memory.
    • Code Example:
      # Open the file for reading
      file = open('orders.csv', 'r')
      lines = file.readlines() # Read all lines into a list
      file.close() # Always close the file after reading

      # Print out the raw lines to see what we have
      print(lines)
  • Discussion:
    • Explain the purpose of using open and readlines().
    • Emphasize the importance of closing the file after use.

B. Converting Lines to a List of Lists

  • Processing Each Line:
    • We need to remove newline characters and split each line by commas.
    • Step-by-Step Using a Loop (No List Comprehensions):
      # Initialize an empty list to hold our rows
      data = []

      # Process each line using a for loop
      for line in lines:
      # Remove newline characters from the end of the line
      clean_line = line.strip()
      # Split the line into individual cells using a comma as separator
      cells = clean_line.split(',')
      # Append the list of cells to our data list
      data.append(cells)

      # Print the resulting table (list of lists)
      print(data)
  • Interactive Exercise:
    • Ask students: “What do you expect to see printed? What does the first row look like (should be the header)?”

C. Separating Headers from Data

  • Storing Headers and Data Separately:
    # Assume the first row contains the header
    headers = data[0]
    # All subsequent rows are data
    orders = []
    for i in range(1, len(data)):
    orders.append(data[i])

    # Print headers and orders separately
    print("Headers:", headers)
    print("Orders:", orders)
  • Discussion:
    • Explain why separating headers from data can be useful for later processing.

D. Basic Data Cleaning and Filtering

  • Converting String Values to Appropriate Types:
    • For example, converting the quantity (third column) from string to integer.
    for row in orders:
    # Assume quantity is in the third cell (index 2)
    row[2] = int(row[2])
  • Filtering Rows:
    • For instance, selecting only orders that are "dine-in".
    dine_in_orders = []  # new list for filtered rows
    for row in orders:
    # Assume order type is in the fourth cell (index 3)
    if row[3] == 'dine-in':
    dine_in_orders.append(row)

    print("Dine-in Orders:", dine_in_orders)
  • Interactive Exercise:
    • Ask: “How many rows do you expect to see if you filter by 'dine-in'? Try running the code.”

3. Writing Data Back to a CSV File (20 minutes)

A. Preparing Data for Output

  • Combining Headers and Filtered Data:
    # Create a new data structure for output: header followed by filtered rows.
    output_data = []
    # Append headers first
    output_data.append(headers)
    # Append each filtered row
    for row in dine_in_orders:
    output_data.append(row)

B. Converting Lists to CSV Format

  • Converting Each Row to a String:
    • We need to join the cells of each row back into a comma-separated string.
    output_lines = []  # List to hold string versions of rows

    # Loop over each row in output_data
    for row in output_data:
    # Convert each cell to a string (if not already) and join them with commas
    line = ""
    for i in range(len(row)):
    # Append the cell to the line; add a comma if not the last cell
    line = line + str(row[i])
    if i < len(row) - 1:
    line = line + ","
    # Append the resulting string to output_lines
    output_lines.append(line)

    # Print output_lines to see the final CSV-formatted strings
    print(output_lines)
  • Interactive Discussion:
    • Ask: “Why do we need to convert each cell to a string? What does the join operation accomplish?”

C. Writing the Output to a File

  • Using the File Write Process:
    # Open a new file for writing the filtered data
    output_file = open('dine_in_orders.csv', 'w')

    # Write each line to the file, adding a newline at the end
    for line in output_lines:
    output_file.write(line + "\n")

    output_file.close()
  • Discussion:
    • Explain that the file is created/overwritten and that each line is written with a newline character.
    • Mention that Pandas’ to_csv automates these steps for us.

4. Wrap-Up (10 minutes)

Recap

  • We learned how to:
    • Read a CSV file using open() and readlines().
    • Process each line by stripping newlines and splitting on commas (using loops, not list comprehensions).
    • Separate headers from data.
    • Clean data by converting string values to proper types.
    • Filter rows using for loops.
    • Convert the processed data back to CSV format by joining cells in each row.
    • Write the final output to a new CSV file.
  • Connection to Pandas:
    • Manual CSV processing shows the “under the hood” work that Pandas automates (reading, cleaning, filtering, and writing data).