TODO: needs same CSV file as L30
File I/O and CSV Processing in Python
1. Introduction and Motivation (10 minutes)
A. Overview
- Context:
- In previous lectures, you learned how high‑level libraries like Pandas work with CSV files. Today, we’ll “go under the hood” and process CSVs manually using basic file I/O, loops, and string methods.
- Goals:
- Learn how to read a text file and convert it into a table-like structure (a list of lists).
- Perform basic data cleaning and filtering using loops.
- Write the filtered data back to a new CSV file.
- Motivating Example:
- Imagine a CSV file called
orders.csv
that contains restaurant orders. Each row has: date, dish, quantity, and order type (e.g., "dine-in" or "takeout").
- Imagine a CSV file called
B. Why Process CSVs Manually?
- Understand the mechanics behind file reading and writing.
- Gain insight into how libraries like Pandas automate these tasks.
2. Reading and Parsing CSV Files Manually (20 minutes)
A. Reading a File in Python
- Basic File Reading:
- Use the built‑in
open
function andreadlines()
to read the entire file into memory. - Code Example:
# Open the file for reading
file = open('orders.csv', 'r')
lines = file.readlines() # Read all lines into a list
file.close() # Always close the file after reading
# Print out the raw lines to see what we have
print(lines)
- Use the built‑in
- Discussion:
- Explain the purpose of using
open
andreadlines()
. - Emphasize the importance of closing the file after use.
- Explain the purpose of using
B. Converting Lines to a List of Lists
- Processing Each Line:
- We need to remove newline characters and split each line by commas.
- Step-by-Step Using a Loop (No List Comprehensions):
# Initialize an empty list to hold our rows
data = []
# Process each line using a for loop
for line in lines:
# Remove newline characters from the end of the line
clean_line = line.strip()
# Split the line into individual cells using a comma as separator
cells = clean_line.split(',')
# Append the list of cells to our data list
data.append(cells)
# Print the resulting table (list of lists)
print(data)
- Interactive Exercise:
- Ask students: “What do you expect to see printed? What does the first row look like (should be the header)?”
C. Separating Headers from Data
- Storing Headers and Data Separately:
# Assume the first row contains the header
headers = data[0]
# All subsequent rows are data
orders = []
for i in range(1, len(data)):
orders.append(data[i])
# Print headers and orders separately
print("Headers:", headers)
print("Orders:", orders) - Discussion:
- Explain why separating headers from data can be useful for later processing.
D. Basic Data Cleaning and Filtering
- Converting String Values to Appropriate Types:
- For example, converting the quantity (third column) from string to integer.
for row in orders:
# Assume quantity is in the third cell (index 2)
row[2] = int(row[2]) - Filtering Rows:
- For instance, selecting only orders that are "dine-in".
dine_in_orders = [] # new list for filtered rows
for row in orders:
# Assume order type is in the fourth cell (index 3)
if row[3] == 'dine-in':
dine_in_orders.append(row)
print("Dine-in Orders:", dine_in_orders) - Interactive Exercise:
- Ask: “How many rows do you expect to see if you filter by 'dine-in'? Try running the code.”
3. Writing Data Back to a CSV File (20 minutes)
A. Preparing Data for Output
- Combining Headers and Filtered Data:
# Create a new data structure for output: header followed by filtered rows.
output_data = []
# Append headers first
output_data.append(headers)
# Append each filtered row
for row in dine_in_orders:
output_data.append(row)
B. Converting Lists to CSV Format
- Converting Each Row to a String:
- We need to join the cells of each row back into a comma-separated string.
output_lines = [] # List to hold string versions of rows
# Loop over each row in output_data
for row in output_data:
# Convert each cell to a string (if not already) and join them with commas
line = ""
for i in range(len(row)):
# Append the cell to the line; add a comma if not the last cell
line = line + str(row[i])
if i < len(row) - 1:
line = line + ","
# Append the resulting string to output_lines
output_lines.append(line)
# Print output_lines to see the final CSV-formatted strings
print(output_lines) - Interactive Discussion:
- Ask: “Why do we need to convert each cell to a string? What does the join operation accomplish?”
C. Writing the Output to a File
- Using the File Write Process:
# Open a new file for writing the filtered data
output_file = open('dine_in_orders.csv', 'w')
# Write each line to the file, adding a newline at the end
for line in output_lines:
output_file.write(line + "\n")
output_file.close() - Discussion:
- Explain that the file is created/overwritten and that each line is written with a newline character.
- Mention that Pandas’
to_csv
automates these steps for us.
4. Wrap-Up (10 minutes)
Recap
- We learned how to:
- Read a CSV file using
open()
andreadlines()
. - Process each line by stripping newlines and splitting on commas (using loops, not list comprehensions).
- Separate headers from data.
- Clean data by converting string values to proper types.
- Filter rows using for loops.
- Convert the processed data back to CSV format by joining cells in each row.
- Write the final output to a new CSV file.
- Read a CSV file using
- Connection to Pandas:
- Manual CSV processing shows the “under the hood” work that Pandas automates (reading, cleaning, filtering, and writing data).