Day 32 - File I/O
Skills: None
Pre-reading: 11.1
Intro (15 mins)
- Today we’ll learn how to read and write CSV files "by hand" in Python, using only basic file I/O and string methods—no Pandas.
- This helps you understand what's happening under the hood when you use high-level libraries.
- Example: Reading a CSV file of library loans and filtering for home-loan records.
# Read the file
# First download the file from the URL:
# https://pdi.run/f25-library-loans.csv
# and save it as 'library_loans.csv' in the same directory as this file.
file = open('library_loans.csv', 'r')
lines = file.readlines()
file.close()
# Convert lines to list of lists
data = []
for line in lines:
cells = line.strip().split(',')
data.append(cells)
# Separate header and data
headers = data[0]
loans = data[1:]
# Convert days to int and filter for home-loan
days_index = headers.index('days')
loan_type_index = headers.index('loan_type')
home_loans = []
for row in loans:
row[days_index] = int(row[days_index])
if row[loan_type_index] == 'home-loan':
home_loans.append(row)
# Write filtered data to a new CSV
output = [headers] + home_loans
out_file = open('home_loans.csv', 'w')
for row in output:
out_file.write(','.join([str(cell) for cell in row]) + '\n')
out_file.close() - All of these steps—reading, splitting, cleaning, filtering, and writing—are done manually here, but are automated by Pandas.
Class Exercises (40 mins)
- Filter the data to include only loans where
loan_type
is"home-loan"
. Print the filtered rows. - Write the filtered data (including the header) to a new CSV file called
home_loans.csv
. - What happens if you forget to close the file after writing? Try it and see if you get an error or warning.
- Add a new column to each row called
overdue_fee
(assume each day over 14 costs $0.25). Write the updated data to a new CSV file. - Try reading a CSV file that is missing a column in one row. What happens? How could you handle this?
- Change your code to skip rows that are missing data, and write only the complete rows to a new file. Print a warning for each skipped row.
- Write code to count the total number of loans in the file (excluding the header).
- Write code to count how many loans are for each unique
loan_type
(e.g.,in-library
,home-loan
). - Write code to find the
book_title
with the highest total days borrowed across all rows. - Write code to filter for loans where
days
is greater than 21, and write these rows (with the header) to a new CSV file calledlong_loans.csv
. - Write code to compute the total overdue fees (sum of
overdue_fee
) for all loans, assuming each day over 14 costs $0.25. - Write code to find and print all unique borrower names in the file.
- Write code to sort the loans by
days
(largest first) and write the sorted data (with header) to a new CSV file calledsorted_loans.csv
. - (Optional) Write code to add a new column called
duration_label
that is"short"
ifdays
≤ 7,"medium"
ifdays
is 8–14, and"long"
otherwise, and write the updated data to a new CSV file.
Wrap-up (5 mins)
- Manual file I/O gives you insight into how data is read, cleaned, filtered, and written in Python.
- Libraries like Pandas automate these steps, but it’s important to understand what’s happening behind the scenes.