Skip to main content

Day 32 - File I/O

Skills: None

Pre-reading: 11.1

Intro (15 mins)

  • Today we’ll learn how to read and write CSV files "by hand" in Python, using only basic file I/O and string methods—no Pandas.
  • This helps you understand what's happening under the hood when you use high-level libraries.
  • Example: Reading a CSV file of library loans and filtering for home-loan records.
    # Read the file
    # First download the file from the URL:
    # https://pdi.run/f25-library-loans.csv
    # and save it as 'library_loans.csv' in the same directory as this file.
    file = open('library_loans.csv', 'r')
    lines = file.readlines()
    file.close()

    # Convert lines to list of lists
    data = []
    for line in lines:
    cells = line.strip().split(',')
    data.append(cells)

    # Separate header and data
    headers = data[0]
    loans = data[1:]

    # Convert days to int and filter for home-loan
    days_index = headers.index('days')
    loan_type_index = headers.index('loan_type')

    home_loans = []
    for row in loans:
    row[days_index] = int(row[days_index])
    if row[loan_type_index] == 'home-loan':
    home_loans.append(row)

    # Write filtered data to a new CSV
    output = [headers] + home_loans
    out_file = open('home_loans.csv', 'w')
    for row in output:
    out_file.write(','.join([str(cell) for cell in row]) + '\n')
    out_file.close()
  • All of these steps—reading, splitting, cleaning, filtering, and writing—are done manually here, but are automated by Pandas.

Class Exercises (40 mins)

  • Filter the data to include only loans where loan_type is "home-loan". Print the filtered rows.
  • Write the filtered data (including the header) to a new CSV file called home_loans.csv.
  • What happens if you forget to close the file after writing? Try it and see if you get an error or warning.
  • Add a new column to each row called overdue_fee (assume each day over 14 costs $0.25). Write the updated data to a new CSV file.
  • Try reading a CSV file that is missing a column in one row. What happens? How could you handle this?
  • Change your code to skip rows that are missing data, and write only the complete rows to a new file. Print a warning for each skipped row.
  • Write code to count the total number of loans in the file (excluding the header).
  • Write code to count how many loans are for each unique loan_type (e.g., in-library, home-loan).
  • Write code to find the book_title with the highest total days borrowed across all rows.
  • Write code to filter for loans where days is greater than 21, and write these rows (with the header) to a new CSV file called long_loans.csv.
  • Write code to compute the total overdue fees (sum of overdue_fee) for all loans, assuming each day over 14 costs $0.25.
  • Write code to find and print all unique borrower names in the file.
  • Write code to sort the loans by days (largest first) and write the sorted data (with header) to a new CSV file called sorted_loans.csv.
  • (Optional) Write code to add a new column called duration_label that is "short" if days ≤ 7, "medium" if days is 8–14, and "long" otherwise, and write the updated data to a new CSV file.

Wrap-up (5 mins)

  • Manual file I/O gives you insight into how data is read, cleaned, filtered, and written in Python.
  • Libraries like Pandas automate these steps, but it’s important to understand what’s happening behind the scenes.