Skip to main content

Day 31 - File I/O

Skills: None

Pre-reading: 11.1

Intro (15 mins)

  • Today we’ll learn how to read and write CSV files "by hand" in Python, using only basic file I/O and string methods—no Pandas.
  • This helps you understand what's happening under the hood when you use high-level libraries.
  • Example: Reading a CSV file of library loans and filtering for home-loan records.
    # Read the file
    # First download the file from the URL:
    # https://pdi.run/f25-library-loans.csv
    # and save it as 'library_loans.csv' in the same directory as this file.
    file = open('library_loans.csv', 'r')
    lines = file.readlines()
    file.close()

    # Convert lines to list of lists
    data = []
    for line in lines:
    cells = line.strip().split(',')
    data.append(cells)

    # Separate header and data
    headers = data[0]
    loans = data[1:]

    # Convert days to int and filter for home-loan
    days_index = headers.index('days')
    loan_type_index = headers.index('loan_type')

    home_loans = []
    for row in loans:
    row[days_index] = int(row[days_index])
    if row[loan_type_index] == 'home-loan':
    home_loans.append(row)

    # Write filtered data to a new CSV
    output = [headers] + home_loans
    out_file = open('home_loans.csv', 'w')
    for row in output:
    out_file.write(','.join([str(cell) for cell in row]) + '\n')
    out_file.close()
  • All of these steps—reading, splitting, cleaning, filtering, and writing—are done manually here, but are automated by Pandas.

Class Exercises (40 mins)

  1. Filter the data to include only loans where loan_type is "home-loan". Print the filtered rows.
  2. Write the filtered data (including the header) to a new CSV file called home_loans.csv.
  3. What happens if you forget to close the file after writing? Try it and see if you get an error or warning.
  4. Add a new column to each row called overdue_fee (assume each day over 14 costs $0.25). Write the updated data to a new CSV file.
  5. Try reading a CSV file that is missing a column in one row. What happens? How could you handle this?
  6. Change your code to skip rows that are missing data, and write only the complete rows to a new file. Print a warning for each skipped row.
  7. Write code to count the total number of loans in the file (excluding the header).
  8. Write code to count how many loans are for each unique loan_type (e.g., in-library, home-loan).
  9. Write code to find the book_title with the highest total days borrowed across all rows.
  10. Write code to filter for loans where days is greater than 21, and write these rows (with the header) to a new CSV file called long_loans.csv.
  11. Write code to compute the total overdue fees (sum of overdue_fee) for all loans, assuming each day over 14 costs $0.25.
  12. Write code to find and print all unique borrower names in the file.
  13. Write code to sort the loans by days (largest first) and write the sorted data (with header) to a new CSV file called sorted_loans.csv.
  14. (Optional) Write code to add a new column called duration_label that is "short" if days ≤ 7, "medium" if days is 8–14, and "long" otherwise, and write the updated data to a new CSV file.

Wrap-up (5 mins)

  • Manual file I/O gives you insight into how data is read, cleaned, filtered, and written in Python.
  • Libraries like Pandas automate these steps, but it’s important to understand what’s happening behind the scenes.