Skip to main content

Homework 6 — Bookstore Sales: Lists (Iteration)

Skills: 3, 11

Due

Thursday, October 16, 2025 at 6PM (Oakland) or 9PM (Boston)


Introduction

You're working as a data analyst for an online bookstore. The company has provided you with a transaction table containing information about individual book purchases. Your job is to extract columns from this table and perform various calculations to help the business understand their sales patterns. Assume you have access to a table called book-transactions with the following columns:

book-transactions = table: transaction-id, date-time, book-title, book-genres, total-amount
row: "TXN001", "2024-10-01 14:23", "1984", "Fiction", 18.98
row: "TXN002", "2024-10-01 16:45", "Crime and Punishment", "Fiction", 20.98
row: "TXN003", "2024-10-02 09:12", "Where the Crawdads Sing", "Fiction", 19.98
row: "TXN004", "2024-10-02 11:30", "Atomic Habits", "Self-Help", 22.98
row: "TXN005", "2024-10-03 13:15", "Klara and the Sun", "Sci-Fi", 32.98
row: "TXN006", "2024-10-03 15:42", "Project Hail Mary", "Sci-Fi", 34.98
row: "TXN007", "2024-10-04 10:05", "Evicted", "Non-Fiction", 20.98
row: "TXN008", "2024-10-04 12:18", "1984", "Fiction", 18.98
row: "TXN009", "2024-10-05 14:33", "Atomic Habits", "Self-Help", 22.98
row: "TXN010", "2024-10-05 16:20", "The Seven Husbands of Evelyn Hugo", "Fiction", 14.50
row: "TXN011", "2024-10-06 08:45", "Educated", "Memoir", 17.25
row: "TXN012", "2024-10-06 10:15", "Dune", "Sci-Fi", 29.99
row: "TXN013", "2024-10-06 12:30", "The Midnight Library", "Fiction", 16.75
row: "TXN014", "2024-10-07 09:00", "Sapiens", "Non-Fiction", 21.00
row: "TXN015", "2024-10-07 11:45", "The Alchemist", "Fiction", 13.95
row: "TXN016", "2024-10-07 14:20", "Becoming", "Memoir", 19.50
row: "TXN017", "2024-10-08 07:30", "The Handmaid's Tale", "Fiction", 15.99
row: "TXN018", "2024-10-08 13:15", "Thinking, Fast and Slow", "Psychology", 24.95
row: "TXN019", "2024-10-08 15:40", "The Hobbit", "Fantasy", 12.99
row: "TXN020", "2024-10-09 10:25", "Born a Crime", "Memoir", 18.00
row: "TXN022", "2024-10-09 16:10", "Circe", "Fantasy", 26.50
row: "TXN023", "2024-10-10 08:20", "The Power of Now", "Self-Help", 14.99
row: "TXN024", "2024-10-10 11:35", "Normal People", "Fiction", 17.99
row: "TXN025", "2024-10-10 14:45", "The Silent Patient", "Thriller", 23.50
row: "TXN026", "2024-10-11 09:15", "Untamed", "Memoir", 20.25
row: "TXN027", "2024-10-11 13:00", "The Thursday Murder Club", "Mystery", 22.75
row: "TXN028", "2024-10-11 15:30", "Atomic Habits", "Self-Help", 18.99
row: "TXN029", "2024-10-12 10:40", "The Song of Achilles", "Fantasy", 25.00
end

All of the below problems will work on a single column (as a list) -- your tests can be standalone, but you are also welcome to try your functions on the table above by extracting the column with book-transactions.get-column("column-name")).

Problem 1

Design a function count-high-value that takes a list of transaction amounts and counts how many are above $25. Use for each to iterate through the amounts and build up a count.

Problem 2

Design a function format-transaction-ids that takes a list of transaction IDs and converts them to a "display format" by removing the "TXN" prefix and adding "Order #" at the beginning (e.g., "TXN001" becomes "Order #001"). Use for each to transform each ID.

Problem 3

Design a function count-fiction-books that takes a list of book genres and counts how many are exactly "Fiction" (case-sensitive). Use for each to iterate through the genres and build up a count.

Problem 4

Design a function extract-dates that takes a list of date-time strings and extracts just the date portion (everything before the space). For example, "2024-10-01 14:23" should become "2024-10-01". Use for each to transform each date-time string.

Problem 5

The bookstore owner is considering updating the transaction system to add a customer_id column that would link each purchase to a specific person. This would enable the bookstore to provide:

  • Personalized book recommendations
  • Targeted email marketing
  • Customer loyalty rewards
  • Returns for customers who don’t have a receipt

However, this change raises significant privacy concerns, especially given the sensitive nature of reading materials.

Part A

Complete the privacy analysis table describing flows of information before and after this change. In this case, the transmission principles remain the same:

QuestionOld DatabaseProposed Database
What type of information is shared?
Who is the subject of the information?
Who is the sender of the information?
Who are the potential recipients of the information?Intended recipients:

Unintended recipients:
Intended recipients:

Unintended recipients:
What principles govern the collection and transmission of this information?Customers consent to data collection as a condition of their use of the site. Data is stored on remote servers.Customers consent to data collection as a condition of their use of the site. Data is stored on remote servers.

Note: you do not need to create a table in your solution file. Simply write answers to the questions in comments, e.g.,:

# old database information shared: ...
# proposed database information shared: ...
...

Part B

The principle of "secondary use" refers to data collected for one purpose being used for another purpose without the subject's consent. For example, after the USA PATRIOT Act was passed in 2001, the FBI gained expanded powers to access library records. The FBI’s use of the records is a “secondary use.” In response, many libraries began deliberately deleting patron borrowing records as soon as books were returned. Using the privacy analysis above, how might you create a marketing plan that minimizes potential “secondary use” of book sales data? Specify the following information in your plan:

  • What data is collected
  • How long it is retained
  • What functionality is preserved
  • What privacy risk is reduced