Lab 4 — Customer Research
Skills: 2, 11
Reference: For all work with tables, refer to the Tables page in the menu at the top of the page!
Introduction
We've been asked to assist with filtering a large set of market research data to identify potential customers for a clothing brand. We've been given a record containing a customer's age, zipcode, and yearly income.
Please create a table based on the customer-data.csv
file -- you will use this for the rest of the lab.
Problem 1
Design a function young-customers
that takes a table and returns a new table that has only customers who are under 30 years old.
Problem 2
Retailers collect a lot of data about customers in order to target them with personalized advertisements and coupons. In 2012, Target’s data analysis team began using purchases like unscented lotion, diaper-bag sized purses, and prenatal vitamins to predict a woman’s early stage pregnancy. Target then sent coupons for baby related items such as diapers to the houses of people. In doing so, it may have unintentionally revealed information about people’s pregnancy status to other household members, as described in this anecdote (https://www.nytimes.com/2012/02/19/magazine/shopping-habits.html):
“A man walked into a Target outside Minneapolis and demanded to see the manager. He was clutching coupons that had been sent to his daughter, and he was angry, according to an employee who participated in the conversation.
“My daughter got this in the mail!” he said. “She’s still in high school, and you’re sending her coupons for baby clothes and cribs? Are you trying to encourage her to get pregnant?”…
The manager apologized and then called a few days later to apologize again. “I had a talk with my daughter,” he said. “It turns out there’s been some activities in my house I haven’t been completely aware of. She’s due in August. I owe you an apology.””
This is an example of data inference: using one or many pieces of information to infer some other piece of information. Data inference tools are being marketed for hiring employees, predicting shoppers' moods, and predicting criminal behavior (https://www.nytimes.com/2019/04/21/opinion/computational-inference.html). However, inferring new pieces of information and then sharing them can also be a risk to privacy.
According to our privacy framework, privacy involves maintaining "appropriate flows of information in a particular context." We've analyzed the flows of information in this situation by answering the following five questions but have left some unintended recipients blank:
Question | Answer |
---|---|
What type of information is shared? | Coupons with names and prices of products; coupons have been selected based on past purchases and Target's inferences about the customer's current life stage and needs |
Who is the subject of the information? | A Target customer |
Who is the sender of the information? | Target |
Who are the potential recipients of the information? | Intended Recipient: The customer Unintended Recipient(s): [to be filled in] |
What principles govern the collection and transmission of the information? | Information is collected from Target's internal purchase history data and purchased from data brokers. No consent was given for the initial collection of information or for transmission of information. |
Part A
Based on the anecdote above, identify 2 unintended recipients of the information that this customer is likely to be pregnant. One unintended recipient is directly mentioned in the anecdote; you should think of at least one more unique category of unintended recipients.
Please fill in your answers as comments.
Part B
How could Target change the transmission of coupon information to customers in order to reduce the number of unintended recipients?
Problem 3
Design a function income-category
that takes a row and returns a string representing the income category: "low" for incomes below $40,000, "middle" for incomes between $40,000 and $80,000, and "high" for incomes above $80,000.
Then, use build-column
to add this categorization to the table. Visualize this new column using freq-bar-chart
.
Problem 4
Create a function marketing-summary
that adds a new column that indicates an age group ("under 25", "25-35", "over 35"). Experiment with visualizing the results of this binning using freq-bar-chart
, vs. using histogram
on the age
column.