Lab 4 — Customer Research
Skills: 2, 11
Introduction
We've been asked to assist with filtering a large set of market research data to identify potential customers for a clothing brand. We've been given a record containing a customer's age, zipcode, and yearly income.
Problem 1
Design a function young-customers
that filters only the customers who are under 30 years old.
Problem 2
Problem 2. Pregnancy Prediction (Privacy/Contextual Integrity) Retailers collect a lot of data about customers in order to target them with personalized advertisements and coupons. In 2012, Target’s data analysis team began using purchases like unscented lotion, diaper-bag sized purses, and prenatal vitamins to predict a woman’s early stage pregnancy. Target then sent coupons for baby related items such as diapers to the houses of people. In doing so, it may have unintentionally revealed information about people’s pregnancy status to other household members, as described in this anecdote (https://www.nytimes.com/2012/02/19/magazine/shopping-habits.html):
“A man walked into a Target outside Minneapolis and demanded to see the manager. He was clutching coupons that had been sent to his daughter, and he was angry, according to an employee who participated in the conversation.
“My daughter got this in the mail!” he said. “She’s still in high school, and you’re sending her coupons for baby clothes and cribs? Are you trying to encourage her to get pregnant?”…
The manager apologized and then called a few days later to apologize again. “I had a talk with my daughter,” he said. “It turns out there’s been some activities in my house I haven’t been completely aware of. She’s due in August. I owe you an apology.””
This is an example of data inference: using one or many pieces of information to infer some other piece of information. Data inference tools are being marketed for hiring employees, predicting shoppers' moods, and predicting criminal behavior (https://www.nytimes.com/2019/04/21/opinion/computational-inference.html). However, inferring new pieces of information and then sharing them can also be a risk to privacy.
According to the contextual integrity framework, privacy involves maintaining "appropriate flows of information in a particular context." We've analyzed the flows of information in this situation by answering the following five questions but have left some unintended recipients blank:
Question | Answer |
---|---|
What type of information is shared? | Coupons with names and prices of products; coupons have been selected based on past purchases and Target's inferences about the customer's current life stage and needs |
Who is the subject of the information? | A Target customer |
Who is the sender of the information? | Target |
Who are the potential recipients of the information? | Intended Recipient: The customer Unintended Recipient(s): [to be filled in] |
What principles govern the collection and transmission of the information? | Information is collected from Target's internal purchase history data and purchased from data brokers. No consent was given for the initial collection of information or for transmission of information. |
Part A
Based on the anecdote above, identify 2 unintended recipients of the information that this customer is likely to be pregnant. One unintended recipient is directly mentioned in the anecdote; you should think of at least one more unique category of unintended recipients.
Please fill in your answers as comments in the file.
Part B
How could Target change the transmission of coupon information to customers in order to reduce the number of unintended recipients?
Problem 3 (binning)
Design a function income-category
that takes a row and returns a string representing the income category: "low" for incomes below $40,000, "middle" for incomes between $40,000 and $80,000, and "high" for incomes above $80,000.
Then, use build-column
to add this categorization to the table.
Problem 4
Create a function marketing-summary
that generates a summary table showing the count of customers in each combination of age group ("under 25", "25-35", "over 35") and income category ("low", "middle", "high"), which will help the marketing team target their campaigns.