Extra - Recommendation Systems
Skills: None
Pre-reading: 9.2
Intro (15 mins)
- Dictionaries are useful for fast lookups and tracking relationships, which we will leverage to build a movie recommendation system.
- Today we explore how to use dictionaries to analyze movie preferences and make recommendations based on what other people liked.
- Our goal: Given a movie someone likes, recommend other movies they might enjoy based on what movies tend to appear together in people's lists.
Understanding Co-occurrence with Dictionaries
When we say movies "co-occur," we mean they appear together in the same person's list. For example:
# If Alice likes these movies:
alice_movies = ["Inception", "The Matrix", "Interstellar"]
# We can track which movies appear together using a dictionary
cooccurrence = {
"Inception": ["Inception", "The Matrix", "Interstellar"],
"The Matrix": ["Inception", "The Matrix", "Interstellar"],
"Interstellar": ["Inception", "The Matrix", "Interstellar"]
}
The .get() Method: Safely Adding to Dictionary Values
When building up our dictionary, we need to handle keys that might not exist yet:
# Without .get() - causes KeyError if key doesn't exist
movies_dict["Inception"] = movies_dict["Inception"] + ["The Matrix"] # Error if "Inception" not in dict!
# With .get() - safely handles missing keys
movies_dict["Inception"] = movies_dict.get("Inception", []) + ["The Matrix"] # Returns [] if key missing
Counting with Counter
Once we have all co-occurrences, we can count which movies appear most frequently:
from collections import Counter
# If "Inception" appeared with these movies across all students:
inception_cooccurrences = ["The Matrix", "Interstellar", "The Matrix", "The Dark Knight", "The Matrix"]
# Count occurrences
counts = Counter(inception_cooccurrences)
print(counts.most_common(2)) # [('The Matrix', 3), ('Interstellar', 1)]
You'll build a recommendation system in several steps:
- Load and convert movie data to dictionaries
- Extract movies from each student's preferences
- Build a co-occurrence dictionary tracking which movies appear together
- Use
Counterto find the most frequently co-occurring movies - Create a recommendation function that suggests movies
Class Exercises (40 mins)
Part 1: Loading and Converting Data
- Download the CSV file from here. Load the CSV file using
pd.read_csv. - Create a list called containing the strings
"Movie 1","Movie 2", ...,"Movie 10", for your column names. - Convert the DataFrame to a list of dictionaries using
df.to_dict(orient="records"). Print the first dictionary to observe the data structure. - Write a loop that iterates through
data_lst. For eachrow, print theMovie 1.
Part 2: Extracting Movies from a Row
- Write a function
get_moviesthat takes a single row dictionary as input and returns a list of all non-NaN movie titles from that row. Usepd.notna()to check for NaN values. You can use the list you created in the above exercise here. - Test your function with the first few rows. Do all students list the same number of movies?
Part 3: Building the Co-occurrence Dictionary
- What does the dict.get() method do? What happens if you call .get(key, []) on a key that doesn't exist?
- Suppose one student listed these movies: ["Inception", "The Matrix", "Interstellar"]. We want to track that these three movies appeared together. For each movie in this list, we need to store ALL the movies (including itself) in a dictionary. Write a function
update_occurrencesthat takes in the movies list and a dictionary. The function should loop through the movie list and updates the dictionary by adding all the movies for each movie in this list. - Write a function build_cooccurrence_dict(df) that:
- Converts the DataFrame to a list of dictionaries
- Creates an (empty) dictionary for tracking cooccurrences,
- For each row, uses your function from above to extract the list of movies,
- Uses your function above to update the cooccurrence dictionary, and
- Returns the dictionary
- Test your build_cooccurrence_dict() function. Pick a movie and examine what's stored in the dictionary for that movie. Does it make sense?
Part 4: Making Recommendations
- Import
Counterfromcollections. Given a list with duplicates, useCounter()to count occurrences. What type of object does it return? - Write a function
recommend_v1(movie_name, cooccurrence_dct)that:- Takes a movie name and the co-occurrence dictionary
- Uses Counter on the list from the dictionary
- Returns the top 10 most common items(look up
.most_common()for counters)
Part 5: Testing the System
- Test your system! Connect your functions to the dataframe from step 1 and try a movie ("The Shawshank Redemption") to see the recommendations.
Wrap-up (5 mins)
- You can add more bells and whistles to your system (remove the movie itself, track student names, etc.)
- Dictionaries are useful for their quick lookup and key-based access.