Building a Movie Recommendation System With KNN

This week (3/12), we covered recommendation systems in Tuesday Tutorial. We will explore a use case for recommendation systems in the Friday Fun episode that will be published today (3/15). This blog post shares the dataset and the code notebook for the use case.

It is not required that you use these. The episode will explain the use case, and what the algorithm we have built does. But if you like doing nerdy things, these files will come handy.

The company in our use case is “What Will You Play Next Inc.” . They want to develop a recommendation system, where if a viewer is considering watching a movie, and they are on the “Watch Now” page of that title, the system will recommend a list titled ” Users like you also liked”, below the movie description.

We will be using Python to build a collaborative filtering recommendation system, leveraging KNN, a clustering approach. We have covered clustering in one of the Tuesday Tutorial episodes.

The dataset that will be used can be downloaded here. The description of data from Kaggle is below:

Data Description

The datasets describe ratings and free-text tagging activities from MovieLens, a movie recommendation service. It contains 20000263 ratings and 465564 tag applications across 27278 movies. These data were created by 138493 users between January 09, 1995 and March 31, 2015. This dataset was generated on October 17, 2016.

Users were selected at random for inclusion. All selected users had rated at least 20 movies.  No demographic information is included. Each user is represented by an id, and no other information is provided. The data are contained in six files.

tag.csv that contains tags applied to movies by users:

  • userId
  • movieId
  • tag
  • timestamp

rating.csv that contains ratings of movies by users:

  • userId
  • movieId
  • rating
  • timestamp

movie.csv that contains movie information:

  • movieId
  • title
  • genres

link.csv that contains identifiers that can be used to link to other sources:

  • movieId
  • imdbId
  • tmbdId

genome_scores.csv that contains movie-tag relevance data:

  • movieId
  • tagId
  • relevance

genome_tags.csv that contains tag descriptions:

  • tagId
  • tag

genome_scores.csv that contains movie-tag relevance data:

  • movieId
  • tagId
  • relevance
Code Notebook

The code notebook can be downloaded here:


Leave a comment