Project #1 – Collaborative Filtering using the NETFLIX Data

In this project your group will predict 100,000 movie ratings for users in a subset of the original NETFLIX data issued for the NETFLIX Prize. This challenge aimed at substantially improving the accuracy of predictions about how much someone is going to enjoy a movie based on their movie preferences. It was issued by the Netflix company and on September 21, 2009 a $1mio Grand Prize was awarded to the winning team.

Project Goal: Analyze the NETFLIX data using MapReduce or Spark and, based on the outcomes of this analysis, develop a feasible and efficient implementation of the collaborative filtering algorithm in MapReduce or Spark. After computing the predicted ratings, evaluate those ratings by comparing them to the true ratings (gold standard). Note that MapReduce/Spark is only required to find the k-most similar users or items. Execute your MapReduce or Spark program on Amazon EMR. Once you have the most similar users or items, you do not need MapReduce or Spark for the predictions and evaluation. This is a competition! Part of the grades for the results (up to 10%) will be assigned according to a ranking of the number and quality of all teams’ predictions!

Detailed instructions can found HERE.

BACK to final project page.