Movie Recommendations With Spark Collaborative Filtering

Collaborative filtering (CF)[1] based on the alternating least squares (ALS) technique[2] is another algorithm used to generate recommendations. It produces automatic predictions (filtering) about the interests of a user by collecting preferences from many other users (collaborating). The underlying assumption of the collaborative filtering approach is that if a person A has the same opinion as a person B on an issue, A is more likely to have B's opinion on a different issue than a randomly chosen person. This algorithm gained a lot of traction in the data science community after it was used by the team winner of the Netflix Prize.

The CF algorithm has also been implemented in Spark MLlib[3] with the aim to address fast execution on very large datasets. KNIME Analytics Platform with its Big Data Extensions offers it in the Spark Collaborative Filtering node. We will use it here to recommend movies to a new user within a KNIME implementation of the collaborative filtering solution provided in the Infofarm blog post[4].