How to Build a Collaborative Filtering Recommender Engine with Memgraph and Cypher

Introduction

A recommendation engine is a system that tries to suggest relevant items to users. These could be movies (e.g Netflix), products (e.g Amazon), flights (e.g Skyscanner), etc. Recommendation engines have become a key component in today’s online-first world and if engineered properly, they can help significantly increase revenue for commercial applications.

Although many different approaches exist to building a recommendation engine, in this tutorial we will be focusing on one of the most widely used ones, collaborative filtering. We will be using a movie dataset to build a simple movie recommender system leveraging Memgraph and Cypher.

Movie Recommendations With Spark Collaborative Filtering

Collaborative filtering (CF)[1] based on the alternating least squares (ALS) technique[2] is another algorithm used to generate recommendations. It produces automatic predictions (filtering) about the interests of a user by collecting preferences from many other users (collaborating). The underlying assumption of the collaborative filtering approach is that if a person A has the same opinion as a person B on an issue, A is more likely to have B's opinion on a different issue than a randomly chosen person. This algorithm gained a lot of traction in the data science community after it was used by the team winner of the Netflix Prize.

The CF algorithm has also been implemented in Spark MLlib[3] with the aim to address fast execution on very large datasets. KNIME Analytics Platform with its Big Data Extensions offers it in the Spark Collaborative Filtering node. We will use it here to recommend movies to a new user within a KNIME implementation of the collaborative filtering solution provided in the Infofarm blog post[4].