KNIMEā€™s Path To Empowering Developers in the Evolving Data Science Landscape

In the rapidly evolving world of data science, companies are constantly seeking tools and platforms that can help them harness the power of their data. KNIME, an open-source data science platform, has been at the forefront of this revolution, providing a comprehensive environment for data preparation, machine learning, and analysis. I recently had the opportunity to catch up with Michael Berthold, Founder and CEO of KNIME, at the Snowflake Data Cloud Summit, where we discussed the company's journey over the past five years and its vision for empowering developers, engineers, and architects in the data science landscape.

Evolving With the Times

Over the past five years, KNIME has undergone significant changes to stay ahead of the curve. "We completely changed both of our technologies," Berthold revealed. The analytics platform is now browser-ready, and the KNIME server has been replaced with a cloud-native business hub. The company also recently launched a SaaS offering, allowing users to access KNIME's powerful features without the need for on-premises installation.

Movie Recommendations With Spark Collaborative Filtering

Collaborative filtering (CF)[1] based on the alternating least squares (ALS) technique[2] is another algorithm used to generate recommendations. It produces automatic predictions (filtering) about the interests of a user by collecting preferences from many other users (collaborating). The underlying assumption of the collaborative filtering approach is that if a person A has the same opinion as a person B on an issue, A is more likely to have B's opinion on a different issue than a randomly chosen person. This algorithm gained a lot of traction in the data science community after it was used by the team winner of the Netflix Prize.

The CF algorithm has also been implemented in Spark MLlib[3] with the aim to address fast execution on very large datasets. KNIME Analytics Platform with its Big Data Extensions offers it in the Spark Collaborative Filtering node. We will use it here to recommend movies to a new user within a KNIME implementation of the collaborative filtering solution provided in the Infofarm blog post[4].