How To Use SingleStore Pipelines With Kafka, Part 1 of 3

Abstract

In this article series, we'll look at a compelling feature of SingleStore called Pipelines. This enables vast quantities of data to be ingested, in parallel, into a SingleStore database. We'll also see an example of how we can use this feature in conjunction with Apache Kafka™. This first article will focus on uploading some data into SingleStore using Spark. In a previous article, we noted that Spark was great for ETL with SingleStore. We'll also perform some analysis of the data. In the example application, we will simulate some sensors distributed globally that generate temperature readings, and these readings will be ingested into SingleStore through the Confluent Cloud. We'll implement a Producer-Consumer model using Java and JDBC, and then simplify this using SingleStore Pipelines.

The SQL scripts, Java code, and notebook files used in this article series are available on GitHub. The notebook files are available in DBC, HTML, and iPython formats.