Spark Structured Streaming Using Java

Spark provides streaming library to process continuously flowing of data from real-time systems.

Concept 

Spark Streaming is originally implemented with DStream API that runs on Spark RDD’s where the data is divided into chunks from the streaming source,  processed and then send to destination.

Redis Streams + Apache Spark Structured Streaming

Recently, I had the honor of presenting my talk, "Redis + Structured Streaming: A Perfect Combination to Scale-out Your Continuous Applications" at the Spark+AI Summit.

My interest in this topic was fueled by new features introduced in Apache Spark and Redis over the last couple months. Based on my previous use of Apache Spark, I appreciate how elegantly it runs batch processes, and the introduction of Structured Streaming in version 2.0 is further progress in that direction.

Spark Streaming vs. Structured Streaming

Fan of Apache Spark? I am too. The reason is simple. Interesting APIs to work with, fast and distributed processing, and, unlike MapReduce, there's no I/O overhead, it's fault tolerance, and much more. With this, you can do a lot in the world of big data and fast data. From "processing huge chunks of data" to "working on streaming data," Spark works flawlessly. In this post, we will be talking about the streaming power we get from Spark.

Spark provides us with two ways of working with streaming data: