streaming analytics | The Blog Pros

April 8, 2021

Online Machine Learning: Into the Deep

Introduction

Online Learning is a branch of Machine Learning that has obtained a significant interest in recent years thanks to its peculiarities that perfectly fit numerous kinds of tasks in today’s world. Let’s dive deeper into this topic.

What Exactly Is Online Learning?

In traditional machine learning, often called batch learning, the training data is first gathered in its entirety and then a chosen machine learning model is trained on said data: the resulting model is then deployed to make predictions on new unseen data.

March 17, 2020

ETL and How it Changed Over Time

What Is ETL?

ETL is the abbreviation for Extract, Transformation, and Load. In simple terms, it is just copying data between two locations.[1]

Extract: The process of reading the data from different types of sources including databases.
Transform: Converting the extracted data to a particular format. Conversion also involves enriching the data using other data in the system.
Load: The process of writing the data to a target database, data warehouse, or another system.

ETL can be differentiated into 2 categories with regards to the infrastructure.

March 10, 2020March 27, 2020

Deep Dive Into Apache Flink’s TumblingWindow — Part 2

In my previous post, I shared examples of how to define and use TumblingWindow in its default behavior. In this post, I am going to provide examples of overriding these defaults and handling late values.

You may also like: Deep Dive Into Apache Flink's TumblingWindow — Part 1

February 20, 2020

Deep Dive Into Apache Flink’s TumblingWindow – Part 1

In this article, I will share coding examples some of the key aspects of TumblingWindow in Flink. Those not familiar with Flink streaming can get an introduction here.

Before we get into TumblingWindow, let us get a basic understanding of "Window" when it comes to stream processing or streaming computation. In a data stream you have a source that is continuously producing data, making it unfeasible to compute a final value.

December 17, 2019

Streaming ETL With Apache Flink

Streaming data computation is becoming more and more common with the growing Big Data landscape. Many enterprises are also adopting or moving towards streaming for message passing instead of relying solely on REST APIs.

Apache Flink has emerged as a popular framework for streaming data computation in a very short amount of time. It has many advantages in comparison to Apache Spark (e.g. lightweight, rich APIs, developer-friendly, high throughput, an active and vibrant community).

December 6, 2019

Streaming ETL With Apache Flink — Part 3

Previous posts - Part 1 | Part 2.

Introduction

In this post, let us start exploring Flink to solve a real-world problem. This post from zalando.com shows how they are using Flink to perform a complex event correlation. I will take a simplified and practical event correlation problem and try to solve using Flink.

September 24, 2019

Make Crucial Predictions as Data Comes

Flink: as fast as a squirrel

Walking by the hottest IT streets in these days means you've likely heard about achieving Streaming Machine Learning, i.e. moving AI towards streaming scenario and exploiting the real-time capabilities along with new Artificial Intelligence techniques. Moreover, you will also notice the lack of research related to this topic, despite the growing interest in it.

If we try to investigate it a little bit deeper then, we realize that a step is missing: nowadays, well-known streaming applications still don't get the concept of Model Serving properly, and industries still lean on lambda architecture in order to achieve the goal. Suppose a bank has a concrete frequently updated batch trained Machine Learning model (e.g. an optimized Gradient Descent applied to past buffer overflow attack attempts) and it wants to deploy the model directly to their own canary.

April 14, 2019

Developing Event-Driven Applications to Prevent Accidents

Let’s build an application that not only predicts a disaster but helps to prevent it. We will try and compare two well-known streaming technologies: Apache Spark and Apache Flink. Our application will consume an event stream, figure out that things are going bad, and trigger an alarm signal.

There is a variety of use cases, for example: