Online Machine Learning: Into the Deep

Introduction

Online Learning is a branch of Machine Learning that has obtained a significant interest in recent years thanks to its peculiarities that perfectly fit numerous kinds of tasks in today’s world. Let’s dive deeper into this topic.

What Exactly Is Online Learning?

In traditional machine learning, often called batch learning, the training data is first gathered in its entirety and then a chosen machine learning model is trained on said data: the resulting model is then deployed to make predictions on new unseen data. 

ETL and How it Changed Over Time

What Is ETL?

ETL is the abbreviation for Extract, Transformation, and Load. In simple terms, it is just copying data between two locations.[1]

  • Extract: The process of reading the data from different types of sources including databases.
  • Transform: Converting the extracted data to a particular format. Conversion also involves enriching the data using other data in the system.
  • Load: The process of writing the data to a target database, data warehouse, or another system.

ETL can be differentiated into 2 categories with regards to the infrastructure.

Deep Dive Into Apache Flink’s TumblingWindow – Part 1

In this article, I will share coding examples some of the key aspects of TumblingWindow in Flink. Those not familiar with Flink streaming can get an introduction here. 

Before we get into TumblingWindow, let us get a basic understanding of  "Window" when it comes to stream processing or streaming computation.  In a data stream you have a source that is continuously producing data, making it unfeasible to compute a final value.  

Streaming ETL With Apache Flink

Streaming data computation is becoming more and more common with the growing Big Data landscape. Many enterprises are also adopting or moving towards streaming for message passing instead of relying solely on REST APIs. 

Apache Flink has emerged as a popular framework for streaming data computation in a very short amount of time. It has many advantages in comparison to Apache Spark (e.g. lightweight, rich APIs, developer-friendly, high throughput, an active and vibrant community).

Make Crucial Predictions as Data Comes

Flink: as fast as a squirrel

Walking by the hottest IT streets in these days means you've likely heard about achieving Streaming Machine Learning, i.e. moving AI towards streaming scenario and exploiting the real-time capabilities along with new Artificial Intelligence techniques. Moreover, you will also notice the lack of research related to this topic, despite the growing interest in it.

If we try to investigate it a little bit deeper then, we realize that a step is missing: nowadays, well-known streaming applications still don't get the concept of Model Serving properly, and industries still lean on lambda architecture in order to achieve the goal. Suppose a bank has a concrete frequently updated batch trained Machine Learning model (e.g. an optimized Gradient Descent applied to past buffer overflow attack attempts) and it wants to deploy the model directly to their own canary.