streaming | The Blog Pros

July 7, 2022

7 Reasons to Choose Apache Pulsar over Apache Kafka

So why did we build our messaging service using Apache Pulsar?

At DataStax, our mission is to empower developers to build cloud-native distributed applications by making cloud-agnostic, high-performance messaging technology easily available to everyone. Developers want to write distributed applications or microservices but don’t want the hassle of managing complex message infrastructure or getting locked into a particular cloud vendor. They need a solution that just works. Everywhere.

April 8, 2022

Solving the Problem of Working Remotely With Resource-Intensive Applications Using Moonlight

For a number of reasons, you cannot transfer high-throughput equipment and resource-intensive software home, but you can still organize high-quality remote access from anywhere at no extra cost. We are going to tell you about the first way we have tested for maintaining convenient remote management from almost any device.

What’s Up, Doc?

The average employee just needs to connect to the remote desktop using the RDP protocol in order to access corporate resources from a laptop, and herein lies the problem for IT specialists: ensuring security. If a specialist needs resource-intensive applications that use 3D acceleration, this is a problem of a completely different kind.

April 5, 2022

Real-Time Pulsar and Python Apps on a Pi

Today we will look at the easy way to build Python streaming applications from the edge to the cloud. Let's walk through how to build a Python application on a Raspberry Pi that streams sensor data and more from the edge to any and all data stores while processing data in event time.

My GitHub repository has all of the code, configuration, and scripts needed to build and run this application.

February 27, 2022

Pulsar in Python on Pi for Sensors

I have a new Raspberry Pi with a Breakout Garden with a thermal camera, 1.12" OLED screen, and a CO2+ sensor.

We first need to install the Pulsar Python Client, if you are running on certain architectures you will need to compile the Apache Pulsar C++ Client first.

February 24, 2022

Pulsar on KubeSphere: Installing Distributed Messaging and Streaming Platform

KubeSphere, an open-source container platform running on Kubernetes, provides users with an app-centric experience. In this connection, it features a comprehensive set of tools for developers to manage apps across their entire lifecycle. In this article, I will demonstrate how to install Apache Pulsar on a KubeSphere cluster as an example. Apache Pulsar, a cloud-native, distributed messaging and streaming tool, represents a go-to platform to meet the real-time event-streaming needs of enterprises.

Before You Begin

To install Pulsar on KubeSphere, you need to do the following beforehand:

July 8, 2021

Video Codecs and Encoding

Streaming has always been an important place in our lives, but it has become an essential need for all of us now, especially after Covid-19.

As being one of the core elements of video streaming, Video codecs allow publishers to compress a video file for distribution over the Internet, through a process called video encoding. It allows you to watch your favorite series in the evening on Netflix or Amazon Prime. Video Codecs allow you to watch videos and see your loved ones with video calling programs, even with limited bandwidth.

June 6, 2021

Introducing Cloudera SQL Stream Builder (SSB)

Cloudera SQL Stream Builder (SSB)

The initial release of Cloudera SQL Stream Builder as part of the CSA 1.3.0 release of Apache Flink and friends from Cloudera shows an integrated environment well integrated into Cloudera's Data Platform. SSB is an improved release of Eventador's SQL Stream Builder with integration into Cloudera Manager, Cloudera Flink, and other streaming tools.

May 5, 2021

Spring Cloud Stream Channel Interceptor

Introduction

A Channel Interceptor is a means to capture a message before being sent or received in order to view it or modify it. The channel interceptor allows having a structured code when we want to add extra message processing or embed additional data that are basically related to a technical aspect without affecting the business code.

The Message Interceptor is used in frameworks like Spring Cloud Sleuth and Spring Security to propagate tracing and security context through message queue by adding headers to message in the producer part, then reading them and restoring the context in the consumer part.

April 3, 2021

Different Streaming Strategies in Mule 4

In this article, we will go through one of the most important concepts of MuleSoft Streaming strategies.

There are three types of streaming strategies available in MuleSoft:

March 25, 2021

How to Build and Debug a Flink Pipeline Based in Event Time

One of the most important concepts for stream-processing frameworks is the concept of time. There are different concepts of time:

Processing time: It’s the time-based on the clock of the machine where the event is being processed. It’s easy to use, but, because that time changes when the job is executed, the result of the job isn’t consistent. Each time you execute the job, you may have different results. This isn’t an acceptable trade-off for many use cases.
Event time: It’s time-based on some of the fields in the event, typically a timestamp field. Each time you execute the pipeline with the same input, you obtain the same result, which is a good thing. But it also tends to be a bit harder to work with it for several reasons. We’ll cover them later in the article.
Ingestion time: It’s based on the timestamp when the event was ingested in the streaming platform (Kafka) and it usually goes in the metadata. From a Flink perspective, we can consider it a particular mix of Event time and processing time with the disadvantages of both.

Apache Flink has excellent support for Event time processing, probably the best of the different stream-processing frameworks available. For more information, you can read Notions of Time: Event Time and Processing Time in the official documentation. If you prefer videos, Streaming Concepts and Introduction to Flink - Event Time and Watermarks is a good explanation.

December 12, 2020

Quality Control in OTT

OTT and streaming services are becoming increasingly popular: people watch videos on mobile devices and computers much more often. For instance, in the US, 86% of smartphone owners use their mobile devices to watch video content. Television broadcasters and content-makers do need to have their own OTT services; but, what is more, they should constantly monitor the quality of streaming and content in order to be competitive in an actively developing market.

Media Format Is Changeable

The traditional format for delivering video content, via cable or satellite TV, is linear. Viewers have to watch what is shown at that moment and use a device connected to a cable or dish. But viewers want to watch what they are interested in right now, using the device that is comfortable for them. OTT provides such an opportunity, and therefore, this format is becoming mainstream. Now, it is the viewer who dictates what they want to watch and when, while the content-makers try to offer the user a product that best suits their preferences.

August 28, 2020

Azure DevOps: Getting Started With Audit Streaming With Event Grid

Streaming of Audit Logs in Azure DevOps Is Available in Public Preview

Auditing logs in Azure DevOps administrator can monitor the changes throughout the DevOps instance.

By default, it will display data till 90 days, what if your organization wants this to send this data to inside or outside azure instance like kibana/logstash and create some funny visualization graphs.

July 31, 2020

gRPC Client and Bi-directional Streaming with Ballerina Swan Lake

Resembling a graceful rendition of Tchaikovsky’s infamous ballet, the namesake Swan Lake release of the Ballerina (referred to as Ballerinalang in the article) programming language comes packed with a revamped gRPC library to provide a more elegant way of handling client and bi-directional streaming cases. This article aims to discuss this improved gRPC streaming functionality by referring to an example for better understanding. However, if you are new to gRPC in Ballerinalang and seeking in-detail knowledge on the basics and implementation of a Unary application, read this blog on Ballerina + gRPC.

Microservices Bill Calculator with Ballerinalang

Let’s look at a basic microservices-based bill calculator as an example. The client would stream the price and quantity (the input values) of various items to be included in the total bill, and the server would essentially multiply and add them together to return the total bill as a reply.

May 8, 2020

Apache Spark vs Apache Storm

Introduction

Apache Storm and Apache Spark are two powerful and open source tools being used extensively in the Big Data ecosystem. Many people have doubts regarding the suitability and applicability of these tools. In this post, I would like to draw a comparison between these tools.

Apache Storm: Apache Storm makes it easy to reliably process unbounded streams of data, doing for realtime processing what Hadoop did for batch processing. Apache Storm is simple, can be used with any programming language, and is a lot of fun to use!

March 17, 2020

ETL and How it Changed Over Time

What Is ETL?

ETL is the abbreviation for Extract, Transformation, and Load. In simple terms, it is just copying data between two locations.[1]

Extract: The process of reading the data from different types of sources including databases.
Transform: Converting the extracted data to a particular format. Conversion also involves enriching the data using other data in the system.
Load: The process of writing the data to a target database, data warehouse, or another system.

ETL can be differentiated into 2 categories with regards to the infrastructure.

February 20, 2020

Deep Dive Into Apache Flink’s TumblingWindow – Part 1

In this article, I will share coding examples some of the key aspects of TumblingWindow in Flink. Those not familiar with Flink streaming can get an introduction here.

Before we get into TumblingWindow, let us get a basic understanding of "Window" when it comes to stream processing or streaming computation. In a data stream you have a source that is continuously producing data, making it unfeasible to compute a final value.

December 17, 2019

Streaming ETL With Apache Flink

Streaming data computation is becoming more and more common with the growing Big Data landscape. Many enterprises are also adopting or moving towards streaming for message passing instead of relying solely on REST APIs.

Apache Flink has emerged as a popular framework for streaming data computation in a very short amount of time. It has many advantages in comparison to Apache Spark (e.g. lightweight, rich APIs, developer-friendly, high throughput, an active and vibrant community).

December 11, 2019

KSQL: A SQL Streaming Engine for Apache Kafka

KSQL is a SQL streaming engine for Apache Kafka. It provides an easy-to-use, yet powerful interactive SQL interface for stream processing on Kafka, without the need to write code in a programming language like Java or Python. KSQL is scalable, elastic, and fault-tolerant. It supports a wide range of streaming operations, including data filtering, transformations, aggregations, joins, windowing, and sessionization.

What Is Streaming?

In stream processing, data is continuously processed, as new data become available for analyzing. Data is processed sequentially as an unbounded stream and may be pulled in by a “listening” analytics system as a record in key-value pairs.