Shubham Dangare | The Blog Pros

March 10, 2020

Collecting Logs in Azure Databricks

Azure Databricks is an Apache Spark-based analytics platform optimized for the Microsoft Azure cloud services platform. In this blog, we are going to see how we can collect logs from Azure to ALA. Before going further we need to look how to set up a Spark cluster in Azure.

Create a Spark Cluster in Databricks

In the Azure portal, go to the Databricks workspace that you created, and then click Launch Workspace.
You are redirected to the Azure Databricks portal. From the portal, click New Cluster.
Under “Advanced Options,” click on the “Init Scripts” tab. Go to the last line under the “Init Scripts" section. Under the “destination” dropdown, select “DBFS," and enter “dbfs:/databricks/spark-monitoring/spark-monitoring.sh” in the text box. Click the “add” button.

Run a Spark SQL job

In the left pane, select Azure Databricks. From the Common Tasks, select New Notebook.
In the Create Notebook dialog box, enter a name, select language, and select the Spark cluster that you created earlier.

December 11, 2019

KSQL: A SQL Streaming Engine for Apache Kafka

KSQL is a SQL streaming engine for Apache Kafka. It provides an easy-to-use, yet powerful interactive SQL interface for stream processing on Kafka, without the need to write code in a programming language like Java or Python. KSQL is scalable, elastic, and fault-tolerant. It supports a wide range of streaming operations, including data filtering, transformations, aggregations, joins, windowing, and sessionization.

What Is Streaming?

In stream processing, data is continuously processed, as new data become available for analyzing. Data is processed sequentially as an unbounded stream and may be pulled in by a “listening” analytics system as a record in key-value pairs.

July 17, 2019

TensorFlow With Keras (Part 2)

This article is in continuation to Part 1, Tensorflow for deep learning. Make sure you go through it for a better understanding of this case study.

Keras is a high-level neural network API written in Python and capable of running on top of Tensorflow, CNTK, or Theano. It was developed with a focus on enabling fast experimentation. In this article, we are going to cover one small case study for fashion mnist.

March 8, 2019

Apache Kafka With Scala Tutorial

Before the introduction of Apache Kafka, data pipelines used to be very complex and time-consuming. A separate streaming pipeline was needed for every consumer. You can see the complexity of it with the help of the below diagram.

Apache Kafka solved this problem and provided a universal pipeline that is fault-tolerant, scalable, and simple to use. There is now a single pipeline needed to cater to multiple consumers, which can be also seen with the help of the below diagram.