azure | The Blog Pros

August 7, 2021

Extract Text From Sales Receipt Using Pre-Built Model: Azure Form Recognizer

Nowadays, where almost everything is turning to online and virtual modes, a very common problem any organization is facing is the processing of receipts that were scanned and submitted electronically for reimbursement purposes.

Now for any claim or reimbursements to get clear, first those must reach to proper accounts department based on the organization and the sector, and one way to perform this activity is by manual intervention. A person or a team must go through all those digitally scanned receipts manually and filter them based on the departments or any other validation and eligibility criteria they may have.

August 4, 2021

Get Started With Kafka Connector for Azure Cosmos DB Using Docker

Having a local development environment is quite handy when trying out a new service or technology. Docker has emerged as the de-facto choice in such cases. It is especially useful in scenarios where you’re trying to integrate multiple services and gives you the ability to start fresh before each run.

This blog post is a getting started guide for the Kafka Connector for Azure Cosmos DB. All the components (including Azure Cosmos DB) will run on your local machine, thanks to:

July 17, 2021

Bulk Copy Data Sharing Pattern for Applications in Azure With Data Explorer, Data Factory, and Cosmos DB

In the initial stages of a data platform development, data size is small, and you can easily share it via email or services such as Power BI. However, once the platform grows, and different parts of the business become dependent on it, sharing data between systems becomes a big challenge.

In a majority of the data-driven systems, one of the two patterns is used for consuming data.

June 26, 2021

Redis Streams in Action — Part 3 (Tweet Processor App)

Welcome to this series of blog posts that covers Redis Streams with the help of a practical example. We will use a sample application to make Twitter data available for search and query in real-time. RediSearch and Redis Streams serve as the backbone of this solution that consists of several co-operating components, each of which will be covered in a dedicated blog post.

The code is available in this GitHub repo - https://github.com/abhirockzz/redis-streams-in-action

June 24, 2021

Implementing Zero Trust Architecture on Azure Hybrid Cloud

This article outlines an approach to model NIST’s Zero Trust Security Architecture while migrating to MS Azure but still working with hybrid cloud deployments, and using tools and services offered by Azure.

What Is a Zero Trust Architecture (ZTA)?

The term ZTA has been in use in the domain of enterprise security models and architectures for organizations since 2010 when Forrester coined the term but became popular after NIST published it as a framework (SP 800-207, final version published in Aug. 2020). ZTA further got a lot of visibility after the US govt recently mandated all Federal agencies to adopt ZTA.

June 6, 2021

Reference Architecture: Deploying WSO2 API Manager on Microsoft Azure

Introduction

WSO2 is a 15+ years old software engineering organization that provides a set of Open Source products/platforms for API Management, Enterprise Integration, and Identity and Access Management.

Meeting current industry demands, all the WSO2 product can be deployed on any of the below infrastructure choices:

June 1, 2021

Redis Streams in Action (Part 2): Tweets Consumer App

The code is available in this GitHub repo - https://github.com/abhirockzz/redis-streams-in-action

May 29, 2021

Azure Databricks: 14 Best Practices For a Developer

1. Choice of Programming Language

The language depends on the type of cluster. A cluster can comprise of two modes, i.e., Standard and High Concurrency. A High Concurrency cluster supports R, Python, and SQL, whereas a Standard cluster supports Scala, Java, SQL, Python, and R.
Spark is developed in Scala and is the underlying processing engine of Databricks. Scala performs better than Python and SQL. Hence, for the Standard cluster, Scala is the recommended language for developing Spark jobs.

2. ADF for Invoking Databricks Notebooks

Eliminate Hardcoding: In certain scenarios, Databricks requires some configuration information related to other Azure services such as storage account name, database server name, etc. The ADF pipeline uses pipeline variables for storing the configuration details. During the Databricks notebook invocation within the ADF pipeline, the configuration details are transferred from pipeline variables to Databricks widget variables, thereby eliminating hardcoding in the Databricks notebooks.
Notebook Dependencies: It is relatively easier to establish notebook dependencies in ADF than in Databricks itself. In case of failure, debugging a series of notebook invocations in an ADF pipeline is convenient.

Cheap: When a Notebook is invoked through ADF, the Ephemeral job cluster pattern is used for processing the spark job because the lifecycle of the cluster is tied to the job lifecycle. These short-life clusters cost lesser than the clusters which are created using the Databricks UI.

3. Using Widget Variables

The configuration details are made accessible to the Databricks code through the widget variables. The configuration data is transferred from pipeline variable to widget variables when the notebook is invoked in the ADF pipeline. During the development phase, to model the behavior of a notebook run by ADF, widget variables are manually created using the following line of code.

May 24, 2021

Azure Synapse Analytics – New Insights Into Data Security

Azure Synapse Analytics is a new product in the Microsoft Azure portfolio. It brings a whole new layer of control plane over well-known services as SQL Warehouse (rebranded to SQL Provisioned Pool), integrated Data Factory Pipelines, and Azure Data Lake Storage, as well as add new components such as Serverless SQL and Spark Pools. Integrated Azure Synapse Workspace helps handle security and protection of data in one place for all data lake, data analytics, and warehousing needs, but also requires learning some new concepts. At GFT, working with financial institutions all over the world, we pay particular attention to the security aspects of solutions that we provide to our customers. Synapse Analytics is a welcome new tool in this area.

The New Workspace Portal

The first visible difference, when compared to other services, is that Synapse Analytics has a separate Workspace: https://web.azuresynapse.net/ that provides access to code, notebooks, SQL, pipelines, monitoring, and management panels. The portal is available on the public Internet using Azure AD Access controls for controlling access to any Synapse Analytics instance in any tenant that we have access to. However, Synapse Analytics introduces a new way to connect to the portal from Internet-isolated, on-premises networks, and offices using Private Link Hubs. Compared to Private Links that protect access to services and databases, this solution is used for routing traffic to a web portal. In conjunction with the Azure AD Conditional Access policy, the new Synapse Analytics Workspace can be protected with network and authentication policies.

May 21, 2021

Redis Streams in Action (Part 1)

Welcome to this series of blog posts that cover Redis Streams with the help of a practical example. We will use a sample application to make Twitter data available for search and query in real-time. RediSearch and Redis Streams serve as the backbone of this solution that consists of several co-operating components, each of which will be covered in a dedicated blog post.

The code is available in this GitHub repo - https://github.com/abhirockzz/redis-streams-in-action

May 15, 2021

Distributed Tracing in ASP.NET Core With Jaeger and Tye, Part 2: Project Tye

In This Series:

Distributed Tracing With Jaeger
Simplifying the Setup With Tye (this article)

Tye is an experimental dotnet tool from Microsoft that aims to make developing, testing, and deploying microservices easier. Tye's opinionated nature greatly simplifies the lifecycle of development and deployment of .NET Core microservices.

To understand the benefits of Tye, let's enumerate the steps involved in the development and deployment of the DCalculator application to Kubernetes:

May 2, 2021

Distributed Tracing in ASP.NET Core With Jaeger and Tye Part 1: Distributed Tracing

In This Series:

Distributed Tracing With Jaeger (this article)
Simplifying the Setup With Tye (coming soon)

Modern microservices applications consist of many services deployed on various hosts such as Kubernetes, AWS ECS, and Azure App Services or serverless compute services such as AWS Lambda and Azure Functions. One of the key challenges of microservices is the reduced visibility of requests that span multiple services. In distributed systems that perform various operations such as database queries, publish and consume messages, and trigger jobs, how would you quickly find issues and monitor the behavior of services? The answer to the perplexing problem is Distributed Tracing.

Distributed Tracing, Open Tracing, and Jaeger

Distributed Tracing is the capability of a tracing solution that you can use to track a request across multiple services. The tracing solutions use one or more correlation IDs to collate the request traces and store the traces, which are structured log events across different services, in a central database.

April 23, 2021

Migrate HDFS Data to Azure

During the middle of last year, my team decided to move our Hadoop workloads to Azure, including our data and applications. This article provides some of the best practices we used in migrating on-premises HDFS data to Azure HDInsight. Mentioned below are two approaches that we adopted to transfer the data over network with TLS encryption.

Method 1

The ExpressRoute Azure service uses a private connection between Azure and on-premise data centers (ExpressRoute offers higher security, reliability, and speeds with lower latencies than typical connections over the Internet). We took advantage of Data Factory's native data copy functionality using the integration runtime to migrate the data. Data Factory's self-hosted integration runtime (SHIR) should be installed on a pool of Windows VMs on an Azure virtual network. The VMs can be scaled out to multiple VMs to fully utilize network and storage IOPS or bandwidth.

April 5, 2021

Azure and Confluent: Real-Time Search Powered by Azure Cache for Redis, Spring Cloud

Self-managing a distributed system like Apache Kafka®, along with building and operating Kafka connectors, is complex and resource-intensive. It requires significant Kafka skills and expertise in the development and operations teams of your organization. Additionally, the higher the volumes of real-time data that you work with, the more challenging it becomes to ensure that all of the infrastructure scales efficiently and runs reliably.

Confluent and Microsoft are working together to make the process of adopting event streaming easier than ever by alleviating the typical infrastructure management needs that often pull developers away from building critical applications. With Azure and Confluent seamlessly integrated, you can collect, store, process event streams in real-time and feed them to multiple Azure data services. The integration helps reduce the burden of managing resources across Azure and Confluent.

April 4, 2021

Event-Driven Architecture With Apache Kafka for .NET Developers Part 2: Event Consumer

In This Series:

Development Environment and Event Producer
Event Consumer (this article)
Azure Integration (coming soon)

Let's carry our discussion forward and implement a consumer of the events published by the Employee service to the leave-applications Kafka topic. We will extend the application that we developed earlier to add two new services to demonstrate how Kafka consumers work: Manager service and Result reader service.

Source Code

The complete source code of the application and other artifacts is available in my GitHub repository.

March 31, 2021

RediSearch in Action

Redis has a versatile set of data structures ranging from simple Strings all the way to powerful abstractions such as Redis Streams. The native data types can take you a long way, but there are certain use cases that may require a workaround. One example is the requirement to use secondary indexes in Redis in order to go beyond the key-based search/lookup for richer query capabilities. Though you can use Sorted Sets, Lists, and so on to get the job done, you’ll need to factor in some trade-offs.

Enter RediSearch! Available as a Redis module, RediSearch provides flexible search capabilities, thanks to a first-class secondary indexing engine. It offers powerful features such as full-text Search, auto completion, geographical indexing, and many more.

March 30, 2021

Event-Driven Architecture With Apache Kafka for .NET Developers Part 1: Event Producer

In This Series:

Development Environment and Event Producer (this article)
Event Consumer (coming soon)
Azure Integration (coming soon)

Introduction

An event-driven architecture utilizes events to trigger and communicate between microservices. An event is a change in the service's state, such as an item being added to the shopping cart. When an event occurs, the service produces an event notification which is a packet of information about the event.

The architecture consists of an event producer, an event router, and an event consumer. The producer sends events to the router, and the consumer receives the events from the router. Depending on the capability, the router can push the events to the consumer or send the events to the consumer on request (poll). The producer and the consumer services are decoupled, which allows them to scale, deploy, and update independently.

March 15, 2021

Build a Basic GraphQL Server With ASP.NET Core and Entity Framework in 10 Minutes

Since I wrote my first GraphQL post in 2019, much has changed with GraphQL in the .NET space. The ongoing changes have also affected most of the documentation available online. This article will walk you through the steps to create a basic GraphQL API on ASP.NET Core using GraphQL for .NET, Entity Framework Core, Autofac, and the Repository design pattern. I chose the tech stack for the sample application based on the popularity of the frameworks and patterns. You can substitute the frameworks or libraries with equivalent components in your implementation.

If you are not familiar with the concepts of GraphQL, please take some time to read the learn series of articles on the GraphQL website. Let's now fire up our preferred editor or IDE to get started.