scala tutorial | The Blog Pros

How to Process Nasty Fixed Width Files Using Apache Spark

A fixed width file is a very common flat file format when working with SAP, Mainframe, and Web Logs. Converting the data into a dataframe using metadata is always a challenge for Spark Developers. This particular article talks about all kinds of typical scenarios that a developer might face while working with a fixed witdth file. This solution is generic to any fixed width file and very easy to implement. This also takes care of the Tail Safe Stack as the RDD gets into the foldLeft operator.

Let's check the source file first and then the metadata file:

Akka Stream: Map And MapAsync

In this post, we will discuss what are “map” and “mapAsync” when used in the Akka stream and how to use them.

The difference is highlighted in their signatures:-

Apache Kafka With Scala Tutorial

Before the introduction of Apache Kafka, data pipelines used to be very complex and time-consuming. A separate streaming pipeline was needed for every consumer. You can see the complexity of it with the help of the below diagram.

Apache Kafka solved this problem and provided a universal pipeline that is fault-tolerant, scalable, and simple to use. There is now a single pipeline needed to cater to multiple consumers, which can be also seen with the help of the below diagram.

GBase 8a Implementation Guide: Resource Assessment
No categories
1. Disk Storage Space Evaluation The storage space requirements for a GBase cluster are calculated based on the data volume of the business system, the choice of compression algorithm, and the number of cluster replicas. The data volume of a business s... […]
A Look Into Netflix System Architecture
No categories
Ever wondered how Netflix keeps you glued to your screen with uninterrupted streaming bliss? Netflix Architecture is responsible for the smooth streaming experience that attracts viewers worldwide behind the scenes. Netflix's system architecture emphas... […]
High Availability and Disaster Recovery (HADR) in SQL Server on AWS
No categories
High Availability and Disaster Recovery (HADR) play a vital role in maintaining the integrity of data, reducing downtime, and safeguarding against data loss in enterprise database systems. AWS offers a range of HADR options for SQL Server, which levera... […]
Terraform Tips for Efficient Infrastructure Management
No categories
Terraform is a popular tool for defining and provisioning infrastructure as code (IaC), improving consistency, repeatability, and version control. But you need to know how to use it properly to extract maximum value from it as an infrastructure managem... […]
Integration Testing With Keycloak, Spring Security, Spring Boot, and Spock Framework
No categories
In today's security landscape, OAuth2 has become a standard for securing APIs, providing a more robust and flexible approach than basic authentication. My journey into this domain began with a critical solution architecture decision: migrating from bas... […]

Proudly powered by WordPress