Maxim Stefanov | The Blog Pros

January 10, 2020

Apache Hadoop Code Quality: Production VS Test

In order to get high-quality production code, it's not enough just to ensure maximum coverage with tests. No doubt, great results require the main project code and tests to work efficiently together. Therefore, tests have to be paid as much attention to as source code. A decent test is a key success factor, as it will catch regression in production. Let's take a look at PVS-Studio static analyzer warnings to see the importance of the fact that errors in tests are no worse than the ones in production. Today's focus: Apache Hadoop.

About the Project

Those who were formerly interested in Big Data have probably heard or worked with the Apache Hadoop project. In a nutshell, Hadoop is a framework that can be used as a basis for building and working with Big Data systems.

August 20, 2019

PVS-Studio Visits Apache Hive

For the past ten years, the open-source movement has been one of the key drivers of the IT industry's development. The role of open source projects is becoming more and more prominent, not only in terms of quantity but also in terms of quality. This changes the very concept of how open source software is positioned on the IT market in general. Today, we are going to talk about Apache Hive.

Hadoop and Apache Hive

About Apache Hive

Apache Hadoop is currently thought to be one of the pioneering Big Data technologies. Its primary tasks are storing, processing, and managing large amounts of data. The main components comprising the framework are Hadoop Common, HDFS, Hadoop MapReduce, and Hadoop YARN. Over time, a large ecosystem of related projects and technologies has developed around Hadoop — many of which originally started as part of the project and then budded off to become independent. Apache Hive is one of them.

GBase 8a Implementation Guide: Resource Assessment
No categories
1. Disk Storage Space Evaluation The storage space requirements for a GBase cluster are calculated based on the data volume of the business system, the choice of compression algorithm, and the number of cluster replicas. The data volume of a business s... […]
A Look Into Netflix System Architecture
No categories
Ever wondered how Netflix keeps you glued to your screen with uninterrupted streaming bliss? Netflix Architecture is responsible for the smooth streaming experience that attracts viewers worldwide behind the scenes. Netflix's system architecture emphas... […]
High Availability and Disaster Recovery (HADR) in SQL Server on AWS
No categories
High Availability and Disaster Recovery (HADR) play a vital role in maintaining the integrity of data, reducing downtime, and safeguarding against data loss in enterprise database systems. AWS offers a range of HADR options for SQL Server, which levera... […]
Terraform Tips for Efficient Infrastructure Management
No categories
Terraform is a popular tool for defining and provisioning infrastructure as code (IaC), improving consistency, repeatability, and version control. But you need to know how to use it properly to extract maximum value from it as an infrastructure managem... […]
Integration Testing With Keycloak, Spring Security, Spring Boot, and Spock Framework
No categories
In today's security landscape, OAuth2 has become a standard for securing APIs, providing a more robust and flexible approach than basic authentication. My journey into this domain began with a critical solution architecture decision: migrating from bas... […]