big data tutorial | The Blog Pros

May 20, 2021

Creating Live Dashboards With QuickSight

For many enterprise-grade applications, providing a point where you can access in-depth analysis about your data has become a crucial feature. There are many approaches to this — you can build your own web application and backend that has views allowing customers to filter and analyze data. Alternatively, you can use the embedded analytics capabilities of Looker, Tableau, or Sisense — all of which are large business intelligence tools, with a host of features and connectors into all sorts of data sources.
But if you’re already on AWS, then it really is worth considering QuickSight to present analytics in your web application.

This series will guide you through the intricacies of creating a multi-tenant solution with QuickSight, dealing with data security across customers and within organizations. We’ll need to go beyond to AWS console and dive into the CLI/API commands that you’ll need to manage all of this.

May 16, 2021

Data Processing Using Functions in Prosto: An Alternative to Map-Reduce and SQL

Why Prosto? Having Only Set Operations Is Not Enough

Typical data processing tasks have to access and analyze data stored in multiple tables. They could be called relations, collections, or sets in different systems but we will refer to them as tables for simplicity. The general task of data processing is to derive new data from these multiple tables and each solid data (processing) model must answer the following three important questions: how to compute new columns within one table, how to link tables and how to aggregate data. Below we shortly describe how these tasks are solved in a traditional set-oriented model and where these solutions have significant flaws.

Calculation. Given a table, we frequently need to add a new column with values computed from other columns in this same table. Conceptually, the task is similar to defining a cell in a spreadsheet, for example, C1=A1+B1. Easy and natural? Yes. However, it is not so easy in traditional data processing frameworks. The main problem is that we need to define a new table because adding a column to an existing table is not possible. Prosto toolkit is intended to fix this flaw by providing a dedicated operation where a new column can be added as in this example: ColumnC=ColumnA+ColumnB.

May 6, 2021

Step-by-Step Tutorial: From Data Preprocessing to Using Graph Database

This article is contributed by Jiayi98, a Nebula Graph user. She shared her experience in deploying Nebula Graph offline and preprocessing a dataset provided by LDBC. It is a beginner-friendly step-by-step guide to learn Nebula Graph.

This is not standard stress testing, but a small-scale test. Through this test, I got familiar with the deployment of Nebula Graph, its data import tool, its graph query language, Java API, and data migration. Additionally, now I have a basic understanding of its cluster performance.

April 3, 2019

Getting Started With Alluxio and Spark in 5 Minutes

Co-authored by Alex Ma.

Introduction

Apache Spark has brought significant innovation to Big Data computing, but its results are even more extraordinary when paired with Alluxio. Alluxio provides Spark with a reliable data sharing layer, enabling Spark to excel at performing application logic while Alluxio handles storage. Bazaarvoice uses the combination of Spark and Alluxio to provide a real-time big data platform that has the ability to not only handle the intake of 1.5 billion page views during peak events like Black Friday but also provides real-time analytics against it (read more). At this scale, the gain in speed is an enabler for new workloads. We’ve established a clean and simple way to integrate Alluxio and Spark.