Getting Started With Alluxio and Spark in 5 Minutes

Co-authored by Alex Ma.

Introduction

Apache Spark has brought significant innovation to Big Data computing, but its results are even more extraordinary when paired with Alluxio. Alluxio provides Spark with a reliable data sharing layer, enabling Spark to excel at performing application logic while Alluxio handles storage. Bazaarvoice uses the combination of Spark and Alluxio to provide a real-time big data platform that has the ability to not only handle the intake of 1.5 billion page views during peak events like Black Friday but also provides real-time analytics against it (read more). At this scale, the gain in speed is an enabler for new workloads. We’ve established a clean and simple way to integrate Alluxio and Spark.