The Practice of Alluxio in Ctrip Real-Time Computing Platform

Today, a real-time computation platform is becoming increasingly important in many organizations. In this article, we will describe how ctrip.com applies Alluxio to accelerate the Spark SQL real-time jobs and maintain the jobs’ consistency during the downtime of our internal data lake (HDFS). In addition, we leverage Alluxio as a caching layer to reduce the workload pressure on our HDFS NameNode.

Background and Architecture

Ctrip.com is the largest online travel booking website in China. It provides online travel services including hotel reservations, transportation ticketing, packaged tours, with hundreds of millions of online visits every day. Driven by the high demand, a massive amount of data is stored in big data platforms in different formats. Handling nearly 300,000 offline and real-time analytics jobs every day, our main Hadoop cluster is at the scale of a thousand servers, with more than 50PB of data stored and increasing by 400TB daily.