Avoid Data Silos in Presto in Meta: The Journey From Raptor to RaptorX

Raptor is a Presto connector (presto-raptor) used to power some critical interactive query workloads in Meta (previously Facebook). Though referred to in the ICDE 2019 paper Presto: SQL on Everything, it remains somewhat mysterious to many Presto users because there is no available documentation for this feature. This article will shed some light on the history of Raptor and why Meta eventually replaced it in favor of a new architecture based on local caching, namely RaptorX.

The Story of Raptor

Generally speaking, Presto, as a query engine, does not own storage. Instead, connectors were developed to query different external data sources. This framework is very flexible, but it is hard to offer low latency guarantees in disaggregated compute and storage architectures. Network and storage latency adds difficulty in avoiding variability. To address this limitation, Raptor was designed as a shared-nothing storage engine for Presto.

CategoriesUncategorizedTags