Trino, Superset, and Ranger on Kubernetes: What, Why, How?

This article is an opinionated SRE point of view of an open-source stack to easily request, graph, audit and secure any kind of data access of multiple data sources. This post is the first part of a series of articles dedicated to MLOps topics. So, let’s start with the theory!

What Is Trino?

Trino is an open-source distributed SQL query engine that can be used to run ad hoc and batch queries against multiple types of data sources. Trino is not a database, it is an engine that aims to run fast analytical queries on big data file systems (like Hadoop, AWS S3, Google Cloud Storage, etc), but also on various sources of distributed data (like MySQL, MongoDB, Cassandra, Kafka, Druid, etc).  One of the great advantages of Trino is its ability to query different datasets and then join information to facilitate access to data.