Success Story: From AWS EMR to Kubernetes

Motivation

This article is an overview of the path we followed to migrate Spark Workloads to Kubernetes and to avoid EMR dependency. EMR was an important support tool at Empathy.co to orchestrate Spark workloads, but once the workloads became more complex, the use of EMR also became more complicated. So, back in December 2020, the Step Function flow to orchestrate the different EMR clusters was like this:

In January 2021, an initial Spark on Kubernetes RFC was proposed by the Platform Engineering team. The aim was to have a better solution with less possible friction between teams, especially the Data and Ether teams, the main users of the Spark workloads.