Progressive Delivery: Argo Rollouts Adoption

Progressive Delivery is emerging as a worthy successor to Continuous Delivery by enabling developers to control how new features are launched to end users. Its wide popularity is owed to the demand for faster and more reliable software releases. The increasing emphasis on customer experience has begun to push the Continuous Delivery methodology by the wayside. Large enterprises like Netflix, Amazon, and Uber are turning to Progressive Delivery to test and release code in a phased and controlled manner.

In a nutshell, Progressive Delivery empowers developers to plan and implement code changes to a subset of users and then expand it to all users. The progressive rollout of features is executed through techniques like blue-green deployment, feature flagging, and canary deployments. You can mitigate issues that come up by promoting a version to all users only when you’re confident that it is performant and reliable. If it fails in production, the impact radius is restricted to a subset of users, and the update can be rolled back immediately.

Progressive Delivery in Kubernetes: Analysis

The native Kubernetes Deployment Objects support the Rolling Update strategy, which provides a basic guarantee during an update, as well as limitations:

  • Few controls over the speed of the rollout.
  • Inability to control traffic flow to the new version.
  • Readiness probes are unsuitable for deeper, stress, or one-time checks.
  • No ability to check external metrics to verify an update.
  • No ability to automatically abort and roll back the update.

For the reasons above, in a complex production environment, this Rolling Update could be risky because it doesn't provide control over the blast radius, may roll out too aggressively, and there is no rollback automation in case of failure.

Success Story: From AWS EMR to Kubernetes

Motivation

This article is an overview of the path we followed to migrate Spark Workloads to Kubernetes and to avoid EMR dependency. EMR was an important support tool at Empathy.co to orchestrate Spark workloads, but once the workloads became more complex, the use of EMR also became more complicated. So, back in December 2020, the Step Function flow to orchestrate the different EMR clusters was like this:

In January 2021, an initial Spark on Kubernetes RFC was proposed by the Platform Engineering team. The aim was to have a better solution with less possible friction between teams, especially the Data and Ether teams, the main users of the Spark workloads.