Oxidizing the Kubernetes Operator

Some applications are hard to manage in Kubernetes, requiring lots of manual labor and time invested to operate them. Some applications might be hard to set up, some need special care for restarts, some both. It’s individual. What if the knowledge of how to operate an application could be extracted into a dedicated software, relieving human operators? That is exactly what Kubernetes operators are about. An operator, a.k.a., custom controller automates what a human operator would do with the application to make it run successfully. First, Kubernetes is taught about the application to be operated by the custom controller. This is simply done by creating a custom resource. Custom resources end up extending the Kubernetes API, making Kubernetes recognize the resource. The operator then watches events of such custom resource and acts upon them, hence the name custom controller — a controller for custom resources.

Rust is an extraordinary language for operators to be implemented in. A typical Kubernetes operator ends up making lots of calls to the Kubernetes API. Watching resources states, creating new ones, updating/deleting old ones. Also, an operator should be able to manage multiple resources at a time, ideally in parallel. Rust’s asynchronous programming model is a perfect match for building high-performance/throughput operators. If a threaded runtime, such as Tokio is used, a vast amount of custom resources can be managed in parallel by the operator. Rust is an ahead-of-time compiled language with no runtime, no garbage collection pauses, and C-level performance. An operator typically resides inside a Pod in Kubernetes. Such a pod can be restarted at any time. Near-instant startup time minimizes delay before the state of managed resources is handled by the operator again. Rust also guarantees memory safety and eliminates data races. This is especially helpful for heavily concurrent/parallel applications, like Kubernetes operators.

Parallel Grid Search in H2O

H2O is, at its core, a platform for distributed, in-memory computing. On top of the distributed computation platform, machine learning algorithms are implemented. At H2O, we design every operation, be it data transformation, training of machine learning models, or even parsing to utilize the distributed computation model. In order to work with big data fast, it’s necessary.

However, a single operation usually can not utilize clusters' computational resources to the very maximum. Data needs to be distributed across the cluster, and many operations require sequential execution of tasks, which, even if implemented in a distributed manner, follow after each other and require data exchange. These and many other smaller factors, if summed up together, may introduce a significant overhead.