data dependencies | The Blog Pros

October 15, 2021

Fantastic ML Pipelines and Tips for Building Them

A machine learning (ML) pipeline is an automated workflow that operates by enabling the transformation of data, funneling them through a model, and evaluating the outcome. In order to cater to these requirements, an ML pipeline consists of several steps such as training a model, model evaluation, visualization after post-processing, etc. Each step is crucial towards the success of the whole pipeline, not only for the short-term but also in the long run. In order to ensure the sustainability of a pipeline in the longer run, ML engineers and organizations need to account for several ML-specific risk factors in the system design. The authors from Google pinpoint risk factors such as boundary erosion, entanglement, hidden feedback loops, undeclared consumers, data dependencies, configuration issues, changes in the external world, and a variety of system-level anti-patterns [1]. In this article, we will be diving deep into the root causes of some of these risk factors.

Figure 1: Automated pipeline (source : 123.rf)

1. Boundary Erosion

If you are given an ML pipeline and if your data team approaches you with a change in the input feature such as increase/reduction in dimension, would you be able to ensure that it won't affect the entire pipeline? Mostly the answer would be no.

Data Dependency Analyses in Backend Applications

Data Processing Patterns

Every application somehow deals with data. After all, this is why applications are necessary in the first place.

In the case of backend applications, there are well-known stable patterns for how code deals with data. Everyone knows about CRUD, for example.

Packages for Store Routines in MariaDB 11.4
No categories
MariaDB 11.4 introduced many advanced features. One that grabbed my attention is the general support of packages for stored routines. Although this was previously available by activating the Oracle compatibility mode, now the feature is available gener... […]
Maintain Chat History in Generative AI Apps With Valkey
No categories
A while back I wrote up a blog post on how to use Redis as a chat history component with LangChain. Since LangChain already had Redis chat history available as a component, it was quite convenient to write a client application. But, that's not the same... […]
Knowledge Graph Enlightenment, AI, and RAG
No categories
In the previous edition of the YotG newsletter, the wave of Generative AI hype was probably at its all-time high. Today, while Generative AI is still talked about and trialed, the hype is subsiding. Skepticism is settling in, and for good reason. Repor... […]
Building an Effective Zero Trust Security Strategy for End-To-End Cyber Risk Management
No categories
You've probably heard a lot about zero-trust security lately, and for good reason. As we move more of our applications and data to the cloud, the traditional castle-and-moat approach to security just doesn't cut it anymore. This makes me come... […]
Phased Approach to Data Warehouse Modernization
No categories
A modernized database will help you focus on building innovative solutions rather than investing your time and effort in managing these legacy systems. Based on the scale of your existing data warehouse processes or jobs, it can be an enormous task to ... […]

Proudly powered by WordPress