Best Practices for Setting up Monitoring Operations for Your AI Team

In recent years, the term MLOps has become a buzzword in the world of AI, often discussed in the context of tools and technology. However, while much attention is given to the technical aspects of MLOps, what's often overlooked is the importance of the operations. There is often a lack of discussion around the operations needed for machine learning (ML) in production and monitoring specifically. Things like accountability for AI performance, timely alerts for relevant stakeholders, and the establishment of necessary processes to resolve issues are often disregarded for discussions about specific tools and tech stacks. 

ML teams have traditionally been research-oriented, focusing heavily on training models to achieve high testing scores. However, once the model is ready to be deployed in real business processes and applications, the culture around establishing production-oriented operations is lacking. As a consequence, there is a lack of clarity regarding who is responsible for the models' outcomes and performance. Without the right operations in place, even the most advanced tools and technology won't be enough to ensure healthy governance for your AI-driven processes. 

The Real Democratization of AI, and Why It Has to Be Closely Monitored

In recent years, the topic of AI democratization has gained a lot of attention. But what does it really mean, and why is it important? And most importantly, how can we make sure that the democratization of AI is safe and responsible? In this article, we'll explore the concept of AI democratization, how it has evolved, and why it's crucial to closely monitor and manage its use to ensure that it is safe and responsible.

What AI Democratization Used to Be

In the past, AI democratization was primarily associated with "Auto ML" companies and tools. These promised to allow anyone, regardless of their technical knowledge, to build their own AI models. While this may have seemed like a democratization of AI, the reality was that these tools often resulted in mediocre results at best. Most companies realized that to truly derive value from AI, they needed teams of knowledgeable professionals who understood how to build and optimize models.

How to Monitor for Data and Concept Drift

Data Drift

Data and concept drift are frequently mentioned in ML monitoring, but what exactly are they, and how are they detected? Furthermore, given the common misconceptions, are data and concept drift things to be avoided at all costs or natural and acceptable consequences of training models in production? Read on to find out.

What Is It?

Perhaps the more common of the two is data drift, which refers to any change in the data distribution after training the model. In other words, data drift commonly occurs when the inputs a model is presented within production fail to correspond with the distribution it was provided during training. This typically presents itself as a change in the feature distribution, i.e., specific values for a given feature may become more common in production. In contrast, other values may see a decrease in prevalence. For example, consider an e-commerce company serving an LTV prediction model to optimize marketing efforts. A reasonable feature for such a model would be a customer’s age. However, suppose this same company changed its marketing strategy, perhaps by initiating a new campaign targeted at a specific age group. In this scenario, the distribution of ages being fed to the model would likely change, causing a distribution shift in the age feature and perhaps a degradation in the model’s predictive capacity. This would be considered data drift.

The Three Must-Haves for Machine Learning Monitoring

Machine learning models are not static pieces of code but, instead, dynamic predictors that depend on data, hyperparameters, evaluation metrics, and many other variables; it is vital to have insight into the training and deployment process to prevent model drift predictive stasis. That said, not all monitoring solutions are created equal. These are the three must-haves for a machine learning monitoring tool, whether you decide to build or buy a solution.

Complete Process Visibility

Many applications involve multiple models working in tandem, and these models serve a higher business purpose which may be two or three steps downstream. Furthermore, the model's behavior will likely be dependent on data transformations that are multiple steps upstream. Thus, a simple monitoring system that focuses on single model behavior will not capture the holistic picture of model performance related to the global business context. More profound knowledge of model viability only comes from complete process visibility – having insight into the entire data flow, metadata, context, and overarching business processes on which the modeling is predicated. For example, as part of a credit approval application, a bank may deploy a suite of models that assess creditworthiness, screen for potential fraud, and dynamically allocate trending offers and promos. A simple monitoring system might be able to evaluate any one of these models individually, but solving the overall business problem demands an understanding of the interlocution between them. While they may have divergent modeling goals, each model rests upon a shared foundation of training data, context, and business metadata. Thus, an effective monitoring solution will take these disparate pieces into account and generate unified insights that harness this shared information. These might include identifying a niche and underutilized customer segments in the training data distribution, flagging potential instances of concept and data drift, understanding the aggregate model impact on business KPIs, and more. The best monitoring solutions can also work not only on ML models but also on generic, tabularized data, allowing the monitoring solution to be extended to all business use-cases, not just those involving an ML component.