Using Machine Learning to Find Root Cause of App Failure Changes Everything

It is inevitable that a website or app will fail or encounter problems from time to time, ranging from broken functionality to performance issues or even complete outages. Development cycles are too fast, conditions too dynamic, and infrastructure and code too complex to expect flawless operations all the time. When a problem does occur, it creates a high-pressure urgency that sends teams scurrying to find a solution. The root cause of most problems can usually be found somewhere among millions (or even billions) of log events from a large number of different sources. The ensuing investigation is usually slow and painful and can take away valuable hours from already busy engineering teams. It also involves handoffs between experts in different aspects or components of the app, particularly with the use of interconnected microservices and third-party services which can cause a wide range of failure permutations. 

Finding the root cause and solution takes both time and experience. At the same time, development teams are usually quite short-staffed and overworked, so the urgent “fire drill” of dropping everything to find the cause of an app problem stalls other important development work. Using observability tools, such as APM, tracing, monitoring, and log management solutions, helps team productivity, but it's not enough. These tools still require knowing what to look for and significant time to interpret the results that are uncovered.

A Primer on ML and Jupyter Notebook

Recently, I was working on an edge computing demo[1] that used ML (machine learning) to detect anomalies for a manufacturing use case. While I had a generic understanding of what ML is, I lacked the practitioner's understanding of how to use it. Similarly, I’d heard of Jupyter Notebook and was vaguely aware that it was connected with ML, but didn’t really know what it was and how to use one. This article is geared towards people who just want to understand ML and Jupyter Notebook. There are plenty of great resources available if you want to learn how to build ML models.

Caution: If you’re a data scientist then this article is not for you! We’ll be using very simple analysis techniques to serve as a teaching aid. 

Using Scikit-Learn for Machine Learning Application Development in Python

Python is arguably the best programming language for machine learning. However, many aspiring machine learning developers don’t know where to start. They should look into the scikit-learn library, which is one of the best for developing machine learning applications. It is free and relatively easy to install and learn.

Why Machine Learning Programmers Should Be Familiar With Scikit-Learn

If you are trying to develop machine learning applications, then you were going to need a robust toolkit. Scikit-learn is just the solution that you need. This library was developed in 2007 as part of a Google project. Three years later, the code was released as hey solution for machine learning algorithms in conjunction with Google and several other major companies.