Using Machine Learning for Log Analysis and Anomaly Detection: A Practical Approach to Finding the Root Cause

There are many articles on applying machine learning for log analysis. However, most of them are dated, academic in nature, or don’t focus on practical outcomes. On DZone, the last time an article covering how ML can be used for log analysis was published 5 years ago.

In this article, we want to share our real-life experience on using ML/AI for log analysis and anomaly detection with the specific purpose of automatically uncovering the root cause of software issues.

Anomaly Detection Using the Bag-of-Words Model

I am going to show in detail one use case of unsupervised learning: behavioral-based anomaly detection. Imagine you are collecting daily activity from people. In this example, there are six people (S1-S6). When all the data are sorted and pre-processed, the result may look like this list:

  • S1 = eat, read book, ride bicycle, eat, play computer games, write homework, read book, eat, brush teeth, sleep
  • S2 = read book, eat, walk, eat, play tennis, go shopping, eat snack, write homework, eat, brush teeth, sleep
  • S3 = wake up, walk, eat, sleep, read book, eat, write homework, wash bicycle, eat, listen music, brush teeth, sleep
  • S4 = eat, ride bicycle, read book, eat, play piano, write homework, eat, exercise, sleep
  • S5 = wake up, eat, walk, read book, eat, write homework, watch television, eat, dance, brush teeth, sleep
  • S6 = eat, hang out, date girl, skating, use mother's CC, steal clothes, talk, cheating on taxes, fighting, sleep

S1 is the set of the daily activity of the first person, S2 of the second, and so on. If you look at this list, then you can pretty easily recognize that activity of S6 is somehow different from the others. That's because there are only six people. What if there were six thousand? Or six million? Unfortunately, there is no way you could recognize the anomalies. But machines can. Once a machine can solve a problem on a small scale, it can usually handle the large scale relatively easily. Therefore, the goal here is to build an unsupervised learning model that will identify S6 as an anomaly.