anomaly detection | The Blog Pros

November 5, 2021

Using Machine Learning for Log Analysis and Anomaly Detection: A Practical Approach to Finding the Root Cause

There are many articles on applying machine learning for log analysis. However, most of them are dated, academic in nature, or don’t focus on practical outcomes. On DZone, the last time an article covering how ML can be used for log analysis was published 5 years ago.

In this article, we want to share our real-life experience on using ML/AI for log analysis and anomaly detection with the specific purpose of automatically uncovering the root cause of software issues.

September 5, 2019

Anomaly Detection Using the Bag-of-Words Model

I am going to show in detail one use case of unsupervised learning: behavioral-based anomaly detection. Imagine you are collecting daily activity from people. In this example, there are six people (S1-S6). When all the data are sorted and pre-processed, the result may look like this list:

S1 = eat, read book, ride bicycle, eat, play computer games, write homework, read book, eat, brush teeth, sleep
S2 = read book, eat, walk, eat, play tennis, go shopping, eat snack, write homework, eat, brush teeth, sleep
S3 = wake up, walk, eat, sleep, read book, eat, write homework, wash bicycle, eat, listen music, brush teeth, sleep
S4 = eat, ride bicycle, read book, eat, play piano, write homework, eat, exercise, sleep
S5 = wake up, eat, walk, read book, eat, write homework, watch television, eat, dance, brush teeth, sleep
S6 = eat, hang out, date girl, skating, use mother's CC, steal clothes, talk, cheating on taxes, fighting, sleep

S1 is the set of the daily activity of the first person, S2 of the second, and so on. If you look at this list, then you can pretty easily recognize that activity of S6 is somehow different from the others. That's because there are only six people. What if there were six thousand? Or six million? Unfortunately, there is no way you could recognize the anomalies. But machines can. Once a machine can solve a problem on a small scale, it can usually handle the large scale relatively easily. Therefore, the goal here is to build an unsupervised learning model that will identify S6 as an anomaly.

GBase 8a Implementation Guide: Resource Assessment
No categories
1. Disk Storage Space Evaluation The storage space requirements for a GBase cluster are calculated based on the data volume of the business system, the choice of compression algorithm, and the number of cluster replicas. The data volume of a business s... […]
A Look Into Netflix System Architecture
No categories
Ever wondered how Netflix keeps you glued to your screen with uninterrupted streaming bliss? Netflix Architecture is responsible for the smooth streaming experience that attracts viewers worldwide behind the scenes. Netflix's system architecture emphas... […]
High Availability and Disaster Recovery (HADR) in SQL Server on AWS
No categories
High Availability and Disaster Recovery (HADR) play a vital role in maintaining the integrity of data, reducing downtime, and safeguarding against data loss in enterprise database systems. AWS offers a range of HADR options for SQL Server, which levera... […]
Terraform Tips for Efficient Infrastructure Management
No categories
Terraform is a popular tool for defining and provisioning infrastructure as code (IaC), improving consistency, repeatability, and version control. But you need to know how to use it properly to extract maximum value from it as an infrastructure managem... […]
Integration Testing With Keycloak, Spring Security, Spring Boot, and Spock Framework
No categories
In today's security landscape, OAuth2 has become a standard for securing APIs, providing a more robust and flexible approach than basic authentication. My journey into this domain began with a critical solution architecture decision: migrating from bas... […]

Proudly powered by WordPress