Anomaly Detection Using the Bag-of-Words Model

I am going to show in detail one use case of unsupervised learning: behavioral-based anomaly detection. Imagine you are collecting daily activity from people. In this example, there are six people (S1-S6). When all the data are sorted and pre-processed, the result may look like this list:

  • S1 = eat, read book, ride bicycle, eat, play computer games, write homework, read book, eat, brush teeth, sleep
  • S2 = read book, eat, walk, eat, play tennis, go shopping, eat snack, write homework, eat, brush teeth, sleep
  • S3 = wake up, walk, eat, sleep, read book, eat, write homework, wash bicycle, eat, listen music, brush teeth, sleep
  • S4 = eat, ride bicycle, read book, eat, play piano, write homework, eat, exercise, sleep
  • S5 = wake up, eat, walk, read book, eat, write homework, watch television, eat, dance, brush teeth, sleep
  • S6 = eat, hang out, date girl, skating, use mother's CC, steal clothes, talk, cheating on taxes, fighting, sleep

S1 is the set of the daily activity of the first person, S2 of the second, and so on. If you look at this list, then you can pretty easily recognize that activity of S6 is somehow different from the others. That's because there are only six people. What if there were six thousand? Or six million? Unfortunately, there is no way you could recognize the anomalies. But machines can. Once a machine can solve a problem on a small scale, it can usually handle the large scale relatively easily. Therefore, the goal here is to build an unsupervised learning model that will identify S6 as an anomaly.

Robust Exception Handling

Oh no, don't do this to me...

// Writing comment that exception is skipped
try {
    throw new IOException("Made up");
} catch (IOException e) {
    // skip it
}

// Logging and moving on
try {
    throw new IOException("Made up");
} catch (IOException e) {
    log.error("blah blah blah", e);
}

// Creating TODO instead of actually doing the job
try {
    throw new IOException("Made up");
} catch (IOException e) {
    // TODO - handler it (;
}