Don’t Make a Hash of Analysis, Go Fuzzy

Security Operations Center (SOC) analysts spend a lot of their time and effort trying to identify if a document has changed, possibly signifying it has been compromised. The leading method of doing so involves using a hashing algorithm.

Using hash we can tell if there has been even the slightest change to a document. But what happens when the change is insignificant or our purpose is to locate similar files that don’t have the exact same hash?

Decoded: Examples of How Hashing Algorithms Work

If cryptography was a body, its hashing algorithm would be the heart of it. If cryptography was a car, its hashing algorithm would be its engine. If cryptography was a movie, its hashing algorithm would be the star. If cryptography was the solar system, its hashing algorithm would be the sun. Okay, that’s probably too far, but you’ve got the point, right? Before we get to the what hashing algorithm is, why it’s there, and how it works, it’s important to understand where its nuts and bolts are. Let’s start with hashing.

What Is Hashing?

Let’s try to imagine a hypothetical situation here. Suppose you want to send a message/file to someone and it is absolutely imperative that it reaches its intended recipient in the exact same format. How would you do it? One option is to send it multiple times and verify that it wasn’t tampered with. But what if the message is too long? What if the file measures in gigabytes? It would be utterly absurd, impractical, and, quite frankly, boring to verify every single letter, right? Well, that’s where hashing comes into play.