How to Generate and Compare Perceptual Image Hashes in Java

Perceptual image hashing is a relatively new process used primarily in the multimedia industry for content identification and authentication. The process itself uses an algorithm to extract specific features from an image and calculate a hash value based on that information. The hash value that is generated acts as a kind of ‘fingerprint’ for the image; it is a distinct identifier that is unique to its parent image. 

As you may have guessed by the fingerprint comparison, perceptual image hashing is particularly useful for digital forensics, but it has become an important player in prohibiting online copyright infringement as well. By comparing the hash value of an original/authentic image with the hash value of a similar image, you can identify and match various images and calculate the Hamming Distance between them. For reference, Hamming Distance measures the minimum number of substitutions it takes to change one image to the other, so hash values that are closer together are more similar. 

Decoded: Examples of How Hashing Algorithms Work

If cryptography was a body, its hashing algorithm would be the heart of it. If cryptography was a car, its hashing algorithm would be its engine. If cryptography was a movie, its hashing algorithm would be the star. If cryptography was the solar system, its hashing algorithm would be the sun. Okay, that’s probably too far, but you’ve got the point, right? Before we get to the what hashing algorithm is, why it’s there, and how it works, it’s important to understand where its nuts and bolts are. Let’s start with hashing.

What Is Hashing?

Let’s try to imagine a hypothetical situation here. Suppose you want to send a message/file to someone and it is absolutely imperative that it reaches its intended recipient in the exact same format. How would you do it? One option is to send it multiple times and verify that it wasn’t tampered with. But what if the message is too long? What if the file measures in gigabytes? It would be utterly absurd, impractical, and, quite frankly, boring to verify every single letter, right? Well, that’s where hashing comes into play.