June 18, 2021 by Ilya Dudkin

Using Machine Learning to Detect Dupes: Some Real-Life Examples

As companies collect more and more data about their customers, an increased amount of duplicate information starts appearing in the data as well, causing a lot of confusion among internal teams. Since it would be impossible to manually go through all of the data and delete the duplicates, companies have come up with machine learning solutions that perform such work for them. Today we would like to take a look at some interesting uses of machine learning to catch duplicates in all kinds of environments. Before we dive right in, let’s take a look at how machine learning systems work.

How Do Machine Learning Systems Identify Duplicates?

When a person looks at an image or two strings of data it would be fairly easy for them to determine whether or not the images or strings are duplicates. However, how would you train a machine to spot such duplicates? Perhaps a good starting point would be to identify all of the similarities, but then you would need to explain exactly what 'similar' means. Are there gradations to similarities? In order to overcome such challenges, researchers use string metrics to train machine learning models.

The Scale, Speed, and Spend of Low Code: Benefits and Challenges of Low-Code Platforms
No categories
Editor's Note: The following is an article written for and published in DZone's 2024 Trend Report, Low-Code Development: Elevating the Engineering Experience With Low and No Code. The relevance of low-code development is growing as businesses see... […]
Empowering Citizen Developers With Low- and No-Code Tools: Changing Developer Workflows and Empowering Non-Technical Employees to Build Apps
No categories
Editor's Note: The following is an article written for and published in DZone's 2024 Trend Report, Low-Code Development: Elevating the Engineering Experience With Low and No Code. The rise of low-code and no-code (LCNC) platforms has sparked a de... […]
PostgreSQL BiDirectional Replication
No categories
As you can understand from my previous blogs I am really into PostgreSQL. Previously we ran Debezium in Embedded mode. Behind the scenes, Debezium consumes the changes that were committed to the transaction log. This happens by utilizing the logical de... […]
Twenty Things Every Java Software Architect Should Know
No categories
As the software development landscape continues to evolve at a rapid pace, Java stands out as a foundational language that drives a multitude of applications on a global scale. In 2024, the role of a Java software architect has assumed unprecedented si... […]
How To Plan a (Successful) MuleSoft VPN Migration (Part II)
No categories
In this second post, we'll be reviewing more topics that you should take into consideration if you're planning a VPN migration. If you missed the first part, you can start from there. […]

Proudly powered by WordPress