July 16, 2019August 26, 2019 by Roberto Cervantes

How to Protect Dataset Privacy Using Python and Pandas

Working with datasets that contain sensitive information is risky, and as a data scientist, you should be extremely careful whenever this type of data is present in a dataset. People dealing with sensitive information are often under the misunderstanding that by removing names, ID’s, and credit card numbers that the privacy risk is eliminated. While removing direct identifiers can help, there are more information elements in a dataset that can be used to re-identify an individual. For example, Latanya Sweeney, Director of the Data Privacy Lab in the Institute of Quantitative Social Science (IQSS) at Harvard, proved that 87 percent of US population can be re-identified using zip code, gender, and date of birth.

In this post, I am going to show you how to effectively reduce the privacy risk of a dataset while maintaining its analytical value for machine learning.

Inspirational Websites Roundup: Webflow Special #5
In Articles, Inspiration, roundup
In collaboration with Webflow, we’re thrilled to present this roundup, featuring a handpicked selection of awe-inspiring websites. […]
Building Scalable Software Solutions for Display Manufacturing Automation
No categories
As display manufacturing continues to evolve, the demand for scalable software solutions to support automation has become more critical than ever. Scalable software architectures are the backbone of efficient and flexible production lines, enabling man... […]
Collective #850
No categories
Menu-Design Checklist * STL Twister * DGM.js * ASCII Silhouettify […]
How To Use Thread.sleep() in Selenium
No categories
When performing Selenium automation testing, you might encounter a NoSuchElementException() when the element you're trying to interact with isn't found. This error typically occurs due to the element existing on the page but takes a considerable amount... […]
Leveraging Test Containers With Docker for Efficient Unit Testing
No categories
In the changing world of software development, testing is crucial for making sure that applications are reliable, functional, and perform well. Unit testing is a method used to check how parts or units of code behave. However, testing can get tricky wh... […]