HIPAA Compliant Solution Requirements

In the ever-connected era, cloud computing is altering the way medics, nurses, and hospitals deliver quality, cost-effective services to their patients. The Health Insurance Portability and Accountability Act of 1996 (HIPAA) is a law in the US published to protect the privacy of patient’s medical records and health-related information provided by/to patients, also known as PHI (Personal Health Information). 

HIPAA applies to “covered entities” and “business associates” including doctors, hospitals, health-related providers, clearinghouses, and health insurance providers. HIPAA is also applied to countries, all companies which are providing services related to health or they are handling or storing patient’s health information.

How to Protect Dataset Privacy Using Python and Pandas

Working with datasets that contain sensitive information is risky, and as a data scientist, you should be extremely careful whenever this type of data is present in a dataset. People dealing with sensitive information are often under the misunderstanding that by removing names, ID’s, and credit card numbers that the privacy risk is eliminated. While removing direct identifiers can help, there are more information elements in a dataset that can be used to re-identify an individual. For example, Latanya Sweeney, Director of the Data Privacy Lab in the Institute of Quantitative Social Science (IQSS) at Harvard, proved that 87 percent of US population can be re-identified using zip code, gender, and date of birth.

In this post, I am going to show you how to effectively reduce the privacy risk of a dataset while maintaining its analytical value for machine learning.