15 Places to Find Free Datasets for Your Data Science Projects

If you’ve ever worked on a personal data science project, you’ve probably spent a lot of time scouring the internet for interesting datasets to analyze.

It can be fun to sift through dozens of datasets to find the best fit, but it can also be frustrating to download and import multiple CSV files, only to find that the data is just missing, not so interesting. Fortunately, there are online repositories that keep sets of data and (mostly) remove uninteresting ones.

The 10 Commandments for Performing a Data Science Project

In designing a data science project, establishing what we, or the users we are building models for, want to achieve is vital, but this understanding only provides a blueprint for success. To truly deliver against a well-established brief, data science teams must follow best practices in executing the project. To help establish what that might mean, I have come up with ten points to provide a framework that can be applied to any data science project.

1. Understand the Problem 

The most fundamental part of solving any problem is knowing exactly what problem you are solving. Make sure that you understand what you are trying to predict, any constraints, and what the ultimate purpose for this project will be. Ask questions early on and validate your understanding with peers, domain experts, and end-users. If you find that answers are aligning with your understanding, you know that you are on the right path. 

The 10 Commandments for Designing a Data Science Project

Introduction

As businesses across industries seek to improve workflows and the delivery of products and services through increased automation, there is an ever-growing demand for the adoption of more advanced data science capabilities and projects. 

Artificial intelligence and machine learning can, of course, deliver great ROI — but only under the right conditions. In every instance, a data science project must be framed in the right way, both from a business and a technical point of view. To help provide this framework, I have devised the following “10 commandments” for designing a data science project. 

Successful Data Science Project Planning: CRISP-DM Is Not Dead!

It is very difficult to find a concise article containing a comprehensive guide to implement a Machine Learning or Data Science project. There are many online articles that provide detailed information on how we need to implement parts of a Machine Learning/Data Science project. Sometimes, companies only need high-level steps that show a clear overview.

A lot of Data Science project leads today forget about CRISP-DM, which is Cross-industry standard process for data mining created in 1996. In 2015, IBM released a new methodology called Analytics Solutions Unified Method for Data Mining/Predictive Analytics (also known as ASUM-DM) which refines and extends CRISP-DM.[1]