Scalable Select of Random Rows in SQL

If you’re new to the big data world and also migrating from tools like Google Analytics or Mixpanel for your web analytics, you probably noticed performance differences. Google Analytics can show you predefined reports in seconds, while the same query for the same data in your data warehouse can take several minutes or even more.

Such performance boosts are achieved by selecting random rows or the sampling technique. We see this technique is used quite frequently during deployments of Cube.js related to massive amounts of data. Let’s learn how to select random rows in SQL.