Building Open Source Google Analytics from Scratch

From an engineering standpoint, the technology behind Google Analytics was pretty sophisticated when it was created. Custom, tailored-made algorithms were implemented for event collection, sampling, aggregation, and storing output for reporting purposes. Back then, it required years of engineering time to ship such a piece of software. Big data landscapes have changed drastically since then. In this tutorial, we’re going to rebuild an entire Google Analytics pipeline. We’ll start from data collection and reporting. By using the most recent big data technology available, we’ll see how simple it is to reproduce such software nowadays.

TL;DR

Here’s an analytics dashboard with an embedded tracking code that collects data about its visitors while visualizing it at the same time.