Building a Real-Time Bike-Share Data Pipeline with StreamSets, Kafka and MapD

In this post, we will use the Ford GoBike Real-Time System, StreamSets Data Collector, Apache Kafka, and MapD to create a real-time data pipeline of bike availability in the Ford GoBike bike share ecosystem. We’ll walk through the architecture and configuration that enables this data pipeline and share a simple auto-updating dashboard within MapD Immerse.

High-Level Architecture

The high-level architecture consists of two data pipelines; one pipeline polls the GoBike system and publishes that data to Kafka. The other pipeline consumes from Kafka using Data Collector, transforms the data, then writes the data to MapD: