Tom’s Tech Notes: Big Data Advice for Devs [Podcast]

Welcome to our latest episode of Tom's Tech Notes! This week, we'll hear advice from industry experts tailored specifically for developers. From general tips for Big Data app development to the formats and architectures you need to know about, here's what they have to say about the modern Big Data ecosystem.

As a primer and reminder from our initial post, these podcasts are compiled from conversations our analyst Tom Smith has had with experts from around the world as part of his work on our research guides.

Apache Parquet vs. CSV Files

You have surely read about Google Cloud (i.e. BigQuery, Dataproc), Amazon Redshift Spectrum, and Amazon Athena. Now, you are looking to take advantage of one or two. However, before you jump into the deep end, you will want to familiarize yourself with the opportunities of leveraging Apache Parquet instead of regular text, CSV, or TSV files. If you are not thinking about how to optimize for these new query service models, you are throwing money out the window.

What Is Apache Parquet?

Apache Parquet is a columnar storage format with the following characteristics: