Apache Parquet vs. CSV Files

You have surely read about Google Cloud (i.e. BigQuery, Dataproc), Amazon Redshift Spectrum, and Amazon Athena. Now, you are looking to take advantage of one or two. However, before you jump into the deep end, you will want to familiarize yourself with the opportunities of leveraging Apache Parquet instead of regular text, CSV, or TSV files. If you are not thinking about how to optimize for these new query service models, you are throwing money out the window.

What Is Apache Parquet?

Apache Parquet is a columnar storage format with the following characteristics: