Apache Avro to ORC Using Apache Gobblin

Apache Avro and Apache ORC 

Apache Avro and Apache ORC (Optimized Row Columnar) are top-level projects under the Apache Software Foundation. Fundamentally, they are data serialization formats with different strengths. 

Apache Avro is an efficient row-based binary file format for serializing data during transfer or at rest. It uses a schema to define the data structure that has to be serialized, and the schema is collocated and stored as part of Avro’s data file. As frequently needed in big data space, Avro was designed to support data evolution by allowing the augmentation of new fields to the data structure without the need for a complete recompilation of the code that uses it. 

CategoriesUncategorized