Over the last decade, the volume and variety of data and data storage technologies have been soaring. Businesses in every industry have been looking for cost-effective ways to store it, and storage has been one of the main requirements for data retention. With a change to [near] real-time data pipelines and growing adoption of Apache Kafka, companies must find ways to reduce the Total Cost of Ownership (TCO) of their data platform.
Currently, Kafka is configured with a shorter, and typically three days, retention period. Data, older than the retention period, is copied, via streaming data pipelines, to scalable external storage for long-term use like AWS S3 or Hadoop Distributed File System (HDFS).