A New Approach to Solve I/O Challenges in the Machine Learning Pipeline

Background

The drive for training accuracy leads companies to develop complicated training algorithms and collect a large amount of training data with which single-machine training takes an intolerable long time. Distributed training seems promising in meeting the training speed requirements but faces the challenges of data accessibility, performance, and storage system stability in dealing with I/O in the machine learning pipeline.

Solutions

The above challenges can be addressed in different ways. Traditionally, two solutions are commonly used to help resolve data access challenges in distributed training. Beyond that, Alluxio provides a different approach.