Deep Learning at Alibaba Cloud With Alluxio – Running PyTorch on HDFS

Google’s TensorFlow and Facebook’s PyTorch are two Deep Learning frameworks that have been popular with the open source community. Although PyTorch is still a relatively new framework, many developers have successfully adopted it due to its ease of use.

By default, PyTorch does not support Deep Learning model training directly in HDFS, which brings challenges to users who store data sets in HDFS. These users need to either export HDFS data at the start of each training job or modify the source code of PyTorch to support reading from HDFS. Both approaches are not ideal because they require additional manual work that may introduce additional uncertainties to the training job.