Building Pyarrow With CUDA Support

The other day, I was looking to read an Arrow buffer on GPU using Python, but as far as I could tell, none of the provided pyarrow packages on conda or pip are built with CUDA support. Like many of the packages in the compiled-C-wrapped-by-Python ecosystem, Apache Arrow is thoroughly documented, but the number of permutations of how you could choose to build pyarrow with CUDA support quickly becomes overwhelming.

In this post, I’ll show you how to build pyarrow with CUDA support on Ubuntu using Docker and virtualenv. These directions are approximately the same as the official Apache Arrow docs, but here, I explain them step-by-step and show only the single build toolchain I used.