PyTorch Neural Quirks

PyTorch uses some different software models than you might be used to, especially if you migrate to using it from something like Keras or TensorFlow. This first is, of course, the Tensor concept (something shared with TensorFlow, but not so obvious in Keras). The second is the nn.Module hierarchy you need to use when building a particular network. The final one is implied dimensionality and the channel concept. Of these, I'd really like to focus on the latter its own article, so let's get the first two out of the way first.

Tensors in PyTorch are really just values, and they mirror many of the methods available on NumPy arrays — like ones(), zeros(), etc. They have specific naming conventions on instances too. For example, Tensor::add_() will add to the calling addend and adding in place, while Tensor::add() will return a new Tensor with the new cumulative value. They support list-like indexing semantics, slicing, and comprehensions as well. They convert easily too and from NumPy arrays as well via the torch.from_numpy() and Tensor::numpy() methods. They also have a sense of location and are affiliated with a specific device, and this is where things can get tricky.