RNN, Seq2Seq, Transformers: Introduction to Neural Architectures Commonly Used in NLP

Just a few years ago, RNNs and their gated variants (that added multiplicative interactions and mechanisms for better gradient transfer) were the most popular architectures used for NLP.

Prominent researchers, such as Andrey Karpathy, were singing odes to RNNs' unreasonable effectiveness and large corporations were keen on adopting the models to put them into virtual agents and other NLP applications.