Definitely more than meets the eye, Transformers are a type of neural network that is largely based on the “attention mechanism”. One of the major benefits to this architecture is the ability to train on a large amount of data in parallel rather than in sequence, which opens up the door to companies being able to train very large models on very large datasets much more quickly and efficiently than previous recurrent models.