“Attention is All you Need” showed attention as a sequence of multiplicative and concat operations but… what if I told you they are additive?
The post Transformers (and Attention) are Just Fancy Addition Machines appeared first on Towards Data Science.