Implementing "Attention is all you need"
- Transformer Base
- Sliding Attention Window
- Mamba
Refernces:
https://www.youtube.com/watch?v=kCc8FmEb1nY - Andrej Karpathy
https://github.com/jadore801120/attention-is-all-you-need-pytorch
https://nlp.seas.harvard.edu/2018/04/03/attention.html