Dive into Deep Learning, redone by Quanta Magazine
Attention (wip)
This repository will house a visualization that will attempt to convey instant enlightenment of how Attention works, in the field of artificial intelligence. Obviously I believe this algorithm to be one of the most important developments in the history of deep learning. We can possibly use it to solve, well, everything.
In my mind, one good intuitive visualization can bring about more insight and understanding than long highly paid tutoring / courses.
Why does it work?
Attention has many interpretations, ranging from physics based intepretations to speculations on biological plausibility.
Update: Recently, three papers have concurrently closed in on a connection between self-attention and gradient descent, while investigating in-context learning properties of Transformers!
- Transformers learn in-context by gradient descent
- What learning algorithm is in-context learning? Investigations with linear models
- Why Can GPT Learn In-Context? Language Models Secretly Perform Gradient Descent as Meta-Optimizers
What has Attention accomplished?
- Protein Folding
- Language
- Vision
- Image Segmentation
- Speech Recognition
- Symbolic Mathematics
- Midi Generation
- Theorem Proving
- Gene Expression
- Text to Image
- Attention-only Text to Image
- Text to Video
- Text to Video 2
- Code Generation
- Language+
- Protein Generation
- Multimodal Model
- Video Understanding
- Heart Disease Classification
- Weather Forecasting
- Text to Speech
- Few-Shot Visual Question Answering
- Generalist Agent
- Audio Generation from Raw Waveform
- Sample Efficient World Model
- Audio / Speech Generation
- Nucleic Acid / Protein Binding
- Generalizable Prompting for Robotic Arm Control
- Zero-shot Text to Speech
- Music Generation
- Designing Molecular Scissors for DNA
- Nucleic Language Model
Will keep adding to this list as time goes on
Other resources
Is it all we need?
No one really knows. All I know is, if we were to dethrone attention with a better algorithm, it is over. Part of what motivates me to do some scalable 21st century teaching is the hope maybe someone can find a way to improve on it, or find its replacement. It just takes one discovery!
Potential improvements
Appreciation
Large thanks goes to 3Blue1Brown for showing us that complex mathematics can be taught with such elegance and potency through visualizations
Citations
@misc{vaswani2017attention,
title = {Attention Is All You Need},
author = {Ashish Vaswani and Noam Shazeer and Niki Parmar and Jakob Uszkoreit and Llion Jones and Aidan N. Gomez and Lukasz Kaiser and Illia Polosukhin},
year = {2017},
eprint = {1706.03762},
archivePrefix = {arXiv},
primaryClass = {cs.CL}
}
@article{Bahdanau2015NeuralMT,
title = {Neural Machine Translation by Jointly Learning to Align and Translate},
author = {Dzmitry Bahdanau and Kyunghyun Cho and Yoshua Bengio},
journal = {CoRR},
year = {2015},
volume = {abs/1409.0473}
}
Gotta teach the AGI to love. - Ilya Sutskever