vishnushukla1729 / history-of-deep-learning Goto Github PK

View Code? Open in Web Editor NEW

This project forked from saurabhaloneai/history-of-deep-learning

1.0 0.0 0.0 33.16 MB

learningggggggg

Python 0.02% C 0.01% Jupyter Notebook 99.98%

history-of-deep-learning's Introduction

speedrun of implemntation of History-of-Deep-Learning (inspired by "adam-maj" -added few more papers)

Three stage of implemntation : From Scrath, In PyTorch And In Jax(not all but some).

Totalcount : (9/32)

01-deep-neural-networks

Concept	Complete
BackPropagation	✅
CNN	✅
AlexNet	✅
U-net	✅

02-optimization-and-regularization

Concept	Complete
weights-decay	✅
relu	✅
residuals
dropout	✅
batch-norm
layer-norm
gelu	✅
adam
early-stopping	✅

03-sequence-modeling

Concept	Complete
rnn
lstm
learning-to-forget
word2vec
seq2seq
attention
mixture-of-experts

04-transformer

Concept	Complete
transformer
bert
t5
gpt
lora
rlhf
vision-transformer

05-image-generation

Concept	Complete
gans
vae
diffusion
clip
dall-e

Papers

DNN - Learning Internal Representations by Error Propagation (1987), D. E. Rumelhart et al. [PDF]
CNN - Backpropagation Applied to Handwritten Zip Code Recognition (1989), Y. Lecun et al. [PDF]
LeNet - Gradient-Based Learning Applied to Document Recognition (1998), Y. Lecun et al. [PDF]
AlexNet - ImageNet Classification with Deep Convolutional Networks (2012), A. Krizhevsky et al. [PDF]
U-Net - U-Net: Convolutional Networks for Biomedical Image Segmentation (2015), O. Ronneberger et al. [PDF]
Weight Decay - A Simple Weight Decay Can Improve Generalization (1991), A. Krogh and J. Hertz [PDF]
ReLU - Deep Sparse Rectified Neural Networks (2011), X. Glorot et al. [PDF]
Residuals - Deep Residual Learning for Image Recognition (2015), K. He et al. [PDF]
Dropout - Dropout: A Simple Way to Prevent Neural Networks from Overfitting (2014), N. Strivastava et al. [PDF]
BatchNorm - Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift (2015), S. Ioffe and C. Szegedy [PDF]
LayerNorm - Layer Normalization (2016), J. Lei Ba et al. [PDF]
GELU - Gaussian Error Linear Units (GELUs) (2016), D. Hendrycks and K. Gimpel [PDF]
Adam - Adam: A Method for Stochastic Optimization (2014), D. P. Kingma and J. Ba [PDF]
RNN - A Learning Algorithm for Continually Running Fully Recurrent Neural Networks (1989), R. J. Williams [PDF]
LSTM - Long-Short Term Memory (1997), S. Hochreiter and J. Schmidhuber [PDF]
Learning to Forget - Learning to Forget: Continual Prediction with LSTM (2000), F. A. Gers et al. [PDF]
Word2Vec - Efficient Estimation of Word Representations in Vector Space (2013), T. Mikolov et al. [PDF]
Phrase2Vec - Distributed Representations of Words and Phrases and their Compositionality (2013), T. Mikolov et al. [PDF]
Encoder-Decoder - Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation (2014), K. Cho et al. [PDF]
Seq2Seq - Sequence to Sequence Learning with Neural Networks (2014), I. Sutskever et al. [PDF]
Attention - Neural Machine Translation by Jointly Learning to Align and Translate (2014), D. Bahdanau et al. [PDF]
Mixture of Experts - Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer (2017), N. Shazeer et al. [PDF]
Transformer - Attention Is All You Need (2017), A. Vaswani et al. [PDF]
BERT - BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding (2018), J. Devlin et al. [PDF]
RoBERTa - RoBERTa: A Robustly Optimized BERT Pretraining Approach (2019), Y. Liu et al. [PDF]
T5 - Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer (2019), C. Raffel et al. [PDF]
GPT-2 - Language Models are Unsupervised Multitask Learners (2018), A. Radford et al. [PDF]
GPT-3 - Language Models are Few-Shot Learners (2020) T. B. Brown et al. [PDF]
LoRA - LoRA: Low-Rank Adaptation of Large Language Models (2021), E. J. Hu et al. [PDF]
RLHF - Fine-Tuning Language Models From Human Preferences (2019), D. Ziegler et al. [PDF]
PPO - Proximal Policy Optimization Algorithms (2017), J. Schulman et al. [PDF]
InstructGPT - Training language models to follow instructions with human feedback (2022), L. Ouyang et al. [PDF]
Helpful & Harmless - Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback (2022), Y. Bai et al. [PDF]
Vision Transformer - An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale (2020), A. Dosovitskiy et al. [PDF]
GAN - Generative Adversarial Networks (2014), I. J. Goodfellow et al. [PDF]
VAE - Auto-Encoding Variational Bayes (2013), D. Kingma and M. Welling [PDF]
VQ VAE - Neural Discrete Representation Learning (2017), A. Oord et al. [PDF]
VQ VAE 2 - Generating Diverse High-Fidelity Images with VQ-VAE-2 (2019), A. Razavi et al. [PDF]
Diffusion - Deep Unsupervised Learning using Nonequilibrium Thermodynamics (2015), J. Sohl-Dickstein et al. [PDF]
Denoising Diffusion - Denoising Diffusion Probabilistic Models (2020), J. Ho. et al. [PDF]
Denoising Diffusion 2 - Improved Denoising Diffusion Probabilistic Models (2021), A. Nichol and P. Dhariwal [PDF]
Diffusion Beats GANs - Diffusion Models Beat GANs on Image Synthesis, P. Dhariwal and A. Nichol [PDF]
CLIP - Learning Transferable Visual Models From Natural Language Supervision (2021), A. Radford et al. [PDF]
DALL E - Zero-Shot Text-to-Image Generation (2021), A. Ramesh et al. [PDF]
DALL E 2 - Hierarchical Text-Conditional Image Generation with CLIP Latents (2022), A. Ramesh et al. [PDF]
Deep Learning - Deep Learning (2015), Y. LeCun, Y. Bengio, and G. Hinton [PDF]
GAN - Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks (2016), A. Radford et al. [PDF]
DCGAN - Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks (2016), A. Radford et al. [PDF]
BigGAN - Large Scale GAN Training for High Fidelity Natural Image Synthesis (2018), A. Brock et al. [PDF]
WaveNet - WaveNet: A Generative Model for Raw Audio (2016), A. van den Oord et al. [PDF]
BERTology - A Survey of BERT Use Cases (2020), R. Rogers et al. [PDF]
GPT - Improving Language Understanding by Generative Pre-Training (2018), A. Radford et al. [PDF]
GPT-4 - GPT-4 Technical Report (2023), OpenAI [PDF]
Deep Reinforcement Learning - Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm (2017), D. Silver et al. [PDF]
Deep Q-Learning - Playing Atari with Deep Reinforcement Learning (2013), V. Mnih et al. [PDF]
AlphaGo - Mastering the Game of Go with Deep Neural Networks and Tree Search (2016), D. Silver et al. [PDF]
AlphaFold - Highly accurate protein structure prediction with AlphaFold (2021), J. Jumper et al. [PDF]
T5 - Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer (2019), C. Raffel et al. [PDF]
ELECTRA - ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators (2020), K. Clark et al. [PDF]
SimCLR - A Simple Framework for Contrastive Learning of Visual Representations (2020), T. Chen et al. [PDF]

Recommend Projects