Giter VIP home page Giter VIP logo

count-sketch-optimizers's Introduction

Count-Sketch Optimizers

Compressing Gradient Optimizers via Count-Sketches

An ICML 2019 paper by Ryan Spring, Anastasios Kyrillidis, Vijai Mohan, Anshumali Shrivastava

BERT-Large Training Results

Trained with Activation Checkpointing and Mixed Precision Training (FP16) on Nvidia V100 DGX-1 servers

BERT-Large Adam Count-Min Sketch (CMS) - RMSprop
Time (Days) 5.32 5.52
Size (MB) 7,097 5,133
Test Perplexity 4.04 4.18

Convergence Rate - Adam, CMS-RMSprop Faster convergence rate with larger batch size - CMS-RMSprop

Instructions

  1. Install Requirements
  2. Add optimizers folder to $PYTHONPATH

Requirements

  1. torch
  2. torchvision
  3. cupy
  4. pynvrtc

Examples

  1. ImageNet - ResNet-18
  2. LM1B - Transformer / LSTM
  3. Wikitext-2 - LSTM

Dense Layer Support

We support compressing the dense layers of the neural network without update sparsity. During training, we update the auxiliary variables and perform the gradient update for each parameter in a single fused CUDA kernel. The dense kernel is equivalent to the sparse kernel. The main difference is that we explicitly avoid generating the auxiliary variables for the dense layers in global memory. Instead, we access them inside the shared memory of the GPU Streaming Multiprocessor. Without this key feature, our approach would not save any GPU memory for the dense layers. In the sparse case, we assume that the non-zero gradient updates is significantly smaller than the auxiliary variable. (See dense_exp_cms.py for more details)

References

  1. Transformer Architecture - Nvidia Megatron Language Model
  2. Compressing Gradient Optimizers via Count-Sketches (ICML 2019)

count-sketch-optimizers's People

Contributors

rdspring1 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.