Giter VIP home page Giter VIP logo

webgpt's Introduction

WebGPT

webGPT

After six years of development, WebGPU is about to launch across most major web browsers. This is massive: web applications now have near-native access to the GPU, with the added capacity of compute shaders.

WebGPT is a vanilla JS and HTML implementation of a transformer model, intended as a proof-of-concept as well as educational resource. WebGPT has been tested to be working with models up to 500 M parameters, though could likely support far more with further testing/optimization.

At the moment, WebGPT averages ~50ms per token on GPT-2 124M running on a 2020 M1 Mac with Chrome Canary. This could be 500% faster, if not more, with better optimization of the kernels, buffers, the WebGPU interface. WebGPU should also receive significant speed increases as it matures.

WebGPT.Demo.mp4

Running WebGPT

Running WebGPT is remarkably simple, as it's just a set of HTML + JS files. Since WebGPU is still in the process of being released, you'll need to open with a compatible browser. WebGPU is currently available on Chrome v113 but the most straightforward way to ensure proper functionality is to install Chrome Canary and enable "Unsafe WebGPU" at chrome://flags/#enable-unsafe-webgpu.

I've included two different models: a toy GPT-Shakespeare model (which is severly undertrained haha) and GPT-2 117M. See main.js for more information on how to run these models. If you want to import custom models, take a look at misc/conversion_scripts.

If you want to try out WebGPT, visit the demo website here KMeans.org. I'd generally reccomend cloning the repo and running locally, just because loading the weights remotely is significantly slower. Note: You'll need to use Git LFS to download the model files, after cloning the repository.

file sizes

Roadmap / Fixing Stupid Decisions

  • Embeddings / de-embeddings on GPU.
  • Initializing pipelines on every step is incredibly inefficient.
  • Key-value caching.
  • Reuse buffers.
  • Kernel shared memory for matmul!
  • Destroy buffers after use!
  • Investigate why attention cache isn't giving proper speed-ups.
  • Make simple instructional version without special stuff.
  • Optimize workgroup sizes, specifically for single row/col operations.
  • Optimize all other kernels.
  • Convert into a package.
  • Compute pass splitting for larger models (maxStorageBufferBindingSize)
  • Write better comments + make Youtube explainer.

Acknowledgements

When I started this project I had no idea how transformers worked or how to implement them (or GPUs or matmul kernels or WebGPU or tokenization for that matter), so Andrej Karpathy's series on neural networks and building GPT from scratch were invaluable: Andrej's Youtube. I've also used some code as well from the nanoGPT repository: nanoGPT.

I copied from LatitudeGames' implementation of OpenAI's GPT-3 tokenizer in Javascript: GPT-3-Encoder.

Note: I'm looking for work!

I'm currently in the process of switching into the AI field. I'm specifically looking for opportunites at larger research labs in a variety of jobs, with the goal of breaking into the space and finding an area in which to specialize. If you're interested, check out my personal website: Personal Website

webgpt's People

Contributors

0hq avatar chenglou avatar m1guelpf avatar felladrin avatar josephrocca avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.