Giter VIP home page Giter VIP logo

webgpt's Introduction

WebGPT

webGPT

After six years of development, WebGPU is about to launch across most major web browsers. This is massive: web applications now have near-native access to the GPU, with the added capacity of compute shaders.

WebGPT is a vanilla JS and HTML implementation of a transformer model, intended as a proof-of-concept as well as educational resource. WebGPT has been tested to be working with models up to 500 M parameters, though could likely support far more with further testing/optimization.

Current Stats

2020 M1 Mac: 3ms/token at 5M parameters with f32 precision.
2020 M1 Mac: 30ms/token at 117M parameters with f32 precision.
2020 M1 Mac: 70ms/token at 377M parameters with f32 precision.
2020 M1 Mac: 120ms/token at 775M parameters with f32 precision.
1.5B is working but unstable, sitting around 1000ms/token due to inefficiencies.

Running WebGPT

Running WebGPT is remarkably simple, as it's just a set of HTML + JS files. Since WebGPU is still in the process of being released, you'll need to open with a compatible browser. WebGPU is currently available on Chrome v113 but the most straightforward way to ensure proper functionality is to install Chrome Canary or Edge Canary.

I've included two different models: a toy GPT-Shakespeare model (which is severly undertrained haha) and GPT-2 117M. See main.js for more information on how to run these models. If you want to import custom models, take a look at misc/conversion_scripts.

If you want to try out WebGPT, visit the demo website here KMeans.org. I'd generally reccomend cloning the repo and running locally, just because loading the weights remotely is significantly slower.
Note: You'll need to use Git LFS to download the model files, after cloning the repository.

file sizes

Roadmap / Fixing Stupid Decisions

  • Embeddings / de-embeddings on GPU.
  • Initializing pipelines on every step is incredibly inefficient.
  • Key-value caching.
  • Reuse buffers.
  • Kernel shared memory for matmul!
  • Destroy buffers after use!
  • Create kernel instruction classes + optimize pipeline creation.
  • Fuse all kernels.
  • Optimize all other kernels.
  • Compute pass splitting for larger models (maxStorageBufferBindingSize)
  • Run selection ops on GPU (topk, selection softmax)
  • Attention kernel is optimized for small models, not for large models where each head having it's own matmul is more efficient.
  • Investigate why attention cache isn't giving proper speed-ups.
  • Make simple instructional version without special stuff.
  • Optimize workgroup sizes, specifically for single row/col operations.
  • Convert into a package.
  • Write better comments + make Youtube explainer.

Acknowledgements

When I started this project I had no idea how transformers worked or how to implement them (or GPUs or matmul kernels or WebGPU or tokenization for that matter), so Andrej Karpathy's series on neural networks and building GPT from scratch were invaluable: Andrej's Youtube. I've also used some code as well from the nanoGPT repository: nanoGPT.

I copied from LatitudeGames' implementation of OpenAI's GPT-3 tokenizer in Javascript: GPT-3-Encoder.

webgpt's People

Contributors

0hq avatar brandon-lb avatar carsonpo avatar chenglou avatar felladrin avatar flbn avatar josephrocca avatar m1guelpf avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

webgpt's Issues

Make note that Git LFS is required to the README or Index

Was running into errors when running locally that weights/gpt2/params_gpt.json wasn't valid and I looked inside and it turned out that it won't clone correctly without git LFS installed. Maybe make mention of it in the README or the index.html where it says "PS: Loading models is 5x slower on the web rather than running locally. Just clone the repo and open!" might save others like me from troubleshooting.

Does "Unsafe WebGPU" flag still need to be set to "enabled"?

Hi! 👋 Thanks for sharing your project!

I've got a question about this part of the Readme:

Running WebGPT is remarkably simple, as it's just a set of HTML + JS files. Since WebGPU is still in the process of being released, you'll need to open with a compatible browser. WebGPU is currently available on Chrome v113 but the most straightforward way to insure proper functionality is to install Chrome Canary and enable "Unsafe WebGPU" in settings.

On Chrome Canary v114, I couldn't see the difference between running it with the flag enabled and disabled.
Could you clarify why it's recommended to enable this flag?

Strange behavior

Hello @0hq,
I think the below results are not the expected behavior.

Shakespeare Model.


GPT2 Model.

Environment

Arch Linux
Chrome Canary Version 114.0.5720.4 (Official Build) dev (64-bit)
Vulkan
google-chrome-unstable --enable-features=Vulkan

No available adapters.

Running error
My chrome version 115.0.5790.171

No available adapters.
model.js:26 Uncaught (in promise) TypeError: Cannot read properties of null (reading 'requestDevice')
    at GPT.initialize (model.js:26:33)
    at async loadModel (index.html:98:9)
initialize @ model.js:26
await in initialize (async)
onclick @ index.html:21

image
image

I get this error lol

Access to fetch at 'file:///C:/Users/idgaf/OneDrive/Desktop/WebGPT/models/better_shakespeare/params_gpt.json' from origin 'null' has been blocked by CORS policy: Cross origin requests are only supported for protocol schemes: http, data, isolated-app, chrome-extension, chrome, https, chrome-untrusted.

Cuz we are loading the model using either file:// or C:/, which stays true to the error message as they are not http://

I'll PR a fix in the readme

Error on MacOS

After cloning the repo and enabling WebGPU in my canary chrome 114, when I tried to load either of the models, I get a console error:

Loading model from folder: gpt2

VM16:1 Uncaught (in promise) SyntaxError: Unexpected token 'v', "version ht"... is not valid JSON

await (async)
onclick @ (index):18`

This, when run in an M1 MacOS 13.3 MacBook pro

Any ideas? Thanks!

Error loading GPT2 on local

Steps to duplicate

  1. Clone the repo
  2. serve the files using serve npm package
  3. Open on Canary with Unsafe WebGPU enabled
  4. Click on "Load GPT2 Model"

Current scenario - error on console Uncaught (in promise) SyntaxError: Unexpected token 'v', "version ht"... is not valid JSON

Expected - load the model successfully

How slow is slow?

I downloaded the github repo and placed in on a localhost server.

I opened the page, and clicked on the "Load GPT2 117Mb" model.

I've been waiting for a few minutes now, with the output stuck on Loading token embeddings.... Is that normal behaviour?

Loading model from folder: gpt2
Loading params...
Warning: Buffer size calc result exceeds GPU limit, are you using this value for a tensor size? 50257 768 1 154389504
bufferSize @ model.js:510
loadParameters @ model.js:298
await in loadParameters (async)
loadModel @ model.js:276
initialize @ model.js:32
await in initialize (async)
loadModel @ gpt/:105
onclick @ gpt/:23
Params: {n_layer: 12, n_head: 12, n_embd: 768, vocab_size: 50257, n_ctx: 1024, …}
Loading token embeddings...
  • Apple M1 Pro
  • Brave 1.61 - "WebGPU is supported in your browser!"

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.