Giter VIP home page Giter VIP logo

femtogpt's Introduction

๐Ÿค– femtoGPT

crates.io GitHub top language GitHub

femtoGPT is a pure Rust implementation of a minimal Generative Pretrained Transformer.

It can be used for both inference and training of GPT-style language-models using CPUs and GPUs!

(HEY! I'm also writing a book, which will soon discuss the implementation of a LLM in detail! Check it out here: The Super Programmer)

Intro

Everything is implemented from scratch, including the tensor processing logic along with training/inference code of a minimal GPT architecture.

The architecture is very similar/almost identical with Andrej Karpathy's nanoGPT video lecture.

femtoGPT is a great start for those who are fascinated by LLMs and would like to understand how these models work in very deep levels.

femtoGPT uses nothing but random generation libraries (rand/rand-distr), data-serialization libraries (serde/bincode for saving/loading already trained models) and a parallel computing library (rayon).

femtoGPT is EXTREMELY SLOW relatively fast on CPU ๐Ÿ˜‰, and most of the primitive operations (E.g Matrix multiplication) are implemented in the simplest way possible.

Correctness of gradients is checked using gradient-check method, though it still is very possible that some layers are implemented wrongly.

(Discord server for discussions around the project!)

Usage

Make sure you have the Rust toolchain on your system, in order to compile and run the project:

curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh

If you want to train using a GPU, you will first need to make sure your GPU drivers are correctly installed on your system, and their OpenCL runtimes are available.

On Debian systems, you can setup OpenCL runtimes by installing the package ocl-icd-opencl-dev:

sudo apt install ocl-icd-opencl-dev

GOOD NEWS! Since femtoGPT's GPU implementation is based on OpenCL, it can run on both NVIDIA and AMD cards, and you won't need to install heavy-weight CUDA-toolkits on your system. OpenCL runtimes would suffice!

Now you'll just need to put the text you want to train your GPT model on, inside dataset.txt. Make sure it has a small number of unique characters! (E.g. the current dataset has only used 65 different unique characters!)

Then you'll need to run:

cargo run --release

It will start training the model and will put the training data in the train_data directory. You can stop the training and continue later!

Output samples

After hours of training on the Shakespeare database, on a 300k parameter model, this has been the output:

LIS:
Tore hend shater sorerds tougeng an herdofed seng he borind,
Ound ourere sthe, a sou so tousthe ashtherd, m se a man stousshan here hat mend serthe fo witownderstesther s ars at atheno sel theas,
thisth t are sorind bour win soutinds mater horengher

This is embarrassingly bad, but looking at the bright side, it seems like it has been able to generate words that are easy to pronounce.

I'm currently training a 10M parameter model to further examine the correctness of my implementation.

UPDATE 5th June 2023:

This has been a new output, after more hours of training on a model with similar scale:

What like but wore pad wo me che nogns yous dares,
As supt it nind bupart 'the reed:
And hils not es

Obviously the model has started to learn some words and punctuation rules!

UPDATE 9th June 2023:

Model was able to reach loss value of ~1.4

Here is an example output:

Adistition gone; true; schistoes for mine souls!
Before your home, bariechts should be
Carlam on that's a worf quirer of him so.
What look'd lack away more
To him foot; one hour fortious of saves:
Son;
'Tis all Earl mmistling me.

HARSARTIO:
Why, idless of my mocks fan that were percious.
Having I will thou should and the muour ne'er shor
To purple, when duke thy are out him.
But will bid you doth remember nature.
Even OF hencomey, carniffeit, I joy
Warming; my deed, but it on of mortard,

UPDATE 10th June 2023, sparks of AGI? LOL

Having trained a 5-layer model on a dataset gathered from Reddit, tokenized with sentencepiece tokenizer with a vocab-size of 500, for about ~10 hours on a 32-core CPU:

Prompt "I think":

I think it s not a scientific method of atheism
I think it s not a fairly simple possible to have a
I think that s a possible point, but I m not sure if you re not
I think that s a great idea, but I m not sure how these are you talking
I think it s a simple thing to do, but I m not sure how thes
I think that s a good thing, but I don t think that s what I mean, I think I
I think that s the only way to be able to use it to be in the same way, but
I think that the police are relatively relatively relative
I think that s a simple moron is a simple character
I think that s the only way to go to the second side of the pol

Prompt "Life is":

Life is not a good idea, but it s not a perfection of the opp
Life is a good thing to say that I don t know what you re talking about, but I don
Life is not the same as a religion, but I m not sure if you re a
Life is a perfectly good job of arguing that you are alm
Life is a perfectly good job of the opposite of the f
Life is a fundamentalist, and then I m not sure how the h
Life is not a good idea, and it s not a perfectly good job, but I
Life is not the same as atheists, but that s the only way to be ac
Life is a bit of a single one of these industry is a f
Life is a good idea to get the opposite of the police offic

Prompt "So sad that":

So sad that you can tell you what? I think I ve been using it on the scre
So sad that I don t know about it, but I don t think I m not afraid to
So sad that I m not sure if you re not arguing with the fact that you
So sad that I was involved in the future, and I have a few we
So sad that s what I said, I m sure you are almost everything you
So sad that you can do it, and I don t think that the fact that it s a po
So sad that I m not sure if you re arguing with the fact that they are
So sad that s the one too much time, but I m not sure if you re arg
So sad that you are sadly supposed to be a big deal in the world
So sad that I don t know about this, but I m not sure how you can do it, but

UPDATE 29th June 2023

After the implementation of the GPU trainer, we were able to train larger models. Here are some samples from a 8-layer 8-head 128-embedding-degree model, trained on TinyStories dataset on a vocab-size of 1000:

Once upon a time, there was a little girl named Lily.
She loved to play with her toys and she had a lot of fun.
One day, Lily saw a big chicky playing with her toys.
She asked her mom, "Can I play with her toys?" Her mom said,
"Sure, Lily. But we have to clean the pales. Let's suet some candy, Lily."
Lily nodded and went to her mom. They played with the mots and staugning her toys.  
Once upon a time, there was a little girl named Lily.
She loved to play outside and explore. One day, she found a jung on the ground.
She picked it up and tecked it. She ran around and saw it. She was very sad.
She asked her mom for her mom. Her mom said, "Lily, I'm going to find it!" Lily said.
She ran to the slock and took her to the teplace. She went to the park and found a molla.
There was a boy named Tim. Tim loved to play with his toys.
One day, Tim's mom came to the park. Tim saw a big, red ball and wanted to play with it.
Tim wanted to play with the ball. Tim was very excited. He wanted to play with the ball.
But the ball was too fast. Tim wanted to play with the ball. But the ball was too fast.
Tim tried to catch it, but it was too fast. Tim was sad. He tried to run away,
but he did not want to play. Tim was sad. He did not want to play with the ball.

femtogpt's People

Contributors

cutoken avatar eltociear avatar keyvank avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

femtogpt's Issues

AMD Gpu traning not working

Hello, I'm trying to run femtoGPT on my rx 6600 under Ubuntu Linux, I've installed the required Rocm OpenCL drivers but, when I run the program using

cargo run --release --features gpu

I get an index out of bounds exception

thread 'main' panicked at 'index out of bounds: the len is 0 but the index is 0', src/graph/gpu/mod.rs:119:22
note: run with RUST_BACKTRACE=1 environment variable to display a backtrace

Edit

This is the log with RUST_BACKTRACE=1

0: rust_begin_unwind
at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library/std/src/panicking.rs:593:5
1: core::panicking::panic_fmt
at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library/core/src/panicking.rs:67:14
2: core::panicking::panic_bounds_check
at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library/core/src/panicking.rs:162:5
3: femto_gpt::graph::gpu::GpuGraph::new
4: femto_gpt::main
note: Some details are omitted, run with RUST_BACKTRACE=full for a verbose backtrace.

Cleaner exception handling

Currently, I have been doing .unwrap()s all over the code which is very bad. It's good to have different error types and raise them instead of panicking.

E.g: TensorError when a faulty tensor operation happens, or GraphError on faulty computation graphs or GptError for core LLM related errors.

How to let the model fill text

Hi @keyvank ,
Instead of letting the model generate the whole text in infer, how do we make it complete a user provided text?

I'm guessing it should be by filling context vector in infer method.

loss jittering

Is it normal for the loss to be jittering?

I've been training my model for a few hours (like 6?), at the start the loss was mostly only decreasing but at around 3.00 it started jittering and the jittering only gets more intense. Currently my loss ranges from ~2.67 to ~2.92.

I'm training on my own dataset (10k lines, ~150KB) with 78 unique characters and 312k parameters (not sure if that matters)

Make femtoGPT the easiest GPT library ever made

It would be good if people could store their entire model+descritption+training_data in a single file so that others could easily infer or fine tune them.

Let's make femtoGPT a library!

Add more documentation/comments

Right now, the code is very undocumented. As femtoGPT is also an educational library, it's good to have comprehensive guides, documentations and commentsm

How to add a new decoder after gpt is created with ::new call ?

hi @keyvank ,
let's say I want to add a new decoder layer (the one that gets constructed as part of 0..num_layers loop) at run time after the gpt::new() call, how do I go about it ? As I understand you are just pushing the computations one by one incremented by tensorid so adding a layer at a later point of time will need incrementing the ids for the next layers as well (for example adding one more decoder layer along with all the sub layers like attention etc means incrementing the vocab out and other variables outside the for loop ?)

Also why keep computations as a btree when in reality it's being used more like a Vec as we are not even using the id against which each computation is stored (please correct me if I missed something :) )

Error When generating text

thread 'main' panicked at src/tokenizer/simple.rs:43:47:
called Option::unwrap() on a None value
note: run with RUST_BACKTRACE=1 environment variable to display a backtrace

Fine-tune meta-parameters

Find the best num-layers/embedding-degree/learning-rate/etc. in terms of accuracy and training time.

How to increase the number of parameters

Hi,
It's mentioned that this has 300k parameters. How to increase the number of parameters. Is it by increasing the number of layers/heads ? (also how are the total parameters calculated)

Thank you for attempting a pure Rust implementation of GPT. Helps a lot to understand things without having to understand python voodoo.

Edit: Also if possible having some comments around important parts of the code will help a lot from learning perspective. Would be great if you wrote a blog post on it :)

Random discussions

Hi,
Assert fails in tensor/mod.rs when number of heads = number of layers. Not sure if it's not a good number but in nanogpt, equal values are supported - see the cpu section (apples to oranges so please ignore if it's not a good comparison):

https://github.com/karpathy/nanoGPT

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.