Giter VIP home page Giter VIP logo

smol-gpt's Introduction

smol-GPT

I built a small (22M parameter) GPT-2 like transformer language model from scratch and trained it on Wikitext103 dataset.

Results

Trained on T4 gpu on Kaggle. val_loss can be brought down further by simply training for more iterations or by changing learning rate or other hyper-parameters in config.py.

GPT-22M model training run visualized:

smol-GPT

Dependencies

pip install torch numpy tiktoken wandb tqdm sklearn pandas

Wandb api key is also needed, which is free. More on it here.

Dataset

I have used wikitest103 dataset with the following split:

  • Train: 122M tokens
  • Validation: 250K tokens
  • Test: 290k tokens

Instructions to prepare the dataset:

Training

GPU is recommended for training. The code is flexible and you can increase/decrease the model parameters. Here's how to train the model:

python train.py --config path_to_config(config.py by default)

You can also set the following training params when executing train.py or provide them in your config file:

Training parameters

options:
  -h, --help            show this help message and exit
  --config CONFIG       Path to the configuration file (cofig.py by default)
  --batch_size BATCH_SIZE
                        Batch size
  --grad_accum_steps GRAD_ACCUM_STEPS
                        Gradient accumulation steps
  --context_length CONTEXT_LENGTH
                        Context length
  --n_layer N_LAYER     Number of transformer layers
  --n_head N_HEAD       Number of attention heads
  --vocab_size VOCAB_SIZE
                        Vocabulary size
  --n_embd N_EMBD       Embedding dimension
  --bias                Use bias in the model
  --dropout DROPOUT     Dropout rate
  --lr LR               Learning rate
  --min_lr MIN_LR       Minimum learning rate
  --weight_decay WEIGHT_DECAY
                        Weight decay
  --max_iters MAX_ITERS
                        Maximum iterations to train for
  --lr_decay_iters LR_DECAY_ITERS
                        Decay learning rate upto this iteration
  --warmup_iters WARMUP_ITERS
                        Warm up iterations
  --eval_epochs EVAL_EPOCHS
                        Number of evaluation epochs
  --eval_intervel EVAL_INTERVEL
                        Evaluation interval
  --device DEVICE       Device to use 'cuda' or 'cpu'
  --save_chkpt_epoch SAVE_CHKPT_EPOCH
                        Save checkpoint every N epochs
  --checkpoint_path CHECKPOINT_PATH
                        Path to save checkpoints
  --resume              Resume training
  --load_checkpoint_path LOAD_CHECKPOINT_PATH
                        Path to the checkpoint to resume training from

Inference

Here is a sample inference from the 22M model:

The Amazon rainforest is not known from the Glasgow and St. Nicholas is of the 1999 Nintendo Entertainment Expo , Japanese retail at the Toronto International Exhibition Exhibition . The impact of ancient jewelry is tourism and horticultural conditions . The city of Dunmoreland Castle is one of the general rocks at least , a 150 @-@ metre long gallery sold in the winter . In town , workforce fluctuates in the 15 @,@ 000 litres ( 393 @,@ 300 yd ) of high quality annual energy revenue of 20 items stalls . At Pecke muts of about thirty per cent of visitors attended the pool , ranking at the top of the roofs on court scale and the top of the system 's rooms . Trade and suspension of buildings are easy to assess and mouse design , with possible planning for the opening down until the first floor has scale restoration .
"
" The largest operating tower in Coatbridge is the main gate coach is the Hathazwalburam Lakkenwa , the symbol of the NTwaqs ( Taddoa City @-@ Peir ) .
"
""
" = = = Outcrossing and sash = = =
"
""
" The top of Coatbridge is the lie in Coatbridge railway , at Eckfeldwe , in Burythreshold and the Yellow River subdivision near the head of Coatbridge . Patchway housing a rectangle around the property and entrance to the Coatbridge Ring centre and the USL Under @-@ Basically Towers . These are operated by Great YOG . Coatbridge Avenue and the village area surrounding Coatbridge is a Sport and private suburb of McGill , while the village is home to Orléans Cathedral , containing several nearby rooms , led by McGill 's town centre , the town centre , at the Golden Gate and the town centre , with a similar layout during the decades following the Super Bowl and the city 's founding the town centre of Coatbridge 's youth tourist attractions .
"
""
" = = = Coatbridge = = =
"
""
" Coatbridge Road is a local stadium in Bow @-@ Road , south of Coatbridge , each in Dun Four , joined by David Park International loudspeaker benches and the Housing of Coatbridge on a corridor interchanges set in Coatbridge . Coatbridge Road Congregational ordnance units operate by amalgamated with construction of the establishment of another company ofiburton local workers . They are typically at the same venue . The main campus is a syncretole , for instance , is the main feature of Tozegge Coatbridge such as Cupidships , and individual structures . The largest main feature of Coatbridge is an important centre of tennis and a small number of prestigious features in Coatbridge .
"
""
" = = = Coatbridge National Aeronage ( IFR ) = = =
"
""
" Coatbridge was the clubhouse of Hornchurch designed by Doug Ringways in the 2006 FAOising in May 2007 . Coatbridge 's Watford Dentistry was fully developed around 15 @,@ 800 metres ( 39 @,@ 172 ft ) and began filling the terracing outgrown in 1960 . Scheduled flames were cut in the long loaned halving , because of consistent use of the town 's proximity to a centre of centre . Each side , and from the opening of Coatbridge Abbey , Coatbridge Magistrate , was 89 metres ( 295 ft ) long , separate from the main castle of Coatbridge was built on the same length .
"
" Coatbridge has extensively diverged the population of Scottish colours around Coatbridge in its fifth century . Coatbridge has served as accommodation ground of defence and industry , an area of open spaces . Coatbridge , a series of centre @-@ cover and redoubts , central and geothermal and topographic zones have historically been constructed and shaped by various biunist groups , including Bolton 's Strawbridge Corners Park , East Fech and Wells , and the West Sussex sandstones . Coatbridge is the largest skilled architect and conforming estate at Crossford Common , and despite the Southber 's robust design . Coatbridge seems to have been passed out as a part of Coatbridge along with its recently demolished and poppy fields . Coatbridge has a 10th floor building , serving the original English Heritage District representing Athelid areas at Dunrose and Twpere . Coatbridge 's functions traditionally followed Grade II * extant lights , although membership is still in Coatbridge and its existing connectivelids . Coatbridge is home to numerous major heritage sites .
"

smol-GPT

Haha, not bad I guess. It does not make much sense, but atleast it is spewing readable english and there is some structure to it.

Run infer.py for sampling/inferring from the model:

python infer.py

You can set the following args when executing infer.py:

Infer parameters

options:
  -h, --help            show this help message and exit
  --model_path MODEL_PATH
                        Path to the model to infer with
  --prompt PROMPT       Input text to the model
  --max_new_tokens MAX_NEW_TOKENS
                        Number of tokens to generate
  --device DEVICE       Device to use 'cuda' or 'cpu'
  --stream_text         Stream text

Evaluation

I have evaluated the 22M model checkpoint (after training for about 10k iterations) on the test dataset. Here are the results:

Test loss: 3.7307
Test perplexity: 41.7096
Test precision: 0.2762
Test recall: 0.2569
Test f1: 0.2591
Test accuracy: 50.9237

Not bad for a small model with relatively simple architecture, no data cleaning and only 10k iterations :)

Running eval.py is easy:

python eval.py

Provide the following arguments to the script:

Eval parameters

options:
  -h, --help            show this help message and exit
  --model_path MODEL_PATH
                        Path to the model to infer with
  --test_data_path TEST_DATA_PATH
                        Path to the test dataset
  --eval_epochs EVAL_EPOCHS
                        Number of epochs to evaluate
  --device DEVICE       Device to use 'cuda' or 'cpu'

To do:

  • Introduce more useful metrics like ROUGE, BLEU etc.
  • Implement Flash attention.
  • Make the code, specially training, more efficient & faster.
  • Introduce more complexity and new architecural changes (eg., ROPE embeddings).

Acknowledgement

The code is heavily inspired by the 🐐 Andrej Karpathy's NanoGPT and his tutorial on building GPT.

smol-gpt's People

Contributors

avenger-py avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.