Giter VIP home page Giter VIP logo

mlxtinygpt's Introduction

Introduction

This is a Swift implementation of Andrej Karpathy's excellent "Let's build GPT" video. It uses the MLX-Swift framework.

The code is all in a single Swift file (< 300 LOC) that implements the Transformer architecture, trains it for 5000 epochs, and then generates Shakespeare like text.

How to run

Clone the project, open MLXTinyGPT.xcodeproj and run it. You can either train the network from scratch, or download weights to generate text.

Performance

Karpathy's scaled up network acheived a validation loss of 1.4873. On my M3 MBP with 18GB of Memory, I was able to achieve a validation loss of 1.538302 with a slightly scaled down network and dropout probability of 0.1, but same the number of epochs.

Karpathy said:

"I would not run this on a CPU or Macbook. You'll have to break down the number of layers and the embedding dimension and so on".

He's definitely right, but it's cool to see that MLX's unified memory model and Apple's Silicon allows you to get quite close to GPU performance.

Improvements

  1. Trying to scale up parameters like blockSize or nEmbedPerHead started giving me issues like nan weights or Metal errors. Figure out if there's a way to scale up further without hitting these issues
  2. Try to MLX.compile the training loop step. On the initial try, I hit some C++ exceptions, presumably because I wasn't capturing the right set of inputs.
  3. Better tokenization?
  4. Try to save the trained model

Sample generation

Here's some Shakespeare text the model generated:

VIRGHARD III: Come to thee, king we would I would seee with That bandire, and say I will take with pray him, Thus hat not husband two any you.

MARIANA: Together lies than new's, Which she's untim's not atch hidle the bred; Leavefore I long me persing to her precedly.

ISABELLA: I he disquanger and

mlxtinygpt's People

Contributors

rounak avatar

Stargazers

Jamone Kelly avatar  avatar  avatar  avatar Jaap Kreijkamp avatar  avatar jed tiotuico avatar Gonzalo Nuñez  avatar Aether** avatar Apoorva Kumar avatar Nick Arner avatar Khushmeet Singh avatar Jean de Dieu Nyandwi avatar Pifometricien avatar Skale.io Developer Account avatar dumbol avatar Ototao avatar Mike Li avatar  avatar Ash Vardanian avatar RGeleta avatar Tadej Fius avatar Yuduo Wu avatar Kevin Kwok avatar  avatar Kyle Howells avatar Razvan B. avatar david l euler avatar mrfakename avatar Julian Harris avatar Ivan Fioravanti avatar  avatar  avatar Nikita avatar  avatar Matthew Campbell avatar Kiran avatar  avatar Ronald Mannak avatar janniks avatar Ivan Parfenchuk avatar  avatar Gurumurthi V Ramanan avatar Pietro Schirano avatar

Watchers

 avatar  avatar

Forkers

jamonek

mlxtinygpt's Issues

Safetensors error

I keep getting this error, is there any dependency needed to make this work?

Btw I also replaced the link for the dataset.

How to run after build succeeded?

image

I got this but not sure what to do now. I don't see any binaries produced.

I am curious how you worked on this repo, like what is the run/debug cycle like?

Thanks. 🖤

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.