Giter VIP home page Giter VIP logo

llama.swift's Introduction

๐Ÿฆ™ llama.swift

License: MIT

A fork of @ggerganov's llama.cpp to use Facebook's LLaMA models in Swift.

See the llama.cpp repository for info about the original goals of the project and implementation.

๐Ÿš€ llama.swift โ†’ future

Version 1 of llama.swift provides a simple, clean wrapper around the original LLaMA models and some of their early derivatives.

The future of llama.swift is CameLLM, which provides clean, Swift interfaces to run LLMs locally on macOS (and hopefully in the future, iOS, too). CameLLM is still in development, and you can star or watch the main repository for updates.


๐Ÿ”จ Setup

Clone the repo:

git clone https://github.com/alexrozanski/llama.swift.git
cd llama.swift

Grab the LLaMA model weights and place them in ./models. ls should print something like:

ls ./models
65B 30B 13B 7B tokenizer_checklist.chk tokenizer.model

To convert the LLaMA-7B model and quantize:

# install Python dependencies
python3 -m pip install torch numpy sentencepiece

# the command-line tools are in `./tools` instead of the repo root like in llama.cpp
cd tools

# convert the 7B model to ggml FP16 format
python3 convert-pth-to-ggml.py ../models/7B/ 1

# quantize the model to 4-bits
make
./quantize.sh 7B

When running the larger models, make sure you have enough disk space to store all of the intermediate files.

โฌ‡๏ธ Installation

Swift Package Manager

Add llama.swift to your project using Xcode (File > Add Packages...) or by adding it to your project's Package.swift file:

dependencies: [
  .package(url: "https://github.com/alexrozanski/llama.swift.git", .upToNextMajor(from: "1.0.0"))
]

๐Ÿ‘ฉโ€๐Ÿ’ป Usage

Swift library

To generate output from a prompt, first instantiate a LlamaRunner instance with the URL to your LLaMA model file:

import llama

let url = ... // URL to the ggml-model-q4_0.bin model file
let runner = LlamaRunner(modelURL: url)

Generating output is as simple as calling run() with your prompt on the LlamaRunner instance. Since tokens are generated asynchronously this returns an AsyncThrowingStream which you can enumerate over to process tokens as they are returned:

do {
  for try await token in runner.run(with: "Building a website can be done in 10 simple steps:") {
    print(token, terminator: "")
  }
} catch let error {
  // Handle error
}

Note that tokens don't necessarily correspond to a single word, and also include any whitespace and newlines.

Configuration

LlamaRunner.run() takes an optional LlamaRunner.Config instance which lets you control the number of threads inference is run on (default: 8), the maximum number of tokens returned (default: 512) and an optional reverse/negative prompt:

let prompt = "..."
let config = LlamaRunner.Config(numThreads: 8, numTokens: 20, reversePrompt: "...")
let tokenStream = runner.run(with: prompt, config: config)

do {
  for try await token in tokenStream {
    ...
  }
} catch let error {
  ...
}

State Changes

LlamaRunner.run() also takes an optional stateChangeHandler closure, which is invoked whenever the run state changes:

let prompt = "..."
let tokenStream = runner.run(
  with: prompt,
  config: .init(numThreads: 8, numTokens: 20),
  stateChangeHandler: { state in
    switch state {
      case .notStarted:
        // Initial state
        break
      case .initializing:
        // Loading the model and initializing
        break
      case .generatingOutput:
        // Generating tokens
        break
      case .completed:
        // Completed successfully
        break
      case .failed:
        // Failed. This is also the error thrown by the `AsyncThrowingSequence` returned from `LlamaRunner.run()`
        break
    }
  })

Closure-based API

If you don't want to use Swift concurrency there is an alternative version of run() which returns tokens via a tokenHandler closure instead:

let prompt = "..."
runner.run(
  with: prompt,
  config: ...,
  tokenHandler: { token in
    ...
  },
  stateChangeHandler: ...
)

Other notes

  • Build for Release if you want token generation to be snappy, since llama will generate tokens slowly in Debug builds.
  • Because of the way the Swift package is structured (and some gaps in my knowledge around exported symbols from modules), including llama.swift also leaks the name of the internal module containing the Objective-C/C++ implementation, llamaObjCxx, as well as some internal classes prefixed with _Llama. Pull requests welcome if you have any ideas on fixing this!

llamaTest app

The repo contains a barebones command-line tool, llamaTest, which uses the llama Framework to run a simple input loop to run inference on a given input prompt.

  • Ensure to set MODEL_PATH in LlamaTest.xcconfig to point to your path/to/ggml-model-q4_0.bin (without quotes or spaces after MODEL_PATH=), for example:
MODEL_PATH=/path/to/ggml-model-q4_0.bin

๐Ÿ“ƒ Misc

llama.swift's People

Contributors

ggerganov avatar alexrozanski avatar blackhole89 avatar prusnak avatar etra0 avatar beiller avatar bengarney avatar jcelerier avatar jooray avatar marckohlbrugge avatar wizzard0 avatar rgerganov avatar ronsor avatar simonw avatar 0-wiz-0 avatar kharvd avatar deepdiffuser avatar maekawatoshiki avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.