Giter VIP home page Giter VIP logo

wllama's Introduction

wllama - Wasm binding for llama.cpp

Another WebAssembly binding for llama.cpp. Inspired by tangledgroup/llama-cpp-wasm, but unlike it, Wllama aims to supports low-level API like (de)tokenization, embeddings,...

Breaking changes

  • Version 1.4.0
    • Add single-thread/wllama.js and multi-thread/wllama.js to the list of CONFIG_PATHS
    • createEmbedding is now adding BOS and EOS token by default

Features

  • Typescript support
  • Can run inference directly on browser (using WebAssembly SIMD), no backend or GPU is needed!
  • No runtime dependency (see package.json)
  • High-level API: completions, embeddings
  • Low-level API: (de)tokenize, KV cache control, sampling control,...
  • Ability to split the model into smaller files and load them in parallel (same as split and cat)
  • Auto switch between single-thread and multi-thread build based on browser support
  • Inference is done inside a worker, does not block UI render
  • Pre-built npm package @wllama/wllama

Limitations:

  • To enable multi-thread, you must add Cross-Origin-Embedder-Policy and Cross-Origin-Opener-Policy headers. See this discussion for more details.
  • No WebGL support, but maybe possible in the future
  • Max model size is 2GB, due to size restriction of ArrayBuffer

Demo and documentations

Documentation: https://ngxson.github.io/wllama/docs/

Demo:

How to use

Use Wllama inside React Typescript project

Install it:

npm i @wllama/wllama

For complete code, see examples/reactjs

NOTE: this example only covers completions usage. For embeddings, please see examples/embeddings/index.html

Simple usage with ES6 module

For complete code, see examples/basic/index.html

import { Wllama } from './esm/index.js';

(async () => {
  const CONFIG_PATHS = {
    'single-thread/wllama.js'       : './esm/single-thread/wllama.js',
    'single-thread/wllama.wasm'     : './esm/single-thread/wllama.wasm',
    'multi-thread/wllama.js'        : './esm/multi-thread/wllama.js',
    'multi-thread/wllama.wasm'      : './esm/multi-thread/wllama.wasm',
    'multi-thread/wllama.worker.mjs': './esm/multi-thread/wllama.worker.mjs',
  };
  // Automatically switch between single-thread and multi-thread version based on browser support
  // If you want to enforce single-thread, add { "n_threads": 1 } to LoadModelConfig
  const wllama = new Wllama(CONFIG_PATHS);
  await wllama.loadModelFromUrl('https://huggingface.co/ggml-org/models/resolve/main/tinyllamas/stories260K.gguf', {});
  const outputText = await wllama.createCompletion(elemInput.value, {
    nPredict: 50,
    sampling: {
      temp: 0.5,
      top_k: 40,
      top_p: 0.9,
    },
  });
  console.log(outputText);
})();

How to build

This repository already come with pre-built binary. But if you want to build it yourself, you can use the commands below:

# Require having docker compose installed
# Firstly, build llama.cpp into wasm
npm run build:wasm
# (Optionally) Build ES module
npm run build

TODO

Short term:

  • Guide: How to split gguf file into smaller one?
  • Add a more pratical embedding example (using a better model)
  • Maybe doing a full RAG-in-browser example using tinyllama?

Long term:

  • Support GPU inference via WebGL
  • Support multi-sequences: knowing the resource limitation when using WASM, I don't think having multi-sequences is a good idea
  • Multi-modal: Waiting for refactoring LLaVA implementation from llama.cpp

wllama's People

Contributors

ngxson avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

wllama's Issues

Build issues with Vite in the React example

Hi, @ngxson!

First of all, thank you for creating lib. I've previously tried rahuldshetty/llm.js and tangledgroup/llama-cpp-wasm, which are also great bindings, but they hadn't made it as easy as running an npm install to get it running. So, kudos for that!

I've successfully integrated wllama into MiniSearch, which is also using Vite as a bundler, and found some issues when building it: tsc complains about some typescript types, and vite build fails due to some Vite build-rules.

Screenshots image image

You can reproduce those issues by running npm run build in the examples/reactjs folder of wllama repository.

Could you kindly review the issues that are preventing the Vite/React example from running successfully?

P.S. Meanwhile, what I'm doing as a workaround is copying the prebuilt wllama files to the dist folder and and forcing Vite to skip checking them.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.