react-llm's Introduction

@react-llm/headless

Easy-to-use headless React Hooks to run LLMs in the browser with WebGPU. As simple as useLLM().

Live Demo

Features:

Supports Vicuna 7B
Use custom system prompts and "user:"/"assistant:" role names
Completion options like max tokens and stop sequences
No data leaves the browser. Accelerated via WebGPU.
Hooks built to 'Bring your own UI'
Persistent storage for conversations in browser storage. Hooks for loading and saving conversations.
Model caching for faster subsequent loads

Installation

npm install @react-llm/headless

Packages in this repository

@react-llm/model - The LLM model and tokenizer compiled for the browser
@react-llm/retro-ui - Retro-themed UI for the hooks
@react-llm/extension - Chrome Extension that uses the hooks
@react-llm/headless - Headless React Hooks for running LLMs in the browser

useLLM API

Types

// Model Initialization
init: () => void;

// Model Generation
send: (msg: string, maxTokens: number, stopSequences: string[]) => void;
onMessage: (msg: GenerateTextResponse) => void;
setOnMessage: (cb: (msg: GenerateTextResponse) => void) => void;

// Model Status
loadingStatus: InitProgressReport;
isGenerating: boolean;
gpuDevice: GPUDeviceInfo;

// Model Configuration
userRoleName: string;
setUserRoleName: (roleName: string) => void;
assistantRoleName: string;
setAssistantRoleName: (roleName: string) => void;

// Conversation Management
conversation: Conversation | undefined;
allConversations: Conversation[] | undefined;
createConversation: (title?: string, prompt?: string) => void;
setConversationId: (conversationId: string) => void;
deleteConversation: (conversationId: string) => void;
deleteAllConversations: () => void;
deleteMessages: () => void;
setConversationTitle: (conversationId: string, title: string) => void;

Hooks

import useLLM from '@react-llm/headless';

const MyComponent = () => {
  const {
    conversation,
    allConversations,
    loadingStatus,
    isGenerating,
    createConversation,
    setConversationId,
    deleteConversation,
    deleteAllConversations,
    deleteMessages,
    setConversationTitle,
    onMessage,
    setOnMessage,
    userRoleName,
    setUserRoleName,
    assistantRoleName,
    setAssistantRoleName,
    gpuDevice,
    send,
    init,
  } = useLLM();

  // Component logic...

  return null;
};

Provider

import { ModelProvider } from "@react-llm/headless";

export default function Home() {
  return (
    <ModelProvider
      config={{
        kvConfig: {
          numLayers: 64,
          shape: [32, 32, 128],
          dtype: 'float32',
        },
        wasmUrl: 'https://your-custom-url.com/model.wasm',
        cacheUrl: 'https://your-custom-url.com/cache/',
        tokenizerUrl: 'https://your-custom-url.com/tokenizer.model',
        sentencePieceJsUrl: 'https://your-custom-url.com/sentencepiece.js',
        tvmRuntimeJsUrl: 'https://your-custom-url.com/tvmjs_runtime.wasi.js',
        maxWindowSize: 2048,
        persistToLocalStorage: true,
      }}
    >
      <Chat />
    </ModelProvider>
  );
}

Packages

@react-llm/headless - Headless React Hooks for running LLMs in the browser
@react-llm/retro-ui - Retro-themed UI for the hooks

How does it work?

This library is a set of React Hooks that provide a simple interface to run LLMs in the browser. It uses Vicuna 13B.

SentencePiece tokenizer (compiled for the browser via Emscripten)
Vicuna 7B (transformed to Apache TVM format)
Apache TVM and MLC Relax (compiled for the browser via Emscripten)
Off-the-main-thread WebWorker to run the model (bundled with the library)

The model, tokenizer, and TVM runtime are loaded from a CDN (huggingface). The model is cached in browser storage for faster subsequent loads.

Example

See packages/retro-ui for the full demo code. This is a simple example of how to use the hooks. To run it, after cloning the repo,

cd packages/retro-ui
pnpm install
pnpm dev

License

MIT

The code under packages/headless/worker/lib/tvm is licensed under Apache 2.0.

react-llm's People

Contributors

Stargazers

Watchers

react-llm's Issues

Support for devices without WebGPU?

Add temperature, topP to GenerateTextRequest

https://github.com/r2d4/react-llm/blob/main/packages/headless/src/worker/llm.ts#L227 also takes in two additional parameters temperature and topP which are currently hardcoded https://github.com/r2d4/react-llm/blob/main/packages/headless/src/worker/llm.ts#L330

These should be added to the GenerateTextRequest https://github.com/r2d4/react-llm/blob/main/packages/headless/src/types/worker_message.ts#L13-L18

Hi, I'm trying to use react-llm's browser extension with a different model (Llama2 7b). I have compiled the model to wasm per the webllm instructions, but I'm having difficulty figuring out how the tokenizer.model file is created. I can export the tokenizer for Llama 2 from huggingface, resulting in three json files, but I can't find documentation about how to port that to the necessary tokenizer.model file. Could you let me know how you generated it? Thank you for your help and for this great project!

Question: Adapting react-llm to other Llama 1/2 variants

Hi,
Thanks for the amazing project!
I'm currently trying to adapt react-llm to work with a Llama2 variant I've quantized and transformed to wasm using the MLC AI LLM library (q4f32_1). I've updated background/index.js to reflect my model location / information:

    wasmUrl: "/models/Llama-2-7b-f32-q4f32_1/Llama-2-7b-f32-q4f32_1-webgpu.wasm",
    cacheUrl: "https://huggingface.co/maryxxx/Llama-2-7b-f32_q4f32_1/resolve/main/params/",
    tokenizerUrl: "/models/Llama-2-7b-f32-q4f32_1/tokenizer.model",

The extension loads the params from hugging face successfully. It registers the following error after loading, but still presents me with the prompt popup form:
Uncaught (in promise) Error: Cannot find function encoding

Then, when I submit a prompt, I receive the following error:
Uncaught (in promise) TypeError: Cannot read properties of undefined (reading 'generate')

It seems to me like something is off in the loading process, but I'm not sure where to start with debugging -- and if the problem is my wasm model or additional parameters I need to change in the codebase.

If possible, could you please provide an overview of how you prepared the example vicuna model (in case I am missing a step during compilation) and any additional hints re: other parameters that might need to be changed in the app?

Thank you!

Recommend Projects

r2d4 / react-llm Goto Github PK