cainier / gpt-tokens Goto Github PK

View Code? Open in Web Editor NEW

101.0 101.0 12.0 6.68 MB

Calculate the token consumption and amount of openai gpt message

License: MIT License

TypeScript 43.00% JavaScript 57.00%

gpt-tokens's People

Contributors

Stargazers

Watchers

Forkers

freebreaker ibrahimmohammed47 patientnotes xiaoqixiaoqi1113 sebastiansandqvist fyroxl vlrevolution kjessup wooodhead qlee3 kingchan818 linj121

gpt-tokens's Issues

Is it possible to support gpt-4-vision-preview

How to disable warnings?

Throw an error when content contains <|endoftext|>

When using gpt tokens to calculate tokens, I found that if the content contains values similar to<| endoftext |>, the calculation will report an error.
How about use a filtering function like text=text. replace (/<|\w{0,} |>/g, "").

How to count tokens for cl100k_base embedding?

Sorry for probably stupid question, but my task is to count tokens before sending text for vector embedding using cl100k_base - that one seems to be used with text-embedding-3-large model.
I've not seen such an example in the readme and wondering whether this model is supported by this module and how to use it properly for this task.
I mean that I have only one text, should I put it into 'system' or 'user' role field?
And which model I have to specify?
Thanks in advance!

Add gpt-4-0125-preview

Please add 'gpt-4-0125-preview' and the new alias 'gpt-4-turbo-preview' to supported models list

Function calling?

OpenAI models now support what is called function calling.

const usageInfo = new GPTTokens({
    model   : 'gpt-3.5-turbo-16k-0613,
    messages,
    funcitons,
});

Is there a chance that this library is going to support it? If no can you please let me know how should I get # of tokens for the functions. Thank you.

Tiktoken encodings not re-used between GPTTokens instances (slow performance)

When calling this library many times (for instance using it to split text into parts), it appears the internal encodings aren't re-used and there isn't any way to do it.

The result is it's quite slow:

import { GPTTokens } from 'gpt-tokens'
// import { GPTTokens } from '../src/libs/gptTokens.js'

for (let i = 0; i < 1000; i++) {
  console.time('GPTTokens')
  const usageInfo = new GPTTokens({
    plus: false,
    model: 'gpt-3.5-turbo-0613',
    messages: [
      {
        role: 'user',
        content: 'Hello world',
      },
    ],
  })

  usageInfo.usedTokens
  usageInfo.promptUsedTokens
  usageInfo.completionUsedTokens
  usageInfo.usedUSD
  console.timeEnd('GPTTokens')
}

Returns:

GPTTokens: 332.625ms
GPTTokens: 290.321ms
GPTTokens: 273.416ms
GPTTokens: 264.106ms
GPTTokens: 281.858ms
GPTTokens: 257.714ms
GPTTokens: 280.463ms
GPTTokens: 282.296ms
GPTTokens: 255.335ms
GPTTokens: 274.843ms
GPTTokens: 268.74ms
GPTTokens: 269.419ms
GPTTokens: 279.843ms
GPTTokens: 252.028ms
GPTTokens: 276.782ms
GPTTokens: 283.575ms
GPTTokens: 258.711ms
GPTTokens: 284.372ms

When the encodings are cached in the module:

GPTTokens: 64.708ms
GPTTokens: 1.558ms
GPTTokens: 1.12ms
GPTTokens: 1.114ms
GPTTokens: 0.876ms
GPTTokens: 0.838ms
GPTTokens: 0.954ms
GPTTokens: 0.92ms
GPTTokens: 0.765ms
GPTTokens: 0.84ms
GPTTokens: 0.72ms
GPTTokens: 0.789ms
GPTTokens: 0.822ms
GPTTokens: 0.782ms
GPTTokens: 0.78ms
GPTTokens: 0.737ms
GPTTokens: 0.746ms

Import is broken

After upgrading new release I started to get import errors.

{
"errorType": "Error",
"errorMessage": "Cannot find module '/var/task/node_modules/gpt-tokens/dist/index.ts' imported from /var/task/index.js\nDid you mean to import gpt-tokens/dist/index.js?",
"code": "ERR_MODULE_NOT_FOUND",
"url": "file:///var/task/node_modules/gpt-tokens/dist/index.ts",

It looks like package.json export has wrong dist file for "import", should be index.d.ts
"exports": {
".": {
"import": "./dist/index.ts",

Previous version 1.2.0 works well.

js-tiktoken 性能太差了

如题，尝试过js-tiktoken 性能太差了

Don't surpport gpt-4-0613

Don't surpport gpt-4-0613.
Would you add the gpt-4-0613?

Requesting your help with gpt-tokens project

Hello! I really appreciate your project gpt-tokens, which is a fast BPE tokenizer for working with OpenAI’s models. I have a need to use it on Vercel Edge Runtime to count the tokens of the OpenAI responses that I get from Vercel reverse proxy. However, I don’t have any programming background, so I can’t do it by myself.

Therefore, I would like to ask you for a favor, if you have time and interest. Could you please tell me how much US dollars you would need as a compensation? I am happy to pay you a reasonable fee.

Thank you very much for your time and help.

Yuan

Error while importing in React

My code :

I am making a function to shorten the number of messages so that the tokens doesnt exceed max tokens.

import { GPTTokens } from "gpt-tokens";

let reduceTokens = (history, model = "gpt-3.5-turbo") => {
  // let usageInfo = new GPTTokens({model : model ,messages : history});
  
  // console.log(usageInfo.usedTokens);
  return history;
};

export default reduceTokens;

While importing i am getting this error

Please help...

[Idea] self-discipline plugin

To manage API usage across multiple projects without exceeding limits, consider implementing a function that calculates delay in milliseconds for each request. This function should factor in userPriority and botPriority, ensuring quicker responses for paid users and higher priority projects. Lower priority requests can have longer delays. Here's a prototype from my current project for your review.

// RPM (requests per minute), RPD (requests per day), TPM (tokens per minute), TPD (tokens per day), and IPM (images per minute)

import { GPTTokens, supportModelType } from 'gpt-tokens'

import connector, { RatesResults } from './connector'
import logger from '../../lib/logger'
import Stats from '../../models/Stats'
import { StatKeys } from '../../models/Stats.types'
const log = logger({ module: 'RateLimiter.model.ts' })

export enum OpenAIModels {
  'gpt-4' = 'gpt-4',
  'gpt-4-1106-preview' = 'gpt-4-1106-preview',
  'gpt-4-vision-preview' = 'gpt-4-vision-preview',
  'gpt-3.5-turbo' = 'gpt-3.5-turbo',
  'text-embedding-ada-002' = 'text-embedding-ada-002',
  'whisper-1' = 'whisper-1',
  'tts-1' = 'tts-1',
  'dall-e-2' = 'dall-e-2',
  'dall-e-3' = 'dall-e-3',
}
export interface MessageItem {
  name?: string
  role: 'system' | 'user' | 'assistant'
  content: string
}
type RateSettings = {
  RPM: number //  requests per minute
  RPD: number //  requests per day
  TPM: number //  tokens per minute
  TPD: number // token per day
  CS: number // context size
}
const MSDAY = 1000 * 60 * 60 * 24
const MSMINUTE = 1000 * 60

const limits: { [model in OpenAIModels]?: RateSettings } = {
  [OpenAIModels['gpt-4']]: { RPM: 10000, RPD: -1, TPM: 300000, TPD: -1, CS: 8190 },
  [OpenAIModels['gpt-4-1106-preview']]: { RPM: 500, RPD: 10000, TPM: 300000, TPD: -1, CS: 128000 },
  [OpenAIModels['gpt-4-vision-preview']]: { RPM: 20, RPD: 100, TPM: 300000, TPD: -1, CS: 128000 },
  [OpenAIModels['gpt-3.5-turbo']]: { RPM: 10000, RPD: -1, TPM: 1000000, TPD: -1, CS: 4000 },
  [OpenAIModels['text-embedding-ada-002']]: { RPM: 10000, RPD: -1, TPM: 5000000, TPD: -1, CS: -1 },
  [OpenAIModels['whisper-1']]: { RPM: 100, RPD: -1, TPM: -1, TPD: -1, CS: -1 },
  [OpenAIModels['tts-1']]: { RPM: 100, RPD: -1, TPM: -1, TPD: -1, CS: -1 },
  [OpenAIModels['dall-e-2']]: { RPM: 100, RPD: -1, TPM: -1, TPD: -1, CS: -1 },
  [OpenAIModels['dall-e-3']]: { RPM: 15, RPD: -1, TPM: -1, TPD: -1, CS: -1 },
}
const lowestPriorityDelayMultiplier = 4
const highestPriorityDelayMultiplier = 1.1
class RateLimiter {
  constructor() {
    //
  }
  public async registerChatRequest(
    model: OpenAIModels,
    messages: MessageItem[],
    tools: any[] = [],
    userPriority = 1, // from 0(lowest) to 1(highest)
    botPriority = 1 // from 0(lowest) to 1(highest)
  ): Promise<boolean> {
    //
    try {
      const usageInfo = new GPTTokens({
        model: model as supportModelType | undefined,
        messages,
      })
      const rateSetting: RateSettings | undefined = limits[model]
      if (!rateSetting) {
        throw new Error(`Invalid model [${model}]`)
      }

      if (rateSetting.CS > 0 && usageInfo.usedTokens > rateSetting.CS) {
        throw new Error('Chat exceeds token limit')
      }

      const currentConumeRates: RatesResults = await connector.getRates(model)
      const getDelayForRate: number = this.getDelayForRate(rateSetting, currentConumeRates, userPriority, botPriority)

      // pause for a bit
      if (getDelayForRate && getDelayForRate > 0) {
        await new Promise((resolve) => setTimeout(resolve, getDelayForRate))
      }

      await connector.register(model, usageInfo.usedTokens)
        await Stats.addStat({
          [StatKeys.consumeRequests]: 1,
          [StatKeys.consumeTokens]: usageInfo.usedTokens,
          [StatKeys.consumeCash]: usageInfo.usedUSD,
        })
      return true
    } catch (e) {
      log.error(e, 'Error registering chat request')
      return false
    }
  }

  getDelayForRate(
    rateSettings: RateSettings,
    currentConumeRates: RatesResults,
    userPriority: number,
    botPriority: number
  ): number {
    //finding proper delay for each rate
    const rpmDelay =
      rateSettings.RPM > 0
        ? ((currentConumeRates.RPM / rateSettings.RPM) * MSMINUTE) / (rateSettings.RPM - currentConumeRates.RPM)
        : 0
    const rpdDelay =
      rateSettings.RPD > 0
        ? ((currentConumeRates.RPD / rateSettings.RPD) * MSDAY) / (rateSettings.RPD - currentConumeRates.RPD)
        : 0
    const tpmDelay =
      rateSettings.TPM > 0
        ? ((currentConumeRates.TPM / rateSettings.TPM) * MSMINUTE) / (rateSettings.TPM - currentConumeRates.TPM)
        : 0
    const tpdDelay =
      rateSettings.TPD > 0
        ? ((currentConumeRates.TPD / rateSettings.TPD) * MSDAY) / (rateSettings.TPD - currentConumeRates.TPD)
        : 0

    // get highest delay and calculate additional koeficient for low priority users
    const delay = Math.max(rpmDelay, rpdDelay, tpmDelay, tpdDelay)
    // additional multiplieer delay for unprioritezd from 1(highest prioriyt) to lowestPriorityDelayMultiplier(lowest priority)
    const totalPriorityK =
      (1 - userPriority * botPriority) * lowestPriorityDelayMultiplier + highestPriorityDelayMultiplier
    // rounded delay in ms which necessary to fit rate limits
    return Math.round(delay * totalPriorityK)
  }
}

export default new RateLimiter()

Is there really a discount on gpt-3.5-turbo for Plus users?

I was unable to substantiate this, is there a reference somewhere?

messages token count algorithm is different from the one in openAI official cookbook

On this line: https://github.com/Cainier/gpt-tokens/blob/main/index.js#L170

For model "gpt-3.5-turbo-0613", at line 170, the code is using tokens_per_message = 4, tokens_per_name = -1

but in openAI's cookbook code: https://github.com/openai/openai-cookbook/blob/main/examples/How_to_count_tokens_with_tiktoken.ipynb , in the num_tokens_from_messages function (snippet IN [14]) , you can see it is using tokens_per_message = 3, tokens_per_name = 1

And from their code, gpt-3.5-turbo-0301 and gpt-3.5-turbo-0613 are using different token-count method, but code in this repo are using the same method: https://github.com/Cainier/gpt-tokens/blob/main/index.js#L164

Counts are slightly off on completion for chat models

Awesome project, thank you for adding this to the ecosystem! My brother and I are currently working on https://github.com/openpipe/openpipe, and this package is incredibly useful to us.

I do notice that completion token counts are slightly off on some models. Specifically, it appears that GPTTokens always believes that the completion includes more tokens than it actually does. I created an experiment that compares the number of tokens OpenAI reports were used for a certain response (returned from non-streamed responses) against tokens calculated using GPTTokens (calculated on streamed responses). Here's the experiment: https://openpipe.ai/experiments/e2d5d255-5731-4dbc-9f83-7f642745404d.

I think we're using the latest version (1.0.10): https://github.com/OpenPipe/openpipe/blob/main/package.json#L39

And here are the relevant screenshots:

Non-streamed token counts (read from response):

Streamed token counts (calculated using GPTTokens):

Again, amazing project! Starring now!

Support for the latest gpt-3.5-turbo-16k model

new GPTTokens() sometime fails with RuntimeError: unreachable

When using this package, sometimes there are runtime errors in the downstream @dqbd/tiktoken dependency.

Code snippet:

new GPTTokens({
  model: 'gpt-3.5-turbo',
  messages: [
    {
      role: 'user',
      content: JSON.stringify(query),
    },
  ],
});

Error:

/Users/xxx/repos/gpt-pr-comment-summary/crawler/node_modules/@dqbd/tiktoken/tiktoken_bg.cjs:262
            wasm.tiktoken_encode(retptr, this.ptr, ptr0, len0, addHeapObject(allowed_special), addHeapObject(disallowed_special));
                 ^
RuntimeError: unreachable
    at wasm://wasm/00b5f812:wasm-function[563]:0x6a72a
    at wasm://wasm/00b5f812:wasm-function[665]:0x6fd7a
    at wasm://wasm/00b5f812:wasm-function[756]:0x70f7f
    at wasm://wasm/00b5f812:wasm-function[237]:0x5c43a
    at wasm://wasm/00b5f812:wasm-function[200]:0x4db89
    at wasm://wasm/00b5f812:wasm-function[34]:0x1f78a
    at wasm://wasm/00b5f812:wasm-function[159]:0x48dc3
    at Tiktoken.encode (/Users/xxx/repos/gpt-pr-comment-summary/crawler/node_modules/@dqbd/tiktoken/tiktoken_bg.cjs:262:18)
    at GPTTokens.num_tokens_from_messages (/Users/xxx/repos/gpt-pr-comment-summary/crawler/node_modules/gpt-tokens/index.js:162:40)
    at GPTTokens.num_tokens_from_messages (/Users/xxx/repos/gpt-pr-comment-summary/crawler/node_modules/gpt-tokens/index.js:126:25)

Add openai finetune model support

It seems there is not finetune model in supportModelType. A finetune example is ft:gpt-3.5-turbo-0613:company::abcdefg". I think we should support finetune model feature.

Here is the price reference of finetune model.