Comments (5)
The logic is here:
Line 33 in 3e86200
So typically python -c 'import tempfile; import os; print(os.path.join(tempfile.gettempdir(), "data-gym-cache"))'
from tiktoken.
Hm, thanks for the detailed environment information, but I'm not able to reproduce.
Can you set export TIKTOKEN_CACHE_DIR=""
and retry? This environment variable will prevent tiktoken from using a cache for the vocab files it downloads.
Note that even in the simple publicly available tests this code path is tested:
tiktoken/tests/test_simple_public.py
Line 9 in 3e86200
from tiktoken.
I tried to set the key, but not solved. Is there a specific path for the cache? I might need to delete the cache manually.
from tiktoken.
If that doesn't help, maybe you could set a breakpoint and see what the difference between those two dictionaries is.
from tiktoken.
Woo, that works, after deleted the cached files, it turns right now. Thanks a lot!
There might be an error of the file during or after downloading. Not sure if it is needed to check the cached file before use it, or in that assert bpe_ranks == encoder_json_loaded
line, might print more info if it failed.
from tiktoken.
Related Issues (20)
- <|endoftext|>,Why can't ChatGPT recognize it? HOT 1
- Can't install tiktoken==0.4.0 or tiktoken==0.5.1in Python 3.12 HOT 2
- Incorrect tokenization of "Elaborate" HOT 1
- pinging tiktoken URL always fails, pinging api.openai.com always works HOT 1
- Description of repository has a typo HOT 1
- ImportError: cannot import name '_tiktoken' from partially initialized module 'tiktoken'
- How to find the token count of a prompt using meta/llama2-70b model
- K
- Encode an empty string gives empty tokens HOT 2
- Using offline: `.tiktoken` file gets deleted automatically on Linux
- SSLError: HTTPSConnectionPool(host='openaipublic.blob.core.windows.net', port=443): Max retries exceeded with url HOT 5
- how to convert qwen.tiktoken to tokenzier.model
- Optimize _byte_pair_merge function in BPE implementation
- Tiktoken not installing on a macbook pro with m2 chip HOT 2
- Exception has occurred: ConnectionError HTTPSConnectionPool(host='openaipublic.blob.core.windows.net', port=443): Max retries exceeded with url: /encodings/cl100k_base.tiktoken (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x000001F4D42B0EE0>: Failed to establish a new connection: [Errno 11001] getaddrinfo failed'))
- Understanding the intended behaviour of `_encode_bytes`
- Custom tokenizer fails to encode despite characters being in mergeable_ranks HOT 1
- Use a custom exception ValueError subclass for the special tokens warning
- Error
- Combining marks and indic vowel marks within words are being split breaking all indic languages and most languages except English and CJKs HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from tiktoken.