Comments (11)
Got the llama2-7b model working on macOS and Android.
Local model runtime on macOS: Model load time: 10.39s, Time to first generated token: 0.739s, Generated token rate: 0.3089 toks/sec
Android Samsung Galaxy S22 runtime: Model load time: 12.05s, Time to first generated token: 8.448s, Generated token rate: 0.0777 toks/sec
Updated list of issues:
LLama2 model
-
vocab_size
inparams.json
from HF downloads is -1, need to manually change to 32000 to proceed forward, update script/readme steps - Export with SDPA failed with errors for
AttributeError: '_OpNamespace' 'llama' object has no attribute 'sdpa_with_kv_cache'
- Update readme to add steps for generating tokenizer.bin for llama2 model
- Optimize local model runtime on macOS (Model load time: 10.39s, Time to first generated token: 0.739s, Generated token rate: 0.3089 toks/sec)
- Android Emulator -- pte file transfer hangs / creashes emulator for 4gb model file
- Add steps for running on iOS
Stories Model
- Fix error
RuntimeError: mmap can only be used with files saved with torch.save(./stories/stories110M.pt, _use_new_zipfile_serialization=True)
from executorch.
Thanks @chauhang for reporting this issue! Could you confirm the vocab_size in llama2 7B model's params.json?
from executorch.
Also tested for llama2-7b after updating vocab_size to 32000, getting error AttributeError: '_OpNamespace' 'llama' object has no attribute 'sdpa_with_kv_cache'
Full error logs here
from executorch.
After removing spda param was able to proceed uptill running model on computer. On running the model get error The tokenizer vocab size 84545034 is larger than the model vocab size 32000. .... In function generate(), assert failed (num_prompt_tokens >= 1): Expected at least 1 prompt token
Full logs here
from executorch.
@iseeyuan For the meta-llama/Llama-2-7b model the params.json on HF is:
{"dim": 4096, "multiple_of": 256, "n_heads": 32, "n_layers": 32, "norm_eps": 1e-05, "vocab_size": -1}
Also checked for 13b/70b base models and the chat models all of them have vocab_size=-1 in their params.json
from executorch.
@chauhang , It's a bug in our code. We should provide an option so that the export_llama works out of box, given a downloaded folder, either from llama official website, or from HuggingFace.
from executorch.
@chauhang , the second issue, Export with SDPA failed with [errors](https://gist.github.com/chauhang/ca75857c6a152df65b79302fefa1fe2c?permalink_comment_id=5015390#gistcomment-5015390) for AttributeError: '_OpNamespace' 'llama' object has no attribute 'sdpa_with_kv_cache'
should have been fixed in main branch over the weekend. Could you pull the updated version and give it another try?
The performance afterwards may also get affected by using sdpa_with_kv_cache.
from executorch.
Also tested for llama2-7b after updating vocab_size to 32000, getting error
AttributeError: '_OpNamespace' 'llama' object has no attribute 'sdpa_with_kv_cache'
Might be related to @larryliu0820's diff that got reverted recently
from executorch.
updated
we should just cherry-pick that, right?
from executorch.
Thanks @chauhang
Some fixes
from executorch.
Things are fixed now.
from executorch.
Related Issues (20)
- SDK + Inspector output time format is inconsistent with delegates HOT 6
- How to link custom ops? HOT 1
- Error Saving YOLOv8 Model After CoreML Conversion HOT 3
- Errors when running Llama on QNN backend HOT 5
- Android Llama2 demo app fails to install on S23/S24 with Android 14 HOT 2
- Option to build without the standard library HOT 5
- how to realize the sliding window of kv cache?
- [v0.3.1] Release Tracker HOT 1
- ./install_requirements.sh --pybind xnnpack Error HOT 5
- Augmented assignment op (+=) fails export HOT 1
- Errors when running setup script for QNN HOT 3
- [Llama] [Dynamic Shape] [Core ML Delegate] Do Not Delegate Symbol Manipulation HOT 1
- [Build] choose_qparams_tensor_out get wrong return type cause build fail on native Windows HOT 10
- Support build on native Windows HOT 2
- Why not directly use Google/XNNPACK as a third-party solution? HOT 1
- Thread/no of core setting for execution_runner HOT 4
- torch.nn.InstanceNorm2d fails at runtime HOT 4
- Improve the documentation of custom op build HOT 1
- where to use the sdpa_with_kv_cache_out function
- Add support for int8 input/output HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from executorch.