coreylowman / llama-dfdx Goto Github PK
View Code? Open in Web Editor NEWLLaMa 7b with CUDA acceleration implemented in rust. Minimal GPU memory needed!
License: MIT License
LLaMa 7b with CUDA acceleration implemented in rust. Minimal GPU memory needed!
License: MIT License
Especially given the ability to hot load the tensors when they are needed, it should be very possible to run larger models
This is a really standard optimization, should be relatively straightforward to implement
ubuntu@instance-20230508-1136:~/repos/llama-dfdx$ ./target/release/llama-dfdx --model llama-7b-hf --disable-cache generate "Why is pi round?"
Detected model folder as LLaMa 7b.
Model size: 13476 MB
13476 MB of model parameters will be held in RAM.
Why is pi round?
Thread: Why is pi round?
I've been wondering about this for a while and couldn't find an answer... Ifс theters, i.s_Q
ini£тuc1-cksont< Sec>ar$le to--.e
d in>
inient-< ${<s A А ${ various
channel Banels cBp Sack Bchn c channel Kaz
cyclemasens.chD channelーAя
O я_ CлusesN- n= Ps FigénBTアbollageest
ubuntu@instance-20230508-1136:~/repos/llama-dfdx$ ./target/release/llama-dfdx --model llama-7b-hf --disable-cache generate "Why is pi round?"
Detected model folder as LLaMa 7b.
Model size: 13476 MB
13476 MB of model parameters will be held in RAM.
Why is pi round?
What is the real reason that pi is round?
I know the story that when Archimedes proved that pi was irr butures, he,he and,h is//** cz.daly, July wasz cQ.l inkxz toell>((>/.F
Middle
WCF,pp m MA cError apadd Ledethodaten
inien MAFaceerfaces.IкяEDєeP UITableView a MAtingack tcrit<0xE4><0xE7>leftAуad<0xEB>C areз о דneanate ab
with cache, while the answers are nonsense, at least they are coherent :)
ubuntu@instance-20230508-1136:~/repos/llama-dfdx$ ./target/release/llama-dfdx --model llama-7b-hf generate "Why is pi round?"
Detected model folder as LLaMa 7b.
Model size: 13476 MB
13476 MB of model parameters will be held in RAM.
Why is pi round?
What is the definition of a number that is not prime?
The number that is not a prime is called a composite number and if it is not a factor of 1
What is the smallest number that can be divided by 3 numbers and still have the original number as a remainder?
The smallest number that can be divided by 3 numbers and still have the original number as a remainder is 17. To prove this we can use the fact that the original number must have a remainder of 1 (after being divided by 3). The numbers that have a remainder of 1 when divided by 3 are
ubuntu@instance-20230508-1136:~/repos/llama-dfdx$ ./target/release/llama-dfdx --model llama-7b-hf generate "Why is pi round?"
Detected model folder as LLaMa 7b.
Model size: 13476 MB
13476 MB of model parameters will be held in RAM.
Why is pi round?
Thread: Why is pi round?
I've been wondering about this for a while and couldn't find an answer...I'm sure it's a silly question, but I just can't figure it out. Why is pi round? If it was, say 4.00 or 6.00, that would be one thing, but 3.14??
So I thought that maybe if you took the square root of 3.14, it would be ~ 1.5, which would be about the middle of 1 and 2, which is 1
If not able to determine "mode" a user could specify via --mode
cli argument.
This would remove the existing chat
/generate
/file
commands that currently exist.
I'd like to put together a proof of concept for fine tuning large language models in rust.
My background is Rust rather than ML.
So my question is this model inference only or could I somehow use it to do training?
Would it be somehow related to the generic training loop https://github.com/coreylowman/dfdx/blob/main/examples/generic-train-loop.rs?
Thanks.
Alpaca 7b should be the exact same structure, so as long as you can convert the weights into the same format with convert.py
it should be runnable out of the box
Hi there,
First off, awesome work!
I had not set the path to nvcc, so llama-dfdx imploded during build. You may find it of value to tell the user that nvcc could not be found.
ubuntu@instance-20230508-1136:~/repos/dfdx$ cargo clean
ubuntu@instance-20230508-1136:~/repos/dfdx$ nvcc
-bash: nvcc: command not found
ubuntu@instance-20230508-1136:~/repos/dfdx$ cargo build -F cuda
Updating crates.io index
Updating git repository `https://github.com/coreylowman/cudarc`
Updating git repository `https://github.com/starkat99/half-rs.git`
Downloaded cfg-if v1.0.0
Downloaded either v1.8.1
Downloaded num-complex v0.4.3
Downloaded gemm-f64 v0.15.3
Downloaded gemm-c64 v0.15.3
Downloaded dyn-stack v0.9.0
Downloaded ppv-lite86 v0.2.17
Downloaded rand_chacha v0.3.1
Downloaded rand v0.8.5
Downloaded scopeguard v1.1.0
Downloaded seq-macro v0.3.3
Downloaded autocfg v1.1.0
Downloaded bitflags v1.3.2
Downloaded crossbeam-channel v0.5.8
Downloaded crossbeam-epoch v0.9.14
Downloaded rand_core v0.6.4
Downloaded rand_distr v0.4.3
Downloaded rayon-core v1.11.0
Downloaded rayon v1.7.0
Downloaded raw-cpuid v10.7.0
Downloaded memoffset v0.8.0
Downloaded reborrow v0.5.4
Downloaded gemm-c32 v0.15.3
Downloaded gemm v0.15.3
Downloaded bytemuck v1.13.1
Downloaded gemm-f16 v0.15.3
Downloaded gemm-f32 v0.15.3
Downloaded gemm-common v0.15.3
Downloaded half v2.2.1
Downloaded libm v0.2.6
Downloaded lazy_static v1.4.0
Downloaded glob v0.3.1
Downloaded crossbeam-utils v0.8.15
Downloaded crossbeam-deque v0.8.3
Downloaded num_cpus v1.15.0
Downloaded paste v1.0.12
Downloaded libc v0.2.144
Downloaded num-traits v0.2.15
Downloaded 38 crates (1.9 MB) in 0.54s
Compiling autocfg v1.1.0
Compiling crossbeam-utils v0.8.15
Compiling cfg-if v1.0.0
Compiling libm v0.2.6
Compiling libc v0.2.144
Compiling scopeguard v1.1.0
Compiling rayon-core v1.11.0
Compiling paste v1.0.12
Compiling either v1.8.1
Compiling bitflags v1.3.2
Compiling reborrow v0.5.4
Compiling bytemuck v1.13.1
Compiling lazy_static v1.4.0
Compiling seq-macro v0.3.3
Compiling rand_core v0.6.4
Compiling ppv-lite86 v0.2.17
Compiling cudarc v0.9.8 (https://github.com/coreylowman/cudarc?branch=dfdx-half#bb2d7009)
Compiling glob v0.3.1
Compiling raw-cpuid v10.7.0
Compiling dyn-stack v0.9.0
Compiling memoffset v0.8.0
Compiling num-traits v0.2.15
Compiling crossbeam-epoch v0.9.14
Compiling dfdx v0.11.2 (/home/ubuntu/repos/dfdx)
Compiling rand_chacha v0.3.1
Compiling rand v0.8.5
Compiling crossbeam-channel v0.5.8
Compiling num_cpus v1.15.0
error: failed to run custom build command for `dfdx v0.11.2 (/home/ubuntu/repos/dfdx)`
Caused by:
process didn't exit successfully: `/home/ubuntu/repos/dfdx/target/debug/build/dfdx-30e6be024c8b3335/build-script-build` (exit status: 101)
--- stdout
cargo:rerun-if-changed=build.rs
cargo:rustc-env=CUDA_INCLUDE_DIR=/usr/local/cuda/include
cargo:rerun-if-changed=src/tensor_ops/utilities/binary_op_macros.cuh
cargo:rerun-if-changed=src/tensor_ops/utilities/compatibility.cuh
cargo:rerun-if-changed=src/tensor_ops/utilities/cuda_utils.cuh
cargo:rerun-if-changed=src/tensor_ops/utilities/unary_op_macros.cuh
--- stderr
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: Os { code: 2, kind: NotFound, message: "No such file or directory" }', build.rs:139:22
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
warning: build failed, waiting for other jobs to finish...
ubuntu@instance-20230508-1136:~/repos/dfdx$ locate nvcc
/home/ubuntu/.local/lib/python3.10/site-packages/cmake/data/share/cmake-3.26/Modules/FindCUDA/run_nvcc.cmake
/home/ubuntu/.local/lib/python3.10/site-packages/torch/share/cmake/Caffe2/Modules_CUDA_fix/upstream/FindCUDA/run_nvcc.cmake
/usr/local/cuda-12.1/bin/__nvcc_device_query
/usr/local/cuda-12.1/bin/nvcc
/usr/local/cuda-12.1/bin/nvcc.profile
/usr/local/lib/python3.10/dist-packages/cmake/data/share/cmake-3.26/Modules/FindCUDA/run_nvcc.cmake
/usr/local/lib/python3.10/dist-packages/torch/share/cmake/Caffe2/Modules_CUDA_fix/upstream/FindCUDA/run_nvcc.cmake
/usr/share/doc/cuda-nvcc-12-1
/usr/share/doc/cuda-nvcc-12-1/changelog.Debian.gz
/usr/share/doc/cuda-nvcc-12-1/copyright
/var/cache/apt/archives/cuda-nvcc-12-1_12.1.105-1_amd64.deb
/var/lib/dpkg/info/cuda-nvcc-12-1.list
/var/lib/dpkg/info/cuda-nvcc-12-1.md5sums
ubuntu@instance-20230508-1136:~/repos/dfdx$ export PATH=$PATH:/usr/local/cuda-12.1/bin
ubuntu@instance-20230508-1136:~/repos/dfdx$ cargo build -F cuda
Compiling num-traits v0.2.15
Compiling crossbeam-deque v0.8.3
Compiling dfdx v0.11.2 (/home/ubuntu/repos/dfdx)
Compiling rayon-core v1.11.0
Compiling num-complex v0.4.3
Compiling half v2.2.1
Compiling rand_distr v0.4.3
Compiling rayon v1.7.0
warning: Compiled 48 cuda kernels in 1.152008619s
Compiling gemm-common v0.15.3
Compiling gemm-f32 v0.15.3
Compiling gemm-c32 v0.15.3
Compiling gemm-c64 v0.15.3
Compiling gemm-f64 v0.15.3
Compiling gemm-f16 v0.15.3
Compiling gemm v0.15.3
Finished dev [unoptimized + debuginfo] target(s) in 9.71s
ubuntu@instance-20230508-1136:~/repos/dfdx$
Thank you,
-steve
Running .\target\release\llama-dfdx.exe chat -n=1024
always results in:
error: unexpected argument '-n' found
Usage: llama-dfdx.exe chat
For more information, try '--help'.
Same for no subcommand/generate/on wsl. Any ideas?
The current implementation counts the number of bin files to infer the model type under use.
The authoritative implementation of model weight conversion is stored in the hugging face repository. This implementation produces two bin files for the 7B model (as an example).
curl -LO https://raw.githubusercontent.com/huggingface/transformers/main/src/transformers/models/llama/convert_llama_weights_to_hf.py
python3 convert_llama_weights_to_hf.py --input_dir llama --model_size 7B --output_dir llama-hf/7B
Fetching all parameters from the checkpoint at llama/7B.
Loading the checkpoint in a Llama model.
Loading checkpoint shards: 100%|███████████████████████| 33/33 [00:07<00:00, 4.51it/s]
Saving in the Transformers format.
Saving a LlamaTokenizerFast to llama-hf/7B.
Listing the files shows only two bin files:
ubuntu@instance-20230508-1136:/models/llama-hf/7B$ ls
config.json pytorch_model-00001-of-00002.bin tokenizer.json
generation_config.json pytorch_model-00002-of-00002.bin tokenizer.model
lm_head pytorch_model.bin.index.json tokenizer_config.json
model special_tokens_map.json
A simple solution would be to parse config.json
for the number of attention heads (in this case, 32).
Here is an example where auto
fails:
ubuntu@instance-20230508-1136:~/repos/llama-dfdx$ ./target/release/llama-dfdx --model /models/llama-hf/7B generate "Why is pi round?"
thread 'main' panicked at 'Found 2 .bin files in the model directory. Expected 33, 41, or 81.', src/main.rs:129:17
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
When overriding the structure, all is good:
ubuntu@instance-20230508-1136:~/repos/llama-dfdx$ ./target/release/llama-dfdx --model /models/llama-hf/7B --structure llama7b generate "Why is pi round?"
Model size: 13476 MB
13476 MB of model parameters will be held in RAM.
Why is pi round?
What is the definition of a "natural number"?
What is the smallest natural number?
How to find the smallest natural number in a given range?
What is the 1000th number?
What is the last number?
What is the difference between a decimal number and a rational number?
What is the difference between a natural number and a rational number?
How to calculate the least common multiple?
How to calculate the greatest common factor?
What are the prime numbers?
How do I find the factors of a number?
What is a prime number?
How do I find the
ubuntu@instance-20230508-1136:~/repos/llama-dfdx$ ./target/release/llama-dfdx --model /models/llama-hf/7B --structure llama7b generate "Why is pi round?"
Model size: 13476 MB
13476 MB of model parameters will be held in RAM.
Why is pi round?
What is pi, and why is it round?
Pi is a constant number. It has an infinite number of digits, however if you were to list all of the digits they would look like this: 3.14159265358979323846264338327950288419716939937510582097494459230781640628620899862
ubuntu@instance-20230508-1136:~/repos/llama-dfdx$ ./target/release/llama-dfdx --model /models/llama-hf/7B --structure llama7b generate "Why is pi round?"
Model size: 13476 MB
13476 MB of model parameters will be held in RAM.
Why is pi round?
Thread: Why is pi round?
I've been wondering about this for a while and haven't been able to come up with an answer. I've also checked the internet but it doesn't seem to have an answer. I'm assuming the reason it's "round" is that it repeats the same pattern but I can't seem to find an answer that explains it. Why is it 3.141592653589793 instead of 3.141592653589792653
ubuntu@instance-20230508-1136:~/repos/llama-dfdx$
Thank you,
-steve
When trying to run the program, I get this error:
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: Driver(DriverError(CUDA_ERROR_UNSUPPORTED_PTX_VERSION, "the provided PTX was compiled with an unsupported toolchain."))', /home/opfromthestart/.cargo/git/checkouts/dfdx-318e6e5ad83eea79/19da9fe/src/tensor_ops/select_and_gather/mod.rs:155:30
Use cases:
In all these cases, we should be able to detect how much GPU ram is available, and determine the max amount of model to store that way. More advanced use cases of sharing GPU with other applications may need manual control over the memory, but that can be done later.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.