Giter VIP home page Giter VIP logo

cake's Introduction

cake's People

Contributors

b0xtch avatar evilsocket avatar lukewwww avatar yaojunluo avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cake's Issues

Unable to build without CUDA

Tried on debian server and termux. Results are same

CARGO_PROFILE_RELEASE_BUILD_OVERRIDE_DEBUG=true RUST_BACKTRACE=full cargo build --release
warning: /home/dankcat/cake/cake-ios/Cargo.toml: `crate_type` is deprecated in favor of `crate-type` and will not work in the 2024 edition
(in the `cake` library target)
   Compiling cudarc v0.11.7
   Compiling candle-kernels v0.6.0
   Compiling zstd-sys v2.0.12+zstd.1.5.6
   Compiling block-buffer v0.10.4
error: failed to run custom build command for `candle-kernels v0.6.0`

Caused by:
  process didn't exit successfully: `/home/dankcat/cake/target/release/build/candle-kernels-15ec0a2c0042f062/build-script-build` (exit status: 101)
  --- stdout
  cargo:rerun-if-changed=build.rs
  cargo:rerun-if-changed=src/compatibility.cuh
  cargo:rerun-if-changed=src/cuda_utils.cuh
  cargo:rerun-if-changed=src/binary_op_macros.cuh
  cargo:info=["/usr", "/usr/local/cuda", "/opt/cuda", "/usr/lib/cuda", "C:/Program Files/NVIDIA GPU Computing Toolkit", "C:/CUDA"]
  cargo:rerun-if-env-changed=CUDA_COMPUTE_CAP

  --- stderr
  thread 'main' panicked at /home/dankcat/.cargo/registry/src/index.crates.io-6f17d22bba15001f/bindgen_cuda-0.1.5/src/lib.rs:489:18:
  `nvidia-smi` failed. Ensure that you have CUDA installed and that `nvidia-smi` is in your PATH.: Os { code: 2, kind: NotFound, message: "No such file or directory" }
  stack backtrace:
     0:     0x55c93d687785 - std::backtrace_rs::backtrace::libunwind::trace::h1a07e5dba0da0cd2
                                 at /rustc/129f3b9964af4d4a709d1383930ade12dfe7c081/library/std/src/../../backtrace/src/backtrace/libunwind.rs:105:5
     1:     0x55c93d687785 - std::backtrace_rs::backtrace::trace_unsynchronized::h61b9b8394328c0bc
                                 at /rustc/129f3b9964af4d4a709d1383930ade12dfe7c081/library/std/src/../../backtrace/src/backtrace/mod.rs:66:5
     2:     0x55c93d687785 - std::sys_common::backtrace::_print_fmt::h1c5e18b460934cff
                                 at /rustc/129f3b9964af4d4a709d1383930ade12dfe7c081/library/std/src/sys_common/backtrace.rs:68:5
     3:     0x55c93d687785 - <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt::h1e1a1972118942ad
                                 at /rustc/129f3b9964af4d4a709d1383930ade12dfe7c081/library/std/src/sys_common/backtrace.rs:44:22
     4:     0x55c93d6ac13b - core::fmt::rt::Argument::fmt::h07af2b4071d536cd
                                 at /rustc/129f3b9964af4d4a709d1383930ade12dfe7c081/library/core/src/fmt/rt.rs:165:63
     5:     0x55c93d6ac13b - core::fmt::write::hc090a2ffd6b28c4a
                                 at /rustc/129f3b9964af4d4a709d1383930ade12dfe7c081/library/core/src/fmt/mod.rs:1157:21
     6:     0x55c93d68420f - std::io::Write::write_fmt::h8898bac6ff039a23
                                 at /rustc/129f3b9964af4d4a709d1383930ade12dfe7c081/library/std/src/io/mod.rs:1832:15
     7:     0x55c93d68755e - std::sys_common::backtrace::_print::h4e80c5803d4ee35b
                                 at /rustc/129f3b9964af4d4a709d1383930ade12dfe7c081/library/std/src/sys_common/backtrace.rs:47:5
     8:     0x55c93d68755e - std::sys_common::backtrace::print::ha96650907276675e
                                 at /rustc/129f3b9964af4d4a709d1383930ade12dfe7c081/library/std/src/sys_common/backtrace.rs:34:9
     9:     0x55c93d688a49 - std::panicking::default_hook::{{closure}}::h215c2a0a8346e0e0
    10:     0x55c93d68878d - std::panicking::default_hook::h207342be97478370
                                 at /rustc/129f3b9964af4d4a709d1383930ade12dfe7c081/library/std/src/panicking.rs:298:9
    11:     0x55c93d688ee3 - std::panicking::rust_panic_with_hook::hac8bdceee1e4fe2c
                                 at /rustc/129f3b9964af4d4a709d1383930ade12dfe7c081/library/std/src/panicking.rs:795:13
    12:     0x55c93d688dc4 - std::panicking::begin_panic_handler::{{closure}}::h00d785e82757ce3c
                                 at /rustc/129f3b9964af4d4a709d1383930ade12dfe7c081/library/std/src/panicking.rs:664:13
    13:     0x55c93d687c49 - std::sys_common::backtrace::__rust_end_short_backtrace::h1628d957bcd06996
                                 at /rustc/129f3b9964af4d4a709d1383930ade12dfe7c081/library/std/src/sys_common/backtrace.rs:171:18
    14:     0x55c93d688af7 - rust_begin_unwind
                                 at /rustc/129f3b9964af4d4a709d1383930ade12dfe7c081/library/std/src/panicking.rs:652:5
    15:     0x55c93d5e10f3 - core::panicking::panic_fmt::hdc63834ffaaefae5
                                 at /rustc/129f3b9964af4d4a709d1383930ade12dfe7c081/library/core/src/panicking.rs:72:14
    16:     0x55c93d5e1546 - core::result::unwrap_failed::h82b551e0ff2b2176
                                 at /rustc/129f3b9964af4d4a709d1383930ade12dfe7c081/library/core/src/result.rs:1654:5
    17:     0x55c93d5f00d8 - core::result::Result<T,E>::expect::h0d780f1427a920a0
                                 at /rustc/129f3b9964af4d4a709d1383930ade12dfe7c081/library/core/src/result.rs:1034:23
    18:     0x55c93d6058fc - bindgen_cuda::compute_cap::h544f29d1dbea88ae
                                 at /home/dankcat/.cargo/registry/src/index.crates.io-6f17d22bba15001f/bindgen_cuda-0.1.5/src/lib.rs:485:19
    19:     0x55c93d60216f - <bindgen_cuda::Builder as core::default::Default>::default::hc8d3c33e79e06ed7
                                 at /home/dankcat/.cargo/registry/src/index.crates.io-6f17d22bba15001f/bindgen_cuda-0.1.5/src/lib.rs:48:27
    20:     0x55c93d5e2e5f - build_script_build::main::h601c987ee98bf43b
                                 at /home/dankcat/.cargo/registry/src/index.crates.io-6f17d22bba15001f/candle-kernels-0.6.0/build.rs:7:19
    21:     0x55c93d5e270b - core::ops::function::FnOnce::call_once::h3413b6fc62df34af
                                 at /rustc/129f3b9964af4d4a709d1383930ade12dfe7c081/library/core/src/ops/function.rs:250:5
    22:     0x55c93d5e1e6e - std::sys_common::backtrace::__rust_begin_short_backtrace::hbdfe41c52daab1ec
                                 at /rustc/129f3b9964af4d4a709d1383930ade12dfe7c081/library/std/src/sys_common/backtrace.rs:155:18
    23:     0x55c93d5e22d1 - std::rt::lang_start::{{closure}}::h51c795f7d1b1d218
                                 at /rustc/129f3b9964af4d4a709d1383930ade12dfe7c081/library/std/src/rt.rs:159:18
    24:     0x55c93d67ead0 - core::ops::function::impls::<impl core::ops::function::FnOnce<A> for &F>::call_once::h6abeee5a7794ceb5
                                 at /rustc/129f3b9964af4d4a709d1383930ade12dfe7c081/library/core/src/ops/function.rs:284:13
    25:     0x55c93d67ead0 - std::panicking::try::do_call::hd6e966bb06877057
                                 at /rustc/129f3b9964af4d4a709d1383930ade12dfe7c081/library/std/src/panicking.rs:559:40
    26:     0x55c93d67ead0 - std::panicking::try::hc9b3807f5768cb19
                                 at /rustc/129f3b9964af4d4a709d1383930ade12dfe7c081/library/std/src/panicking.rs:523:19
    27:     0x55c93d67ead0 - std::panic::catch_unwind::h94a757c154076c6e
                                 at /rustc/129f3b9964af4d4a709d1383930ade12dfe7c081/library/std/src/panic.rs:149:14
    28:     0x55c93d67ead0 - std::rt::lang_start_internal::{{closure}}::hc5223fb36050c743
                                 at /rustc/129f3b9964af4d4a709d1383930ade12dfe7c081/library/std/src/rt.rs:141:48
    29:     0x55c93d67ead0 - std::panicking::try::do_call::hddf7b4e1ebeb3f69
                                 at /rustc/129f3b9964af4d4a709d1383930ade12dfe7c081/library/std/src/panicking.rs:559:40
    30:     0x55c93d67ead0 - std::panicking::try::h1842860a1f941a31
                                 at /rustc/129f3b9964af4d4a709d1383930ade12dfe7c081/library/std/src/panicking.rs:523:19
    31:     0x55c93d67ead0 - std::panic::catch_unwind::h009016ccf811d4c3
                                 at /rustc/129f3b9964af4d4a709d1383930ade12dfe7c081/library/std/src/panic.rs:149:14
    32:     0x55c93d67ead0 - std::rt::lang_start_internal::h3ed4fe7b2f419135
                                 at /rustc/129f3b9964af4d4a709d1383930ade12dfe7c081/library/std/src/rt.rs:141:20
    33:     0x55c93d5e22aa - std::rt::lang_start::hff6e3b582a875b8d
                                 at /rustc/129f3b9964af4d4a709d1383930ade12dfe7c081/library/std/src/rt.rs:158:17
    34:     0x55c93d5e306e - main
    35:     0x7f40ce51524a - <unknown>
    36:     0x7f40ce515305 - __libc_start_main
    37:     0x55c93d5e1761 - _start
    38:                0x0 - <unknown>
warning: build failed, waiting for other jobs to finish...
error: failed to run custom build command for `cudarc v0.11.7`

Caused by:
  process didn't exit successfully: `/home/dankcat/cake/target/release/build/cudarc-5c6a5152ed8f4c4d/build-script-build` (exit status: 101)
  --- stdout
  cargo:rerun-if-changed=build.rs
  cargo:rerun-if-env-changed=CUDA_ROOT
  cargo:rerun-if-env-changed=CUDA_PATH
  cargo:rerun-if-env-changed=CUDA_TOOLKIT_ROOT_DIR

  --- stderr
  thread 'main' panicked at /home/dankcat/.cargo/registry/src/index.crates.io-6f17d22bba15001f/cudarc-0.11.7/build.rs:55:10:
  Failed to execute `nvcc`: Os { code: 2, kind: NotFound, message: "No such file or directory" }
  stack backtrace:
     0:     0x564532b54cf5 - std::backtrace_rs::backtrace::libunwind::trace::h1a07e5dba0da0cd2
                                 at /rustc/129f3b9964af4d4a709d1383930ade12dfe7c081/library/std/src/../../backtrace/src/backtrace/libunwind.rs:105:5
     1:     0x564532b54cf5 - std::backtrace_rs::backtrace::trace_unsynchronized::h61b9b8394328c0bc
                                 at /rustc/129f3b9964af4d4a709d1383930ade12dfe7c081/library/std/src/../../backtrace/src/backtrace/mod.rs:66:5
     2:     0x564532b54cf5 - std::sys_common::backtrace::_print_fmt::h1c5e18b460934cff
                                 at /rustc/129f3b9964af4d4a709d1383930ade12dfe7c081/library/std/src/sys_common/backtrace.rs:68:5
     3:     0x564532b54cf5 - <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt::h1e1a1972118942ad
                                 at /rustc/129f3b9964af4d4a709d1383930ade12dfe7c081/library/std/src/sys_common/backtrace.rs:44:22
     4:     0x564532b75a2b - core::fmt::rt::Argument::fmt::h07af2b4071d536cd
                                 at /rustc/129f3b9964af4d4a709d1383930ade12dfe7c081/library/core/src/fmt/rt.rs:165:63
     5:     0x564532b75a2b - core::fmt::write::hc090a2ffd6b28c4a
                                 at /rustc/129f3b9964af4d4a709d1383930ade12dfe7c081/library/core/src/fmt/mod.rs:1157:21
     6:     0x564532b5290f - std::io::Write::write_fmt::h8898bac6ff039a23
                                 at /rustc/129f3b9964af4d4a709d1383930ade12dfe7c081/library/std/src/io/mod.rs:1832:15
     7:     0x564532b54ace - std::sys_common::backtrace::_print::h4e80c5803d4ee35b
                                 at /rustc/129f3b9964af4d4a709d1383930ade12dfe7c081/library/std/src/sys_common/backtrace.rs:47:5
     8:     0x564532b54ace - std::sys_common::backtrace::print::ha96650907276675e
                                 at /rustc/129f3b9964af4d4a709d1383930ade12dfe7c081/library/std/src/sys_common/backtrace.rs:34:9
     9:     0x564532b55d89 - std::panicking::default_hook::{{closure}}::h215c2a0a8346e0e0
    10:     0x564532b55acd - std::panicking::default_hook::h207342be97478370
                                 at /rustc/129f3b9964af4d4a709d1383930ade12dfe7c081/library/std/src/panicking.rs:298:9
    11:     0x564532b56223 - std::panicking::rust_panic_with_hook::hac8bdceee1e4fe2c
                                 at /rustc/129f3b9964af4d4a709d1383930ade12dfe7c081/library/std/src/panicking.rs:795:13
    12:     0x564532b56104 - std::panicking::begin_panic_handler::{{closure}}::h00d785e82757ce3c
                                 at /rustc/129f3b9964af4d4a709d1383930ade12dfe7c081/library/std/src/panicking.rs:664:13
    13:     0x564532b551b9 - std::sys_common::backtrace::__rust_end_short_backtrace::h1628d957bcd06996
                                 at /rustc/129f3b9964af4d4a709d1383930ade12dfe7c081/library/std/src/sys_common/backtrace.rs:171:18
    14:     0x564532b55e37 - rust_begin_unwind
                                 at /rustc/129f3b9964af4d4a709d1383930ade12dfe7c081/library/std/src/panicking.rs:652:5
    15:     0x564532b25f53 - core::panicking::panic_fmt::hdc63834ffaaefae5
                                 at /rustc/129f3b9964af4d4a709d1383930ade12dfe7c081/library/core/src/panicking.rs:72:14
    16:     0x564532b26366 - core::result::unwrap_failed::h82b551e0ff2b2176
                                 at /rustc/129f3b9964af4d4a709d1383930ade12dfe7c081/library/core/src/result.rs:1654:5
    17:     0x564532b2c438 - core::result::Result<T,E>::expect::h33784a2d338b94a7
                                 at /rustc/129f3b9964af4d4a709d1383930ade12dfe7c081/library/core/src/result.rs:1034:23
    18:     0x564532b316f6 - build_script_build::cuda_version_from_build_system::h4a38442c7c737c00
                                 at /home/dankcat/.cargo/registry/src/index.crates.io-6f17d22bba15001f/cudarc-0.11.7/build.rs:52:18
    19:     0x564532b3133a - build_script_build::main::h77dc56d88b14ee07
                                 at /home/dankcat/.cargo/registry/src/index.crates.io-6f17d22bba15001f/cudarc-0.11.7/build.rs:37:34
    20:     0x564532b2e5cb - core::ops::function::FnOnce::call_once::h2274ad654a6bbd1b
                                 at /rustc/129f3b9964af4d4a709d1383930ade12dfe7c081/library/core/src/ops/function.rs:250:5
    21:     0x564532b340fe - std::sys_common::backtrace::__rust_begin_short_backtrace::hff1eff237bf98703
                                 at /rustc/129f3b9964af4d4a709d1383930ade12dfe7c081/library/std/src/sys_common/backtrace.rs:155:18
    22:     0x564532b2b3d1 - std::rt::lang_start::{{closure}}::h214b04bede10fd10
                                 at /rustc/129f3b9964af4d4a709d1383930ade12dfe7c081/library/std/src/rt.rs:159:18
    23:     0x564532b4f850 - core::ops::function::impls::<impl core::ops::function::FnOnce<A> for &F>::call_once::h6abeee5a7794ceb5
                                 at /rustc/129f3b9964af4d4a709d1383930ade12dfe7c081/library/core/src/ops/function.rs:284:13
    24:     0x564532b4f850 - std::panicking::try::do_call::hd6e966bb06877057
                                 at /rustc/129f3b9964af4d4a709d1383930ade12dfe7c081/library/std/src/panicking.rs:559:40
    25:     0x564532b4f850 - std::panicking::try::hc9b3807f5768cb19
                                 at /rustc/129f3b9964af4d4a709d1383930ade12dfe7c081/library/std/src/panicking.rs:523:19
    26:     0x564532b4f850 - std::panic::catch_unwind::h94a757c154076c6e
                                 at /rustc/129f3b9964af4d4a709d1383930ade12dfe7c081/library/std/src/panic.rs:149:14
    27:     0x564532b4f850 - std::rt::lang_start_internal::{{closure}}::hc5223fb36050c743
                                 at /rustc/129f3b9964af4d4a709d1383930ade12dfe7c081/library/std/src/rt.rs:141:48
    28:     0x564532b4f850 - std::panicking::try::do_call::hddf7b4e1ebeb3f69
                                 at /rustc/129f3b9964af4d4a709d1383930ade12dfe7c081/library/std/src/panicking.rs:559:40
    29:     0x564532b4f850 - std::panicking::try::h1842860a1f941a31
                                 at /rustc/129f3b9964af4d4a709d1383930ade12dfe7c081/library/std/src/panicking.rs:523:19
    30:     0x564532b4f850 - std::panic::catch_unwind::h009016ccf811d4c3
                                 at /rustc/129f3b9964af4d4a709d1383930ade12dfe7c081/library/std/src/panic.rs:149:14
    31:     0x564532b4f850 - std::rt::lang_start_internal::h3ed4fe7b2f419135
                                 at /rustc/129f3b9964af4d4a709d1383930ade12dfe7c081/library/std/src/rt.rs:141:20
    32:     0x564532b2b3aa - std::rt::lang_start::ha16ce9452477e973
                                 at /rustc/129f3b9964af4d4a709d1383930ade12dfe7c081/library/std/src/rt.rs:158:17
    33:     0x564532b331fe - main
    34:     0x7fec3bef624a - <unknown>
    35:     0x7fec3bef6305 - __libc_start_main
    36:     0x564532b26541 - _start
    37:                0x0 - <unknown>

About the reason of having cluster nodes

Thanks for your valuable contribution.
I have the following question that I need some clarification. It would probably be also noteworthy to be mentioned in the README description for more clarity.
From my basic understanding, in cake we are splitting the model into its layers and distributing those layers to separate nodes because a huge 70B model will not fit into a single normal GPU. So my question is that what would be the benefit of having a cluster of these nodes on our network instead of having only a single worker and just loading and offloading each layer of model one by one? Because my understanding is that the model inference is sequential, so one node has to wait for the process of previous layers to finish to start its process. So basically having multiple nodes would appear redundant. Unless, we have some sort of pipelining mechanism that would feed batches to the nodes one at a time to perform pipelining. Is that our intention here? Could you please provide some guidance and explanation on this? Thanks again.

[Feature request] Will cake be able to train model in the future?

Cake is very useful for me , as it can make full use of all my devices and share the memory(which allows me to run larger models)
However, it seems that cake currently cannot be use to train LoRA models...
Will we add this feature in the future? That's even more useful(Training uses much more resources)
As cake depends on candle, there exists candle-lora which allows people to use candle to train models.

(And, to save as much memory as possible, wilk cake add support for gguf and 4bit models(QLoRA)? )

Building on ubuntu errors `cuMemAdvise_v2` on cuda 12.1

Compiling tracing-core v0.1.32
error[E0599]: no method named `cuMemAdvise_v2` found for reference `&'static driver::sys::sys_12010::Lib` in the current scope
   --> /home/ubuntu/.cargo/registry/src/index.crates.io-6f17d22bba15001f/cudarc-0.11.7/src/driver/result.rs:613:10
    |
612 | /     lib()
613 | |         .cuMemAdvise_v2(dptr, num_bytes, advice, location)
    | |_________-^^^^^^^^^^^^^^
    |
help: there is a method `cuMemAdvise` with a similar name
    |
613 |         .cuMemAdvise(dptr, num_bytes, advice, location)
    |          ~~~~~~~~~~~

error[E0599]: no method named `cuMemPrefetchAsync_v2` found for reference `&'static driver::sys::sys_12010::Lib` in the current scope
     --> /home/ubuntu/.cargo/registry/src/index.crates.io-6f17d22bba15001f/cudarc-0.11.7/src/driver/result.rs:628:10
      |
627   | /     lib()
628   | |         .cuMemPrefetchAsync_v2(dptr, num_bytes, location, 0, stream)
      | |_________-^^^^^^^^^^^^^^^^^^^^^
      |
help: there is a method `cuMemPrefetchAsync` with a similar name, but with different arguments
     --> /home/ubuntu/.cargo/registry/src/index.crates.io-6f17d22bba15001f/cudarc-0.11.7/src/driver/sys/sys_12010.rs:13548:5
      |
13548 | /     pub unsafe fn cuMemPrefetchAsync(
13549 | |         &self,
13550 | |         devPtr: CUdeviceptr,
13551 | |         count: usize,
13552 | |         dstDevice: CUdevice,
13553 | |         hStream: CUstream,
13554 | |     ) -> CUresult {
      | |_________________^
image

PTX代码使用了一个不被支持的工具链进行编译

您好,我在使用中遇到了新的问题
运行命令

RUST_LOG=debug CUDA_VISIBLE_DEVICES=2 ./cake-cli --model /data1/pre_trained_model/Llama-3-8B-Instruct --topology /sdc/jky/cake/topology.yml

报错如下:

[2024-07-17T06:24:01Z DEBUG] device is cuda 0
[2024-07-17T06:24:01Z INFO ] [Master] dtype=F16 device=Cuda(CudaDevice(DeviceId(1))) mem=220.7 MiB
[2024-07-17T06:24:01Z INFO ] loading configuration from /data1/pre_trained_model/Llama-3-8B-Instruct/config.json
[2024-07-17T06:24:01Z INFO ] loading topology from /sdc/jky/cake/topology.yml
[2024-07-17T06:24:01Z DEBUG] cache::n_elem = 128
[2024-07-17T06:24:01Z DEBUG] cache::theta = [ 1.0000e0, 8.1462e-1, 6.6360e-1, 5.4058e-1, 4.4037e-1, 3.5873e-1, 2.9223e-1,
     2.3805e-1, 1.9392e-1, 1.5797e-1, 1.2869e-1, 1.0483e-1, 8.5397e-2, 6.9566e-2,
     5.6670e-2, 4.6164e-2, 3.7606e-2, 3.0635e-2, 2.4955e-2, 2.0329e-2, 1.6560e-2,
     1.3490e-2, 1.0990e-2, 8.9523e-3, 7.2927e-3, 5.9407e-3, 4.8394e-3, 3.9423e-3,
     3.2114e-3, 2.6161e-3, 2.1311e-3, 1.7360e-3, 1.4142e-3, 1.1520e-3, 9.3847e-4,
     7.6450e-4, 6.2277e-4, 5.0732e-4, 4.1327e-4, 3.3666e-4, 2.7425e-4, 2.2341e-4,
     1.8199e-4, 1.4825e-4, 1.2077e-4, 9.8381e-5, 8.0143e-5, 6.5286e-5, 5.3183e-5,
     4.3324e-5, 3.5292e-5, 2.8750e-5, 2.3420e-5, 1.9078e-5, 1.5542e-5, 1.2660e-5,
     1.0313e-5, 8.4015e-6, 6.8440e-6, 5.5752e-6, 4.5417e-6, 3.6997e-6, 3.0139e-6,
     2.4551e-6]
    Tensor[[64], f32, cuda:0]
Error: DriverError(CUDA_ERROR_UNSUPPORTED_PTX_VERSION, "the provided PTX was compiled with an unsupported toolchain.") when loading cast_u32_f32

第二次请求会报错

您好,第一次请求的时候会正常输出,第二次请求会报错,主节点的服务也会终止
工作节点运行命令

CUDA_VISIBLE_DEVICES=3 ./cake-cli --model /sdc/pre_trained_model/Llama3-Chinese-8B-Instruct --mode worker --name worker0 --topology /sdc/jky/cake/topology.yml --address 0.0.0.0:10128

主节点运行命令

CUDA_VISIBLE_DEVICES=3,4,5,6,7 ./cake-cli --model /home/pre_trained_model/Llama3-Chinese-8B-Instruct --api 0.0.0.0:8080 --topology /home/jky/cake/topology.yml

报错如下:

thread 'tokio-runtime-worker' panicked at /sdc/jky/cake/cake-core/src/cake/worker.rs:215:26:
called `Result::unwrap()` on an `Err` value: cannot broadcast [29, 29] to [1, 32, 29, 170]
   0: candle_core::error::Error::bt
   1: candle_core::layout::Layout::broadcast_as
   2: candle_core::tensor::Tensor::broadcast_as
   3: cake_core::models::llama3::cache::Cache::apply_attention_mask
   4: cake_core::models::llama3::attention::CausalSelfAttention::forward
   5: <cake_core::models::llama3::transformer::Transformer as cake_core::cake::Forwarder>::forward::{{closure}}
   6: cake_core::cake::worker::Worker<G>::run::{{closure}}::{{closure}}
   7: tokio::runtime::task::core::Core<T,S>::poll
   8: tokio::runtime::task::harness::Harness<T,S>::poll
   9: tokio::runtime::scheduler::multi_thread::worker::Context::run_task
  10: tokio::runtime::scheduler::multi_thread::worker::Context::run
  11: tokio::runtime::context::set_scheduler
  12: tokio::runtime::context::runtime::enter_runtime
  13: tokio::runtime::scheduler::multi_thread::worker::run
  14: tokio::runtime::task::core::Core<T,S>::poll
  15: tokio::runtime::task::harness::Harness<T,S>::poll
  16: tokio::runtime::blocking::pool::Inner::run
  17: std::sys_common::backtrace::__rust_begin_short_backtrace
  18: core::ops::function::FnOnce::call_once{{vtable.shim}}
  19: std::sys::pal::unix::thread::Thread::new::thread_start
  20: <unknown>
  21: <unknown>


Stack backtrace:
   0: anyhow::error::<impl core::convert::From<E> for anyhow::Error>::from
   1: <cake_core::models::llama3::transformer::Transformer as cake_core::cake::Forwarder>::forward::{{closure}}
   2: cake_core::cake::worker::Worker<G>::run::{{closure}}::{{closure}}
   3: tokio::runtime::task::core::Core<T,S>::poll
   4: tokio::runtime::task::harness::Harness<T,S>::poll
   5: tokio::runtime::scheduler::multi_thread::worker::Context::run_task
   6: tokio::runtime::scheduler::multi_thread::worker::Context::run
   7: tokio::runtime::context::set_scheduler
   8: tokio::runtime::context::runtime::enter_runtime
   9: tokio::runtime::scheduler::multi_thread::worker::run
  10: tokio::runtime::task::core::Core<T,S>::poll
  11: tokio::runtime::task::harness::Harness<T,S>::poll
  12: tokio::runtime::blocking::pool::Inner::run
  13: std::sys_common::backtrace::__rust_begin_short_backtrace
  14: core::ops::function::FnOnce::call_once{{vtable.shim}}
  15: std::sys::pal::unix::thread::Thread::new::thread_start
  16: <unknown>
  17: <unknown>
stack backtrace:
   0: rust_begin_unwind
   1: core::panicking::panic_fmt
   2: core::result::unwrap_failed
   3: cake_core::cake::worker::Worker<G>::run::{{closure}}::{{closure}}
   4: tokio::runtime::task::core::Core<T,S>::poll
   5: tokio::runtime::task::harness::Harness<T,S>::poll
   6: tokio::runtime::scheduler::multi_thread::worker::Context::run_task
   7: tokio::runtime::scheduler::multi_thread::worker::Context::run
   8: tokio::runtime::context::set_scheduler
   9: tokio::runtime::context::runtime::enter_runtime
  10: tokio::runtime::scheduler::multi_thread::worker::run
  11: tokio::runtime::task::core::Core<T,S>::poll
  12: tokio::runtime::task::harness::Harness<T,S>::poll
  13: tokio::runtime::blocking::pool::Inner::run
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.

bug with tokenizer and gibberish output

the tokenizer has issues resolving a few tokens including special ones (they will be shown in the output as ), which is causing all sorts of gibberish output ... it's probably a matter of parsing the model/tokenizer.json properly

Error in model.forward: error in forward batch operation for block

The first time I call the API, it works fine. However, when I call the REST API for the second time, the master node reports the error:
cake/api/mod.rs:98:10: called Result::unwrap() on an Err value: error in model.forward: error in forward batch operation for block 29: error receiving response for Batch
Additionally, one of my workers will also trigger an error and then stop:
src/cake/worker.rs:225:26: called Result::unwrap() on an Err value: cannot broadcast [28, 28] to [1, 32, 28, 65]

request for the second time error details

image
image

You encountered an error message displaying a stack backtrace, but the specific error details are not clearly shown. This typically indicates that the program encountered an unhandled error or crashed. The entries in the backtrace suggest that specific symbol information is not available, possibly due to missing debug symbols or optimizations that removed detailed information.

Thanks for the FOSS! Suggestion for future possible backends runtimes: Vulkan, OpenCL, SYCL/OpenVino/intel GPU, AMD gpu/ROCm/HIP.

Thanks for the FOSS!

Suggestion for future possible backends runtimes: Vulkan, OpenCL, SYCL/OpenVino/intel GPU, AMD gpu/ROCm/HIP.

Vulkan and OpenCL both have the possibility of being very portable to GPUs and also to some extent CPUs that have supporting SW for it.

SYCL can run on various CPU / GPU platforms; it / openvino etc. is the primary ideal target to support intel gpus.

May I ask why I am unable to download the model and use the product through Huggingface

root@llama01:/www/cake# /www/cake/target/release/cake-cli --model /www/llama --mode worker --name linux_server_1 --address 0.0.0.0:9527 --topology /www/cake/topology.yml
[2024-08-08T16:11:12Z INFO ] [Worker] dtype=F16 device=Cpu mem=5.3 MiB
[2024-08-08T16:11:12Z INFO ] loading configuration from /www/llama/config.json
[2024-08-08T16:11:12Z INFO ] loading topology from /www/cake/topology.yml
[2024-08-08T16:11:12Z INFO ] loading tensors in /www/llama/model.safetensors.index.json
[2024-08-08T16:11:12Z INFO ] loading tensors from /www/llama/model.safetensors.index.json ...
[2024-08-08T16:11:12Z INFO ] loading model-00002-of-00004.safetensors ...
Error: cannot find tensor model-00002-of-00004.safetensors.self_attn.q_proj.weight

Inquiries about the possibility of supporting windows systems

Hello developers, I see you this project, really awesome, I have been struggling with the lack of performance of the device, and do not have much money to buy A100 graphics card, because to buy milk powder for the child, haha, would like to consult whether there is the intention to join the windows system, I see that the mac, linux, android have support, we are mainly on the side of windows 7! I see mac, linux, and android are all supported, we are mainly windows 7 on our side, we have six computers, it would be nice to have a cluster that supports windows.

Dockerfile support

Hereby I have successfully compiled your project with Docker, and am willing to share with anyone struggling to do the same.

Since this software is in alpha, I advise the author to use this as reference and build official docker image for this project, before static linking and AppImage.

The filesystem structure is:

├── build.sh # build script
├── cake # cloned repository
├── cargo_config.toml # cargo mirror config
├── Dockerfile_intermediate # building intermediate image
└── run.sh # run the final container

Content of build.sh:

INTERMEDIATE_IMAGE_NAME=cake_llm_intermediate
IMAGE_NAME=cake_llm

INTERMEDIATE_CONTAINER_NAME=cake_container_intermediate
CONTAINER_NAME=cake_container

git clone https://github.com/evilsocket/cake

docker kill $CONTAINER_NAME
docker rm $CONTAINER_NAME
docker rmi $INTERMEDIATE_IMAGE_NAME

docker build -t $INTERMEDIATE_IMAGE_NAME -f Dockerfile_intermediate .


read -p "Do you want to continue? (y/n): " answer

case $answer in
    [Yy]* ) echo "You chose yes.";;
    [Nn]* ) echo "You chose no."; exit 1;;
    * ) echo "Please answer yes or no."; exit 1;;
esac

docker kill $INTERMEDIATE_CONTAINER_NAME
docker rm $INTERMEDIATE_CONTAINER_NAME

docker rmi $IMAGE_NAME
docker run -d --privileged --gpus 1 --name $INTERMEDIATE_CONTAINER_NAME $INTERMEDIATE_IMAGE_NAME tail -f /dev/null
docker exec -w /root/cake $INTERMEDIATE_CONTAINER_NAME cargo build
docker commit $INTERMEDIATE_CONTAINER_NAME $IMAGE_NAME 

docker kill $INTERMEDIATE_CONTAINER_NAME
docker rm $INTERMEDIATE_CONTAINER_NAME

Content of Dockerfile_intermediate:

FROM nvidia/cuda:12.4.0-base-ubuntu22.04

RUN rm /etc/apt/apt.conf.d/docker-clean
RUN apt update
RUN apt install -y build-essential curl

RUN apt install -y cuda-nvcc-12-4 cuda-nvrtc-dev-12-4 libcublas-dev-12-4 libcurand-dev-12-4

RUN apt install -y cargo

COPY cake /root/cake

COPY cargo_config.toml /root/.cargo/config.toml

Content of run.sh:

IMAGENAME=cake_llm
CONTAINER_NAME=cake_container

docker kill $CONTAINER_NAME
docker rm $CONTAINER_NAME

MODEL_PATH=/root/data/Meta-Llama-3-8B-Instruct
TOPOFILE=/root/data/topology.yaml

docker run -it --rm --mount type=bind,source=<source_path>,target=/root/data,ro -e LD_LIBRARY_PATH=/usr/local/cuda-12.4/targets/x86_64-linux/lib/ --name $CONTAINER_NAME --privileged --gpus 1 $IMAGENAME /root/cake/target/debug/cake-cli --model $MODEL_PATH --topology $TOPOFILE 

无法编译成功

C:\Users\Administrator\Desktop\cake>cargo build --release
warning: C:\Users\Administrator\Desktop\cake\cake-ios\Cargo.toml: crate_type is deprecated in favor of crate-type and will not work in the 2024 edition
(in the cake library target)
Compiling cudarc v0.11.7
Compiling candle-kernels v0.6.0
Compiling clap_lex v0.7.1
Compiling bit-vec v0.6.3
Compiling strsim v0.11.1
Compiling nom v7.1.3
Compiling console v0.15.8
Compiling esaxx-rs v0.1.10
error: failed to run custom build command for cudarc v0.11.7
note: To improve backtraces for build dependencies, set the CARGO_PROFILE_RELEASE_BUILD_OVERRIDE_DEBUG=true environment variable to enable debug information generation.

Caused by:
process didn't exit successfully: C:\Users\Administrator\Desktop\cake\target\release\build\cudarc-95f6bdd5c33de08a\build-script-build (exit code: 101)
--- stdout
cargo:rerun-if-changed=build.rs
cargo:rerun-if-env-changed=CUDA_ROOT
cargo:rerun-if-env-changed=CUDA_PATH
cargo:rerun-if-env-changed=CUDA_TOOLKIT_ROOT_DIR

--- stderr
thread 'main' panicked at C:\Users\Administrator.cargo\registry\src\index.crates.io-6f17d22bba15001f\cudarc-0.11.7\build.rs:82:14:
Unsupported cuda toolkit version: 11.0. Please raise a github issue.
stack backtrace:
0: std::panicking::begin_panic_handler
at /rustc/129f3b9964af4d4a709d1383930ade12dfe7c081/library\std\src\panicking.rs:652
1: core::panicking::panic_fmt
at /rustc/129f3b9964af4d4a709d1383930ade12dfe7c081/library\core\src\panicking.rs:72
2: <alloc::vec::Vec as core::iter::traits::collect::FromIterator>::from_iter
3: <alloc::vec::Vec as core::iter::traits::collect::FromIterator>::from_iter
4: core::ops::function::FnOnce::call_once
note: Some details are omitted, run with RUST_BACKTRACE=full for a verbose backtrace.
warning: build failed, waiting for other jobs to finish...
error: failed to run custom build command for candle-kernels v0.6.0
note: To improve backtraces for build dependencies, set the CARGO_PROFILE_RELEASE_BUILD_OVERRIDE_DEBUG=true environment variable to enable debug information generation.

Caused by:
process didn't exit successfully: C:\Users\Administrator\Desktop\cake\target\release\build\candle-kernels-644872f2b8f06ed1\build-script-build (exit code: 101)
--- stdout
cargo:rerun-if-changed=build.rs
cargo:rerun-if-changed=src/compatibility.cuh
cargo:rerun-if-changed=src/cuda_utils.cuh
cargo:rerun-if-changed=src/binary_op_macros.cuh
cargo:info=["/usr", "/usr/local/cuda", "/opt/cuda", "/usr/lib/cuda", "C:/Program Files/NVIDIA GPU Computing Toolkit", "C:/CUDA"]
cargo:rerun-if-env-changed=CUDA_COMPUTE_CAP

--- stderr
thread 'main' panicked at C:\Users\Administrator.cargo\registry\src\index.crates.io-6f17d22bba15001f\bindgen_cuda-0.1.5\src\lib.rs:492:9:
assertion left == right failed
left: "Field "compute_cap" is not a valid field to query."
right: "compute_cap"
stack backtrace:
0: std::panicking::begin_panic_handler
at /rustc/129f3b9964af4d4a709d1383930ade12dfe7c081/library\std\src\panicking.rs:652
1: core::panicking::panic_fmt
at /rustc/129f3b9964af4d4a709d1383930ade12dfe7c081/library\core\src\panicking.rs:72
2: core::panicking::assert_failed_inner
at /rustc/129f3b9964af4d4a709d1383930ade12dfe7c081/library\core\src\panicking.rs:409
3: core::panicking::assert_failed
4: bindgen_cuda::cuda_include_dir::{{closure}}
5: <bindgen_cuda::Builder as core::default::Default>::default
6: std::rt::lang_start
7: std::rt::lang_start
8: __ImageBase
9: std::rt::lang_start
10: std::rt::lang_start_internal
at /rustc/129f3b9964af4d4a709d1383930ade12dfe7c081/library\std\src\rt.rs:141
11: std::rt::lang_start
12: main
13: invoke_main
at D:\a_work\1\s\src\vctools\crt\vcstartup\src\startup\exe_common.inl:78
14: __scrt_common_main_seh
at D:\a_work\1\s\src\vctools\crt\vcstartup\src\startup\exe_common.inl:288
15: BaseThreadInitThunk
16: RtlUserThreadStart
note: Some details are omitted, run with RUST_BACKTRACE=full for a verbose backtrace.

C:\Users\Administrator\Desktop\cake>nvidia-smi
Tue Jul 16 01:15:46 2024
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 457.30 Driver Version: 457.30 CUDA Version: 11.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name TCC/WDDM | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 GeForce GTX 108... WDDM | 00000000:01:00.0 On | N/A |
|100% 29C P8 16W / 250W | 555MiB / 11264MiB | 2% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 1340 C+G Insufficient Permissions N/A |
| 0 N/A N/A 12420 C+G C:\Windows\explorer.exe N/A |
| 0 N/A N/A 12928 C+G ...m Files\ToDesk\ToDesk.exe N/A |
| 0 N/A N/A 13352 C+G ...artMenuExperienceHost.exe N/A |
| 0 N/A N/A 13940 C+G ...d\runtime\WeChatAppEx.exe N/A |
| 0 N/A N/A 14476 C+G ...y\ShellExperienceHost.exe N/A |
| 0 N/A N/A 15492 C+G ...2txyewy\TextInputHost.exe N/A |
| 0 N/A N/A 17964 C+G ...ray\lghub_system_tray.exe N/A |
| 0 N/A N/A 18256 C+G ...e\PhoneExperienceHost.exe N/A |
| 0 N/A N/A 18744 C+G ...5n1h2txyewy\SearchApp.exe N/A |
| 0 N/A N/A 19444 C+G ...lPanel\SystemSettings.exe N/A |
+-----------------------------------------------------------------------------+

C:\Users\Administrator\Desktop\cake>nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2020 NVIDIA Corporation
Built on Thu_Jun_11_22:26:48_Pacific_Daylight_Time_2020
Cuda compilation tools, release 11.0, V11.0.194
Build cuda_11.0_bu.relgpu_drvr445TC445_37.28540450_0

无法找到指定的文件

模型路径是正确的,不知道这个报错指的是哪个文件找不到

[Worker] dtype=F16 device=Cuda(CudaDevice(DeviceId(1))) mem=207.4 MiB
 loading topology from topology.yml
loading configuration from /sdc/pre_trained_model/Llama3-Chinese-8B-Instruct/config.json
Error: No such file or directory (os error 2)

Is it possible to use quantized models?

Firs of all, I wanna thank you for your hard work, I love this project and I thinks it's awesome to be able to handle inference on different devices.
As for me, the point in splitting a model among different devices, lays in my current RAM limitations, so I guess it would have much more sense to be able to use quantized versions of the big models.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.