The candle-lora from ericlbuehler

any example for llama_lora training

Is there any way to save lora-converted model?

I tried to fine tune TinyLlama with this crate. I use candle-lora/candle-lora-transformers/examples/llama.rs to load model.safetensors, do stuff about training, eventually find that there's no way to save the model in safetensors format.

I tried to implement a save method myself wrapping candle_core::safetensors::save(), but how can I get the weight of lora part? All I can get is the raw model before it converted to lora model.

For example, if you run /candle-lora-macro/examples/linear.rs, by println!("{:?}", model.a); you will see it printed as Linear struct, not a LoraLinear struct, and you can't get ff_a、ff_b from model.a, despite that the model is converted to a lora model.

Model Merging

Hello Eric,

This issue may not be relevant to this repo, but it seems like model merging is gathering some speed. Have you seen any examples ? Any tips on how to implement this in candle ?

Thanks !

Updated candle-core, candle-nn [0.5.0] release breaks installation of candle-lora and candle-lora-macro dependencies

When I try the following, I get an error.

[dependencies]
candle-core = { version = "0.4.1", features = ["metal"] }
candle-metal = "0.27.1"
candle-nn = { version = "0.4.1" }

cargo add -v --git https://github.com/EricLBuehler/candle-lora.git candle-lora candle-lora-macro                           
    Updating git repository `https://github.com/EricLBuehler/candle-lora.git`
      Adding candle-lora (git) to dependencies.
             Features:
             - accelerate
             - cuda
             - cudarc
             - cudnn
             - flash-attn
             - mkl
             - nccl
      Adding candle-lora-macro (git) to dependencies.
             Features:
             - accelerate
             - cuda
             - cudarc
             - cudnn
             - flash-attn
             - mkl
             - nccl
    Updating git repository `https://github.com/EricLBuehler/candle-lora.git`
    Updating crates.io index
    Updating git repository `https://github.com/huggingface/candle.git`
error: failed to select a version for the requirement `candle-core = "^0.4.1"`
candidate versions found which didn't match: 0.5.0
location searched: Git repository https://github.com/huggingface/candle.git
required by package `candle-lora v0.2.0 (https://github.com/EricLBuehler/candle-lora.git#a7ea48a7)`

I can resolve this by adding:

[patch."https://github.com/huggingface/candle.git"]
candle-core = { version = "0.4.1" }
candle-nn = { version = "0.4.1" }

to my toml - just thought you should know that 0.5.0 candle-core is complicating adding this crate to a project if you are just following the instructions on this repo.

Thanks
Andy

Question: Could we use the same mechanism for Quantization?

Just an idea but couldn't the LayerLike mechanism be used to simply swap e.g. Linear-Layers in a model with quantized Linear-Layers?
This would make quantization of already implemented models trivial.

QA-LoRA Implementation and Review

Hello Eric,

I recently came across QA-LoRA paper.

How easy or difficult is it to implement it in the candle-lora repo ?

Add more LoRA transformers

replace_layer_fields and AutoLoraConvert not working as expected

The proc-macro does not work as expected:

From running :

cd candle-lora-macro/examples
cargo expand --test linear

Output: (Only the `Model` part)

impl Model {
    /// Be sure to provide a configuration for each type!
    pub fn get_lora_model<'a>(
        &'a mut self,
        lora_config: candle_lora::LoraConfig,
        vb: &candle_nn::VarBuilder,
        linear_config: Option<candle_lora::LoraLinearConfig>,
        conv1d_config: Option<candle_lora::LoraConv1dConfig>,
        conv2d_config: Option<candle_lora::LoraConv2dConfig>,
        embed_config: Option<candle_lora::LoraEmbeddingConfig>,
    ) {
        let mut linear: ::std::collections::HashMap<
            String,
            &dyn candle_lora::LinearLayerLike,
        > = ::std::collections::HashMap::new();
        let mut conv1d: ::std::collections::HashMap<
            String,
            &dyn candle_lora::Conv1dLayerLike,
        > = ::std::collections::HashMap::new();
        let mut conv2d: ::std::collections::HashMap<
            String,
            &dyn candle_lora::Conv2dLayerLike,
        > = ::std::collections::HashMap::new();
        let mut embed: ::std::collections::HashMap<
            String,
            &dyn candle_lora::EmbeddingLayerLike,
        > = ::std::collections::HashMap::new();
        if !linear.is_empty() && linear_config.is_none() {
            {
                ::core::panicking::panic_fmt(
                    format_args!("Config not speified for linear layers."),
                );
            };
        }
        if !conv1d.is_empty() && conv1d_config.is_none() {
            {
                ::core::panicking::panic_fmt(
                    format_args!("Config not speified for conv1d layers."),
                );
            };
        }
        if !conv2d.is_empty() && conv2d_config.is_none() {
            {
                ::core::panicking::panic_fmt(
                    format_args!("Config not speified for conv2d layers."),
                );
            };
        }
        if !embed.is_empty() && embed_config.is_none() {
            {
                ::core::panicking::panic_fmt(
                    format_args!("Config not speified for embedding layers."),
                );
            };
        }
        let mut builder = candle_lora::SelectedLayersBuilder::new();
        if linear_config.is_some() {
            builder = builder.add_linear_layers(linear, linear_config.unwrap());
        }
        if conv1d_config.is_some() {
            builder = builder.add_conv1d_layers(conv1d, conv1d_config.unwrap());
        }
        if conv2d_config.is_some() {
            builder = builder.add_conv2d_layers(conv2d, conv2d_config.unwrap());
        }
        if embed_config.is_some() {
            builder = builder.add_embed_layers(embed, embed_config.unwrap());
        }
        let selection = builder.build();
        let new_layers = candle_lora::Lora::convert_model(selection, lora_config, &vb);
        let _ = "Start";
        let _ = "Done!";
    }
    /// Be sure to provide a configuration for each type!
    pub fn get_merged_lora_model<'a>(
        &'a mut self,
        lora_config: candle_lora::LoraConfig,
        vb: &candle_nn::VarBuilder,
        linear_config: Option<candle_lora::LoraLinearConfig>,
        conv1d_config: Option<candle_lora::LoraConv1dConfig>,
        conv2d_config: Option<candle_lora::LoraConv2dConfig>,
        embed_config: Option<candle_lora::LoraEmbeddingConfig>,
    ) {
        use candle_lora::Merge;
        let mut linear: ::std::collections::HashMap<
            String,
            &dyn candle_lora::LinearLayerLike,
        > = ::std::collections::HashMap::new();
        let mut conv1d: ::std::collections::HashMap<
            String,
            &dyn candle_lora::Conv1dLayerLike,
        > = ::std::collections::HashMap::new();
        let mut conv2d: ::std::collections::HashMap<
            String,
            &dyn candle_lora::Conv2dLayerLike,
        > = ::std::collections::HashMap::new();
        let mut embed: ::std::collections::HashMap<
            String,
            &dyn candle_lora::EmbeddingLayerLike,
        > = ::std::collections::HashMap::new();
        if !linear.is_empty() && linear_config.is_none() {
            {
                ::core::panicking::panic_fmt(
                    format_args!("Config not speified for linear layers."),
                );
            };
        }
        if !conv1d.is_empty() && conv1d_config.is_none() {
            {
                ::core::panicking::panic_fmt(
                    format_args!("Config not speified for conv1d layers."),
                );
            };
        }
        if !conv2d.is_empty() && conv2d_config.is_none() {
            {
                ::core::panicking::panic_fmt(
                    format_args!("Config not speified for conv2d layers."),
                );
            };
        }
        if !embed.is_empty() && embed_config.is_none() {
            {
                ::core::panicking::panic_fmt(
                    format_args!("Config not speified for embedding layers."),
                );
            };
        }
        let mut builder = candle_lora::SelectedLayersBuilder::new();
        if linear_config.is_some() {
            builder = builder.add_linear_layers(linear, linear_config.unwrap());
        }
        if conv1d_config.is_some() {
            builder = builder.add_conv1d_layers(conv1d, conv1d_config.unwrap());
        }
        if conv2d_config.is_some() {
            builder = builder.add_conv2d_layers(conv2d, conv2d_config.unwrap());
        }
        if embed_config.is_some() {
            builder = builder.add_embed_layers(embed, embed_config.unwrap());
        }
        let selection = builder.build();
        let mut new_layers = candle_lora::Lora::convert_model(
            selection,
            lora_config,
            &vb,
        );
    }
}

Note that these lines:

candle-lora/candle-lora-macro/src/lib.rs

Lines 547 to 555 in 273eef4

 #linear_stream 

 #conv1d_stream 

 #conv2d_stream 

 #embed_stream 

 #linear_option1_stream 

 #conv1d_option1_stream 

 #conv2d_option1_stream 

 #embed_option1_stream

are not expanded.

error[E0277]: expected a `Fn<(&candle_core::Tensor,)>` closure, found `BatchNorm`

Please see huggingface/candle#1647.

Could we have a written walkthrough of finetuning llama/mistral with this?

I'm really excited with finetuning LLMs on rust but I'm a complete beginner to machine learning so even though I've spent some time going through the repo it would be great if there was a plainwritten tutorial or step by step guide on finetuning some of the more popular models with this.

Thanks!

Examples for Llama model architecture

Hello Eric, this looks like great work ! Thank you !!

Can you please add examples for both training and inference for Llama model using candle-lora ? Is it supported through this work ?

In Llama model, only the embedding layer is converted to lora layer.

I tried to fine tune TinyLlama with this crate. After training, the safetensors saved only contains two tensors:

lora_llama.b0
lora_llama.a0

I expand the macro in mod llama and find that these two layers will be used in embedding layers.

        pub fn get_lora_model<'a>(
            &'a mut self,
            lora_config: candle_lora::LoraConfig,
            vb: &candle_nn::VarBuilder,
            linear_config: Option<candle_lora::LoraLinearConfig>,
            conv1d_config: Option<candle_lora::LoraConv1dConfig>,
            conv2d_config: Option<candle_lora::LoraConv2dConfig>,
            embed_config: Option<candle_lora::LoraEmbeddingConfig>,
        ) {
            let mut linear: ::std::collections::HashMap<
                String,
                &dyn candle_lora::LinearLayerLike,
            > = ::std::collections::HashMap::new();
            let mut conv1d: ::std::collections::HashMap<
                String,
                &dyn candle_lora::Conv1dLayerLike,
            > = ::std::collections::HashMap::new();
            let mut conv2d: ::std::collections::HashMap<
                String,
                &dyn candle_lora::Conv2dLayerLike,
            > = ::std::collections::HashMap::new();
            let mut embed: ::std::collections::HashMap<
                String,
                &dyn candle_lora::EmbeddingLayerLike,
            > = ::std::collections::HashMap::new();
            [(embed.insert("wte".to_string(), &*self.wte))];
            if !linear.is_empty() && linear_config.is_none() {
                {
                    ::core::panicking::panic_fmt(
                        format_args!("Config not speified for linear layers."),
                    );
                };
            }
            if !conv1d.is_empty() && conv1d_config.is_none() {
                {
                    ::core::panicking::panic_fmt(
                        format_args!("Config not speified for conv1d layers."),
                    );
                };
            }
            if !conv2d.is_empty() && conv2d_config.is_none() {
                {
                    ::core::panicking::panic_fmt(
                        format_args!("Config not speified for conv2d layers."),
                    );
                };
            }
            if !embed.is_empty() && embed_config.is_none() {
                {
                    ::core::panicking::panic_fmt(
                        format_args!("Config not speified for embedding layers."),
                    );
                };
            }
            let mut builder = candle_lora::SelectedLayersBuilder::new();
            if linear_config.is_some() {
                builder = builder.add_linear_layers(linear, linear_config.unwrap());
            }
            if conv1d_config.is_some() {
                builder = builder.add_conv1d_layers(conv1d, conv1d_config.unwrap());
            }
            if conv2d_config.is_some() {
                builder = builder.add_conv2d_layers(conv2d, conv2d_config.unwrap());
            }
            if embed_config.is_some() {
                builder = builder.add_embed_layers(embed, embed_config.unwrap());
            }
            let selection = builder.build();
            let new_layers = candle_lora::Lora::convert_model(selection, lora_config, &vb);
            [
                (self
                    .wte = ::std::sync::Arc::new(
                    new_layers.embed.get("wte").unwrap().clone(),
                )),
            ];
        }

So none of linear layer in the self-attention block is converted to lora layer. When I use my fine-tuned model, it behave exactly the same as before.

How to use canle_lora modle with rust auxm web server

Auxm code is below

let model = Llama::load(
        vb,
        &cache,
        &config,
        false,
        loraconfig,
        linearconfig,
        embedconfig,
    )
    .unwrap();

Router::new().layer(Extension(model))

the error is

(dyn EmbeddingLayerLike + 'static)` cannot be sent between threads safely
the trait `Send` is not implemented for `(dyn EmbeddingLayerLike + 'static)`
the trait `tower_layer::Layer<S>` is implemented for `Extension<T>`
required for `Arc<(dyn EmbeddingLayerLike + 'static)>` to implement `Send`
required for `Extension<LoraLLM>` to implement `tower_layer::Layer<Route>

	#linear_stream
	#conv1d_stream
	#conv2d_stream
	#embed_stream

	#linear_option1_stream
	#conv1d_option1_stream
	#conv2d_option1_stream
	#embed_option1_stream

ericlbuehler / candle-lora Goto Github PK

candle-lora's Introduction

Eric Buehler

Libraries

Programming Languages

Machine Learning Libraries

Machine Learning Applications

Mathematics

candle-lora's People

Contributors

Stargazers

Watchers

Forkers

candle-lora's Issues

Recommend Projects

Recommend Topics

Recommend Org