A LLM training and validation pipeline
pip install -e .
to install the library
- Train an LLAMA-2 7b model with text2sql datasets
python -m dependent.dependent.examples.sft_spider
- Refer to how the text2sql example set up the configuration
- Steps:
- Configuration stage: create a fine-tune pipeline configuration dict/yaml/json and wrap it with
DPConfig
class.- The configuration has 4 sections:
algorithm
,llm
,train
, anddata
. - Each section is wrapped with the corresponding configuration class.
- The configuration classes are designed to maximize flexibity in composing different LLM libraries.
- New arguments required by some LLM library can be easily added in the configuration stage.
- The configuration has 4 sections:
- Create fine-tuning pipeline: create a customized fine-tuning pipeline with the
DPConfig
configuration.- Fine-tuning pipeline for different task can be different. An example is text2sql pipeline
- For different tasks, dataset adapter may be needed. An example is text2sql dataset adapter
- As aforementioned,
DPConfig
class can help add new arguments easily.
- Fine-tunning: run the fine-tuning pipeline.
- K-split training: use
DataSplit
class can split, compose, concatenate datasets with ease. - Trainer library: use different trainers for fine-tuning.
- K-split training: use
- Configuration stage: create a fine-tune pipeline configuration dict/yaml/json and wrap it with
- Add support for hyperparameter tuning based on the
DPConfig
class. - Add support for more training APIs in trainers
- Add
mlflow
to manage the pipeline - Add
wandb
to monitor the training and evaluation process - Add RLHF support by referring to trlx