Giter VIP home page Giter VIP logo

fine-tune-with-deepspeed's Introduction

Fine-Tuning Llama 13B with HuggingFace Transformers and DeepSpeed

This README describes steps for instruction fine-tuning of the Llama 13B model on the alpaca-gpt-4 dataset using HuggingFace Transformers and DeepSpeed. The script can be run on a single A100 GPU, for instance in Goggle Colab - just open the terminal and complete the steps listed below. Alternatively, you can run this on your choice of one or several GPUs.

Prerequisites

  • Python 3.6 or later.
  • Basic familiarity with Python programming and virtual environments.

Setup

1. Create a Virtual Environment (Optional)

Creating a virtual environment is recommended to avoid conflicts with system-wide packages.

python -m venv venv
source venv/bin/activate  # On Windows, use `venv\Scripts\activate`

2. Clone this code repository

If cloning with HTTPS:

git clone https://github.com/agnedil/fine-tune-with-deepspeed.git

3. Make sure you have access to GPU

nvcc --version

4. Ensure access to HuggingFace Hub

If you need access to HuggingFace Hub, provide the access token after running this command:

huggingface-cli login

5. Install Dependencies

Install the required libraries using pip:

pip install accelerate peft bitsandbytes transformers trl datasets deepspeed

If you encounter a matrix multiplication error when running the script, you may want to downgrade the transformers package according to this issue:

!pip install git+https://github.com/huggingface/[email protected]

6. Run the Fine-Tuning Script on One GPU

The fine-tuning script, script.py, is prepared for execution along with a config file ds_config.json. To run the script with DeepSpeed, execute the following command in your terminal in the same directory as the above two files (or adjust file paths respectively):

deepspeed --num_nodes 1 --num_gpus=1 script.py

7. Run the Fine-Tuning Script on Multiple GPUs

Modify ds_config.json and replace m with a specific number of GPU nodes and n with a specific number of GPUs per node in the command below. For more details on running DeepSpeed, see the References section below.

deepspeed --num_nodes {m} --num_gpus={n} script.py

8. References

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.