Giter VIP home page Giter VIP logo

dl_env_setups's Introduction

Deep Learning Environment Setups

License: MIT enUS zhCN

This repo contains my ways of setting up various project environment setups. Please suggest more easy-to-use piplines and DL dev tips.

中文版本

TODO

Docker Setups

This setup contains a way of setting up an exprimenting env with docker. The environment was based on Nvidia NGC Docker Pytorch images.

There are many advantages of using this method:

  1. You don't have to setup nvcc, cuDNN, etc.
  2. You can fire up an env without messing up the Windows system. The only thing you need is docker-desktop.
  3. This setup is developed and maintained by Nvidia themselves.

The only thing you need is a proper Nvidia GPU driver, and nvidia-container-toolkit based on you system setting. Here is a guide if you need to setup nvidia-container-toolkit in Linux.

To use this setup, you have to:

  1. Specify all the required packages you need in the requirement.txt.

  2. Replace the desired base image in ./exp_container/Dockerfile.

  3. Give a proper name and tag for your image. For example liux2/app-framework-experiment:exp. And build the docker image with:

    bash exp_container/build_docker.sh
  4. Change the flags in ./exp_container/env_docker.sh based on your needs. And finally, fire up the conatiner with bash exp_container/env_docker.sh.

Tips:

  1. The -it flag in the build docker command will start an interactive terminal for you. From here, you can choose to start a Jupyter-lab environment with:

    bash exp_container/start_jupyter.sh
  2. The -v flag with the path specified will attach your current dir into docker container.

  3. After the Jupyter-lab env been setup, you can choose to use the docker Jupyter kernal from VS code if you prefer local IDE setups.

  4. If you decide to migrate this env to another machine, use

    docker save -o {{backup_file.tar}} {{image_name:tag}}

    to save your image to a tar file with name and tag preserved. And use:

    docker load -i {{backup_file.tar}}

    to load it on the target machine.

Docker Compose

An easier alternative is to use docker compose. You can set the necessary parameters in docker-compose file, and run with docker compose up or docker compose up -d for detached mode. You can enter the terminal with the follwing parameter set:

stdin_open: true
tty: true

and enter with docker compose exec {service_name} sh. Here the {service_name} should be the name you used for the container, dev is the name used in docker compose file.

Conda Setups

LLM API Setups

This section provides various ways of serving LLM APIs.

FastChat

This setup contains a way of setting up an OpenAI-style API for your desired LLM based on fastchat.

To use this setup, you have to:

  1. Download the checkpoints from your LLM source repo into LLM_fastchat_api/.
  2. Specify your LLM requirements in requirements.txt.
  3. Change any necessary params in the docker-compose.yml.
  4. Fireup the API with docker compose up and shut it down with docker compose down.

Tips:

  1. You can choose to change the backend to vLLM for faster experience with this tutorial.

Dataset Preparation

Google Drive Downloader

Google drive is an easy-to-use dataset storing and sharing application. To download files from Goole drive to your server with exploiting the high bandwidth:

  1. Prepare your dataset as a compressed file, and click share file.

  2. Get your user id by requiring a Link, the id will be a long hash in the middle that looks like https://drive.google.com/file/d/a-long-hash/view?usp=sharing.

  3. Put the hash and the file name in the scripts/gd-downloader.sh

  4. Run with:

    bash scripts/gd-downloader.sh

Kaggle Dataset Downloader

To download datasets from Kaggle, you need to:

  1. Go to your Kaggle account, get an API key in the API section. And download the JSON file.

  2. Open a terminal and run:

    pip install -q kaggle
    pip install -q kaggle-cli
    mkdir -p ~/.kaggle
    cp "your/path/to/kaggle.json" ~/.kaggle/
    cat ~/.kaggle/kaggle.json 
    chmod 600 ~/.kaggle/kaggle.json
    
    # For competition datasets
        kaggle competitions download -c dataset_name -p download_to_folder
    # For other datasets
    kaggle datasets download -d user/dataset_name -p download_to_folder

    Replace:

    • your /path/to/kaggle.json with your path to kaggle.json on drive.
    • download_to_folder with the folder where you’d like to store the downloaded dataset.
    • dataset_name and/or user/dataset_name.

source: https://towardsdatascience.com/a-quicker-way-to-download-kaggle-datasets-in-google-collab-abe90bf8c866

Huggingface Dataset Downloader

Huggingface datasets package provides a way of loading datasets easily from their Hub.

The tutorials and use cases can be found from their homepage.

Tips

This section recommends tips for software developing.

Environmental Variable Setups

You can use dotenv to load system variables. It can protect your keys and passwords from leaking. To install in Python with pip, use pip install python-dotenv. To use this package:

  1. Prepare an .env file, an example can be

    OPENAI_API_KEY = "sk-xxx"
  2. Example Python script usage:

    import os
    from dotenv import load_dotenv
    env_path = "scripts/secrets.env"
    load_dotenv(dotenv_path=env_path, verbose=True)
    OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")
  3. Example Jupyter Notebook usage:

    import os
    from dotenv import load_dotenv
    %load_ext dotenv
    %dotenv ./scripts/secrets.env
    OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")

dl_env_setups's People

Contributors

liux2 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.