Giter VIP home page Giter VIP logo

critical-dream's Introduction

Critical Dream Logo

A trippy, visual companion to critical role episodes


Objective: To create a visual companion to critical role episodes that renders scenes, characters, and environments to accompany the episode audio.

This project will involve several components that need to work together:

  • Caption data ingestion: get captions from critical role episodes on YouTube.
  • Image data ingestion: get image data from fandom sites, wikis, fan-made art.
  • Generative image model training: train/fine-tune an image generation model that can generate character-specific images given a text prompt.
  • Prompt engineering: use an LLM to parse text captions and write prompts for the image generator to create images of the scene being described within some time window in each episode.

Environment Setup

conda create -n critical-dream python=3.11 -y
pip install -r requirements.txt
pip install git+https://github.com/huggingface/diffusers

Export secrets:

export $(grep -v '^#' secrets.txt | xargs)

Caption data ingestion

Use the youtube_transcript_api package to extract transcripts based on a list of video ids.

python critical_dream/captions.py data/captions

Compose scenes from transcripts

python critical_dream/compose_scenes.py data/captions data/scenes_v12

Create Huggingface Dataset

python critical_dream/create_scenes_dataset.py data/scenes_v12 cosmicBboy/critical-dream-scenes-mighty-nein-v3

Generate Scene Images

python critical_dream/generate_scene_images.py \
  --output_dir output/images \
  --dataset_id cosmicBboy/critical-dream-scenes-mighty-nein-v1 \
  --lora_model_id "cosmicBboy/stable-diffusion-xl-base-1.0-lora-dreambooth-critdream-v0.5.2" \
  --output_dir "/content/drive/MyDrive/[r&d] ml-research/critical-dream/scenes/v4" \
  --debug

Create Aligned Scenes Huggingface Dataset

python critical_dream/create_aligned_scenes_dataset.py \
  --captions_dir data/captions \
  --scene_dir data/scenes_v10 \
  --dataset_id cosmicBboy/critical-dream-aligned-scenes-mighty-nein-v1

Dreambooth fine-tuning

Get image data

To download example images of each character, do:

python critical_dream/image_data.py dataset/data \
  --multi_instance_data_config config/mighty_nein_instances.yaml \
  --delete_existing

Training

Export secrets:

export HF_HUB_TOKEN="..."
export WANDB_API_KEY="..."

You can fine-tune various models and fine-tuning options:

Stable Diffusion XL Base 1.0 LoRA fine-tuning
export MODEL_NAME="stabilityai/stable-diffusion-xl-base-1.0"
export VAE_PATH="stabilityai/sdxl-vae"
export OUTPUT_DIR="models/model_sd1xl_lora_critdream"
export HUB_MODEL_ID="cosmicBboy/stable-diffusion-xl-base-1.0-lora-dreambooth-critdream"

accelerate launch critical_dream/train_dreambooth_lora_sdxl.py \
  --pretrained_model_name_or_path=$MODEL_NAME  \
  --multi_instance_data_config=config/mighty_nein_instances.yaml \
  --multi_instance_subset=fjord \
  --pretrained_vae_model_name_or_path=$VAE_PATH \
  --with_prior_preservation \
  --output_dir=$OUTPUT_DIR \
  --resolution=1024 \
  --train_batch_size=1 \
  --gradient_accumulation_steps=1 \
  --learning_rate=1e-4 \
  --lr_scheduler="constant" \
  --lr_warmup_steps=0 \
  --num_class_images=10 \
  --max_train_steps=10 \
  --validation_prompt="a picture of [critrole-fjord], a half-orc with a top hat" \
  --validation_epochs=25 \
  --checkpointing_steps=500 \
  --hub_model_id=$HUB_MODEL_ID \
  --seed="0" \
  --push_to_hub
Stable Diffusion XL Base 1.0 LoRA fine-tuning from Pretrained LoRA
export MODEL_NAME="stabilityai/stable-diffusion-xl-base-1.0"
export LORA_MODEL_NAME="cosmicBboy/stable-diffusion-xl-base-1.0-lora-dreambooth-critdream-v0.5"
export VAE_PATH="stabilityai/sdxl-vae"
export OUTPUT_DIR="models/model_sd1xl_lora_critdream"
export HUB_MODEL_ID="cosmicBboy/stable-diffusion-xl-base-1.0-lora-dreambooth-critdream-v0.5.1"

accelerate launch critical_dream/train_dreambooth_lora_sdxl.py \
  --pretrained_model_name_or_path=$MODEL_NAME  \
  --pretrained_lora_model_name_or_path=$LORA_MODEL_NAME \
  --data_dir_root=dataset \
  --multi_instance_data_config=config/mighty_nein_instances.yaml \
  --multi_instance_subset=fjord \
  --pretrained_vae_model_name_or_path=$VAE_PATH \
  --output_dir=$OUTPUT_DIR \
  --resolution=1024 \
  --train_batch_size=1 \
  --gradient_accumulation_steps=1 \
  --learning_rate=1e-4 \
  --lr_scheduler="constant" \
  --lr_warmup_steps=0 \
  --num_class_images=10 \
  --max_train_steps=10 \
  --validation_prompt="a picture of [critrole-fjord], a half-orc with a top hat" \
  --validation_epochs=25 \
  --checkpointing_steps=500 \
  --hub_model_id=$HUB_MODEL_ID \
  --seed="0"
Stable Diffusion 1.4 full fine-tuning
export MODEL_NAME="CompVis/stable-diffusion-v1-4"
export INSTANCE_DIR="data/fjord"
export CLASS_DIR="data/half_orc"
export OUTPUT_DIR="models/model_sd1_fjord"
export HUB_MODEL_ID="cosmicBboy/stable-diffusion-v1-4-dreambooth-critdream-fjord"

accelerate launch critical_dream/train_dreambooth.py \
  --pretrained_model_name_or_path=$MODEL_NAME  \
  --instance_data_dir=$INSTANCE_DIR \
  --class_data_dir=$CLASS_DIR \
  --output_dir=$OUTPUT_DIR \
  --instance_prompt="a picture of [critrole-fjord], a half-orc warlock" \
  --class_prompt="a picture of a half-orc warlock" \
  --resolution=512 \
  --train_batch_size=1 \
  --gradient_accumulation_steps=1 \
  --learning_rate=5e-4 \
  --lr_scheduler="constant" \
  --lr_warmup_steps=0 \
  --num_class_images=200 \
  --max_train_steps=1000 \
  --validation_prompt="a picture of [critrole-fjord], a half-orc with a top hat" \
  --validation_steps=250 \
  --checkpointing_steps=1000 \
  --hub_model_id=$HUB_MODEL_ID \
  --push_to_hub \
  --report_to="wandb"
Stable Diffusion 1.4 LoRA fine-tuning
export MODEL_NAME="CompVis/stable-diffusion-v1-4"
export INSTANCE_DIR="data/fjord"
export CLASS_DIR="data/half_orc"
export OUTPUT_DIR="models/model_sd1_lora_fjord"
export HUB_MODEL_ID="cosmicBboy/stable-diffusion-v1-4-lora-dreambooth-critdream-fjord"

accelerate launch critical_dream/train_dreambooth_lora.py \
  --pretrained_model_name_or_path=$MODEL_NAME  \
  --instance_data_dir=$INSTANCE_DIR \
  --class_data_dir=$CLASS_DIR \
  --output_dir=$OUTPUT_DIR \
  --instance_prompt="a picture of [critrole-fjord], a half-orc" \
  --class_prompt="a picture of a half-orc" \
  --resolution=512 \
  --train_batch_size=1 \
  --gradient_accumulation_steps=1 \
  --learning_rate=5e-6 \
  --lr_scheduler="constant" \
  --lr_warmup_steps=0 \
  --num_class_images=200 \
  --max_train_steps=1500 \
  --validation_prompt="a picture of [critrole-fjord], a half-orc with a top hat" \
  --validation_epochs=25 \
  --checkpointing_steps=1000 \
  --hub_model_id=$HUB_MODEL_ID \
  --push_to_hub \
  --report_to="wandb"
Stable Diffusion 2 full fine-tuning
export MODEL_NAME="stabilityai/stable-diffusion-2"
export INSTANCE_DIR="data/fjord"
export CLASS_DIR="data/half_orc"
export OUTPUT_DIR="models/model_sd2_lora_fjord"
export HUB_MODEL_ID="cosmicBboy/stable-diffusion-2-dreambooth-critdream-fjord"

accelerate launch critical_dream/train_dreambooth.py \
  --pretrained_model_name_or_path=$MODEL_NAME  \
  --instance_data_dir=$INSTANCE_DIR \
  --class_data_dir=$CLASS_DIR \
  --output_dir=$OUTPUT_DIR \
  --instance_prompt="a picture of [critrole-fjord], a half-orc" \
  --class_prompt="a picture of a half-orc" \
  --resolution=1024 \
  --train_batch_size=1 \
  --report_to="wandb" \
  --gradient_accumulation_steps=1 \
  --learning_rate=5e-6 \
  --report_to="wandb" \
  --lr_scheduler="constant" \
  --lr_warmup_steps=0 \
  --num_class_images=200 \
  --max_train_steps=100 \
  --validation_prompt="a picture of [critrole-fjord], a half-orc with a top hat" \
  --validation_steps=100 \
  --checkpointing_steps=100 \
  --hub_model_id=$HUB_MODEL_ID \
  --seed="0" \
  --push_to_hub \
  --report_to="wandb"
Stable Diffusion 2 LoRA fine-tuning
export MODEL_NAME="stabilityai/stable-diffusion-2"
export INSTANCE_DIR="data/fjord"
export CLASS_DIR="data/half_orc"
export OUTPUT_DIR="models/model_sd2_lora_fjord"
export HUB_MODEL_ID="cosmicBboy/stable-diffusion-2-lora-dreambooth-critdream-fjord"

accelerate launch critical_dream/train_dreambooth_lora.py \
  --pretrained_model_name_or_path=$MODEL_NAME  \
  --instance_data_dir=$INSTANCE_DIR \
  --class_data_dir=$CLASS_DIR \
  --output_dir=$OUTPUT_DIR \
  --instance_prompt="a picture of [critrole-fjord], a half-orc" \
  --class_prompt="a picture of a half-orc" \
  --resolution=512 \
  --train_batch_size=1 \
  --gradient_accumulation_steps=1 \
  --learning_rate=5e-6 \
  --lr_scheduler="constant" \
  --lr_warmup_steps=0 \
  --num_class_images=200 \
  --max_train_steps=2000 \
  --validation_prompt="a picture of [critrole-fjord], a half-orc with a top hat" \
  --validation_epochs=25 \
  --checkpointing_steps=250 \
  --hub_model_id=$HUB_MODEL_ID \
  --push_to_hub \
  --report_to="wandb"

critical-dream's People

Contributors

cosmicbboy avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.