wiktorlazarski / head-segmentation Goto Github PK

👦 Human head semantic segmentation

License: Other

Python 0.54% Jupyter Notebook 99.46% Dockerfile 0.01%

computer-vision deep-learning semantic-segmentation unet pytorch human-head celeba-dataset wandb hydra pytorch-lightning

head-segmentation's Introduction

👦 Human Head Semantic Segmentation

🧑‍🎓 Wiktor 🧑‍🎓 Kuba

💎 Installation with `pip`

Installation is as simple as running:

pip install git+https://github.com/wiktorlazarski/head-segmentation.git

🔨 How to use

🤔 Inference

import head_segmentation.segmentation_pipeline as seg_pipeline

segmentation_pipeline = seg_pipeline.HumanHeadSegmentationPipeline()

segmentation_map = segmentation_pipeline.predict(image)

🎨 Visualizing

import matplotlib.pyplot as plt

import head_segmentation.visualization as vis

visualizer = vis.VisualizationModule()

figure, _ = visualizer.visualize_prediction(image, segmentation_map)
plt.show()

⚙️ Setup for development

# Clone repo
git clone https://github.com/wiktorlazarski/head-segmentation.git

# Go to repo directory
cd head-segmentation

# (Optional) Create virtual environment
python -m venv venv
source ./venv/bin/activate

# Install project in editable mode
pip install -e .[dev]

# (Optional but recommended) Install pre-commit hooks to preserve code format consistency
pre-commit install

🐍 Setup for development with Anaconda or Miniconda

# Clone repo
git clone https://github.com/wiktorlazarski/head-segmentation.git

# Go to repo directory
cd head-segmentation

# Create and activate conda environment
conda env create -f ./conda_env.yml
conda activate head_segmentation

# (Optional but recommended) Install pre-commit hooks to preserve code format consistency
pre-commit install

🔬 Quantitative results

Keep in mind that we trained our model with CelebA dataset, which means that our model may not necessarily perform well on your data, since they may come from a different distribution than CelebA.

The table below presents results, computed on the full scale test set images, of three best models we trained. Model naming convention is as followed: <backbone>_<nn_input_image_resultion>.

Model	mobilenetv2_256	mobilenetv2_512	resnet34_512
head IoU	0.967606	0.967337	0.968457
background IoU	0.942936	0.942160	0.944469
mIoU	0.955271	0.954749	0.956463

🧐 Qualitative results

If you want to check predictions on some of your images, please feel free to use our Streamlit application.

cd head-segmentation

streamlit run ./scripts/apps/web_checking.py

⏰ Inference time

If you are strict with time, you can use gpu to acclerate inference. Visualization also consume some time, you can just save the final result as below.

import torch
from PIL import Image
import head_segmentation.segmentation_pipeline as seg_pipeline

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

segmentation_pipeline = seg_pipeline.HumanHeadSegmentationPipeline(device=device)

segmentation_map = segmentation_pipeline.predict(image)

segmented_region = image * cv2.cvtColor(segmentation_map, cv2.COLOR_GRAY2RGB)

pil_image = Image.fromarray(segmented_region)
pil_image.save(save_path)

The table below presents inference time which is tested on Tesla T4 (just for reference). The first image will take more time.

	save figure	just save final result
cpu	around 2.1s	around 0.8s
gpu	around 1.4s	around 0.15s

🤗 Enjoy the model!

head-segmentation's People

Contributors

Stargazers

Watchers

Forkers

shuntos kyhoolee arthurcheungiproov ajayjatav786 keshav1990 wolf88888888 gabbyyam okaris qhddl88 zcloud2014 9527-csroad karaposu linhong00316 jackzhousz 5l1v3r1 yanlong-sun

head-segmentation's Issues

Some parameter to adjust the masking threshold

Thanks for your wonderful job!! I find it useful when I preprocess the captured images. However, I find out that the model will always do the mask aggressively for some parts (like somewhere below the ear) in my dataset. Therefore, I wonder if I have some choices (maybe change some parameters) to make the masking a little looser and refill the lost parts...

How to use GPU for inference

Hello, thanks for the open-source of awesome work!
However, I have the question that as guided in the Inference by HumanHeadSegmentationPipeline .predict, GPU can not be used to accelerate the inference process.
Thus, I suggestion to add self.device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu') in class HumanHeadSegmentationPipeline, and then modify the function predict(self, image) as mdl_out = self._model(preprocessed_image.to(self.device)).detach().cpu()

Share model file

Hi can you upload your model file on google drive or somewhere else to share it
I can't not get it directly from your repo
https://github.com/wiktorlazarski/head-segmentation/blob/master/head_segmentation/model/resnet34_512.ckpt

About features to add [Discussion]

Hello all,
I am actively using this repo in my project and instead of creating something new I want to add more features to this awesome repo. This issue is basically a discussion regarding these features. I would love to hear your opinions about it since you guys have lot more experience.

Feature 1: Allow users to select a model trained on either 256x256 or 512x512 images.

Why: The existing version of this repository has the trained model resizing images to a 512x512 resolution for segmentation. Since the models are trained on images of this resolution, users are bound to this specific size. However, there's a notable difference in inference times when processing images of 512x512 versus 256x256. My preliminary observations indicate:

Models trained on a 256x256 resolution run in 0.04 seconds on a GPU and 0.3 seconds on a CPU.
Models trained on a 512x512 resolution take 0.15 seconds on a GPU and 3 seconds on a CPU.

Moreover, many head segmentation tasks primarily work with face-detected cropped regions. This implies that even if the original image has a larger resolution, the focus (head) region remains smaller. Supporting this point, the dataset utilized here, celebAHQ, exclusively features head regions. If a user were to feed a full-body image, the model might struggle to segment the head correctly. I anticipate that in many scenarios, users will pre-crop the head region, and a 256x256 resolution should suffice. Nonetheless, rather than restricting users to a model trained on this resolution, we could provide an option. This way, they can pick the model that aligns best with their balance of performance and accuracy requirements.

Feature 2: Introduction of Broad Categories like Hair, Face, and Neck

Rationale: In my project, distinguishing between the face, hair, and neck regions is a requirement, and I believe this differentiation would be beneficial for various applications. While numerous face parsers exist, many are crafted for intricate tasks, such as segmenting specific facial features like the left eye, right eyebrow, or lower lip. These highly detailed specifications increase the model's complexity, rendering them more challenging to employ and often slower in performance. For instance, in some scenarios, I need to isolate the neck region to seamlessly integrate heads into a background. In others, I need the hair portion to extract it from background imagery for head swaps. I've yet to come across a repository that's both generic and user-friendly in this manner.

Feature 3: Enhanced Benchmarks and Optimization Efforts

Rationale: I'm keen on exploring avenues to further refine the segmentation performance, particularly concerning inference time. I have a VM equipped with a T4 GPU at my disposal, and I'm enthusiastic about training models with diverse encoders, such as EfficientNet, ResNet18, and Xception. Also preforming quantizaition and pruning on these models and include them inthe benchmarks

Feature 4: Support for PyTorch Lightning v2.0

Rationale: Embracing contemporary advancements is pivotal. My aim is to ensure the repository remains compatible with PyTorch Lightning v2.0. The codebase here is remarkably streamlined. While I'll strive to maintain this clarity, I anticipate requiring feedback on this front.

pytorch_lightning 2.1.0 related error

Versions are deleted from requirements.txt and this causes default install to install version 2.1.0. And if i run train.py using this version i get following error :

However if i downgrade pytorch_lightning to version 1.9.0 this error disappers.

I am using M1 Mac and OS version is 13.3.

pytorch_lightning 2.1.0 error

.

Some Questions regarding inference time and current setup

Hello @wiktorlazarski ,

A couple of days ago I finished the installation and run the repo on linux VM with GPU support. I am inspecting the code for a while and I want to say I am learning a lot just by reading the code. It is so good that I want my personal project to be have very similar clean and configurable structure. Thanks again for creating this work.

Having said that, I do have some questions, and your insights would be highly appreciated.

Before I delve into them, let me give you a brief overview of my understanding of how the head-segmentation repo operates, and kindly rectify any inaccuracies.

For the model acrhitectures, this repo is dependent on the segmentation_models repo
(https://github.com/qubvel/segmentation_models.pytorch). It sources pretrained encoder weights, specifically resnet34 or mobilenet_v2, from the segmentation_library. Subsequently, these encoder components are integrated with a standard UNet, transforming it into a segmentation model.

The current model uses finetuned resnet34 model. And mobilenet_v2 model weights are lost.

My main focus is on optimizing inference time. To break it down, inference time comprises:

Preprocessing duration
Transfer time of the image to the GPU
Time taken for the model to process the image
Time to transfer results back to the CPU
Postprocessing duration

My primary interest lies in the third point, although I've also looked into the others for a comprehensive understanding.

------Lets start with current available pipeline-------

Here is my code to check inference time :

from time import time
import cv2
import torch
import head_segmentation.segmentation_pipeline as seg_pipeline
from prettytable import PrettyTable


print("----Loading Test images----")
#img path for one of orignal celebA images (1024x1024)
image_path= "/home/enes/lab/head-segmentation/processed_dataset/test/images/1000.jpg"

image = cv2.imread(str(image_path), cv2.IMREAD_COLOR)
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
print("test_img shape", image.shape)

image_512 = cv2.resize(image, (512, 512), interpolation=cv2.INTER_AREA)
image_256 = cv2.resize(image, (256, 256), interpolation=cv2.INTER_AREA)
print("resized_test_img (512,512) shape", image_512.shape)
print("resized_test_img (256,256) shape", image_256.shape)

print("----    ----")
print("  ")

print("----Check if GPU is available----")

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print("device:",device)
print("----    ----")
print("  ")


segmentation_pipeline = seg_pipeline.HumanHeadSegmentationPipeline()
segmentation_pipeline_GPU = seg_pipeline.HumanHeadSegmentationPipeline(device=device)

t0=time()
predicted_segmap = segmentation_pipeline.predict(image)
t1=time()
predicted_segmap = segmentation_pipeline.predict(image_512)
t2=time()
predicted_segmap = segmentation_pipeline.predict(image_256)
t3=time()
predicted_segmap = segmentation_pipeline_GPU.predict(image)
t4=time()
predicted_segmap = segmentation_pipeline_GPU.predict(image_512)
t5=time()
predicted_segmap = segmentation_pipeline_GPU.predict(image_256)
t6=time()



myTable = PrettyTable(["Image Size", "CPU", "GPU", ])

myTable.add_row(["1024", str(round(t1-t0,2))+" sec", str(round(t4-t3,2))+" sec"])
myTable.add_row(["512", str(round(t2-t1,2))+" sec", str(round(t5-t4,2))+" sec"])
myTable.add_row(["256", str(round(t3-t2,2))+" sec", str(round(t6-t5,2))+" sec"])

print(myTable)

And here is the outputs:

Question1: Why do you think the GPU performance drops significantly for images with a resolution of 1024x1024? Could it be due to the fact that the model was originally trained on 512x512 images, making it inefficient for the GPU to optimize larger images?

Question2: Another intriguing observation is the near-stagnant inference time on the CPU, regardless of the considerable reduction in image size. Transitioning from a 1024-sized image to a 256-sized one represents an 8-fold decrease in the input data volume. Yet, the inference time improvement is a mere 0.03 seconds.

One of my objectives is to develop a swift CPU-only version for head segmentation. Hence, these results took me by surprise.

As an initial step, I aimed to replicate the aforementioned inference times to ascertain I'm not overlooking any crucial aspects.
For this, I trained the network employing the resnet34 architecture, limiting it to just 3 epochs. The image size, as specified in the config yaml file, remained unchanged at 512x512. Post-training, I loaded the latest checkpoint and retried the experiment described earlier. Below is the relevant code:


from time import time
import cv2
import torch
import head_segmentation.segmentation_pipeline as seg_pipeline
from prettytable import PrettyTable
import numpy as np

class CustomHeadSegmentationPipeline(seg_pipeline.HumanHeadSegmentationPipeline):
    def predict(self, image: np.ndarray, name) -> np.ndarray:
        t0=time()
        preprocessed_image = self._preprocess_image(image)
        t1 = time()
        preprocessed_image = preprocessed_image.to(self.device)
        t2 = time()
        mdl_out = self._model(preprocessed_image)
        t3 = time()
        mdl_out = mdl_out.cpu()
        t4 = time()
        pred_segmap = self._postprocess_model_output(mdl_out, original_image=image)
        t5= time()

        print(" ")
        print("Test details for :", name)
        print(" ")

        print("preprocessing",round(t1-t0,3))
        print("to cpu/gpu",round(t2-t1,3))
        print("model output",round(t3-t2,3))
        print("to cpu",round(t4-t3,3))
        print("postprocess",round(t5-t4,3))
        print("total",round(t5-t0,3))
        print("-------------")

        return pred_segmap


print("----Loading Test images----")
#img path for one of orignal celebA images (1024x1024)
image_path= "/home/enes/lab/head-segmentation/processed_dataset/test/images/1000.jpg"

image = cv2.imread(str(image_path), cv2.IMREAD_COLOR)
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
print("test_img shape", image.shape)

image_512 = cv2.resize(image, (512, 512), interpolation=cv2.INTER_AREA)
image_256 = cv2.resize(image, (256, 256), interpolation=cv2.INTER_AREA)
print("resized_test_img (512,512) shape", image_512.shape)
print("resized_test_img (256,256) shape", image_256.shape)

print("----    ----")
print("  ")

print("----Check if GPU is available----")

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print("device:",device)
print("----    ----")
print("  ")


model_path_mobilenet_v2= "/home/enes/lab/head-segmentation/training_runs/2023-10-22/00-16/models/last.ckpt"
model_path_resnet34= "/home/enes/lab/head-segmentation/training_runs/2023-10-22/21-22/models/last.ckpt"

model_path=model_path_resnet34



segmentation_pipeline = CustomHeadSegmentationPipeline(model_path=model_path)
segmentation_pipeline_GPU = CustomHeadSegmentationPipeline(device=device, model_path=model_path)

t0=time()
name="1024 + CPU"
predicted_segmap = segmentation_pipeline.predict(image, name)
t1=time()
name="512 + CPU"
predicted_segmap = segmentation_pipeline.predict(image_512, name)
t2=time()
name="216 + CPU"
predicted_segmap = segmentation_pipeline.predict(image_256,name)
t3=time()
name="1024 + GPU"
predicted_segmap = segmentation_pipeline_GPU.predict(image,name)
t4=time()
name="512 + GPU"
predicted_segmap = segmentation_pipeline_GPU.predict(image_512, name)
t5=time()
name="256 + GPU"
predicted_segmap = segmentation_pipeline_GPU.predict(image_256, name)
t6=time()


print("Inference times for resnet34 --pretrained --depth=3 : ")
myTable = PrettyTable(["Image Size", "CPU", "GPU", ])

myTable.add_row(["1024", str(round(t1-t0,2))+" sec", str(round(t4-t3,2))+" sec"])
myTable.add_row(["512", str(round(t2-t1,2))+" sec", str(round(t5-t4,2))+" sec"])
myTable.add_row(["256", str(round(t3-t2,2))+" sec", str(round(t6-t5,2))+" sec"])

print(myTable)

And here you see the results for n1-standart CPU + NVIDIA T4 VM:

(this image shows time bottle neck indeed is model output part of the prosess. Total times are slightly different because i optained this detailed results while i run the test with a better cpu to check if there will be a big difference. )

Question3: So, in terms of CPU based inference time , although i am using the same machine, there are 4 times difference between training my model and doing a pip install to current repo. Can you point out what might be the difference with current pipeline model and what I trained?

About achieving faster results

Hi Guys, I am using this repo for one of my personal projects and I am quite happy you guys are keeping it updated.

Head-segmentation part of my app is the bottleneck in terms of speed. I am using CPU bc GPU servers are expensive. I started searching the most popular face-parsing algorithms out there and many of them use relatively complex models to do multiclass segmentation with at least 10 classes (eye, nose, neck, cloth, lips, even left brow and right brows are considered different classes in some cases)

From my understanding, if we are doing just head+hair area segmentation then we don't need such complex model design and maybe we can alter/reduce the model layers and retrain the network for just head+hair segmentation. In theory, this would result in a lot faster runtime. Is my way of thinking is correct?

@wiktorlazarski this repo uses pretrained mobilnet right?

weights not available.

where can I download ?

Share mobilenet based model

Hi, thank you for sharing resnet34 based model
Can you please also share mobilenetv2 based model?
And if yes please share it on google drive, I am not so familar with git lfs