ykasten / layered-neural-atlases Goto Github PK

License: MIT License

Python 100.00%

layered-neural-atlases's Introduction

Layered Neural Atlases for Consistent Video Editing

Project Page | Paper

This repository contains an implementation for the SIGGRAPH Asia 2021 paper Layered Neural Atlases for Consistent Video Editing.

The paper introduces the first approach for neural video unwrapping using an end-to-end optimized interpretable and semantic atlas-based representation, which facilitates easy and intuitive editing in the atlas domain.

Installation Requirements

The code is compatible with Python 3.7 and PyTorch 1.6.

You can create an anaconda environment called neural_atlases with the required dependencies by running:

conda create --name neural_atlases python=3.7 
conda activate neural_atlases 
conda install pytorch=1.6.0 torchvision=0.7.0 cudatoolkit=10.1 matplotlib tensorboard scipy  scikit-image tqdm  opencv -c pytorch
pip install imageio-ffmpeg gdown
python -m pip install detectron2 -f   https://dl.fbaipublicfiles.com/detectron2/wheels/cu101/torch1.6/index.html

Data convention

The code expects 3 folders for each video input, e.g. for a video of 50 frames named "blackswan":

data/blackswan: A folder of video frames containing image files in the following convention: blackswan/00000.jpg,blackswan/00001.jpg,...,blackswan/00049.jpg (as in the DAVIS dataset).
data/blackswan_flow: A folder with forward and backward optical flow files in the following convention: blackswan_flow/00000.jpg_00001.jpg.npy,blackswan_flow/00001.jpg_00000.jpg,...,blackswan_flow/00049.jpg_00048.jpg.npy.
data/blackswan_maskrcnn: A folder with rough masks (created by Mask-RCNN or any other way) containing files in the following convention: blackswan_maskrcnn/00000.jpg,blackswan_maskrcnn/00001.jpg,...,blackswan_maskrcnn/00049.jpg

For a few examples of DAVIS sequences run:

gdown https://drive.google.com/uc?id=1WipZR9LaANTNJh764ukznXXAANJ5TChe
unzip data.zip

Masks extraction

Given only the video frames folder data/blackswan it is possible to extract the Mask-RCNN masks (and create the required folder data/blackswan_maskrcnn) by running:

python preprocess_mask_rcnn.py --vid-path data/blackswan --class_name bird

where --class_name determines the COCO class name of the sought foreground object. It is also possible to choose the first instance retrieved by Mask-RCNN by using --class_name anything. This is usefull for cases where Mask-RCNN gets correct masks with wrong classes as in the "libby" video:

python preprocess_mask_rcnn.py --vid-path data/libby --class_name anything

Optical flows extraction

Furthermore, the optical flow folder can be extracted using RAFT. For linking RAFT into the current project run:

git submodule update --init
cd thirdparty/RAFT/
./download_models.sh
cd ../..

For extracting the optical flows (and creating the required folder data/blackswan_flow) run:

python preprocess_optical_flow.py --vid-path data/blackswan --max_long_edge 768

Pretrained models

For downloading a sample set of our pretrained models together with sample edits run:

gdown https://drive.google.com/uc?id=10voSCdMGM5HTIYfT0bPW029W9y6Xij4D
unzip pretrained_models.zip

Additional pre-trained atlases are provided here.

Training

For training a model on a video, run:

python train.py config/config.json

where the video frames folder is determined by the config parameter "data_folder". Note that in order to reduce the training time it is possible to reduce the evaluation frequency controlled by the parameter "evaluate_every" (e.g. by changing it to 10000). The other configurable parameters are documented inside the file train.py.

Evaluation

During training, the model is evaluated. For running only evaluation on a trained folder run:

python only_evaluate.py --trained_model_folder=pretrained_models/checkpoints/blackswan --video_name=blackswan --data_folder=data --output_folder=evaluation_outputs

where trained_model_folder is the path to a folder that contains the config.json and checkpoint files of the trained model.

Editing

To apply editing, run the script only_edit.py. Examples for the supplied pretrained models for "blackswan" and "boat":

python only_edit.py --trained_model_folder=pretrained_models/checkpoints/blackswan --video_name=blackswan --data_folder=data --output_folder=editing_outputs --edit_foreground_path=pretrained_models/edit_inputs/blackswan/edit_blackswan_foreground.png --edit_background_path=pretrained_models/edit_inputs/blackswan/edit_blackswan_background.png

python only_edit.py --trained_model_folder=pretrained_models/checkpoints/boat --video_name=boat --data_folder=data --output_folder=editing_outputs --edit_foreground_path=pretrained_models/edit_inputs/boat/edit_boat_foreground.png --edit_background_path=pretrained_models/edit_inputs/boat/edit_boat_backgound.png

Where edit_foreground_path and edit_background_path specify the paths to 1000x1000 images of the RGBA atlas edits.

For applying an edit that was done on a frame (e.g. for the pretrained "libby"):

python only_edit.py --trained_model_folder=pretrained_models/checkpoints/libby --video_name=libby --data_folder=data --output_folder=editing_outputs  --use_edit_frame --edit_frame_index=7 --edit_frame_path=pretrained_models/edit_inputs/libby/edit_frame_.png

Citation

If you find our work useful in your research, please consider citing:

@article{kasten2021layered,
  title={Layered neural atlases for consistent video editing},
  author={Kasten, Yoni and Ofri, Dolev and Wang, Oliver and Dekel, Tali},
  journal={ACM Transactions on Graphics (TOG)},
  volume={40},
  number={6},
  pages={1--12},
  year={2021},
  publisher={ACM New York, NY, USA}
}

layered-neural-atlases's People

Contributors

Stargazers

Watchers

layered-neural-atlases's Issues

RuntimeError: No such operator detectron2::nms_rotated

Thanks for sharing your code. I have a problem with running preprocess_mask_rcnn.py . I get the following error: RuntimeError: No such operator detectron2::nms_rotated. I followed all your instructions for creating the required environment, and it seems that detectron2 is installed successfully based on this line:
Successfully installed antlr4-python3-runtime-4.8 detectron2-0.4+cu101 future-0.18.2 fvcore-0.1.3.post20210317 google-auth-1.35.0 iopath-0.1.9 omegaconf-2.1.2 portalocker-2.4.0 pycocotools-2.0.4 pydot-1.4.2 tabulate-0.8.9 termcolor-1.1.0 yacs-0.1.8
I can import detectron2 without any problem, but after running from detectron2 import model_zoo, I get the error below:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/anaconda3/envs/neural_atlases/lib/python3.7/site-packages/detectron2/model_zoo/__init__.py", line 8, in <module>
    from .model_zoo import get, get_config_file, get_checkpoint_url, get_config
  File "/anaconda3/envs/neural_atlases/lib/python3.7/site-packages/detectron2/model_zoo/model_zoo.py", line 9, in <module>
    from detectron2.modeling import build_model
  File "/anaconda3/envs/neural_atlases/lib/python3.7/site-packages/detectron2/modeling/__init__.py", line 2, in <module>
    from detectron2.layers import ShapeSpec
  File "/anaconda3/envs/neural_atlases/lib/python3.7/site-packages/detectron2/layers/__init__.py", line 5, in <module>
    from .nms import batched_nms, batched_nms_rotated, nms, nms_rotated
  File "/anaconda3/envs/neural_atlases/lib/python3.7/site-packages/detectron2/layers/nms.py", line 16, in <module>
    nms_rotated_func = torch.ops.detectron2.nms_rotated
  File "/.local/lib/python3.7/site-packages/torch/_ops.py", line 61, in __getattr__
    op = torch._C._jit_get_operation(qualified_op_name)
RuntimeError: No such operator detectron2::nms_rotated

I am wondering to find out why this error happens and how I can solve it.
Thank you.

How to reduce the training time?

I found that the training time is too long. Even with my A100, it takes a whole day and night to train. How can I reduce the training time? Should I just lower the iters_num parameter? Does iters_num really need to be 10,001?
Or, can I reduce the training time by importing a pre-trained model? How long does it take for you to train a single model?

Can't quite train this time

Hi again,

I now have all my flow and maskrcnn files labelled in the correct way, however Im getting this error when trying to train using !python train.py config/config.json --data_folder=/content/Experiment/data/.
Im also unsure of whether this should point to my data folder with the original, flow and maskrcnn frames or just the folder with the original frames, however, neither way is working.
.

Model has 264706 params
Model has 133122 params
Model has 416379 params
Model has 402945 params
Traceback (most recent call last):
File "train.py", line 380, in
main(json.load(f))
File "train.py", line 183, in main
jif_all = get_tuples(number_of_frames, video_frames)
File "/content/layered-neural-atlases/unwrap_utils.py", line 110, in get_tuples
return torch.cat(jif_all, dim=1)
RuntimeError: There were no tensor arguments to this function (e.g., you passed an empty list of Tensors), but no fallback function is registered for schema aten::_cat. This usually means that this function requires a non-empty list of Tensors. Available functions are [CPU, CUDA, QuantizedCPU, Autograd, Profiler, Tracer, Autocast]

This time I can't quite figure out how to fix it. Thanks so much in advance.

How to implement the Multi Foreground Atlases feature?

Thanks for sharing this amazing code!
I'm trying to implement the Multi Foreground Atlases feature referenced by the Section 4.3 on arxiv paper.
But, i can't understand this sentence:

Unlike the one foreground object case, to support occlusions between different foreground objects, the sparsity loss is applied directly on the atlas, by applying 𝑙1 regularization on randomly sampled UV coordinates from foreground regions in the atlas.

What this means in practice?
I need to apply the l1 regularization equation error(y, ŷ) + λ * Σ |w|?

if yes:

What is the lambda value used in the lucia results on paper?
The error(y, ŷ) term is the single layer case sparsity loss equation (Eq. 14)?
The |w| term is the values from uv coordinates given by the multiple foreground mapping models?

if not:

How to calculate, in practice, the sparsity loss on multi foreground object case?
Please, explain with a equation for easier understanding.

Another questions:

The equations that calculates the losses for each mapping model (like the rigidity_loss and the optical_flow_loss) need to be applied for each foreground mapping model and summed at the end?
What is the coefficient values used for the user scribble losses on the equations:
- l_red = -log(alpha_red) Eq. 20
- l_green = -log(alpha_green) Eq. 21
What is this βtv = 100 variable at Section 3.5?

I would be grateful if you can answer these questions. Thanks!

ValueError: number sections must be larger than 0

(neural_atlases) D:\layered-neural-atlases-main>python only_edit.py --trained_mo
del_folder=pretrained_models/checkpoints/libby --video_name=libby --data_folder=
data/libby --output_folder=editing_outputs --use_edit_frame --edit_frame_index=
7 --edit_frame_path=pretrained_models/edit_inputs/libby/edit_frame_.png

Model has 264706 params
Model has 133122 params
Model has 416379 params
Model has 402945 params
Traceback (most recent call last):
File "only_edit.py", line 472, in
main(training_folder, frame_edit, frames_folder, mask_rcnn_folder, frame_edi
t_file, edit_tex1_file, edit_tex2_file,
File "only_edit.py", line 378, in main
edit_im1, edit_im2 = texture_edit_from_frame_edit(edit_frame, frame_number,
model_F_mapping1, model_F_mapping2,
File "only_edit.py", line 204, in texture_edit_from_frame_edit
maxx2, minx2, maxy2, miny2, edge_size2 = get_mapping_area(model_F_mapping2,
model_alpha, mask_frames > -1, larger_dim,
File "D:\layered-neural-atlases-main\evaluate.py", line 149, in get_mapping_ar
ea
relisa = np.array_split(relis_i.numpy(), np.ceil(relis_i.shape[0] / 100000))

File "<array_function internals>", line 5, in array_split
File "D:\Anaconda3\envs\neural_atlases\lib\site-packages\numpy\lib\shape_base.
py", line 778, in array_split
raise ValueError('number sections must be larger than 0.') from None
ValueError: number sections must be larger than 0.

Videt

how set edit_frame_index = ? 1 2 3 ... ?

python only_edit.py --trained_model_folder=pretrained_models/checkpoints/libby --video_name=libby --data_folder=data --output_folder=editing_outputs --use_edit_frame --edit_frame_index=7 --edit_frame_path=pretrained_models/edit_inputs/libby/edit_frame_.png

how set edit_frame_index = ? 1 2 3 ... ?

How to make new edits on atlas?

Hello thanks for your amazing work.

I have run the train and edit based on your pretrained model, everything goes fine.

However, after I train 40w model on blackswan and use your edited altas, I cannot get correct results. Should I make new atlas images based on the new trained model? Then where I could get the raw atlas(unedit from the model) images?

Thanks in advance.

Train on multi videos

Great work!
I've noticed that "training a model on a video" in README, is it possible to train a model for multi videos?

Hi Yoni, are you available for hire?

I'm working on a start up - and you're work / blogs are exciting.
I want to explore lip syncing + audio text / llama stuff into a commercial product.
https://philliphaeusler.com/posts/llama_lol/
I love the seinfeld generated text.

This is current app going to market.
https://wweevv.app/

apply edit to longer video than trained

Hi there,
thank for your great contribution to the community
I wanted to double check with you one doubt
If I trained the neural atlas on the default length of frames 70 but the video is 240, can I run the only_edit on the full video?
I am noticing that if I try it it will create a snappy version of the video with 240 frames but the actual frames I see from the original video are the 70 ones that the atlas have been trained on meaning 70 frames rendering
any suggestion for this?
thank you

Trouble with preprocess_optical_flow.py

Hi there! This project is phenomenol, seriously excited to try this, see what others make of it and how people use it.

Update: Managed to fix this! I realised that I just hadn't cloned the Raft project properly.

I'm trying to run this in colab, it appears to work up until the point Im trying to run preprocess_optical_flow.py where it gives me.
""
/content/drive/MyDrive/VideoEdit/layered-neural-atlases
Traceback (most recent call last):
File "/content/drive/MyDrive/VideoEdit/layered-neural-atlases/preprocess_optical_flow.py", line 4, in
from raft_wrapper import RAFTWrapper
File "/content/drive/MyDrive/VideoEdit/layered-neural-atlases/raft_wrapper.py", line 12, in
from utils.utils import InputPadder
ModuleNotFoundError: No module named 'utils'
""
I have imported the models under RAFT/models but don't see a utils file or folder anywhere, other than unwrap_utils.py.

Hoping you might have time to answer my query. Again amazing work!

can this framework do long video fusion?

Hello, author, thank you for your work. Excuse me, can this framework do long video fusion? For example, 5 minutes of video. And edit_ How is the image of inputs obtained? The paper not only changes the flowers on the clothes, but also changes the chairs. Are two legends constructed?

How to deal with video with height=768 and width=432

Thanks for your excellent work and source code.

I saw that in your provided config.json, height is 432 and width is 768.
But in my case, the height is 768 and width is 432, and I only changed the corresponding config file as:
"resx": 768, "resy": 432,
However, the reconstructed video seem strange and cannot work well.

Wonder if there are other codes I should change when I deal with this portrait video.
Thanks a lot ~

Impact of hyperparameters

Hello, is it possible to control the appearance of only shadows / reflections in complex scenes where both are present by changing the hyperparameters (like depth of networks and so on)?

Can't install detectron2 on Windows

python -m pip install detectron2 -f https://dl.fbaipublicfiles.com/detectron2/wheels/cu101/torch1.6/index.html

gives an error that it can find the right version to install. How should this be installed? I tried many things and none of them worked.