Comments (9)
@ani0075, thanks for suggesting this. I will fix it asap.
from palette-image-to-image-diffusion-models.
https://stackoverflow.com/a/62550189/13697228 mentions data length needing to be divisible by batch_size
. Changed batch_size
to 1 everywhere and same issue.
Here's the log:
22-06-09 23:28:39.190 - INFO: Create the log file in directory experiments/debug_inpainting_celebahq_220609_232838.
22-06-09 23:28:39.259 - INFO: Dataset [InpaintDataset() form data.dataset] is created.
22-06-09 23:28:39.260 - INFO: Dataset for train have 48 samples.
22-06-09 23:28:39.260 - INFO: Dataset for val have 2 samples.
22-06-09 23:28:39.780 - INFO: Network [Network() form models.network] is created.
22-06-09 23:28:39.781 - INFO: Network [Network] weights initialize using [kaiming] method.
22-06-09 23:28:40.080 - WARNING: Config is a str, converts to a dict {'name': 'mae'}
22-06-09 23:28:40.459 - INFO: Metric [mae() form models.metric] is created.
22-06-09 23:28:40.459 - WARNING: Config is a str, converts to a dict {'name': 'mse_loss'}
22-06-09 23:28:40.468 - INFO: Loss [mse_loss() form models.loss] is created.
22-06-09 23:28:45.991 - INFO: Beign loading pretrained model [Network] ...
22-06-09 23:28:45.992 - WARNING: Pretrained model in [experiments/train_inpainting_celebahq_220426_233652/checkpoint/190_Network.pth] is not existed, Skip it
22-06-09 23:28:45.992 - INFO: Beign loading pretrained model [Network_ema] ...
22-06-09 23:28:45.992 - WARNING: Pretrained model in [experiments/train_inpainting_celebahq_220426_233652/checkpoint/190_Network_ema.pth] is not existed, Skip it
22-06-09 23:28:46.007 - INFO: Beign loading training states
22-06-09 23:28:46.007 - WARNING: Training state in [experiments/train_inpainting_celebahq_220426_233652/checkpoint/190.state] is not existed, Skip it
22-06-09 23:28:46.018 - INFO: Model [Palette() form models.model] is created.
22-06-09 23:28:46.019 - INFO: Begin model train.
from palette-image-to-image-diffusion-models.
Feel free to reopen the issue if there is any question
from palette-image-to-image-diffusion-models.
@Janspiry if you close the issue, the person that originally opened it can't reopen the issue.
How do you suggest I fix the error, Caught IndexError in DataLoader worker process 0.
so that I can actually run the code in this repository? My colleague @hasan-sayeed and I haven't been able to get Palette running at all, despite spending many hours debugging one issue after another.
from palette-image-to-image-diffusion-models.
Sorry for the error, I thought you guys had fixed it.
Since the message says Caught IndexError, I suspect that the self.image the dataseat are reading may be incorrect.
You can try printing this variable. Also can you show me the file directory and the contents of train.flist
from palette-image-to-image-diffusion-models.
@Janspiry thanks for the response. Will take another look and post back.
from palette-image-to-image-diffusion-models.
Hi @Janspiry @sgbaird.
I am facing a similar issue when running the test script. Maybe they are related because of the way in which data indexing is implemented.
92%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████ | 12/13 [1:00:07<05:00, 30
0.59s/it]
Close the Tensorboard SummaryWriter.
Traceback (most recent call last):
File "/scratch/aniruddha/Palette-Image-to-Image-Diffusion-Models/run.py", line 92, in <module>
main_worker(0, 1, opt)
File "/scratch/aniruddha/Palette-Image-to-Image-Diffusion-Models/run.py", line 60, in main_worker
model.test()
File "/scratch/aniruddha/Palette-Image-to-Image-Diffusion-Models/models/model.py", line 190, in test
self.writer.save_images(self.save_current_results())
File "/scratch/aniruddha/Palette-Image-to-Image-Diffusion-Models/models/model.py", line 87, in save_current_results
ret_path.append('GT_{}'.format(self.path[idx]))
IndexError: list index out of range
I am running test on 100 images with batch size of 8. As you can see from the logs, there are 13 batches (12 batches with 8 images and the last batch with 4 images). The run fails only on the last batch. The reason is that the line here looks for 8 images (batch size) in the last batch even though there are only 4.
https://github.com/Janspiry/Palette-Image-to-Image-Diffusion-Models/blob/main/models/model.py#L86
The test script runs fine when I use a multiple of 8 images.
Could you let me know the easiest fix to this? Thanks.
from palette-image-to-image-diffusion-models.
I was able to solve the problem by getting the number of images in the batch explicitly.
temp_batch_size = len(self.path)
for idx in range(temp_batch_size):
ret_path.append('GT_{}'.format(self.path[idx]))
ret_result.append(self.gt_image[idx].detach().float().cpu())
ret_path.append('Process_{}'.format(self.path[idx]))
ret_result.append(self.visuals[idx::temp_batch_size].detach().float().cpu())
ret_path.append('Out_{}'.format(self.path[idx]))
ret_result.append(self.visuals[idx-temp_batch_size].detach().float().cpu())
from palette-image-to-image-diffusion-models.
Setup
Running on Windows Subsystem for Linux 2 (WSL2).
git clone https://github.com/Janspiry/Palette-Image-to-Image-Diffusion-Models.git cd Palette-Image-to-Image-Diffusion-Models conda create -n pip-palette python==3.9.* conda activate pip-palette pip install torch torchvision --extra-index-url https://download.pytorch.org/whl/cu113 pip install -r requirements.txtConfig
Same as #21
Directory Structure
Same as #21
Terminal
(pip-palette) sgbaird@Dell-G7:~/GitHub/Palette-Image-to-Image-Diffusion-Models$ cd /home/sgbaird/GitHub/Palette-Image-to-Image-Diffusion-Models ; /usr/bin/env /home/sgbaird/miniconda3/envs/palette/bin/python /home/sgbaird/.vscode-server/extensions/ms-python.python-2022.8.0/pythonFiles/lib/python/debugpy/launcher 36177 -- /home/sgbaird/GitHub/Palette-Image-to-Image-Diffusion-Models/run.py -p train -c config/inpainting_celebahq_dummy.json --debug export CUDA_VISIBLE_DEVICES=0 /home/sgbaird/GitHub/Palette-Image-to-Image-Diffusion-Models/run.py:28: UserWarning: You have chosen to use cudnn for accleration. torch.backends.cudnn.enabled=True warnings.warn('You have chosen to use cudnn for accleration. torch.backends.cudnn.enabled=True') (pip-palette) sgbaird@Dell-G7:~/GitHub/Palette-Image-to-Image-Diffusion-Models$ cd /home/sgbaird/GitHub/Palette-Image-to-Image-Diffusion-Models ; /usr/bin/env /home/sgbaird/miniconda3/envs/pip-palette/bin/python /home/sgbaird/.vscode-server/extensions/ms-python.python-2022.8.0/pythonFiles/lib/python/debugpy/launcher 41379 -- /home/sgbaird/GitHub/Palette-Image-to-Image-Diffusion-Models/run.py -p train -c config/inpainting_celebahq_dummy.json --debug export CUDA_VISIBLE_DEVICES=0 /home/sgbaird/GitHub/Palette-Image-to-Image-Diffusion-Models/run.py:28: UserWarning: You have chosen to use cudnn for accleration. torch.backends.cudnn.enabled=True warnings.warn('You have chosen to use cudnn for accleration. torch.backends.cudnn.enabled=True') 0%| | 0/16 [00:00<?, ?it/s] Close the Tensorboard SummaryWriter.Error
Caught IndexError in DataLoader worker process 0. Original Traceback (most recent call last): File "/home/sgbaird/miniconda3/envs/pip-palette/lib/python3.9/site-packages/torch/utils/data/_utils/worker.py", line 287, in _worker_loop data = fetcher.fetch(index) File "/home/sgbaird/miniconda3/envs/pip-palette/lib/python3.9/site-packages/torch/utils/data/_utils/fetch.py", line 49, in fetch data = [self.dataset[idx] for idx in possibly_batched_index] File "/home/sgbaird/miniconda3/envs/pip-palette/lib/python3.9/site-packages/torch/utils/data/_utils/fetch.py", line 49, in <listcomp> data = [self.dataset[idx] for idx in possibly_batched_index] File "/home/sgbaird/miniconda3/envs/pip-palette/lib/python3.9/site-packages/torch/utils/data/dataset.py", line 471, in __getitem__ return self.dataset[self.indices[idx]] File "/home/sgbaird/GitHub/Palette-Image-to-Image-Diffusion-Models/data/dataset.py", line 54, in __getitem__ path = self.imgs[index] IndexError: list index out of range File "/home/sgbaird/miniconda3/envs/pip-palette/lib/python3.9/site-packages/torch/_utils.py", line 457, in reraise raise exception File "/home/sgbaird/miniconda3/envs/pip-palette/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 1250, in _process_data data.reraise() File "/home/sgbaird/miniconda3/envs/pip-palette/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 1224, in _next_data return self._process_data(data) File "/home/sgbaird/miniconda3/envs/pip-palette/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 530, in __next__ data = self._next_data() File "/home/sgbaird/miniconda3/envs/pip-palette/lib/python3.9/site-packages/tqdm/std.py", line 1195, in __iter__ for obj in iterable: File "/home/sgbaird/GitHub/Palette-Image-to-Image-Diffusion-Models/models/model.py", line 106, in train_step for train_data in tqdm.tqdm(self.phase_loader): File "/home/sgbaird/GitHub/Palette-Image-to-Image-Diffusion-Models/core/base_model.py", line 45, in train train_log = self.train_step() File "/home/sgbaird/GitHub/Palette-Image-to-Image-Diffusion-Models/run.py", line 58, in main_worker model.train() File "/home/sgbaird/GitHub/Palette-Image-to-Image-Diffusion-Models/run.py", line 92, in <module> main_worker(0, 1, opt) File "/home/sgbaird/miniconda3/envs/pip-palette/lib/python3.9/runpy.py", line 87, in _run_code exec(code, run_globals) File "/home/sgbaird/miniconda3/envs/pip-palette/lib/python3.9/runpy.py", line 97, in _run_module_code _run_code(code, mod_globals, init_globals, File "/home/sgbaird/miniconda3/envs/pip-palette/lib/python3.9/runpy.py", line 268, in run_path return _run_module_code(code, init_globals, run_name, File "/home/sgbaird/miniconda3/envs/pip-palette/lib/python3.9/runpy.py", line 87, in _run_code exec(code, run_globals) File "/home/sgbaird/miniconda3/envs/pip-palette/lib/python3.9/runpy.py", line 197, in _run_module_as_main (Current frame) return _run_code(code, main_globals, None,
https://stackoverflow.com/a/62550189/13697228 mentions data length needing to be divisible by
batch_size
. Changedbatch_size
to 1 everywhere and same issue.Here's the log:
22-06-09 23:28:39.190 - INFO: Create the log file in directory experiments/debug_inpainting_celebahq_220609_232838. 22-06-09 23:28:39.259 - INFO: Dataset [InpaintDataset() form data.dataset] is created. 22-06-09 23:28:39.260 - INFO: Dataset for train have 48 samples. 22-06-09 23:28:39.260 - INFO: Dataset for val have 2 samples. 22-06-09 23:28:39.780 - INFO: Network [Network() form models.network] is created. 22-06-09 23:28:39.781 - INFO: Network [Network] weights initialize using [kaiming] method. 22-06-09 23:28:40.080 - WARNING: Config is a str, converts to a dict {'name': 'mae'} 22-06-09 23:28:40.459 - INFO: Metric [mae() form models.metric] is created. 22-06-09 23:28:40.459 - WARNING: Config is a str, converts to a dict {'name': 'mse_loss'} 22-06-09 23:28:40.468 - INFO: Loss [mse_loss() form models.loss] is created. 22-06-09 23:28:45.991 - INFO: Beign loading pretrained model [Network] ... 22-06-09 23:28:45.992 - WARNING: Pretrained model in [experiments/train_inpainting_celebahq_220426_233652/checkpoint/190_Network.pth] is not existed, Skip it 22-06-09 23:28:45.992 - INFO: Beign loading pretrained model [Network_ema] ... 22-06-09 23:28:45.992 - WARNING: Pretrained model in [experiments/train_inpainting_celebahq_220426_233652/checkpoint/190_Network_ema.pth] is not existed, Skip it 22-06-09 23:28:46.007 - INFO: Beign loading training states 22-06-09 23:28:46.007 - WARNING: Training state in [experiments/train_inpainting_celebahq_220426_233652/checkpoint/190.state] is not existed, Skip it 22-06-09 23:28:46.018 - INFO: Model [Palette() form models.model] is created. 22-06-09 23:28:46.019 - INFO: Begin model train.
Sorry to bother you, did you reproduce this code in the end
from palette-image-to-image-diffusion-models.
Related Issues (20)
- hello
- Inputs and Outputs of Different Sizes HOT 1
- hardest project I have ever reproduced...
- pth2onnx,How should I use “torch.onnx.export()” HOT 1
- Image-to-image translation with mostly black images HOT 3
- How can I add classifier guidance while doing the uncropping task?
- Broken pipeline error while training on multiple gpu
- use this project for image restoration
- How can I adapt the colorization model to work with different image resolutions?
- Training loss growing up
- why p_mean_variance use noise_level instead of sample_gammas like in training for time conditon of denoise function. HOT 3
- There was no result at the time of the test
- segmentation fault HOT 1
- Some of the results are full of noise. HOT 2
- test noise schedule and train noise schedule are different?
- Whether to use a lr scheduler when training from the scratch? HOT 1
- I'm fused by the output and target noise.
- [Uncropping]How to generate panoramas like Firgure 2?
- Error During Colorization Training
- How to implement JPEG restoration task based on this paper?
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from palette-image-to-image-diffusion-models.