Comments (7)
Hi @butterluo, I found nothing wrong with your given messages. As I know, the retrieval eval is indeed slow. You can debug with a small part of the test set (e.g., 50 pairs), or print something in _run_on_single_gpu()
here to see whether the program is still running.
from univl.
Hi @butterluo, I found nothing wrong with your given messages. As I know, the retrieval eval is indeed slow. You can debug with a small part of the test set (e.g., 50 pairs), or print something in
_run_on_single_gpu()
here to see whether the program is still running.
I tried last night, but nothing print out, and the log are still hanging on the last line 'NCCL INFO comm 0x7f7eb8003010 rank 1 nranks 2 cudaDev 1 busId b9000 - Init COMPLETE' for the whole night....
How long running the eval_epoch() function cost in your experience?
from univl.
Different test dataset has different time-cost. On average, less than half an hour.
I have no idea about your problem now. A choice is to comment line 406-441 and only call _run_on_single_gpu()
temporally.
I also want to make sure that your modules = nn.parallel.replicate(model, device_ids)
has valid device_ids
and correspondingly valid GPUs. Thanks.
from univl.
Different test dataset has different time-cost. On average, less than half an hour.
I have no idea about your problem now. A choice is to comment line 406-441 and only call
_run_on_single_gpu()
temporally.I also want to make sure that your
modules = nn.parallel.replicate(model, device_ids)
has validdevice_ids
and correspondingly valid GPUs. Thanks.
When i was using more than 1 gpu, i set 'export CUDA_VISIBLE_DEVICES=3,4' and the device_ids which is passed into 'nn.parallel.replicate(model, device_ids)' is '[0,1]' . Is there any thing wrong?
But when i ran it with 1 gpu, every thing is ok.
from univl.
I have no idea about this bug now. I tested on P40, P100, and V100, and all of them work well. Can you tell me your GPUs' version and pytorch's version?
from univl.
I have no idea about this bug now. I tested on P40, P100, and V100, and all of them work well. Can you tell me your GPUs' version and pytorch's version?
Python version: 3.7 (64-bit runtime)
Tesla V100-PCIE-32GB
CUDA runtime version: 10.0.130
torch==1.8.1
from univl.
Feel free to reopen if any progress on this issue
from univl.
Related Issues (20)
- How to fine-tune with additional layers before UniVL? HOT 2
- Run Without Distributed HOT 3
- TypeError: bad operand type for unary -: 'list' HOT 6
- How to run captioning task on my own video datasets? HOT 1
- Pre-training acceleration using multi-machine distributed training HOT 1
- Can you share your HowTo100M.csv file? HOT 3
- This repo is missing important files HOT 1
- Unable to run video captioning code HOT 3
- where to get transcript to generate youcookii_data.pickle HOT 2
- end-to-end video file captioning process HOT 3
- feature & data shape HOT 6
- How can I create my video feature pickle HOT 4
- video only test for youcook HOT 2
- How to only input text feature or video feature HOT 2
- Is there a code for Finetune on CMU-MOSI here? HOT 1
- Issues about Freezing some additional layers instead of meanP in CLIP4Clip HOT 2
- Error message (torch.distributed.elastic.multiprocessing.errors.ChildFailedError:)
- Estimate of zero-shot performance HOT 1
- Zero score (every output is None) on evaluation captioning with pretrained model HOT 1
- Non-Configurable GPU Count via Arguments
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from univl.