Comments (15)
Not sure whether this is a QUDA issue then. @bjoo thoughts?
from quda.
It seems more likely to be an issue from Chroma instead of QUDA. Low invertQuda / (initQuda - endQuda)
means that Chroma takes too much time to run.
Maybe you could check if you have set --enable-openmp
while configuring QDPXX.
from quda.
Thanks for the comments. I did set -DQDP_USE_OPENMP=ON
when CMaking QDPXX.
from quda.
The CMakeLists.txt
in qdpxx/devel branch has something wrong. QDP_USE_OMP_THREADS
macro is not actually set. Using autoconf to do the configuration should be fine.
If you want CMake to work, you could add set(QDP_USE_OMP_THREADS ${QDP_USE_OPENMP})
just under this line.
from quda.
from quda.
Thank @SaltyChiang and @bjoo for the valuable advice. I have taken some time and tried the change. Adding the QDP_USE_OMP_THREADS
setting in qdpxx/devel CMakeLists.txt
solve the problem. I can see QDP use OpenMP threading
and improvement of efficiency. Although the default -DCHROMA_ENABLE_OPENMP
is OFF, OMP
still works. I think I should mention this issue in qdpxx project later.
from quda.
Closing this here as not a QUDA issue. @wittscien if you create a follow-up issue for QDPXX or Chroma feel free to link it here.
from quda.
from quda.
I see, thanks @bjoo.
from quda.
@SaltyChiang A follow-up -- while I find one of the machines works fine (great improvement in efficiency) after adding QDP_USE_OMP_THREADS
setting in qdpxx/devel CMakeLists.txt
, there is no improvement in efficiency on another machine (The log now prints QDP use OpenMP threading
though). I installed them in the same way and used the same sbatch script. Do you know any possible issues that cause the problem?
from quda.
@wittscien could the issue be related to binding? Are the threads being allocated to their own cores? We have instructions for creating job binding scripts here in case this helps. (Look at how we use the CPU_REORDER
variable to set numactl
.)
from quda.
@maddyscientist Looks like setting numactl
does not improve the efficiency. And OMP_NUM_THREADS has not a bit of help. I use only one V100 card. Seems very weird.
from quda.
@wittscien No idea now. Maybe you should write a simple program to check if the machine could handle OpenMP normally.
from quda.
from quda.
Thank you for the comments. Yes, I have exported OMP_NUM_THREADS
, and the output of a Chroma program prints on the first line QDP use OpenMP threading. We have x threads
as expected @bjoo. But it just does not do it honestly -- if I use top
to show the processes, I see only one thread with 100% CPU usage. I wrote a simple code and it does allocate the specified threads @maddyscientist.
from quda.
Related Issues (20)
- Overrelaxed Coulomb gauge fixing convergence criteria needs to be made more robust
- MG Setup (or refresh) fails with "too many heavy quark residual restarts" HOT 7
- QDP interface and copy_gauge_inc.cu HOT 1
- Can't build with CUDA 11.5 HOT 9
- Query: QUDA Feature-SYCL branch HOT 19
- SYCL wiki page
- Direct aggregation staggered MG fails HOT 1
- Add appropriate `const` modifiers to member functions of the shared memory cache
- MPI err_string length too small, would overflow if MPI errors HOT 1
- Only compile `host_reference` test files when appropriate Dirac types are enabled
- `comm_init_common` (optionally) explicitly relies on the existence of `CUDA_VISIBLE_DEVICES`, ignores `HIP_VISIBLE_DEVICES` HOT 4
- Tuning behavior is dependent on verbosity setting HOT 1
- BCGRQ problems
- Update CMakeLists.txt in the test directory to only `try_compile` `c_interface_test` HOT 1
- HIP build broken... HOT 6
- Typo in macro in develop branch, file lib/targets/hip/device.cpp prevents HIP compilation
- Error finding a Communicator in quda::get_current_communicator() when running Chroma + QDP_JIT HOT 8
- Staggered heavy-quark residual fails to regulate HISQ CG convergence properly with an odd checkerboard source HOT 14
- `site unroll not supported for nSpin = 2 nColor = 32` in coarse-grid-deflated MG HOT 11
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from quda.