Comments (4)
Thank you for the bug report.
This is confirmed using g++9 and openmpi so not compiler/mpi library specific.
The segfault is triggered by the fetch_points/fetch_all call in your example but in fact stems from the creation of the internal message structure within the library triggered by the barrier() function, so is likely representative of a more deeply-rooted bug.
This will be looked into as a matter of urgency and this report updated once a fix is found.
A quick point for the moment, there is a mismatch in the chrono_sampler used, when using the specialisms approach to declaring MUI objects, it is important to match types, the interface and point types are 3d (3-dimensional and double data type) while the chrono_sampler is 1d. This isn't the underlying problem though.
from mui.
Thanks @SLongshaw. I will update my chrono_sampler dimensionality. Good luck finding the bug.
from mui.
Hi @chrisrichardson I think this should now be resolved in both branches.
To summarise, you managed to uncover a situation that doesn't normally arise using MUI, specifically the sending rank was able to exit before the receiving rank had completed its fetch and as MUI relies on a non-blocking send / blocking receive design this meant MPI buffers etc. were lost as the sending rank exited. Normally coupled designs mean this tends not to happen (not specifically by design admittedly).
In reality though, this represented a bug in the general design of MUI as there should always be a corresponding MPI wait (or equivalent) call (which there was) but there was nothing to block to wait to ensure completion.
The solution implemented is a new blocking wait on MPI_Isend completion but with an inbuilt timer to avoid deadlock.
The problem here is that the original design was the safest and most general approach but inherently had this potential problem, the new design is more "correct" in MPI terms but reduces the generality of the library, for most scenarios this shouldn't represent any difference but for those few where it does, the addition of a timeout and user warning messages is the (inelegant) solution.
This change will remain permanent as long as further test cases in the future don't highlight any unforeseen major issues with the new solution. In that case there is a work-around using the original approach and that is to ensure that the sending side finishes using a call to the barrier() function at a time stamp where it hasn't yet issued a fetch() command and, on the receiving side to send a final empty commit() at that time to unblock both sides, in effect adding a synchronisation barrier.
Please try this change out and feel free to re-open this issue if it is not resolved for you, for now though I will close.
from mui.
The fix for this problem eventually resulted in a runtime problem with a specific case where the new blocking wait adding significant overhead. The non-blocking wait has therefore been re-instated, however, a final blocking test loop has been added to the destructor of the MPI comm class, this has been tested against the issue raised here and the new solution still offers a fix while removing the need for a blocking MPI test loop after every send.
from mui.
Related Issues (20)
- Fix endianness traits HOT 3
- Calling mpi_split_by_app with a single app HOT 3
- Issue with Nearest Neighbour spatial filter HOT 2
- Not find the fetch_values function implement HOT 2
- Connecting MUI applications with different starting commands HOT 4
- MUI Coupling with the Lattice Boltzmann solver Palabos freezes HOT 2
- polymorphic uniface -- ideas / opinions wanted HOT 3
- Please, guide for how to install MUI in Lammps HOT 1
- Better packaging for Python wrapper. HOT 2
- Add unit tests for Python wrapper. HOT 3
- Store the mesh connectivity information for mesh based solvers HOT 7
- Add interpolation filter at the sending side HOT 1
- accelerating spatial samplers HOT 1
- improve generality of RBF spatial filter
- Cmake shouldn't specify FORTRAN unless needed HOT 1
- Compiling Errors for MUI Python Wrapper on macOS Based Machine HOT 4
- fortran wrapper type error?? HOT 2
- trasfering array of string from fortran into cpp HOT 1
- configure/install as header-only HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from mui.