Hi, I'd like to store a communicator as dynamic reference, e.g.

Using dynamic reference to communicator about rsmpi HOT 12 CLOSED

tbetcke commented on June 12, 2024

Using dynamic reference to communicator

from rsmpi.

Comments (12)

jedbrown commented on June 12, 2024 1

This part wasn't my design. I haven't thought about this carefully, but surely need this capability to write higher level libraries. Can you make a test and propose an implementation?

from rsmpi.

tbetcke commented on June 12, 2024

Currently going trough the source code. Goal is to access all_reduce_into from a &dyn Communicator trait object.

all_reduce_into is defined in the CommunicatorCollectives trait. This trait derives from Communicator and is implemented for any Communicator type. So ideally we would just cast

(&dyn Communicator as &dyn CollectiveCommunicator).all_reduce_into(...)

However, this does not work because CollectiveCommunicator has template methods and cannot be cast into a trait object.

What seems to work is to implement CollectiveCommunicator for dyn Communicator that is to add in collective.rs the line

imp CollectiveCommunicator for dyn Communicator {}.

We would have to do this for all derived traits from Communicator in order for users to be able to use Communicator as trait object. I don't know if there are any other boundary effects from doing this. But this might be a solution.

from rsmpi.

tbetcke commented on June 12, 2024

Looked more into the above. An issue is the process_at_rank method in the Communicator trait. It requires Self to be sized therefore cannot be called on the trait object.

from rsmpi.

jedbrown commented on June 12, 2024

Surely that can be resolved with a little indirection at most.

from rsmpi.

tbetcke commented on June 12, 2024

Agree. We would need to implement some stuff for dyn Communicator if we want it to have the same functionality as the direct communicator types. For example the process_at_rank method returns a new Process struct that does nothing else than storing a reference to the communicator + the rank information given to process_at_rank.

The actual point to point communication is then implemented as a trait that only needs the rank information and the as_raw method from the communicator. I suspect the same with others.

So yes. With a bit of writing we could probably make everything work for a dyn Communicator object. Would this be desirable? It requires a bit of special code just for dyn Communicator but allows users to pass communicators everywhere simply as trait objects.

from rsmpi.

jedbrown commented on June 12, 2024

So I think process_at_rank is just to avoid if statements in certain code paths. Maybe it should hold an &'a dyn Communicator instead of &'a C where C: 'a + Communicator? Would that be a localized change?

from rsmpi.

Cydhra commented on June 12, 2024

So yes. With a bit of writing we could probably make everything work for a dyn Communicator object. Would this be desirable? It requires a bit of special code just for dyn Communicator but allows users to pass communicators everywhere simply as trait objects.

This is definitely desirable. I am trying to write a higher level library and it is very annoying that I cannot support UserCommunicators easily. Why exactly are those two different types anyway (that is SystemCommunicator and UserCommunicator)? I never worked extensively with MPI in other languages, but if I understand it correctly, all communicators are the same type in standard MPI. Why make this awkward distinction that breaks all design patterns?

Wouldn't it be better to have either one trait that is common to all communicators (like Communicator but one that actually exposes all functions that are present in communicators) or even unify the entire thing into one enum (like MPI usually does)?

from rsmpi.

jedbrown commented on June 12, 2024

I don't like this organization either. I think type-system expression of InterCommunicator is useful because it behaves differently (the caller almost always needs to know whether they're working with an intra- or inter-comm). I think the User versus System distinction could be removed from the type system or converted to an enum. This was not my design and I don't have time to change it myself, but I'd be happy to review a PR.

from rsmpi.

Cydhra commented on June 12, 2024

I may have time to look into this, but I don't understand the distinction of [System/User]Communicator and [System/User]CommunicatorHandle. Both structs contain the same information in each case, and both support construction from raw ffi handles, but I cannot really see what else the -Handle structs do.

The CommunicatorHandle trait is empty, and I can't find traits that are implemented for the handle structs aside from those that handle conversion to ffi handles. They are also absent from the docsrs documentation, despite being public.

from rsmpi.

jedbrown commented on June 12, 2024

If you can simplify while keeping the examples working, I'm all for it.

from rsmpi.

Cydhra commented on June 12, 2024

So I think process_at_rank is just to avoid if statements in certain code paths. Maybe it should hold an &'a dyn Communicator instead of &'a C where C: 'a + Communicator? Would that be a localized change?

No, because the trait AsCommunicator is defined as

pub trait AsCommunicator {
    type Out: Communicator;
    
    fn as_communicator(&self) -> &Self::Out;
}

Setting Out to dyn Communicator doesn't work because that isn't sized.

Also this doesn't solve the issue, because converting a &self reference to a &'a dyn Communicator still requires it to be sized.

from rsmpi.

Cydhra commented on June 12, 2024

I have come up with a solution to the problem of the Process not being Sized. The Process currently looks like this:

#[derive(Copy, Clone)]
pub struct Process<'a, C>
where
    C: 'a + Communicator,
{
    comm: &'a C,
    rank: Rank,
}

We can change this to remove the type parameter (assuming #155 gets merged):

#[derive(Clone)]
pub struct Process<'a> {
    comm: &'a CommunicatorHandle,
    rank: Rank,
}

This then creates a new problem because Process implements AsCommunicator to turn back into the communicator it was derived from. This is for example used to compare process instances. We obviously cannot create a Communicator instance from a reference (that would leave dangling references if the original communicator was dropped).

But we can instead implement Communicator for Process. This is not a problem because Process is bound to the original communicator by its lifetime, so it can never be dangling. This also hides away the CommunicatorHandle properly, and it is a relatively localized change (ignoring the removal of the type parameter).

This has the unfortunate side-effect that we now need a way to get a handle from a Communicator, so we can create the process:

fn by_rank(c: &'a (impl Communicator + ?Sized), r: Rank) -> Option<Self> {
    if r != unsafe { ffi::RSMPI_PROC_NULL } {
        Some(Process {
            comm: c.get_handle(),
            rank: r,
        })
    } else {
        None
    }
}

This necessitates that every Communicator implements a crate-public function get_handle. I currently enforce this by changing trait Communicator to be derived from a sealed trait (hidden in a crate-public module). This way I need not expose the handle to public API and can still access it from every Communicator. I unfortunatelly cannot provide a blanket implementation for this function, since the handles are part of the implementing types.

These changes allow functions like process_at_rank to be called on trait-objects.

The same approach works for struct AnyProcess.

Then there are functions in Communicator left that have type parameters. Many of those (particularly those that use Group) can be changed locally to use trait-objects instead. I am fairly certain this doesn't break anything.

The only functions left are get_attr, which cannot use a trait-object because it uses an associated type on its type-parameter, and all methods that use Buffer objects. It may be possible using trait-objects for those (though this seems iffy to me), but I haven't looked into that yet. Any ideas for those last methods? (using dyn Communicator still works without those methods by the way, you just cannot call methods that have the Self: Sized type boundary).

My experimental implementation of this can be found here. It is branched from #155.

from rsmpi.

Using dynamic reference to communicator about rsmpi HOT 12 CLOSED

Comments (12)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent