Giter VIP home page Giter VIP logo

Comments (8)

johanjoensson avatar johanjoensson commented on June 12, 2024 1

I would not recommend using numpy arrays for storing python bytes, in my experience numpy is clever, and trims away null bytes which is absolutely not what we want. That caused me massive headaches for a long time. However, you can use bytearrays for receiving MPI.BYTES (that way I think you would avoid weird endianness-related issues). Apart from using bytearrays instead of numpy arrays, you example is very close to how I send/receive states.

On my 1.5 years old laptop, I get a similar speedup.

from impuritymodel.

johanjoensson avatar johanjoensson commented on June 12, 2024

In my fork, where I calculate the selfenergy of the impurity, and do self-consistent DMFT, I have had to add a lot of MPI communication of product states. This change would be incompatible with basically all my changes.

Also, this might cause problems with large numbers of spin orbitals and MPI and numpy communication. Python's int representation has no fixed bit length, however all MPI and numpy types have. For instance, by default numpy uses long (which is system dependent) for python's int.

I would recommend using bytes/bytearray for representing states, instead of int.

from impuritymodel.

JohanSchott avatar JohanSchott commented on June 12, 2024

Great with your input @johanjoensson !

Happy to hear about your DMFT work. Cool!

It sounds like your code explicitly use the fact that bytes/bitarray is the product-state representation, true?
I naively thought that the representation would not matter in most cases.
Curious, can you show a short example where the int-representation would not work?
If it would be a lot of work to change to int-representation it sounds better to continue using bytes/bitarray, since the potential gains (skip going back and forth between bytes and bitarray, and not dependent on bitarray package) are not very big.

I think you raised important points about numpy and MPI.
In the short code script below it seems to me it's possible to send a big python int to a numpy array and then send it another MPI rank. But the numpy dtype becomes "object" so the potential benefit of storing several states in a numpy array (instead of a e.g. a python list) is gone I guess. Do you have a better example where numpy or MPI can't handle big ints (not even with the "object" dtype)? (I'm aware one will get nonsense if have a numpy array of dtype np.int64 and try to fill it will values bigger than 2**64-1.)

from mpi4py import MPI
import numpy as np

comm = MPI.COMM_WORLD
rank = comm.rank

if rank == 0:
    # Bigger than 2**64-1
    i = 2**80
    print(f"int: {rank = }, {i = }")
    # ndarray with "object" dtype, if input values are big enough to not fit in int64
    i = np.array([i])
    print(f"First ndarray: {rank = }, {i = }")
    # Add one to all "object" elements
    i += 1
    print(f"Second ndarray: {rank = }, {i = }")
    # Slow unbuffered communication
    comm.send(i, dest=1, tag=0)
elif rank == 1:
    i = comm.recv(source=0, tag=0)
else:
    i = 3

print(f"Final: {rank = }, {i = }")

from impuritymodel.

johanjoensson avatar johanjoensson commented on June 12, 2024

Yes, I found it easier to just force the DMFT calculation to use the bytes/bitarray representation.

The Python "Object" will always work, however it is only available when using the lower case mpi4py functions (send, recv, reduce, etc.). These functions start by pickling the data to be sent, and then sends the pickled object. For performance reasons, this becomes very slow when we need to communicate large (or many) objects.

Instead mpi4py recommends using the upper case mpi4py functions (Send, Recv, Reduce, etc.)1. These functions work with any python buffer type (numpy ndarray, bytearray, etc.), and are much more similar to the Fortranc/C MPI calls, in terms of performance at least. But, they require that your data can be reprented by MPI datatypes, and python Object is not a MPI datatype, unfortunately.

This would only become a problem for communicating large number of product states, which is rather unusual, I agree. However, if we run on a supercomputer with 100+ MPI-ranks (or even 1000+) it becomes necessary to not always store all product states in the basis on all ranks. Rather what I do is I distribute the product states over all the MPI ranks, which unfortunately means I need to be able to send product states between ranks. Since the size of the basis grows exponentially with the number of spin-orbitals, this quickly leads to massive amounts of data being sent over MPI, and the upper case functions are quite helpful then.

from impuritymodel.

JohanSchott avatar JohanSchott commented on June 12, 2024

Thanks for reminding me about the upper-case mpi4py functions.

Ok, now I see the advantage of using the bytes/bitarray representation vs the int representation: can have a numpy array of bytes elements but not a numpy array of big ints.

Let's close this issue (without any further action).

Is the script below similar to how you send states as a numpy array of bytes?

Small script:

from mpi4py import MPI
import numpy as np
from bitarray import bitarray
from time import time
import math

comm = MPI.COMM_WORLD
rank = comm.rank

n_spin_orbitals = 200
# One product state
ps = bitarray("1"*n_spin_orbitals).tobytes()
n_bytes = math.ceil(n_spin_orbitals / 8)
dtype = np.dtype(f"|S{n_bytes}")
# Number of product states
n_ps = 10**6

if rank == 0:
    states = tuple(ps for _ in range(n_ps))

# Slow unbuffered communication
comm.Barrier()
t_unbuff = time()
if rank == 0:
    comm.send(states, dest=1, tag=0)
elif rank == 1:
    states = comm.recv(source=0, tag=0)
comm.Barrier()
t_unbuff = time() - t_unbuff

# Fast buffered communication
if rank == 0:
    # First convert to ndarray
    t_convert = time()
    states_numpy = np.array(states, dtype=dtype)
    t_convert = time() - t_convert
    assert states_numpy.dtype == dtype

comm.Barrier()
t_buff = time()
if rank == 0:
    # Fast buffered communication
    comm.Send([states_numpy, MPI.CHAR], dest=1, tag=7)
elif rank == 1:
    # Initialize numpy array
    states_numpy = np.empty(n_ps, dtype=dtype)
    comm.Recv([states_numpy, MPI.CHAR], source=0, tag=7)
comm.Barrier()
t_buff = time() - t_buff

# Convert back to tuple
t_convert_back = time()
states = tuple(states_numpy)
t_convert_back = time() - t_convert_back

if rank == 0:
    print(f"{1000*t_unbuff = :.1f} ms")
    print(f"{1000*t_convert = :.1f} ms")
    print(f"{1000*t_buff = :.1f} ms")
    print(f"{1000*t_convert_back = :.1f} ms")
    print(f"{t_unbuff / t_buff :.1f}x speed-up using buffered communication")

Script output:

1000*t_unbuff = 66.2 ms
1000*t_convert = 112.2 ms
1000*t_buff = 20.5 ms
1000*t_convert_back = 141.6 ms
3.2x speed-up using buffered communication

It looks like the unbuffered communication is about 3x slower than the buffered one.
And that converting between numpy array and tuple takes more time than the MPI communication (on my 11 years old laptop).

from impuritymodel.

JohanSchott avatar JohanSchott commented on June 12, 2024

I have zero experience of using numpy to store bytes so thanks for the information.

Do you join all states into one bytearray with something like bytearray(b"".join(states)) ?

from impuritymodel.

johanjoensson avatar johanjoensson commented on June 12, 2024

I can usually get away with setting up a generator that outputs the appropriate slices from the states-array, something like (bytes(states[i:i+n_bytes]) for i in range(0, n_ps, n_bytes)), or itertools.chunked if available.

I try to only use bytearray when receiving a bunch of MPI.BYTE, then chunking it up into appropriate bytes when I need the actual product states.

A very large change that I have done in my fork is that I have created a class for storing the manybody basis, so this class is responsible for doing all the MPI-distributing of product states, I use it to switch between vector/matrix representations of multiconfigurational states/operators, expanding the basis by repeated application of the Hamiltonian, etc.

from impuritymodel.

JohanSchott avatar JohanSchott commented on June 12, 2024

Aha, ok.

Interesting with a class to store the product states included the basis.

from impuritymodel.

Related Issues (11)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.