Comments (8)
I would not recommend using numpy arrays for storing python bytes, in my experience numpy is clever, and trims away null bytes which is absolutely not what we want. That caused me massive headaches for a long time. However, you can use bytearrays for receiving MPI.BYTES (that way I think you would avoid weird endianness-related issues). Apart from using bytearrays instead of numpy arrays, you example is very close to how I send/receive states.
On my 1.5 years old laptop, I get a similar speedup.
from impuritymodel.
In my fork, where I calculate the selfenergy of the impurity, and do self-consistent DMFT, I have had to add a lot of MPI communication of product states. This change would be incompatible with basically all my changes.
Also, this might cause problems with large numbers of spin orbitals and MPI and numpy communication. Python's int representation has no fixed bit length, however all MPI and numpy types have. For instance, by default numpy uses long (which is system dependent) for python's int.
I would recommend using bytes/bytearray for representing states, instead of int.
from impuritymodel.
Great with your input @johanjoensson !
Happy to hear about your DMFT work. Cool!
It sounds like your code explicitly use the fact that bytes/bitarray is the product-state representation, true?
I naively thought that the representation would not matter in most cases.
Curious, can you show a short example where the int-representation would not work?
If it would be a lot of work to change to int-representation it sounds better to continue using bytes/bitarray, since the potential gains (skip going back and forth between bytes and bitarray, and not dependent on bitarray package) are not very big.
I think you raised important points about numpy and MPI.
In the short code script below it seems to me it's possible to send a big python int to a numpy array and then send it another MPI rank. But the numpy dtype becomes "object" so the potential benefit of storing several states in a numpy array (instead of a e.g. a python list) is gone I guess. Do you have a better example where numpy or MPI can't handle big ints (not even with the "object" dtype)? (I'm aware one will get nonsense if have a numpy array of dtype np.int64 and try to fill it will values bigger than 2**64-1.)
from mpi4py import MPI
import numpy as np
comm = MPI.COMM_WORLD
rank = comm.rank
if rank == 0:
# Bigger than 2**64-1
i = 2**80
print(f"int: {rank = }, {i = }")
# ndarray with "object" dtype, if input values are big enough to not fit in int64
i = np.array([i])
print(f"First ndarray: {rank = }, {i = }")
# Add one to all "object" elements
i += 1
print(f"Second ndarray: {rank = }, {i = }")
# Slow unbuffered communication
comm.send(i, dest=1, tag=0)
elif rank == 1:
i = comm.recv(source=0, tag=0)
else:
i = 3
print(f"Final: {rank = }, {i = }")
from impuritymodel.
Yes, I found it easier to just force the DMFT calculation to use the bytes/bitarray representation.
The Python "Object" will always work, however it is only available when using the lower case mpi4py functions (send, recv, reduce, etc.). These functions start by pickling the data to be sent, and then sends the pickled object. For performance reasons, this becomes very slow when we need to communicate large (or many) objects.
Instead mpi4py recommends using the upper case mpi4py functions (Send, Recv, Reduce, etc.)1. These functions work with any python buffer type (numpy ndarray, bytearray, etc.), and are much more similar to the Fortranc/C MPI calls, in terms of performance at least. But, they require that your data can be reprented by MPI datatypes, and python Object is not a MPI datatype, unfortunately.
This would only become a problem for communicating large number of product states, which is rather unusual, I agree. However, if we run on a supercomputer with 100+ MPI-ranks (or even 1000+) it becomes necessary to not always store all product states in the basis on all ranks. Rather what I do is I distribute the product states over all the MPI ranks, which unfortunately means I need to be able to send product states between ranks. Since the size of the basis grows exponentially with the number of spin-orbitals, this quickly leads to massive amounts of data being sent over MPI, and the upper case functions are quite helpful then.
from impuritymodel.
Thanks for reminding me about the upper-case mpi4py functions.
Ok, now I see the advantage of using the bytes/bitarray representation vs the int representation: can have a numpy array of bytes elements but not a numpy array of big ints.
Let's close this issue (without any further action).
Is the script below similar to how you send states as a numpy array of bytes?
Small script:
from mpi4py import MPI
import numpy as np
from bitarray import bitarray
from time import time
import math
comm = MPI.COMM_WORLD
rank = comm.rank
n_spin_orbitals = 200
# One product state
ps = bitarray("1"*n_spin_orbitals).tobytes()
n_bytes = math.ceil(n_spin_orbitals / 8)
dtype = np.dtype(f"|S{n_bytes}")
# Number of product states
n_ps = 10**6
if rank == 0:
states = tuple(ps for _ in range(n_ps))
# Slow unbuffered communication
comm.Barrier()
t_unbuff = time()
if rank == 0:
comm.send(states, dest=1, tag=0)
elif rank == 1:
states = comm.recv(source=0, tag=0)
comm.Barrier()
t_unbuff = time() - t_unbuff
# Fast buffered communication
if rank == 0:
# First convert to ndarray
t_convert = time()
states_numpy = np.array(states, dtype=dtype)
t_convert = time() - t_convert
assert states_numpy.dtype == dtype
comm.Barrier()
t_buff = time()
if rank == 0:
# Fast buffered communication
comm.Send([states_numpy, MPI.CHAR], dest=1, tag=7)
elif rank == 1:
# Initialize numpy array
states_numpy = np.empty(n_ps, dtype=dtype)
comm.Recv([states_numpy, MPI.CHAR], source=0, tag=7)
comm.Barrier()
t_buff = time() - t_buff
# Convert back to tuple
t_convert_back = time()
states = tuple(states_numpy)
t_convert_back = time() - t_convert_back
if rank == 0:
print(f"{1000*t_unbuff = :.1f} ms")
print(f"{1000*t_convert = :.1f} ms")
print(f"{1000*t_buff = :.1f} ms")
print(f"{1000*t_convert_back = :.1f} ms")
print(f"{t_unbuff / t_buff :.1f}x speed-up using buffered communication")
Script output:
1000*t_unbuff = 66.2 ms
1000*t_convert = 112.2 ms
1000*t_buff = 20.5 ms
1000*t_convert_back = 141.6 ms
3.2x speed-up using buffered communication
It looks like the unbuffered communication is about 3x slower than the buffered one.
And that converting between numpy array and tuple takes more time than the MPI communication (on my 11 years old laptop).
from impuritymodel.
I have zero experience of using numpy to store bytes so thanks for the information.
Do you join all states into one bytearray with something like bytearray(b"".join(states))
?
from impuritymodel.
I can usually get away with setting up a generator that outputs the appropriate slices from the states-array, something like (bytes(states[i:i+n_bytes]) for i in range(0, n_ps, n_bytes))
, or itertools.chunked
if available.
I try to only use bytearray
when receiving a bunch of MPI.BYTE
, then chunking it up into appropriate bytes
when I need the actual product states.
A very large change that I have done in my fork is that I have created a class for storing the manybody basis, so this class is responsible for doing all the MPI-distributing of product states, I use it to switch between vector/matrix representations of multiconfigurational states/operators, expanding the basis by repeated application of the Hamiltonian, etc.
from impuritymodel.
Aha, ok.
Interesting with a class to store the product states included the basis.
from impuritymodel.
Related Issues (11)
- Magnetic field issue HOT 1
- Add unit-testing with github actions HOT 1
- Add strict flake8 in github action HOT 2
- Provide input parameters to python scripts more in terms of files HOT 1
- Speedup by using Cython or Numba when calculating matrix elements HOT 1
- choose verbosity and use default verbose flag to be zero
- update python packages in requirements.in HOT 1
- Make Lanczos algorithm more robust HOT 7
- Fix sphinx warnings HOT 1
- create and remove using integer representation do not need to go through string representation HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from impuritymodel.