luca-s / mpi-master-slave Goto Github PK
View Code? Open in Web Editor NEWMaster Slave code in mpi4py
License: MIT License
Master Slave code in mpi4py
License: MIT License
Hello! I was wanting to fork your repo for some work that I need to do, however I do not find any license that would allow me to extend your work. Would you consider adding an open license so I may do this? Also have you considered adding this to PyPi? I think it would be very useful. Thanks!
Using the example you offer to illustrate the problem, MPI.FIle.Open works when it exists in main, but doesn't in MySlave. The latter is what I need. Could you please give me a solution to fix it?
The code is below:
from mpi4py import MPI
from mpi_master_slave import Master, Slave
from mpi_master_slave import WorkQueue
import time
class MyApp(object):
"""
This is my application that has a lot of work to do so it gives work to do
to its slaves until all the work is done
"""
def __init__(self, slaves):
# when creating the Master we tell it what slaves it can handle
self.master = Master(slaves)
# WorkQueue is a convenient class that run slaves on a tasks queue
self.work_queue = WorkQueue(self.master)
def terminate_slaves(self):
"""
Call this to make all slaves exit their run loop
"""
self.master.terminate_slaves()
def run(self, tasks=10):
"""
This is the core of my application, keep starting slaves
as long as there is work to do
"""
#
# let's prepare our work queue. This can be built at initialization time
# but it can also be added later as more work become available
#
for i in range(tasks):
# 'data' will be passed to the slave and can be anything
self.work_queue.add_work(data=('Do task', i))
#
# Keeep starting slaves as long as there is work to do
#
while not self.work_queue.done():
#
# give more work to do to each idle slave (if any)
#
self.work_queue.do_work()
#
# reclaim returned data from completed slaves
#
for slave_return_data in self.work_queue.get_completed_work():
done, message = slave_return_data
if done:
print('Master: slave finished is task and says "%s"' % message)
# sleep some time: this is a crucial detail discussed below!
time.sleep(0.03)
class MySlave(Slave):
"""
A slave process extends Slave class, overrides the 'do_work' method
and calls 'Slave.run'. The Master will do the rest
"""
def __init__(self):
super(MySlave, self).__init__()
def do_work(self, data):
rank = MPI.COMM_WORLD.Get_rank()
name = MPI.Get_processor_name()
task, task_arg = data
comm = MPI.COMM_WORLD
mode = MPI.MODE_RDONLY
fh = MPI.File.Open(comm, "tmp.txt", mode)
print("file opening in MySlave for rank {:d}".format(rank))
fh.Close()
print(' Slave %s rank %d executing "%s" task_id "%d"' % (name, rank, task, task_arg) )
return (True, 'I completed my task (%d)' % task_arg)
def main():
name = MPI.Get_processor_name()
rank = MPI.COMM_WORLD.Get_rank()
size = MPI.COMM_WORLD.Get_size()
print('I am %s rank %d (total %d)' % (name, rank, size) )
comm = MPI.COMM_WORLD
mode = MPI.MODE_RDONLY
fh = MPI.File.Open(comm, "tmp.txt", mode)
print("file opening in main for rank {:d}".format(rank))
fh.Close()
if rank == 0: # Master
app = MyApp(slaves=range(1, size))
app.run()
app.terminate_slaves()
else: # Any slave
MySlave().run()
print('Task completed (rank %d)' % (rank) )
if __name__ == "__main__":
import os
if not os.path.isfile('tmp.txt'):
open('tmp.txt', 'a').close()
main()
Hi luca-s,
Thanks for your pretty useful tips for mpi4py!! I am wondering can the master also be a slave? So the master not only handle I/O but also be a slave that need to finish the worker assigned from itself. So if a machine have n cores then only n processes will be launched with mpirun -np n python *.py. This is slightly different from the example you described here with mpirun -np n+1 python *py.
What will be the potential issues with the parallelization method that master also being a slave?
BTW, could you explain a little bit more with the example you described here "mpirun -np n+1 python *py" how does these n+1 processes being allocated on the n cores? does the additional 1 process being allocated into one of the n cores which means there is one core that will deal with 2 processes. Or the n+1 processes are being allocated equally to the n cores.
Best.
Dear Luca-S,
I am writing to ask for your advice on poor processing performance I recently encountered with mpi-master-slave. The task is about processing hundreds of thousands of protein sequence files via MAFFT multiple sequence alignment. The script is executed on a supercomputer grid requesting 50 compute nodes with 32 CPU-s each. One CPU is allocated per MPI rank. (Total number of simultaneous tasks: 1600).
With this setup, in 60 minutes runtime, the number of processed files only hits ~5000. Some component must reach saturation since decreasing the number of nodes to 20 or 10 gives pretty much the same throughput.
I guess that the observed poor performance might be a consequence of inappropriate sleep time for the master thread.
(I just now realize that I modified the sleep from 0.3 s to 0.03 s to achieve a more responsive system but it is quite possible that the system just got worse with the alteration :(.
Knowing the number of workers, what would be a reasonable sleep time for the master process? Depending on the number of proteins in the processed file,the worker process (MAFFT) runtime varies between ~0.4 second and several days.
With kind regards,
Balazs Balint
Hi Luca-s!
I wanted to bring up again if you were interested in publishing this on PyPi. I can do this work if you'd like. It would be useful to have it there. Let me know your thoughts. Thanks!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.