Comments (5)
Questions that need to be answered:
- What to use for the distributed executed?
- My vote is for
mpi4py
, proven to be quite stable over the years.
- My vote is for
- Should we make the distributed package a hard dep?
- My vote is "No"
- How could the current implementation in
feinsum
changed?- Option 1. We could rewrite
feinsum.tuning.OpentunerTuner
to behave like a server-client model where only rank does the writes to the database and generate inputs for the search exploration. I am slightly worried about the scalability, but the common case is that evaluating a point in the search space costs us ~10 seconds maybe this is not a big problem. - Option 2. Each rank runs its own search-space exploration and after each run, it broadcasts the timing result for other ranks to add as extra seed configurations. I'm not sure if opentuner allows us to seed the configurations during the run and the database update costs when multiple processes are performing the updates.
- Option 1. We could rewrite
from feinsum.
I have looked at the mpi4py task pool, mpipool, schwimmbad and charm4py task pools. The mpi4py task pool makes the middle two fairly redundant IMO. Between charm4py and mpi4py, mpi4py is easier to build and install, but certain pool executors break/hang on Spectrum MPI. Charm4py pools seemed more stable, but it takes more effort to build charm4py. Most of the basic task execution functionality is similar between the two so it wouldn't be too hard to add both.
A third option might be to divide up the search space in some way and then have each rank search within its subspace. This could introduce load balancing problems, however.
Keeping it as a soft dependency seems fine for now.
from feinsum.
Most of the basic task execution functionality is similar between the two so it wouldn't be too hard to add both.
I don't think we need a task pool for any of the options here. Option 1 can be done with simple MPI communication primitive by using MPI_ANY_SOURCE
for the server rank and all the client ranks know that rank 0 is the server rank. And the server rank would be in a loop as:
while True:
Blocking RECV from any source.
Record the result in a database and send the next iteration point to the rank that was completed.
Let me know if I'm missing something.
Option (3) is also an interesting option, a mild concern there is if that might force us with a sub-optimal search-exploration strategy.
from feinsum.
Ray may (or may not) be useful here https://docs.ray.io/en/latest/index.html
from feinsum.
Thanks,
I looked at the example in https://docs.ray.io/en/latest/tune/index.html and that should blend well with feinsum's description of parameter spaces. Some downsides I see are:
- it seems like a heavyweight library that would pull in lots of dependencies
- Seems like the distributed tuning depends on Kubernetes infrastructure, which again is a bit heavyweight.
from feinsum.
Related Issues (20)
- Add a utility to check whether two Fused-einsums are performance-isomorphic. HOT 1
- Avoid `lp.fix_parameters` while timing/empirically verifying the transformation
- Fusion algorithm for composition of batched einsums
- Add .rtd-env-py3.yml
- Add empirical global memory bandwidth measurement
- Add empirical maximum FLOP rate measurement HOT 2
- Add helpers to compare the best result in the database with other tools HOT 1
- Add a CI to do some trivial checks on the database entries
- Remove dependency on OpenTuner. HOT 4
- Use vertex coloring while canonicalizing the FusedEinsum HOT 1
- Support matching `max` and `min` reductions
- Rename FusedEinsum -> BatchedEinsum
- [docs] clarify that we use the same namespace for argument names and index names
- Pretty print a Batched einsum
- Validate the prescription of the transform space by fuzz testing
- feinsum.query should accept a cl.Device instead of cl.Context
- Check statement dependencies in `f.match_t_unit_to_einsum`
- Database should not store the kernel string rather a filepath that can be located HOT 1
- `f.loopy_utils` should provide a transformation to make i-th einsum operand as substitution
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from feinsum.