Comments (21)
I've been running multicut well on a cluster node with 1.5TB ram but it has a segmentation fault, presumably running out of RAM, on arrays larger than ~ 5k x 5k x 15.
Yes, this script does not scale well to large volumes.
Instead you will need to use the functionality from this repository.
You can find an example with some explanations here:
https://github.com/constantinpape/cluster_tools/blob/master/example/cremi/run_mc.py
Note that there are some important prerequisites to use this:
- You will need a conda environment with these dependencies:
https://github.com/constantinpape/cluster_tools/blob/master/environment.yml#L9-L16 - All input data must be stored in n5. For conversion from hdf5 or tiff to n5 have a look at
https://github.com/constantinpape/z5/blob/master/src/python/module/z5py/converter.py
Also, does your cluster run any scheduling system?
For now, I support slurm
and lsf
, but it is straightforward to extend this to other schedulers, by implementing a class like https://github.com/constantinpape/cluster_tools/blob/master/cluster_tools/cluster_tasks.py#L374.
from cluster_tools.
Yes we use slurm.
I do have the cluster_env conda environment built, but it wasn't finding the cluster_tools module so I added this:
export PYTHONPATH="/home/mmadany/miniconda3/envs/cluster_env/bin:/home/mmadany/Multicut/cluster_tools-master:/home/mmadany/Multicut/cluster_tools-master/cluster_tools"
I have configured z5 and converted to n5 files. When I try to run that example script, I get this error:
import os
import json
import luigi
from cluster_tools import MulticutSegmentationWorkflow
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/mmadany/Multicut/cluster_tools-master/cluster_tools/__init__.py", line 1, in <module>
from .workflows import MulticutSegmentationWorkflow
File "/home/mmadany/Multicut/cluster_tools-master/cluster_tools/workflows.py", line 5, in <module>
from .watershed import WatershedWorkflow
File "/home/mmadany/Multicut/cluster_tools-master/cluster_tools/watershed/__init__.py", line 1, in <module>
from .watershed_workflow import WatershedWorkflow
File "/home/mmadany/Multicut/cluster_tools-master/cluster_tools/watershed/watershed_workflow.py", line 4, in <module>
from . import watershed as watershed_tasks
File "/home/mmadany/Multicut/cluster_tools-master/cluster_tools/watershed/watershed.py", line 11, in <module>
from nifty.filters import nonMaximumDistanceSuppression
ImportError: cannot import name 'nonMaximumDistanceSuppression' from 'nifty.filters' (/home/mmadany/miniconda3/envs/cluster_env/lib/python3.7/site-packages/nifty/filters/__init__.py)
from cluster_tools.
Yes, sorry, I just implemented nonMaximumDistanceSuppression
and it's not in the conda package yet.
Please check out the latest commit 03ec3b8 and try again.
I added a check to skip nonMaximumDistanceSuppression
if it's not available.
from cluster_tools.
Ok, that runs, and I see it's doing job configuration within the program, this is what I'm getting:
(cluster_env) [mmadany@comet-ln2 cluster_tools-master]$ python ~/Multicut/runluigi.py
DEBUG: Checking if MulticutSegmentationWorkflow(tmp_folder=./tmp_mc_A, max_jobs=16, config_dir=./config_mc, target=slurm, dependency=DummyTask, input_path=/oasis/scratch/comet/mmadany/temp_project/LHB_FullAuto/fxmemf.n5, input_key=dataset1, ws_path=/oasis/scratch/comet/mmadany/temp_project/LHB_FullAuto/sigv.n5, ws_key=dataset1, problem_path=/oasis/scratch/comet/mmadany/temp_project/LHB_FullAuto/multiluigi_temp.n5, node_labels_key=node_labels, output_path=/oasis/scratch/comet/mmadany/temp_project/LHB_FullAuto/multi_luigi_out.h5, output_key=segmentation/multicut, mask_path=, mask_key=, rf_path=, node_label_dict={}, max_jobs_merge=1, skip_ws=True, agglomerate_ws=False, two_pass_ws=False, sanity_checks=False, max_jobs_multicut=1, n_scales=1) is complete
DEBUG: Checking if WriteSlurm(tmp_folder=./tmp_mc_A, max_jobs=16, config_dir=./config_mc, input_path=/oasis/scratch/comet/mmadany/temp_project/LHB_FullAuto/sigv.n5, input_key=dataset1, output_path=/oasis/scratch/comet/mmadany/temp_project/LHB_FullAuto/multi_luigi_out.h5, output_key=segmentation/multicut, assignment_path=/oasis/scratch/comet/mmadany/temp_project/LHB_FullAuto/multi_luigi_out.h5, assignment_key=node_labels, dependency=MulticutWorkflow, identifier=multicut, offset_path=) is complete
INFO: Informed scheduler that task MulticutSegmentationWorkflow_False___config_mc_DummyTask_6d798a14ef has status PENDING
DEBUG: Checking if MulticutWorkflow(tmp_folder=./tmp_mc_A, max_jobs=1, config_dir=./config_mc, target=slurm, dependency=ProblemWorkflow, problem_path=/oasis/scratch/comet/mmadany/temp_project/LHB_FullAuto/multiluigi_temp.n5, n_scales=1, assignment_path=/oasis/scratch/comet/mmadany/temp_project/LHB_FullAuto/multi_luigi_out.h5, assignment_key=node_labels) is complete
INFO: Informed scheduler that task WriteSlurm_node_labels__oasis_scratch_c___config_mc_4d42f4969f has status PENDING
DEBUG: Checking if SolveGlobalSlurm(tmp_folder=./tmp_mc_A, max_jobs=1, config_dir=./config_mc, problem_path=/oasis/scratch/comet/mmadany/temp_project/LHB_FullAuto/multiluigi_temp.n5, assignment_path=/oasis/scratch/comet/mmadany/temp_project/LHB_FullAuto/multi_luigi_out.h5, assignment_key=node_labels, scale=1, dependency=ReduceProblemSlurm) is complete
INFO: Informed scheduler that task MulticutWorkflow_node_labels__oasis_scratch_c___config_mc_e52655bb6f has status PENDING
DEBUG: Checking if ReduceProblemSlurm(tmp_folder=./tmp_mc_A, max_jobs=1, config_dir=./config_mc, problem_path=/oasis/scratch/comet/mmadany/temp_project/LHB_FullAuto/multiluigi_temp.n5, scale=0, dependency=SolveSubproblemsSlurm) is complete
INFO: Informed scheduler that task SolveGlobalSlurm_node_labels__oasis_scratch_c___config_mc_8b8648e259 has status PENDING
DEBUG: Checking if SolveSubproblemsSlurm(tmp_folder=./tmp_mc_A, max_jobs=1, config_dir=./config_mc, problem_path=/oasis/scratch/comet/mmadany/temp_project/LHB_FullAuto/multiluigi_temp.n5, scale=0, dependency=ProblemWorkflow) is complete
INFO: Informed scheduler that task ReduceProblemSlurm___config_mc_SolveSubproblems_1_182aa76377 has status PENDING
DEBUG: Checking if ProblemWorkflow(tmp_folder=./tmp_mc_A, max_jobs=16, config_dir=./config_mc, target=slurm, dependency=DummyTask, input_path=/oasis/scratch/comet/mmadany/temp_project/LHB_FullAuto/fxmemf.n5, input_key=dataset1, ws_path=/oasis/scratch/comet/mmadany/temp_project/LHB_FullAuto/sigv.n5, ws_key=dataset1, problem_path=/oasis/scratch/comet/mmadany/temp_project/LHB_FullAuto/multiluigi_temp.n5, rf_path=, node_label_dict={}, max_jobs_merge=1, compute_costs=True, sanity_checks=False) is complete
INFO: Informed scheduler that task SolveSubproblemsSlurm___config_mc_ProblemWorkflow_1_a1448fd645 has status PENDING
DEBUG: Checking if EdgeCostsWorkflow(tmp_folder=./tmp_mc_A, max_jobs=16, config_dir=./config_mc, target=slurm, dependency=EdgeFeaturesWorkflow, features_path=/oasis/scratch/comet/mmadany/temp_project/LHB_FullAuto/multiluigi_temp.n5, features_key=features, output_path=/oasis/scratch/comet/mmadany/temp_project/LHB_FullAuto/multiluigi_temp.n5, output_key=s0/costs, node_label_dict={}, rf_path=) is complete
INFO: Informed scheduler that task ProblemWorkflow_True___config_mc_DummyTask_3f92ce107e has status PENDING
DEBUG: Checking if ProbsToCostsSlurm(tmp_folder=./tmp_mc_A, max_jobs=16, config_dir=./config_mc, input_path=/oasis/scratch/comet/mmadany/temp_project/LHB_FullAuto/multiluigi_temp.n5, input_key=features, output_path=/oasis/scratch/comet/mmadany/temp_project/LHB_FullAuto/multiluigi_temp.n5, output_key=s0/costs, features_path=/oasis/scratch/comet/mmadany/temp_project/LHB_FullAuto/multiluigi_temp.n5, features_key=features, dependency=EdgeFeaturesWorkflow, node_label_dict={}) is complete
INFO: Informed scheduler that task EdgeCostsWorkflow___config_mc_EdgeFeaturesWork_features_2d838ae4dc has status PENDING
DEBUG: Checking if EdgeFeaturesWorkflow(tmp_folder=./tmp_mc_A, max_jobs=16, config_dir=./config_mc, target=slurm, dependency=GraphWorkflow, input_path=/oasis/scratch/comet/mmadany/temp_project/LHB_FullAuto/fxmemf.n5, input_key=dataset1, labels_path=/oasis/scratch/comet/mmadany/temp_project/LHB_FullAuto/sigv.n5, labels_key=dataset1, graph_path=/oasis/scratch/comet/mmadany/temp_project/LHB_FullAuto/multiluigi_temp.n5, graph_key=s0/graph, output_path=/oasis/scratch/comet/mmadany/temp_project/LHB_FullAuto/multiluigi_temp.n5, output_key=features, max_jobs_merge=1) is complete
INFO: Informed scheduler that task ProbsToCostsSlurm___config_mc_EdgeFeaturesWork_features_682c0950ab has status PENDING
DEBUG: Checking if MergeEdgeFeaturesSlurm(tmp_folder=./tmp_mc_A, max_jobs=1, config_dir=./config_mc, graph_path=/oasis/scratch/comet/mmadany/temp_project/LHB_FullAuto/multiluigi_temp.n5, graph_key=s0/graph, output_path=/oasis/scratch/comet/mmadany/temp_project/LHB_FullAuto/multiluigi_temp.n5, output_key=features, dependency=BlockEdgeFeaturesSlurm) is complete
INFO: Informed scheduler that task EdgeFeaturesWorkflow___config_mc_GraphWorkflow_s0_graph_f1bc78dfbd has status PENDING
DEBUG: Checking if BlockEdgeFeaturesSlurm(tmp_folder=./tmp_mc_A, max_jobs=16, config_dir=./config_mc, input_path=/oasis/scratch/comet/mmadany/temp_project/LHB_FullAuto/fxmemf.n5, input_key=dataset1, labels_path=/oasis/scratch/comet/mmadany/temp_project/LHB_FullAuto/sigv.n5, labels_key=dataset1, graph_path=/oasis/scratch/comet/mmadany/temp_project/LHB_FullAuto/multiluigi_temp.n5, output_path=/oasis/scratch/comet/mmadany/temp_project/LHB_FullAuto/multiluigi_temp.n5, dependency=GraphWorkflow) is complete
INFO: Informed scheduler that task MergeEdgeFeaturesSlurm___config_mc_BlockEdgeFeature_s0_graph_34ddff7acc has status PENDING
DEBUG: Checking if GraphWorkflow(tmp_folder=./tmp_mc_A, max_jobs=16, config_dir=./config_mc, target=slurm, dependency=DummyTask, input_path=/oasis/scratch/comet/mmadany/temp_project/LHB_FullAuto/sigv.n5, input_key=dataset1, graph_path=/oasis/scratch/comet/mmadany/temp_project/LHB_FullAuto/multiluigi_temp.n5, output_key=s0/graph, n_scales=1) is complete
INFO: Informed scheduler that task BlockEdgeFeaturesSlurm___config_mc_GraphWorkflow__oasis_scratch_c_8bd529565b has status PENDING
DEBUG: Checking if MapEdgeIdsSlurm(tmp_folder=./tmp_mc_A, max_jobs=16, config_dir=./config_mc, graph_path=/oasis/scratch/comet/mmadany/temp_project/LHB_FullAuto/multiluigi_temp.n5, input_key=s0/graph, scale=0, dependency=MergeSubGraphsSlurm) is complete
INFO: Informed scheduler that task GraphWorkflow___config_mc_DummyTask__oasis_scratch_c_cb70462974 has status PENDING
DEBUG: Checking if MergeSubGraphsSlurm(tmp_folder=./tmp_mc_A, max_jobs=16, config_dir=./config_mc, graph_path=/oasis/scratch/comet/mmadany/temp_project/LHB_FullAuto/multiluigi_temp.n5, scale=0, output_key=s0/graph, merge_complete_graph=True, dependency=InitialSubGraphsSlurm) is complete
INFO: Informed scheduler that task MapEdgeIdsSlurm___config_mc_MergeSubGraphsSl__oasis_scratch_c_6c607199dc has status PENDING
DEBUG: Checking if InitialSubGraphsSlurm(tmp_folder=./tmp_mc_A, max_jobs=16, config_dir=./config_mc, input_path=/oasis/scratch/comet/mmadany/temp_project/LHB_FullAuto/sigv.n5, input_key=dataset1, graph_path=/oasis/scratch/comet/mmadany/temp_project/LHB_FullAuto/multiluigi_temp.n5, dependency=DummyTask) is complete
INFO: Informed scheduler that task MergeSubGraphsSlurm___config_mc_InitialSubGraphs__oasis_scratch_c_8ef59ea786 has status PENDING
DEBUG: Checking if DummyTask() is complete
INFO: Informed scheduler that task InitialSubGraphsSlurm___config_mc_DummyTask__oasis_scratch_c_f2de7aaf60 has status PENDING
INFO: Informed scheduler that task DummyTask__99914b932b has status DONE
INFO: Done scheduling tasks
INFO: Running Worker with 1 processes
DEBUG: Asking scheduler for work...
DEBUG: Pending tasks: 16
INFO: [pid 21179] Worker Worker(salt=544906811, workers=1, host=comet-ln2.sdsc.edu, username=mmadany, pid=21179) running InitialSubGraphsSlurm(tmp_folder=./tmp_mc_A, max_jobs=16, config_dir=./config_mc, input_path=/oasis/scratch/comet/mmadany/temp_project/LHB_FullAuto/sigv.n5, input_key=dataset1, graph_path=/oasis/scratch/comet/mmadany/temp_project/LHB_FullAuto/multiluigi_temp.n5, dependency=DummyTask)
sbatch: error: bank_limit plugin: expired user, can't submit job
sbatch: error: Batch job submission failed: Invalid account or account/partition combination specified
ERROR: [pid 21179] Worker Worker(salt=544906811, workers=1, host=comet-ln2.sdsc.edu, username=mmadany, pid=21179) failed InitialSubGraphsSlurm(tmp_folder=./tmp_mc_A, max_jobs=16, config_dir=./config_mc, input_path=/oasis/scratch/comet/mmadany/temp_project/LHB_FullAuto/sigv.n5, input_key=dataset1, graph_path=/oasis/scratch/comet/mmadany/temp_project/LHB_FullAuto/multiluigi_temp.n5, dependency=DummyTask)
Traceback (most recent call last):
File "/home/mmadany/miniconda3/envs/cluster_env/lib/python3.7/site-packages/luigi/worker.py", line 199, in run
new_deps = self._run_get_new_deps()
File "/home/mmadany/miniconda3/envs/cluster_env/lib/python3.7/site-packages/luigi/worker.py", line 139, in _run_get_new_deps
task_gen = self.task.run()
File "/home/mmadany/Multicut/cluster_tools2/cluster_tools-master/cluster_tools/cluster_tasks.py", line 93, in run
raise e
File "/home/mmadany/Multicut/cluster_tools2/cluster_tools-master/cluster_tools/cluster_tasks.py", line 79, in run
self.run_impl()
File "/home/mmadany/Multicut/cluster_tools2/cluster_tools-master/cluster_tools/graph/initial_sub_graphs.py", line 76, in run_impl
self.submit_jobs(n_jobs)
File "/home/mmadany/Multicut/cluster_tools2/cluster_tools-master/cluster_tools/cluster_tasks.py", line 443, in submit_jobs
outp = check_output(command).decode().rstrip()
File "/home/mmadany/miniconda3/envs/cluster_env/lib/python3.7/subprocess.py", line 376, in check_output
**kwargs).stdout
File "/home/mmadany/miniconda3/envs/cluster_env/lib/python3.7/subprocess.py", line 468, in run
output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['sbatch', '-o', './tmp_mc_A/logs/initial_sub_graphs_0.log', '-e', './tmp_mc_A/error_logs/initial_sub_graphs_0.err', '-J', 'initial_sub_graphs_0', './tmp_mc_A/slurm_initial_sub_graphs.sh', '0']' returned non-zero exit status 1.
DEBUG: 1 running tasks, waiting for next task to finish
INFO: Informed scheduler that task InitialSubGraphsSlurm___config_mc_DummyTask__oasis_scratch_c_f2de7aaf60 has status FAILED
DEBUG: Asking scheduler for work...
DEBUG: Done
DEBUG: There are no more tasks to run at this time
DEBUG: There are 16 pending tasks possibly being run by other workers
DEBUG: There are 16 pending tasks unique to this worker
DEBUG: There are 16 pending tasks last scheduled by this worker
INFO: Worker Worker(salt=544906811, workers=1, host=comet-ln2.sdsc.edu, username=mmadany, pid=21179) was stopped. Shutting down Keep-Alive thread
INFO:
===== Luigi Execution Summary =====Scheduled 17 tasks of which:
- 1 complete ones were encountered:
- 1 DummyTask()
- 1 failed:
- 1 InitialSubGraphsSlurm(...)
- 15 were left pending, among these:
- 15 had failed dependencies:
- 1 BlockEdgeFeaturesSlurm(...)
- 1 EdgeCostsWorkflow(...)
- 1 EdgeFeaturesWorkflow(...)
- 1 GraphWorkflow(...)
- 1 MapEdgeIdsSlurm(...)
...This progress looks :( because there were failed tasks
===== Luigi Execution Summary =====
looks like this is where the cluster configuration comes in. I need to change my group id and such. Where do I change that and other sbatch variables?
from cluster_tools.
You can update the slurm config here:
https://github.com/constantinpape/cluster_tools/blob/master/example/cremi/run_mc.py#L69
Just add 'groupname': YOUR_GROUP_NAME
.
Also, for debugging, it might be useful to run the command that fails directly and see the error message:
sbatch -o ./tmp_mc_A/logs/initial_sub_graphs_0.log -e ./tmp_mc_A/error_logs/initial_sub_graphs_0.err -J initial_sub_graphs_0' ./tmp_mc_A/slurm_initial_sub_graphs.sh 0
from cluster_tools.
Ok this is what I'm getting now:
> Traceback (most recent call last):
> File "./tmp_mc_A/initial_sub_graphs.py", line 152, in <module>
> initial_sub_graphs(job_id, path)
> File "./tmp_mc_A/initial_sub_graphs.py", line 144, in initial_sub_graphs
> ignore_label)
> File "./tmp_mc_A/initial_sub_graphs.py", line 117, in _graph_block
> increaseRoi=True)
> RuntimeError: Request has wrong type
>
That came from each of the 16 sbatch jobs. It looks like my data type might be off? I'm using the .n5 files but here's what the .h5 file's data looks like when I get a snippet of data using h5ls -d
Boundary Predictions, where 1 is the background and 0 are the boundaries:
(0,58,2742) 0.890196078431372, 0.866666666666667, 0.815686274509804, 0.717647058823529, 0.725490196078431, 0.592156862745098, 0.392156862745098, 0.192156862745098, 0.0941176470588235, 0.0431372549019608, 0.0235294117647059, 0.0196078431372549, 0.0156862745098039, (0,58,2755) 0.0235294117647059, 0.0392156862745098, 0.0901960784313725, 0.203921568627451, 0.407843137254902, 0.592156862745098, 0.756862745098039, 0.882352941176471, 0.945098039215686, 0.980392156862745, 0.992156862745098, 0.996078431372549, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, (0,58,2779) 1, 0.996078431372549, 0.996078431372549, 0.996078431372549, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1
Watershed file, uint32 values in sequence with no holes:
(0,3293,1530) 23660, 23660, 23660, 23660, 23660, 23715, 23715, 23715, 23715, 23715, 23715, 23715, 23715, 23715, 23715, 23715, 23715, 23715, 23715, 23715, 23715, 23715, 23715, 23715, 23715, 23698, 23698, 23698, 23698, 23698, 23698, 23698, 23698, 23698, 23698, 23698, 23698, 23368,
(0,3293,1568) 23368, 23368, 23368, 23368, 23368, 23368, 23368, 23368, 23368, 23368, 23368, 23368, 23368, 23368, 23368, 23368, 23368, 23368, 23368, 23368, 23368, 23368, 23368, 23368, 23368, 23368, 23368, 23368, 23368, 23368, 23368, 23368, 23368, 23368, 23368, 23368, 23368, 23368,
(0,3293,1606) 23368, 23368, 23368, 23368, 23368, 23368, 23368, 23368, 23368, 23368, 23652, 23652, 23652, 23652, 23652, 23652, 23652, 23652, 23652, 23652, 23652, 23652, 23652, 23124, 23124, 23124, 23124, 23124, 23124, 23124, 23124, 23124, 23124, 23124, 23124, 23124, 23124, 23124,
(0,3293,1644) 23124, 23124, 23124, 23124, 23124, 23124, 23124, 23124, 23124, 23124, 23124, 23124, 23124, 23124, 23124, 23124, 23124, 23124, 23124, 23643, 23643, 23643, 23643, 23643, 23643, 23643, 23643, 23643, 23643, 23643, 23643, 23643, 23643, 23643, 23643, 23643, 23643, 23643,
(0,3293,1682) 23643, 23643, 23643, 23643, 23643, 23643, 23643, 23643, 23643, 23643, 23643, 23643, 23643, 23643, 23643, 23643, 23643, 23643, 23643, 23643, 23643, 23643, 23643, 23643, 23653, 23653, 23653, 23653, 23653, 23653, 23653, 23653, 23653, 23653, 23653, 23653, 23653, 23653,
(0,3293,1720) 23653, 23653, 23653, 23653, 23653, 23653, 23653, 23653, 23653, 23653, 23653, 23653, 23653, 23653, 23653, 23653, 23653, 23653, 23653, 23653, 23653, 23653, 23653, 23653, 23653, 23653, 23653, 23653, 23653, 23653, 23653, 23653, 23653, 23653, 23653, 23653, 23653, 23653
from cluster_tools.
The watershed needs to be stored in uint64.
Sorry for the late reply.
from cluster_tools.
Also, to avoid issues you might get in the feature computation:
boundary maps need to be stored either in uint8 or in float32
from cluster_tools.
Ok I made sure my data is uint8 for boundaries and unit64 for ws but I'm still getting the same error
sbatch -o ./tmp_mc_A/logs/initial_sub_graphs_0.log -e ./tmp_mc_A/error_logs/initial_sub_graphs_0.err -J initialsub_graphs_0 ./tmp_mc_A/slurm_initial_sub_graphs.sh
cat ./tmp_mc_A/logs/initial_sub_graphs_0.log
Mytype: d your type: m
2019-04-24 21:23:27.502097: start processing job 0
2019-04-24 21:23:27.502127: reading config from ./tmp_mc_A/initial_sub_graphs_job_0.config
2019-04-24 21:23:27.515858: start processing block 0
cat ./tmp_mc_A/error_logs/initial_sub_graphs_0.err
Traceback (most recent call last):
File "./tmp_mc_A/initial_sub_graphs.py", line 152, in
initial_sub_graphs(job_id, path)
File "./tmp_mc_A/initial_sub_graphs.py", line 144, in initial_sub_graphs
ignore_label)
File "./tmp_mc_A/initial_sub_graphs.py", line 117, in _graph_block
increaseRoi=True)
RuntimeError: Request has wrong type
from cluster_tools.
Looks like this error message has occurred in you z5 repo
Merged #52, the issue should be fixed.
Originally posted by @constantinpape in constantinpape/z5#50 (comment)
from cluster_tools.
Yes, this error message comes from z5 and indicates that some datatypes do not agree.
Are you sure both boundaries and superpixel are stored correctly?
Can you open them with z5 from python?
import z5py
f = z5py.File('/path/to/data.n5')
ds = f[path/in/file']
print(ds.dtype)
If you do this the dtype should be uint8
(or float32
) for the boundaries and uint64
for the superpixels.
from cluster_tools.
Ok I got the workflow up on running on my data and it also worked end-to-end on the sample data. I'm just getting memory errors on my merge_graphs workers. I'd like to change the partition to the high-memory compute nodes where we have 64 cores and 1.45TB of RAM, I tried this:
global_config.update({'shebang': shebang, 'block_shape': block_shape, 'groupname': 'ddp140', 'partition': 'large-shared', 'mem': '1450G'})
but it seems the workers still get sent to the regular nodes. how can I see/edit exactly where the sbatch jobs are being submitted? I think I can also do '--mem=1G --ntasks==1' and it will split the jobs up along the resources on the node.
from cluster_tools.
Ok I got the workflow up on running on my data and it also worked end-to-end on the sample data.
Glad to hear it!
I tried this:
global_config.update({'shebang': shebang, 'block_shape': block_shape, 'groupname': 'ddp140', 'partition': 'large-shared', 'mem': '1450G'}
The global config does not support arbitrary arguments, but just the ones listed here:
https://github.com/constantinpape/cluster_tools/blob/master/cluster_tools/cluster_tasks.py#L217-L224.
Note that I added the partition
option just now.
Also, you need to specify the memory limit for the individual tasks. By updating the
mem_limit
value in the task_config
. See also https://github.com/constantinpape/cluster_tools/blob/master/example/cremi/run_mc.py#L92.
Hope this helps.
from cluster_tools.
Ok thanks that works great,
The furthest I've been able to get is solve_global but that run for over 48 hours which is my job time limit. I've been messing around with the parrelel block size for my 7k x 5k x 400 xyz test volume (400 x 5k x 7k in the .n5 file format).
right now I'm trying block_shape = [80, 1024, 1024]
But I was wondering if you could recommend a value here? I'm allocating 180Gb of RAM to the parallel workers and I can allocate up to 1.45 TB to the single worker steps, I've done that for 'solve_global', 'solve_subproblems', and 'merge_edge_features' so far because they were giving 'out of memory' and 'segmentation fault' errors.
from cluster_tools.
right now I'm trying
block_shape = [80, 1024, 1024]
That sounds reasonable.
How many nodes are in the graph (i.e. how many super-voxel ids are there?).
Usually the solve_global
step should be quite fast if the problem was reduced by solving the subproblems.
Have you tried running everything on a smaller cutout of the data (say 200 x 1024 x 1024) and checked the results?
One potential issue could be that your boundary maps follow a different convention than what I expect:
I assume boundaries to correspond to high values (i.e. 1
means maximal boundary probability for a pixel).
If your boundary maps have the opposite convention, you can set invert_inputs
to True
for costs_to_probs
, see https://github.com/constantinpape/cluster_tools/blob/master/cluster_tools/costs/probs_to_costs.py#L51.
If you use the correct boundary convention and the cutout results look decent, there are 2 options to speed up the final multicut:
- Choose a different solver. This can be done by setting
agglomerator
togreedy-additive
in the config ofsolve_global
, see https://github.com/constantinpape/cluster_tools/blob/master/cluster_tools/multicut/solve_global.py#L43. - Run with more hierarchy levels by setting
n_scales > 1
, see https://github.com/constantinpape/cluster_tools/blob/master/cluster_tools/workflows.py#L207.
These two options can also be combined. Note that both can reduce the quality of the resulting segmentation a bit, but from my experience the effect should not be very significant.
from cluster_tools.
I have 22 million superpixels
high boundary chance is 255, uint8. What's the difference between 'time_limit' and 'time_limit_solver'?
This is a cross of my current volume (lots of myelin unfortunately):
After I get that to work and understand what the time / parallelization values should be, I'd like to try on volumes that are more around the size of 15k x 15k x 1k, like this:
Does that look like it could handle a greedy solver with higher n_scales?
from cluster_tools.
I have 22 million superpixels
That should be fine; I have solved problems with about 2 orders of magnitude more superpixels with this pipeline.
high boundary chance is 255, uint8
That's good, you don't need to change invert_inputs
then.
What's the difference between 'time_limit' and 'time_limit_solver'?
time_limit
is the maximum time a job will run; it is passed as value for the -t
parameter to slurm.
time_limit_solver
is a time limit that is passed to the actual multicut solver.
I forgot to mention this parameter earlier; setting time_limit_solver
might actually fix your problem.
You should set it to ~ 4 hours less than time_limit
. (time_limit_solver
is soft, which means that
the solver will not abruptly stop after the time has passed, but will only check for it after completing an internal iteration. Depending on the problem size, the iterations can take quite a bit, that's why it's safer to give some leeway compared to time_limit
).
Does that look like it could handle a greedy solver with higher n_scales?
Yes, this looks feasible.
from cluster_tools.
ok, looks like time_limit is in minutes (or slurm/sbatch command format) and time_limit_solver is in seconds, correct?
from cluster_tools.
yes that's correct
from cluster_tools.
Ok, it's working full process and looks great, I'll email you a video of what it looks like. Thanks again!
from cluster_tools.
You\re very welcome and thanks for your patience. I am looking forward to see the results :).
from cluster_tools.
Related Issues (20)
- Sample data doesn't extract
- Axon-Dendrite Attribution code HOT 2
- Update links for data of LMC publication
- Block-wise multicut currently does not support halo
- Use decomposition solver as default for solve_global
- Support for cloud-based datastores? HOT 4
- Create smaller file for tests
- Fix tests for edge features
- Fix issues with connected components
- wrong kwarg passed on to elf HOT 17
- Fix issues with uint8 conversion for float data in CopyVolume
- AssertionError: [30, 256, 256], (16, 128, 128 HOT 2
- Input data for axon_dendrite_attribution workflow
- dimension slicing error HOT 6
- threadpool issue HOT 11
- Pass axis metadata for ome.zarr
- `fit_to_roi` not applied
- Pipeline does not work for non-consecutive superpixels
- Pipeline has problems with non-contiguous numpy arrays HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from cluster_tools.