dawe / schist Goto Github PK
View Code? Open in Web Editor NEWAn interface for Nested Stochastic Block Model for single cell analysis
Home Page: https://schist.readthedocs.io
License: BSD 3-Clause "New" or "Revised" License
An interface for Nested Stochastic Block Model for single cell analysis
Home Page: https://schist.readthedocs.io
License: BSD 3-Clause "New" or "Revised" License
Some functions in tools
can now be deprecated and removed. One is plug_state
as we don’t need it anymore, at least not in current form
Hello Dawe!
I am trying to install Schist v0.8.3 on UGent HPC to provide this sw to researchers. We are using EasyBuild to build/install and provide sw - so I like to build from source.
For now I am fighting with the dependency graph-tools
. I try to install its latest version v2.68. The installation works fine, but check commands failing:
from graph_tool.all import graph_draw
from graph_tool.all import Graph, BlockState
import graph_tool.inference
All of these return same error:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/scratch/gent/vo/001/gvo00117/easybuild/RHEL8/cascadelake-ampere-ib/software/graph-tool/2.68-foss-2023a/lib/python3.11/site-packages/graph_tool/all.py", line 34, in <module>
from graph_tool.draw import *
File "/scratch/gent/vo/001/gvo00117/easybuild/RHEL8/cascadelake-ampere-ib/software/graph-tool/2.68-foss-2023a/lib/python3.11/site-packages/graph_tool/draw/__init__.py", line 87, in <module>
from .. inference import minimize_blockmodel_dl, BlockState, ModularityState
File "/scratch/gent/vo/001/gvo00117/easybuild/RHEL8/cascadelake-ampere-ib/software/graph-tool/2.68-foss-2023a/lib/python3.11/site-packages/graph_tool/inference/__init__.py", line 331, in <module>
from . blockmodel import *
File "/scratch/gent/vo/001/gvo00117/easybuild/RHEL8/cascadelake-ampere-ib/software/graph-tool/2.68-foss-2023a/lib/python3.11/site-packages/graph_tool/inference/blockmodel.py", line 119, in <module>
@entropy_state_signature
^^^^^^^^^^^^^^^^^^^^^^^
File "/scratch/gent/vo/001/gvo00117/easybuild/RHEL8/cascadelake-ampere-ib/software/graph-tool/2.68-foss-2023a/lib/python3.11/site-packages/graph_tool/inference/base_states.py", line 110, in entropy_state_signature
warn = "\n".join([" " * (m if j == 0 else m + 4) + l.lstrip() for
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/scratch/gent/vo/001/gvo00117/easybuild/RHEL8/cascadelake-ampere-ib/software/graph-tool/2.68-foss-2023a/lib/python3.11/site-packages/graph_tool/inference/base_states.py", line 110, in <listcomp>
warn = "\n".join([" " * (m if j == 0 else m + 4) + l.lstrip() for
~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~
TypeError: can't multiply sequence by non-int of type 'generator'
What version of graph-tools
is recommended to use with Schist v0.8.3 ?
Some details about sw I use:
GCC v12.3.0 + OpenMPI v4.1.5 + FlexiBLAS v3.3.1 + FFTW v3.3.10 + ScaLAPACK v2.2.0
python v3.11.3
Boost v1.83.0
Numpy v1.25.1
Scipy v1.11.1
Pandas v2.0.3
joblib v1.2.0
While schist
allows the analysis of multimodal data, it does so by passing a list of multiple AnnData
objects. It would be nice to have support for MuData
The documentation is largely outdated, it should be rewritten (possibly from scratch)
Since inference is performed multiple times (by the n_init
parameter), it would be useful to add a n_jobs
option to split independent initialisations when multiple processors are available.
Since all annotation levels are stored, we can remove the need to keep gt.NestedBlockState
in adata.uns['schist']
. Once annotations are present, a new state can be easily reconstructed, given that
nsbm_level
entries in adata.obs
This can be done if we make sure that all parameters are stored in the appropriate dictionary.
If this is implemented, there is no need to use schist.io
functions to read/write as they are there only to dump the state into a separate pickle
The current version of graph-tool (2.33) no longer needs any patching, so maybe you should just tell users to upgrade.
I was wondering if plotting functions should contain a parameter to set font size of group labels.
In particular especially with schist.plotting.alluvial() group labels of lower levels often overlap.
Hi!!!
If I try to use nested_model() with the collect_marginals and equilibrate options I have an error:
UnboundLocalError Traceback (most recent call last)
in
----> 1 nested_model(data, collect_marginals=True, equilibrate=True)
~/opt/anaconda3/envs/scanpy/lib/python3.7/site-packages/schist/inference/_nested_model.py in nested_model(adata, max_iterations, epsilon, equilibrate, wait, nbreaks, collect_marginals, niter_collect, hierarchy_length, deg_corr, multiflip, fast_model, n_init, beta_range, steps_anneal, resume, restrict_to, random_seed, key_added, adjacency, neighbors_key, directed, use_weights, prune, return_low, copy, minimize_args, equilibrate_args)
285 mcmc_equilibrate_args=equilibrate_args,
286 niter=steps_anneal,
--> 287 beta_range=beta_range)
288 if collect_marginals and equilibrate:
289 # we here only retain level_0 counts, until I can't figure out
~/opt/anaconda3/envs/scanpy/lib/python3.7/site-packages/graph_tool/inference/mcmc.py in mcmc_anneal(state, beta_range, niter, history, mcmc_equilibrate_args, verbose)
264 else:
265 S = ret[0]
--> 266 attempts += ret[1]
267 nmoves += ret[2]
268
UnboundLocalError: local variable 'attempts' referenced before assignment
I have no problem if i just use nested_model(data).
Thanks!
When scs.tl.label_transfer
is issued with use_best=True
, unknown label is present in transferred annotations, although it won't be used. It could be better to remove it.
Support to save information about Planted Partition Blocks is limited. If one has a nsbm and a ppbm objects, only the nsbm is pickled.
To install graph-tool via conda the actual commands are:
conda create --name gt -c conda-forge graph-tool
conda activate gt
The command currently given in the instruction will not work.
It might be good to inform users that they can install (without compilation) using homebrew in MacOS, and also in Ubuntu/Debian. The installation instructions are here: https://git.skewed.de/count0/graph-tool/-/wikis/installation-instructions
It may be worthwhile also adding an alternative strategy based on a simple greedy MCMC:
state = NestedBlockState(g)
delta = 1
while abs(delta) > 1e-6:
delta = state.multiflip_mcmc_sweep(niter=10, beta=numpy.inf)[0]
This could be faster in some cases, while still providing good results.
After small code refactoring in 0.7.12 docs are broken
Hi, I'm trying schist with a Seurat converted object, but I encountered the error below. Any suggestions on how to proceed?
adata
AnnData object with n_obs × n_vars = 4015 × 15309
obs: 'orig.ident', 'nCount_RNA', 'nFeature_RNA', 'RNA_Condition', 'percent.mt', 'S.Score', 'G2M.Score', 'Phase', 'old.ident', 'CC.Difference', 'RNA_snn_h.orig.ident_res.0.6', 'seurat_clusters', 'RNA_snn_f.orig.ident_res.0.6', 'RNA_snn_m.orig.ident_res.0.6', 'RNA_snn_h.orig.ident_res.1.2', 'RNA_snn_f.orig.ident_res.1.2', 'RNA_snn_m.orig.ident_res.1.2', 'RNA_snn_h.orig.ident_res.1.8', 'RNA_snn_f.orig.ident_res.1.8', 'RNA_snn_m.orig.ident_res.1.8', 'SingleR_BlueprintEncodeData_labels', 'SingleRrefined_BlueprintEncodeData_labels', 'SingleR_HumanPrimaryCellAtlasData_labels', 'SingleRrefined_HumanPrimaryCellAtlasData_labels', 'SingleR_MonacoImmuneData_labels', 'SingleRrefined_MonacoImmuneData_labels', 'SingleR_DatabaseImmuneCellExpressionData_labels', 'SingleRrefined_DatabaseImmuneCellExpressionData_labels', 'SingleR_NovershternHematopoieticData_labels', 'SingleRrefined_NovershternHematopoieticData_labels'
var: 'vst.mean', 'vst.variance', 'vst.variance.expected', 'vst.variance.standardized', 'vst.variable'
uns: 'neighbors'
obsm: 'X_fastmnn.orig.ident', 'X_harmony.orig.ident', 'X_pca', 'X_tsne.fastmnn.orig.ident', 'X_tsne.harmony.orig.ident', 'X_tsne.merge.orig.ident', 'X_umap.fastmnn.orig.ident', 'X_umap.harmony.orig.ident', 'X_umap.merge.orig.ident'
varm: 'FASTMNN.ORIG.IDENT', 'HARMONY.ORIG.IDENT', 'PCs'
obsp: 'distances'
scs.inference.nested_model(adata)
/DATA_NFS/anaconda3/envs/postsc/lib/python3.8/site-packages/schist/inference/_nested_model.py:145: FutureWarning: This location for 'connectivities' is deprecated. It has been moved to .obsp[connectivities], and will not be accesible here in a future version of anndata.
adjacency = adata.uns[neighbors_key]['connectivities']
KeyError Traceback (most recent call last)
Cell In[7], line 1
----> 1 scs.inference.nested_model(a)
File /DATA_NFS/anaconda3/envs/postsc/lib/python3.8/site-packages/schist/inference/_nested_model.py:145, in nested_model(adata, deg_corr, tolerance, n_sweep, beta, samples, collect_marginals, n_jobs, restrict_to, random_seed, key_added, adjacency, neighbors_key, directed, use_weights, save_model, copy, dispatch_backend)
142 adjacency = adata.obsp[conn_key]
143 else:
144 # scanpy<=1.4.6 has sparse matrix here
--> 145 adjacency = adata.uns[neighbors_key]['connectivities']
146 if restrict_to is not None:
147 restrict_key, restrict_categories = restrict_to
File /DATA_NFS/anaconda3/envs/postsc/lib/python3.8/site-packages/anndata/compat/_overloaded_dict.py:98, in OverloadedDict.getitem(self, key)
96 def getitem(self, key):
97 if key in self.overloaded:
---> 98 return self.overloaded[key].get()
99 else:
100 return self.data[key]
File /DATA_NFS/anaconda3/envs/postsc/lib/python3.8/site-packages/anndata/compat/_overloaded_dict.py:160, in _adjacency_getter(ovld, key, adata)
154 """For overloading:
155
156 >>> mtx = adata.uns["neighbors"]["connectivities"] # doctest: +SKIP
157 >>> mtx = adata.uns["neighbors"]["distances"] # doctest: +SKIP
158 """
159 _access_warn(key, f".obsp[{key}]")
--> 160 return adata.obsp[key]
File /DATA_NFS/anaconda3/envs/postsc/lib/python3.8/site-packages/anndata/_core/aligned_mapping.py:148, in AlignedActualMixin.getitem(self, key)
147 def getitem(self, key: str) -> V:
--> 148 return self._data[key]
KeyError: 'connectivities'
If I try to specify which level I want to select with scnsbm.tl.select_affinity(), a RunutimeError is raised:
`schist.tl.select_affinity(adata,level=2)
ERROR: Level 2 was not found in your data
RuntimeError Traceback (most recent call last)
in
----> 1 schist.tl.select_affinity(adata,level=2)
~/anaconda3/envs/SCRNA/lib/python3.8/site-packages/schist/tools/_select.py in select_affinity(adata, level, threshold, inverse, key, update_state, filter, copy)
54 if level not in adata.uns[key]['cell_affinity']:
55 logg.error(f'Level {level} was not found in your data')
---> 56 raise
57
58 affinities = adata.uns[key]['cell_affinity'][level]
RuntimeError: No active exception to reraise
`
Apparently label transfer, actually the step when affinities are calculated, can crash due to memory error when too many cells are there, and "many" is not even high (>25k).
This probably is due to the fact schist tries to address a numpy array which is too big to be managed. Going with sparse implementations could be the way, but I'm afraid the affinity matrix is not really sparse, unless we set a threshold under which every thing is actually 0
Hello!
I have this error when I apply scs.tl.label_transfer function (and I made sure the annotation is categorical):
AttributeError Traceback (most recent call last)
in
----> 1 scs.tl.label_transfer(rna_p, rna_u, obs='leiden_rna_u')
/beegfs/scratch/ric.cosr/giansanti.valentina/.conda/envs/dnn_cnv2/lib/python3.9/site-packages/schist/tools/_affinity_tools.py in label_transfer(adata, adata_ref, obs, label_unk, use_best, neighbors_key, adjacency, directed, use_weights, pca_args, use_rep, harmony_args, copy)
461 batch_key='_label_transfer')
462 #
--> 463 adata_merge.obs[obs] = adata_merge.obs[obs].cat.add_categories(label_unk).fillna(label_unk)
464
465 # perform integration using harmony
/beegfs/scratch/ric.cosr/giansanti.valentina/.conda/envs/dnn_cnv2/lib/python3.9/site-packages/pandas/core/generic.py in getattr(self, name)
5459 or name in self._accessors
5460 ):
-> 5461 return object.getattribute(self, name)
5462 else:
5463 if self._info_axis._can_hold_identifiers_and_holds_name(name):
/beegfs/scratch/ric.cosr/giansanti.valentina/.conda/envs/dnn_cnv2/lib/python3.9/site-packages/pandas/core/accessor.py in get(self, obj, cls)
178 # we're accessing the attribute of the class, i.e., Dataset.geo
179 return self._accessor
--> 180 accessor_obj = self._accessor(obj)
181 # Replace the property with the accessor object. Inspired by:
182 # https://www.pydanny.com/cached-property.html
/beegfs/scratch/ric.cosr/giansanti.valentina/.conda/envs/dnn_cnv2/lib/python3.9/site-packages/pandas/core/arrays/categorical.py in init(self, data)
2455
2456 def init(self, data):
-> 2457 self._validate(data)
2458 self._parent = data.values
2459 self._index = data.index
/beegfs/scratch/ric.cosr/giansanti.valentina/.conda/envs/dnn_cnv2/lib/python3.9/site-packages/pandas/core/arrays/categorical.py in _validate(data)
2464 def _validate(data):
2465 if not is_categorical_dtype(data.dtype):
-> 2466 raise AttributeError("Can only use .cat accessor with a 'category' dtype")
2467
2468 def _delegate_property_get(self, name):
AttributeError: Can only use .cat accessor with a 'category' dtype
There are some issues with draw_tree
, that is
matplotlib
MatplotlibDeprecationWarning:
The modification of the Axes.artists property was deprecated in Matplotlib 3.5 and will be removed two minor releases later. Use Axes.add_artist instead.
self.insert(len(self), value)
I am following single-cell best practice tutorial, when I ran this code:
import schist as scs
scs.inference.nested_model(adata, samples = 100, random_seed = 5678)
It raised this error.
I am using schist version 0.7.16+2.gb762b76
Thank you so much for assisting me!
schist runs successfully, including the scs.inference.nested_model(adata)
function. However, as a result adata.uns['schist']['state']
is not assigned, and scs.plotting.draw_tree(adata)
give the error KeyError: 'state'
.
Going back to the code, I can see that nested_model lists adata.uns['schist']['state']
as an output, however, this is never assigned later in the file.
I also saw a previous issue "Remove state from unstructured data and IO functions #12" where 'state' is removed from adata.uns['schist']. A relevant bug is fixed in version 0.7.11 that I am using right now. The problem is that pl.draw_tree() still requires the 'state' (pls see in line 83).
I am not sure how to proceed or if I am totally missing the point. Could the producers confirm that the package is running well for them?
I have recently updated graph-tool
from v2.55 to v2.57, I’ve noticed weird results when using scs.inference.nested_model
. I’m tracking down the issue, but until then the last version working with schist
is v2.55. I am going to release a patch for the conda installation ASAP.
I observed a general degradation of performances and longer runtimes after upgrading graph-tool to version 2.40
Latest graph-tool version introduced some optimization (possibly OMP based) which I believe collide with the joblib parallelization we use to run multiple models at the same time.
A possible workaround could be downgrade to gt version 2.37 or set n_jobs=1
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.