theislab / cellrank Goto Github PK

CellRank: dynamics from multi-view single-cell data

License: BSD 3-Clause "New" or "Revised" License

Python 100.00%

bioinformatics cell-fate-determination cell-fate-transitions data-science fuzzy-clustering-analyses genetics machine-learning manifold-learning markov-chains rna-velocity single-cell-genomics single-cell-rna-seq trajectory-generation

cellrank's People

Contributors

Stargazers

Watchers

cellrank's Issues

Rename the MC class

New name: CFLARE - Clustering and Filtering of Left and Right Eigenvectors.

At the moment, the x-axis seems to always be labelled with pseudotime, see line 15 & 16 in our basic tutorial, even though latent_time is specified. Can we change this, to make it more obvious what we really show on the x-axis?

Bug when using keys to compute lineages

When I call mc_fwd.compute_lin_probs(keys=['Ductal', 'Alpha, Beta']), I get the traceback

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-33-4d1908c3199d> in <module>
----> 1 mc_fwd.compute_lin_probs(keys=['Ductal', 'Alpha, Beta'])
      2 mc_fwd.plot_lin_probs()

~/Projects/cellrank/cellrank/tools/_markov_chain.py in compute_lin_probs(self, keys, check_irred, norm_by_frequ)
    814         else:
    815             logg.debug(f"DEBUG: Combining recurrent classes according to `{keys}`")
--> 816             approx_rcs_ = self._prep_rc_classes(keys)
    817         keys = list(approx_rcs_.cat.categories)
    818 

~/Projects/cellrank/cellrank/tools/_markov_chain.py in _prep_rc_classes(self, keys)
   1259 
   1260         lin_colors = list(np.array(lin_colors)[cat_mask])
-> 1261         approx_rcs_temp.cat.reorder_categories(remaining_cat, inplace=True)
   1262         self._lin_probs = Lineage(
   1263             np.empty((1, len(lin_colors))),

~/miniconda3/envs/cellrank/lib/python3.6/site-packages/pandas/core/accessor.py in f(self, *args, **kwargs)
     91         def _create_delegator_method(name):
     92             def f(self, *args, **kwargs):
---> 93                 return self._delegate_method(name, *args, **kwargs)
     94 
     95             f.__name__ = name

~/miniconda3/envs/cellrank/lib/python3.6/site-packages/pandas/core/arrays/categorical.py in _delegate_method(self, name, *args, **kwargs)
   2623 
   2624         method = getattr(self._parent, name)
-> 2625         res = method(*args, **kwargs)
   2626         if res is not None:
   2627             return Series(res, index=self._index, name=self._name)

~/miniconda3/envs/cellrank/lib/python3.6/site-packages/pandas/core/arrays/categorical.py in reorder_categories(self, new_categories, ordered, inplace)
   1032         if set(self.dtype.categories) != set(new_categories):
   1033             raise ValueError(
-> 1034                 "items in new_categories are not the same as in " "old categories"
   1035             )
   1036         return self.set_categories(new_categories, ordered=ordered, inplace=inplace)

ValueError: items in new_categories are not the same as in old categories

> /Users/marius/miniconda3/envs/cellrank/lib/python3.6/site-packages/pandas/core/arrays/categorical.py(1034)reorder_categories()
   1032         if set(self.dtype.categories) != set(new_categories):
   1033             raise ValueError(
-> 1034                 "items in new_categories are not the same as in " "old categories"
   1035             )
   1036         return self.set_categories(new_categories, ordered=ordered, inplace=inplace)

If you fix this, please make sure that lineages still get assigned to the right color by simply running our tutorial.

rpy2 error

I finally got my rpy2 to run and I'm now really keen to get your gene_trends to run using R's mgcv. However, when I run

model = cr.ul.models.GamMGCVModel(adata, n_splines=7, sp=100)
cr.pl.gene_trends(adata, model=model, data_key='Ms',
                  genes=['Gng12', 'Pak3'],
                  time_key='latent_time',
                  weight_threshold=0.05, weight_scale=1, filter_data=True, dpi=25)

I get the error

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-17-759d7e9f342a> in <module>
      8                   genes=['Gng12', 'Pak3'],
      9                   time_key='latent_time',
---> 10                   weight_threshold=0.05, weight_scale=1, filter_data=True, dpi=25)

~/Projects/cellrank/cellrank/plotting/_gene_trend.py in gene_trends(adata, model, genes, lineages, data_key, final, start_lineage, end_lineage, conf_int, same_plot, hide_cells, perc, lineage_cmap, abs_prob_cmap, cell_color, color, cell_alpha, lineage_alpha, size, lw, show_cbar, margins, sharey, figsize, dpi, ncols, n_jobs, backend, ext, suptitle, save, dirname, plot_kwargs, show_progres_bar, **kwargs)
    271         extractor=lambda modelss: {k: v for m in modelss for k, v in m.items()},
    272         show_progress_bar=show_progres_bar,
--> 273     )(lineages, start_lineage, end_lineage, **kwargs)
    274     logg.info("    Finish", time=start)
    275 

~/Projects/cellrank/cellrank/utils/_parallelize.py in wrapper(*args, **kwargs)
     98                 *((i, cs) if use_ixs else (cs,)), *args, **kwargs, queue=queue
     99             )
--> 100             for i, cs in enumerate(collections)
    101         )
    102 

~/miniconda3/envs/cellrank/lib/python3.6/site-packages/joblib/parallel.py in __call__(self, iterable)
   1002             # remaining jobs.
   1003             self._iterating = False
-> 1004             if self.dispatch_one_batch(iterator):
   1005                 self._iterating = self._original_iterator is not None
   1006 

~/miniconda3/envs/cellrank/lib/python3.6/site-packages/joblib/parallel.py in dispatch_one_batch(self, iterator)
    833                 return False
    834             else:
--> 835                 self._dispatch(tasks)
    836                 return True
    837 

~/miniconda3/envs/cellrank/lib/python3.6/site-packages/joblib/parallel.py in _dispatch(self, batch)
    752         with self._lock:
    753             job_idx = len(self._jobs)
--> 754             job = self._backend.apply_async(batch, callback=cb)
    755             # A job can complete so quickly than its callback is
    756             # called before we get here, causing self._jobs to

~/miniconda3/envs/cellrank/lib/python3.6/site-packages/joblib/_parallel_backends.py in apply_async(self, func, callback)
    207     def apply_async(self, func, callback=None):
    208         """Schedule a func to be run"""
--> 209         result = ImmediateResult(func)
    210         if callback:
    211             callback(result)

~/miniconda3/envs/cellrank/lib/python3.6/site-packages/joblib/_parallel_backends.py in __init__(self, batch)
    588         # Don't delay the application, to avoid keeping the input
    589         # arguments in memory
--> 590         self.results = batch()
    591 
    592     def get(self):

~/miniconda3/envs/cellrank/lib/python3.6/site-packages/joblib/parallel.py in __call__(self)
    254         with parallel_backend(self._backend, n_jobs=self._n_jobs):
    255             return [func(*args, **kwargs)
--> 256                     for func, args, kwargs in self.items]
    257 
    258     def __len__(self):

~/miniconda3/envs/cellrank/lib/python3.6/site-packages/joblib/parallel.py in <listcomp>(.0)
    254         with parallel_backend(self._backend, n_jobs=self._n_jobs):
    255             return [func(*args, **kwargs)
--> 256                     for func, args, kwargs in self.items]
    257 
    258     def __len__(self):

~/Projects/cellrank/cellrank/plotting/_utils.py in _fit(genes, lineage_names, start_lineages, end_lineages, queue, **kwargs)
    412             model = (
    413                 models[gene][ln]
--> 414                 .prepare(gene, ln, start_lineage=sc, end_lineage=ec, **kwargs)
    415                 .fit()
    416             )

~/Projects/cellrank/cellrank/utils/models/_models.py in fit(self, x, y, w, **kwargs)
    803         pandas2ri.activate()
    804 
--> 805         df = pandas2ri.py2rpy(
    806             pd.DataFrame(np.c_[self.x, self.y][use_ixs, :], columns=["x", "y"])
    807         )

AttributeError: module 'rpy2.robjects.pandas2ri' has no attribute 'py2rpy'

Could this be a problem with my R or rpy2 version?

I'm using r-base version 3.5.1 and rpy2 version 2.9.4.

Docs badge

Can we add a badge for the docs?

Unify plotting

Capitalization of titles, labels, style (rcParams?)

Lineage object mixing

Kernel class condition number check

I realised that our matrices can sometimes be ill-conditioned, see https://blogs.mathworks.com/cleve/2017/07/17/what-is-the-condition-number-of-a-matrix/ for an intro to condition numbers.

I our kernel class, I would add an option compupte_condn in compute_transition_matix, that uses np.linalg.cond to compute the condition number and save it somewhere in the class. In logging level info, I would print it out and if it's greater than 1e-15, I would issue a warning. There is currently no easy way to do this for sparse matrices, so we need to use T.A for this. That's why I would make it optional and set compute_condn to False by default.

`clustermap` raises error

When I call cr.pl.cluster_fates(adata_t, mode='clustermap', cluster_key='clusters_fine', clusters=clusters)

I get

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-138-8017ed65d544> in <module>
      1 clusters = ['Fev+ Beta', 'Fev+ Alpha', 'Fev+ Pyy','Fev+ Delta', 'Fev+ Epsilon']
----> 2 cr.pl.cluster_fates(adata_t, mode='clustermap', cluster_key='clusters_fine', clusters=clusters)

~/Projects/cellrank/cellrank/plotting/_cluster_fates.py in cluster_fates(adata, cluster_key, clusters, lineages, mode, final, basis, show_cbar, ncols, sharey, save, legend_kwargs, figsize, dpi, **kwargs)
    501         fig = plot_violin_no_cluster_key() if cluster_key is None else plot_violin()
    502     elif mode == "heatmap":
--> 503         fig = plot_heatmap()
    504     else:
    505         raise ValueError(

~/Projects/cellrank/cellrank/plotting/_cluster_fates.py in plot_heatmap()
    374                 ),
    375                 figsize=figsize,
--> 376                 **kwargs,
    377             )
    378             g.ax_heatmap.set_xlabel(cluster_key)

~/miniconda3/envs/cellrank/lib/python3.6/site-packages/seaborn/matrix.py in clustermap(data, pivot_kws, method, metric, z_score, standard_scale, figsize, cbar_kws, row_cluster, col_cluster, row_linkage, col_linkage, row_colors, col_colors, mask, **kwargs)
   1299                         row_cluster=row_cluster, col_cluster=col_cluster,
   1300                         row_linkage=row_linkage, col_linkage=col_linkage,
-> 1301                         **kwargs)

~/miniconda3/envs/cellrank/lib/python3.6/site-packages/seaborn/matrix.py in plot(self, metric, method, colorbar_kws, row_cluster, col_cluster, row_linkage, col_linkage, **kws)
   1136             yind = np.arange(self.data2d.shape[0])
   1137 
-> 1138         self.plot_colors(xind, yind, **kws)
   1139         self.plot_matrix(colorbar_kws, xind, yind, **kws)
   1140         return self

~/miniconda3/envs/cellrank/lib/python3.6/site-packages/seaborn/matrix.py in plot_colors(self, xind, yind, **kws)
   1065 
   1066             heatmap(matrix, cmap=cmap, cbar=False, ax=self.ax_row_colors,
-> 1067                     xticklabels=row_color_labels, yticklabels=False, **kws)
   1068 
   1069             # Adjust rotation of labels

~/miniconda3/envs/cellrank/lib/python3.6/site-packages/seaborn/matrix.py in heatmap(data, vmin, vmax, cmap, center, robust, annot, fmt, annot_kws, linewidths, linecolor, cbar, cbar_kws, cbar_ax, square, xticklabels, yticklabels, mask, ax, **kwargs)
    526     if square:
    527         ax.set_aspect("equal")
--> 528     plotter.plot(ax, cbar_ax, kwargs)
    529     return ax
    530 

~/miniconda3/envs/cellrank/lib/python3.6/site-packages/seaborn/matrix.py in plot(self, ax, cax, kws)
    282         # Draw the heatmap
    283         mesh = ax.pcolormesh(self.plot_data, vmin=self.vmin, vmax=self.vmax,
--> 284                              cmap=self.cmap, **kws)
    285 
    286         # Set the axis limits

~/miniconda3/envs/cellrank/lib/python3.6/site-packages/matplotlib/__init__.py in inner(ax, data, *args, **kwargs)
   1563     def inner(ax, *args, data=None, **kwargs):
   1564         if data is None:
-> 1565             return func(ax, *map(sanitize_sequence, args), **kwargs)
   1566 
   1567         bound = new_sig.bind(ax, *args, **kwargs)

~/miniconda3/envs/cellrank/lib/python3.6/site-packages/matplotlib/axes/_axes.py in pcolormesh(self, alpha, norm, cmap, vmin, vmax, shading, antialiased, *args, **kwargs)
   6105         collection = mcoll.QuadMesh(Nx - 1, Ny - 1, coords,
   6106                                     antialiased=antialiased, shading=shading,
-> 6107                                     **kwargs)
   6108         collection.set_alpha(alpha)
   6109         collection.set_array(C)

~/miniconda3/envs/cellrank/lib/python3.6/site-packages/matplotlib/collections.py in __init__(self, meshWidth, meshHeight, coordinates, antialiased, shading, **kwargs)
   1940     def __init__(self, meshWidth, meshHeight, coordinates,
   1941                  antialiased=True, shading='flat', **kwargs):
-> 1942         Collection.__init__(self, **kwargs)
   1943         self._meshWidth = meshWidth
   1944         self._meshHeight = meshHeight

~/miniconda3/envs/cellrank/lib/python3.6/site-packages/matplotlib/collections.py in __init__(self, edgecolors, facecolors, linewidths, linestyles, capstyle, joinstyle, antialiaseds, offsets, transOffset, norm, cmap, pickradius, hatch, urls, offset_position, zorder, **kwargs)
    162 
    163         self._path_effects = None
--> 164         self.update(kwargs)
    165         self._paths = None
    166 

~/miniconda3/envs/cellrank/lib/python3.6/site-packages/matplotlib/artist.py in update(self, props)
   1004 
   1005         with cbook._setattr_cm(self, eventson=False):
-> 1006             ret = [_update_property(self, k, v) for k, v in props.items()]
   1007 
   1008         if len(ret):

~/miniconda3/envs/cellrank/lib/python3.6/site-packages/matplotlib/artist.py in <listcomp>(.0)
   1004 
   1005         with cbook._setattr_cm(self, eventson=False):
-> 1006             ret = [_update_property(self, k, v) for k, v in props.items()]
   1007 
   1008         if len(ret):

~/miniconda3/envs/cellrank/lib/python3.6/site-packages/matplotlib/artist.py in _update_property(self, k, v)
   1000                 if not callable(func):
   1001                     raise AttributeError('{!r} object has no property {!r}'
-> 1002                                          .format(type(self).__name__, k))
   1003                 return func(v)
   1004 

AttributeError: 'QuadMesh' object has no property 'dendrogram_ratio'

> /Users/marius/miniconda3/envs/cellrank/lib/python3.6/site-packages/matplotlib/artist.py(1002)_update_property()
   1000                 if not callable(func):
   1001                     raise AttributeError('{!r} object has no property {!r}'
-> 1002                                          .format(type(self).__name__, k))
   1003                 return func(v)
   1004

The same code with mode heatmap worked though.

PyPI and Anaconda

Start slowly preparing to do this.

enable `.copy` for mc objects

would be handy to be able to do this, in case I want to work with several mc objects.

Update cellrank docs (api.rst) to reflect changes in the API (root_final deprecated)

Bug in `cellrank.pl.lineages`

When I call cr.pl.lineages(adata, cluster_key='clusters'), I get the traceback

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-24-941df5202743> in <module>
      1 cr.tl.lineages(adata)
----> 2 cr.pl.lineages(adata, cluster_key='clusters')

~/Projects/cellrank/cellrank/plotting/_utils.py in lineages(adata, lineages, final, cluster_key, mode, time_key, color_map, **kwargs)
     91         time_key=time_key,
     92         color_map=color_map,
---> 93         **kwargs,
     94     )
     95 

~/Projects/cellrank/cellrank/tools/_markov_chain.py in plot_lin_probs(self, cluster_key, mode, time_key, color_map, **kwargs)
    899         if mode == "embedding":
    900             scv.pl.scatter(
--> 901                 self._adata, color=color, title=titles, color_map=color_map, **kwargs
    902             )
    903         elif mode == "time":

~/Projects/scvelo/scvelo/plotting/scatter.py in scatter(adata, x, y, basis, vkey, color, use_raw, layer, color_map, colorbar, palette, size, alpha, linewidth, linecolor, perc, groups, sort_order, components, projection, legend_loc, legend_fontsize, legend_fontweight, xlabel, ylabel, title, fontsize, figsize, xlim, ylim, show_density, show_assignments, show_linear_fit, show_polyfit, rug, add_outline, add_text, add_text_pos, outline_width, outline_color, n_convolve, smooth, rescale_color, dpi, frameon, zorder, ncols, wspace, hspace, show, save, ax, **kwargs)
    100                 for key in mkeys:  # multi_kwargs[key] = key[i] if is multikey (list with more than 1 element) else key
    101                     multi_kwargs[key] = eval('{0}[i * (len({0}) > i)] if is_list({0}) else {0}'.format(key))
--> 102                 ax.append(scatter(adata, ax=pl.subplot(gs), **multi_kwargs, **scatter_kwargs, **kwargs))
    103 
    104         savefig_or_show(dpi=dpi, save=save, show=show)

~/Projects/scvelo/scvelo/plotting/scatter.py in scatter(adata, x, y, basis, vkey, color, use_raw, layer, color_map, colorbar, palette, size, alpha, linewidth, linecolor, perc, groups, sort_order, components, projection, legend_loc, legend_fontsize, legend_fontweight, xlabel, ylabel, title, fontsize, figsize, xlim, ylim, show_density, show_assignments, show_linear_fit, show_polyfit, rug, add_outline, add_text, add_text_pos, outline_width, outline_color, n_convolve, smooth, rescale_color, dpi, frameon, zorder, ncols, wspace, hspace, show, save, ax, **kwargs)
    312                         logg.warn('Invalid color key. Using grey instead.')
    313 
--> 314             smp = ax.scatter(x, y, c=c, alpha=alpha, marker='.', zorder=zorder, **kwargs)
    315 
    316             if add_outline:

~/miniconda3/envs/cellrank/lib/python3.6/site-packages/matplotlib/__init__.py in inner(ax, data, *args, **kwargs)
   1597     def inner(ax, *args, data=None, **kwargs):
   1598         if data is None:
-> 1599             return func(ax, *map(sanitize_sequence, args), **kwargs)
   1600 
   1601         bound = new_sig.bind(ax, *args, **kwargs)

~/miniconda3/envs/cellrank/lib/python3.6/site-packages/matplotlib/axes/_axes.py in scatter(self, x, y, s, c, marker, cmap, norm, vmin, vmax, alpha, linewidths, verts, edgecolors, plotnonfinite, **kwargs)
   4498                 )
   4499         collection.set_transform(mtransforms.IdentityTransform())
-> 4500         collection.update(kwargs)
   4501 
   4502         if colors is None:

~/miniconda3/envs/cellrank/lib/python3.6/site-packages/matplotlib/artist.py in update(self, props)
    972 
    973         with cbook._setattr_cm(self, eventson=False):
--> 974             ret = [_update_property(self, k, v) for k, v in props.items()]
    975 
    976         if len(ret):

~/miniconda3/envs/cellrank/lib/python3.6/site-packages/matplotlib/artist.py in <listcomp>(.0)
    972 
    973         with cbook._setattr_cm(self, eventson=False):
--> 974             ret = [_update_property(self, k, v) for k, v in props.items()]
    975 
    976         if len(ret):

~/miniconda3/envs/cellrank/lib/python3.6/site-packages/matplotlib/artist.py in _update_property(self, k, v)
    968                 if not callable(func):
    969                     raise AttributeError('{!r} object has no property {!r}'
--> 970                                          .format(type(self).__name__, k))
    971                 return func(v)
    972 

AttributeError: 'PathCollection' object has no property 'lineages'

I'm fairly certain that's because the absorption probabilities are saved as a Lineage object in adata.obsm. I think you had changed this, but maybe something went wrong when we copied the repo? To make sure no other bugs were introduced, can you please run the tutorial and check that it works?

Titles in `mode=violin`

When I use cr.pl.cluster_fates with mode='violin', the lineage names don't have the 'to/from' prefix that they usually have (see mc.plot_lin_probs). Would be nice to add this prefix, depending on the direction.

More tests

Especially for functionality (i.e. pipeline) + plotting.

Bug in `plot_main_states` for the backwards process

When calling g_bwd.plot_main_states(n_cells=30) I get:
(Note that n_cells=None works fine)

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-64-afed7edbd7ff> in <module>
----> 1 g_bwd.plot_main_states(n_cells=30)

~/Projects/cellrank/cellrank/tools/estimators/_gppca.py in plot_main_states(self, n_cells, lineages, cluster_key, mode, time_key, same_plot, show_dp, title, cmap, **kwargs)
    429                 same_plot=same_plot,
    430                 title=title,
--> 431                 **kwargs,
    432             )
    433 

~/Projects/cellrank/cellrank/tools/estimators/_gppca.py in _plot_states(self, attr, error_msg, n_cells, same_plot, title, **kwargs)
    774                 )
    775         except Exception as e:
--> 776             raise e
    777         finally:
    778             cleanup()

~/Projects/cellrank/cellrank/tools/estimators/_gppca.py in _plot_states(self, attr, error_msg, n_cells, same_plot, title, **kwargs)
    771                     color=keys,
    772                     title=list(_main_states.cat.categories) if title is None else title,
--> 773                     **kwargs,
    774                 )
    775         except Exception as e:

~/Projects/scvelo/scvelo/plotting/scatter.py in scatter(adata, basis, x, y, vkey, color, use_raw, layer, color_map, colorbar, palette, size, alpha, linewidth, linecolor, perc, groups, sort_order, components, projection, legend_loc, legend_loc_lines, legend_fontsize, legend_fontweight, xlabel, ylabel, title, fontsize, figsize, xlim, ylim, add_density, add_assignments, add_linfit, add_polyfit, add_rug, add_text, add_text_pos, add_outline, outline_width, outline_color, n_convolve, smooth, rescale_color, color_gradients, dpi, frameon, zorder, ncols, nrows, wspace, hspace, show, save, ax, **kwargs)
    291                 c = interpret_colorkey(adata, basis, color, perc, use_raw)  # phase portrait: color=basis, layer=color
    292             else:
--> 293                 c = interpret_colorkey(adata, color, layer, perc, use_raw)  # embedding, gene trend etc.
    294 
    295             if c is not None and not isinstance(c, str) and not isinstance(c[0], str):

~/Projects/scvelo/scvelo/plotting/utils.py in interpret_colorkey(adata, c, layer, perc, use_raw)
    442         elif np.any([obs_key in c for obs_key in adata.obs.keys()]):
    443             obs = adata.obs[[key for key in adata.obs.keys() if not isinstance(adata.obs[key][0], str)]]
--> 444             c = obs.astype(np.float32).eval(c)
    445         elif not is_color_like(c):
    446             raise ValueError('color key is invalid! pass valid observation annotation or a gene name')

~/miniconda3/envs/cellrank/lib/python3.6/site-packages/pandas/core/generic.py in astype(self, dtype, copy, errors)
   5696         else:
   5697             # else, only a single dtype is given
-> 5698             new_data = self._data.astype(dtype=dtype, copy=copy, errors=errors)
   5699             return self._constructor(new_data).__finalize__(self)
   5700 

~/miniconda3/envs/cellrank/lib/python3.6/site-packages/pandas/core/internals/managers.py in astype(self, dtype, copy, errors)
    580 
    581     def astype(self, dtype, copy: bool = False, errors: str = "raise"):
--> 582         return self.apply("astype", dtype=dtype, copy=copy, errors=errors)
    583 
    584     def convert(self, **kwargs):

~/miniconda3/envs/cellrank/lib/python3.6/site-packages/pandas/core/internals/managers.py in apply(self, f, filter, **kwargs)
    440                 applied = b.apply(f, **kwargs)
    441             else:
--> 442                 applied = getattr(b, f)(**kwargs)
    443             result_blocks = _extend_blocks(applied, result_blocks)
    444 

~/miniconda3/envs/cellrank/lib/python3.6/site-packages/pandas/core/internals/blocks.py in astype(self, dtype, copy, errors)
    605         if self.is_extension:
    606             # TODO: Should we try/except this astype?
--> 607             values = self.values.astype(dtype)
    608         else:
    609             if issubclass(dtype.type, str):

~/miniconda3/envs/cellrank/lib/python3.6/site-packages/pandas/core/arrays/categorical.py in astype(self, dtype, copy)
    500         if is_integer_dtype(dtype) and self.isna().any():
    501             raise ValueError("Cannot convert float NaN to integer")
--> 502         return np.array(self, dtype=dtype, copy=copy)
    503 
    504     @cache_readonly

~/miniconda3/envs/cellrank/lib/python3.6/site-packages/pandas/core/arrays/categorical.py in __array__(self, dtype)
   1284         ret = take_1d(self.categories.values, self._codes)
   1285         if dtype and not is_dtype_equal(dtype, self.categories.dtype):
-> 1286             return np.asarray(ret, dtype)
   1287         if is_extension_array_dtype(ret):
   1288             # When we're a Categorical[ExtensionArray], like Interval,

~/miniconda3/envs/cellrank/lib/python3.6/site-packages/numpy/core/_asarray.py in asarray(a, dtype, order)
     83 
     84     """
---> 85     return array(a, dtype, copy=False, order=order)
     86 
     87 

ValueError: could not convert string to float: 'Beta'

> /Users/marius/miniconda3/envs/cellrank/lib/python3.6/site-packages/numpy/core/_asarray.py(85)asarray()
     83 
     84     """
---> 85     return array(a, dtype, copy=False, order=order)
     86 
     87

Attribute renaming

To ensure harmonization between scvelo and cellrank, we should rename a couple of our attributes in cellrank. This is in particular relevant for scvelo's latent_time and for the directed PAGA. I suggest the following changes:

In adata, write to final_states and root_states, rather than final_cells and root_cells
write continuous annotations to final_states_cont and root_states_cont. This should be the aggregated probability that a cells belongs to a final/root state, no matter which one. (I guess we called this approx_rcs_probs in MC, so we already have it there).

The above two should be consistent between MC and GPCCA, we can talk about the details tonight. Also, I was thinking about how to make the naming more consistent between MC and GPCCA in general. I would suggest:

rename approx_rcs to metastable_states everywhere, i.e. compute_approx_rcs goes to compute_metastable_states in MC etc.
The MC class directly writes metastable_states to final_states or root_states in AnnData
The GPCCA class writes metastable_states to metastable_states_fwd and metastable_states_bwd in AnnData, and their restriction ('main states') to final_states and root_states in AnnData

PAGA with Pie Charts

Leander and Volker both had similar bugs where their PAGA with pie charts looks something like this:

Any idea what may be happening here?

PAGA in the embedding

Would be nice to be able to plot the PAGA with pie charts in the embedding, similar to how scvelo does it.

Rename startpoints and endpoints

Use 'root cells' and 'final cells' instead.

Generalization of `_merge_approx_rcs`

I would generalize this a bit so that you can pass two series, and optionally, two color arrays associated with the categories. Potentially, you could use this: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.combine.html

I'm opening this up because I need such a utility function at the moment and rather than writing a second one, it would be nice to just have one general one. I would also rename the function and the variables inside it to not have any refernece to recurrent classes, to just keep it general.

Add option to manually set approx_rcs

Markov Chain Interface

A lot of functions in MarkovChain are beginning with compute_ or plot_. Also, we have a lot of attributes - maybe they should be read-only properties, since user shouldn't set them (and perhaps we can hide a few). Also, I'd give the functions more descriptive names (e.g. eigen instead of just eig)
For the former, I propose splitting the functions into modules, e.g.

mc_fwd = cr.tl.MarkovChain(...)
mc_fwd.compute.eigen()
...
mc_fwd.plot.eigen_embedding()

We could also have properties module, e.g.

mc_fwd.props.G2M_score

@Marius1311 what do you think?

Clean up types

Do a proper typechecking using mypy

Make test work again

Introduce `final_states_cont` and `root_states_cont`

Additionally to the categorical root_states and final_states, we need continious root_states_cont and final_states_cont - basically, the probability of being a root/final states, irrespective of which one.

For CFLARE, this already exists, I guess it's currently called final_cells_probs and root_cells_probs, so here, it's only a matter of renaming. For GPCCA, we can get this simply by adding up our lineage probabilities corresponding to root/final states (not taking into account the rest in case we did not redistribute, otherwise it won't work).

Once this is done, we can nicely interface with scvelo's latent time and directed paga. We're talking to Volker tomorrow, would be nice to have this ready by then so he can start with the interface on scvelo's side.

Condition number for sparse matrices

Currently, we use np.linalg.cond to get the condition number, which works only on dense matrices.
Implement the approximation for sparse matrices as described in: https://github.com/sparse-condest/condest

Bug in _counts() function in tools/_cluster_fates module

Hi @Marius1311 ,

I saw a bug in the _counts() function of the tools/_cluster_fates module. I wanted to create a PR but I can't push to the repo so I think this is the best way to notify it.

The snippet of code is:

    for name in cluster_names:
        data = adata[adata.obs[cluster_key] == name].obsm[lin_key]
        dim = data.shape[1]
        index = np.random.randint(data.shape[0], size=n_samples)
        l = [
            np.random.choice(np.arange(dim), 1, p=np.atleast_1d(data[ind]))[0]
            if ~np.isnan(data[ind]).any()
            else 0
            for ind in index
        ]
        freq = [l.count(i) for i in np.arange(5)]
        freq = [i if i > 0 else i + 1e-5 for i in freq]
        d[name] = freq

The bug is freq = [l.count(i) for i in np.arange(5)]should be freq = [l.count(i) for i in np.arange(dim)]

Best,

Juan.

Title figure for the docs

We still need a figure for the index of our docs.

Fix tests

Write a custom comparator accounting for slight variations in sizes + do a proper requirements for tests.

`cr.datasets`

The data we use in our tutorials, but also for our figures, will be a bit different from scvelo's data. For reproducibility, it would be nice to have a .datasets module, that allows the user to conveniently download our datasets from the notebooks repo. Could you look into how scvelo does this and potentially adapt this to our needs?

Loading of adata object

Since we rely on the Lineage class, which cannot be reconstructed when loading the adata
object using sc.read, I think we should have our own wrapper that will also optionally create dummy names, such as Lineage 1 if not present, as well as colors

Switch back to sparse calculation for transition matrix

Now that I know how to solve the issue, let's go back to the original implementation.

Plotting gene dynamics with cr.pl.gene_trends

Hi guys! I'm trying to reproduce your CellRank pancreas tutorial, but when I try to plot the gene dynamics over latent time with

model = cr.ul.models.GamMGCVModel(adata, n_splines=7, sp=100)
cr.pl.gene_trends(adata, model=model, data_key='Ms', genes=['Gng12', 'Pak3'], time_key='latent_time', weight_threshold=0.05, weight_scale=1, filter_data=True, dpi=25)

it's stuck in the loading process (I stopped it after several hours). rpy2 is installed and working properly and switching the model with

from sklearn.gaussian_process.kernels import RBF
from sklearn.kernel_ridge import KernelRidge
model = cr.ul.models.SKLearnModel(adata, KernelRidge(kernel=RBF()))

gives me the same issue. Do you guys have any ideas?

Cluster fates cleaner subsetting

For future reference: once new version of anndata releases, we can use proper subsetting here of the Lineage class thanks to scverse/anndata#335.

Improve versioning and project management

Add version, authors, email, etc. Check out versioneer and flit.

Easy interface for `cr.pl.graph`

I often want to quickly visualise a small directed graph and I think your function would be nice for this. It would be great if we could add a way to just pass it a numpy array or sparse matrix, instead of having to save the matrix in an anndata object first. Do you agree, or do you think I should just use networkx straight away?

Saving Paga Pie

When I save a paga with pie charts with a .pdf extension, then the legend keywords are ignored, i.e. they are just always plotted on data:

15.5_backwards_paga_pie.pdf

This is not the case when I save as e.g. .png

undefined 'get_neighs_params'

cellrank/cellrank/tools/_utils.py

Line 527 in 991b764

and n_neighbors <= get_neighs_params(adata)["n_neighbors"]

It's nowhere defined and raises a
NameError: name 'get_neighs_params' is not defined

Bug in `cr.pl.cluster_fates`

Mode violoin does not work becasue of an indexing error. I call cr.pl.cluster_fates(adata, cluster_key='clusters', mode='violin') and I get

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-22-6b9263a278de> in <module>
      1 mc_fwd.compute_lin_probs(keys=['Ductal', 'Alpha, Beta, Epsilon'])
----> 2 cr.pl.cluster_fates(adata, cluster_key='clusters', mode='violin')

~/Projects/cellrank/cellrank/plotting/_cluster_fates.py in cluster_fates(adata, cluster_key, clusters, lineages, mode, final, show_cbar, ncols, sharey, save, legend_kwargs, figsize, dpi, **kwargs)
    334         )
    335         mask = np.array(mask, dtype=np.bool)
--> 336         data = np.array(adata.obsm[lk][mask, :][:, lin_names])
    337         mean = np.nanmean(data, axis=0)
    338         std = np.nanstd(data, axis=0) / np.sqrt(data.shape[0])

IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices

> /Users/marius/Projects/cellrank/cellrank/plotting/_cluster_fates.py(336)cluster_fates()
    334         )
    335         mask = np.array(mask, dtype=np.bool)
--> 336         data = np.array(adata.obsm[lk][mask, :][:, lin_names])
    337         mean = np.nanmean(data, axis=0)
    338         std = np.nanstd(data, axis=0) / np.sqrt(data.shape[0])

Profile GPCCA with krylov-schur

I think we should run a profiler over this to identify current bottlenecks - I have a feeling we can further speed this up.

Convenient way to specify roots/ends

In case the user already knows where his roots/endpoints are, it would be nice to have a convenient way to input this to the classe, as e.g. a dict with cell barcodes. @michalk8 , I'm not up to date whether this is already implemented or not?

References formatting

For some reason, our references are not aligned, see https://cellrank.readthedocs.io/en/latest/references.html
It looks a bit odd at the moment.

Having separate functions for root and final

Leander told me it may be more intutitive to have one function find_final and another one find_root instead of the combined root_final - what do you think?

Handeling both legends in PAGA pie

Getting both legends to somehow not overlap with the data is a real challenge. What's missing is the option to position the cellrank legend (final_states) outside of the data, like the scvelo/scanpy legend (clusters).

scvelo legend is set to right, cellrank legend is left. The problem is, it's not really outside the plotting area. best does not give me anything better:

If at least I could turn the clusters legend off, and if I were able to put the cellrank legend to the place there the scvelo legend currently is, that would already help me a lot.

New plotting function for complex eigenvalues and check for imaginary part of eigenvectors

Check for aperiodicity as well.

Lineage color debugging

The assignment of colors to lineages isn't quite working yet. At the moment, colors are mixed around when I use keys to compute lineages. See here to reproduce: https://github.com/theislab/cellrank_notebooks_dev/blob/melanoma/notebooks/analysis_pancreas/E15.5/ml_1_cellrank_testing.ipynb

I guess it's because I don't provide keys in the right order?

Enable heatmap/clustermap for `cluster_fates`

I would be great to add mode='heatmap' and mode='clustermap' to cluster_fates, to get e.g. output like this:

I wrote some dummy code that we can base this on: (I think cluster_fates has most of this already)

def cluster_fates_heatmap(adata, cluster_key='louvain', clusters=None, title=None):
    
    from seaborn import heatmap
    
    if clusters is None:
        clusters = adata.obs[cluster_key].cat.categories
    
    lin_obj = adata.obsm['to_final_cells']
    fev_heat = np.zeros((lin_obj.shape[1], len(clusters)))

    for i, cl in enumerate(clusters):
        mask = adata.obs[cluster_key] == cl
        fev_heat[:, i] = np.mean(lin_obj[mask, :], axis=0)

    fig, ax = plt.subplots()
    heatmap(fev_heat, robust=True, xticklabels=clusters, yticklabels=lin_obj.names, annot=True, fmt='.2f', ax=ax)
    if title is None:
        figure_title = 'Averaged cluster fates'
    else:
        figure_title  = title
        
    ax.set_title(title)

    plt.show()

'connectivities' stored in adata.obsp instead of adata.uns

In the latest version of anndata (and with the upcoming release), the neighbor graph will be stored in adata.obsp['distances'] and adata.obsp['connectivities'], which we should already be accounting for now. Could, for instance, be done with a helper function that allows for both conventions

def get_neighs(adata, mode='distances'):
    return adata.obsp[mode] if hasattr(adata, 'obsp') and mode in adata.obsp.keys() else adata.uns['neighbors'][mode]

`.set_approx_rcs` improvements

It would be nice if .set_approx_rcs would have an option add_to_existing: bool = True, where

True: the new recurrent classes are added to the existing. If a dict is passed, this should be easy, as we can expect to only get indices of recurrent states. If a series is passed, I would ignore Nan's, i.e. not override previously recurrent classes as transient, but only add more recurrent classes
False: current behaviour.

theislab / cellrank Goto Github PK

cellrank's People

Contributors

Stargazers

Watchers

Forkers

cellrank's Issues

Recommend Projects

Recommend Topics

Recommend Org