jcventerinstitute / nsforest Goto Github PK

A machine learning method for the discovery of the minimum marker gene combinations for cell type identification from single-cell RNA sequencing

License: MIT License

Python 0.09% Jupyter Notebook 8.97% HTML 90.94%

clustering machine-learning marker-genes random-forest single-cell

nsforest's People

Contributors

Stargazers

Watchers

Forkers

alabarga baevermann e-sollier jianzhangsmu crsky1023 neap-group-b bsungwoo ttl074 auesro yunzhang813 campagnola tianqiyy jennliang1 ryan2han wangdi2016

nsforest's Issues

Error while running NSForest on a Scanpy object

Hi, I have run NSForest V3.0 using "NS_Forest(adata, clusterLabelcolumnHeader = "leiden"" command in Jupyter Notebook (on a Windows 10 OS) and the program ran for a while. However, after a few minutes I got this message:

File :1
6330403 K07Rik >=1.4837858080863953
^
SyntaxError: invalid syntax

Would you please help me solve this issue?

Several issues when using scanpy object converted from seurat

Hi there,

Thank you so much for creating such an awesome tool! I am quite new to coding especially in python and encountered several errors when running the NSForest function. I am not sure if these problems were specific to my object but I thought I would post them here in case someone else has the same issues. I have tried to fix some of the errors and have gotten the function to finish but I'm not 100% sure if the output is correct.

#---------------------------------------------------------------------------------------------------------------------------------------
The first error was in line 167 of the source code:
"AttributeError: Can only use .cat accessor with a 'category' dtype"
I changed this:

medianValues = pd.DataFrame(columns=adata.var_names, index=adata.obs[clusterLabelcolumnHeader].cat.categories)

to this:

medianValues = pd.DataFrame(columns=adata.var_names, index=adata.obs[clusterLabelcolumnHeader].unique())

which seemed to work

#----------------------------------------------------------------------------------------------------------------------------------------------
The second error was in line 172:
"ValueError: Shape of passed values is (49211, 1), indices imply (49211, 31002)" which I think is because the input to create the pandas data frame was in the wrong format.

When I changed this:

Subset_dataframe = pd.DataFrame(data = subset_adata.X, index = subset_adata.obs, columns = subset_adata.var_names)

to this:

Subset_dataframe = pd.DataFrame(data = subset_adata.X.toarray(), index = list(subset_adata.obs["cells"].tolist()), columns =
subset_adata.var_names)

it seemed to work.

#------------------------------------------------------------------------------------------------------------------------------------------------
A similar problem in line 121 of the source code:
When running:

def fbetaTest(x, column, adata, Binary_RankedList, testArray, betaValue = 0.5)

I get this error "ValueError: Shape of passed values is (113957, 1), indices imply (113957, 0)"

But I changed this:

Subset_dataframe = pd.DataFrame(data = subset_adata.X, index = subset_adata.obs_names, columns = subset_adata.var_names)

to this:

Subset_dataframe = pd.DataFrame(data=subset_adata.X.toarray(), index=subset_adata.obs_names.tolist(), columns=subset_adata.var_names)

and it seemed to work.

#-------------------------------------------------------------------------------------------------------------------------------------

Another error was in line 94 of the source code:
"IndexError: Index dimension must be <= 2"

X = x_train[:, None]

I don't think this code is actually necessary so I commented it out which seemed to fix the problem.

#----------------------------------------------------------------------------------------------------------------------------------------------
Then I also got several errors in the last section and I couldn't quite figure out what the problems were but I changed the code from this:

#Move binary genes to Results dataframe
clusters2Genes = pd.DataFrame(columns = ['Gene', 'clusterName'])
clusters2Genes["clusterName"] = Binary_score_store_DF["clusterName"]
clusters2Genes["Gene"] = Binary_score_store_DF.index
GroupedBinarylist = clusters2Genes.groupby('clusterName').apply(lambda x: x['Gene'].unique()) 
BinaryFinal = pd.DataFrame(columns = ['clusterName','Binary_Genes'])
BinaryFinal['clusterName'] = GroupedBinarylist.index
BinaryFinal['Binary_Genes'] = GroupedBinarylist.values

to this:

 Binary_score_store_DF = pd.read_csv('NS-Forest_v3_Extended_Binary_Markers_Supplmental.csv')

# Move binary genes to Results dataframe
clusters2Genes = pd.DataFrame(columns=['Gene', 'clusterName'])
clusters2Genes["clusterName"] = Binary_score_store_DF["clusterName"]
clusters2Genes["Gene"] = Binary_score_store_DF["Unnamed: 0"]
clusters2Genes.to_csv('clusters2Genes.csv')
#GroupedBinarylist = clusters2Genes.groupby('clusterName').apply(lambda x: x['Gene'].unique())
#GroupedBinarylist = clusters2Genes.apply(lambda x: x['Gene'].unique()) #This seemed to work earlier

BinaryFinal = pd.DataFrame(columns=['clusterName', 'Binary_Genes'])
BinaryFinal['clusterName'] =  clusters2Genes["clusterName"]
BinaryFinal['Binary_Genes'] = clusters2Genes["Gene"]
BinaryFinal.to_csv('BinaryFinal.csv')

It seems that in this line of code:

clusters2Genes["clusterName"] = Binary_score_store_DF["clusterName"]

the column name in the Binary_score_store dataframe was incorrect

And in this line of code:

GroupedBinarylist = clusters2Genes.groupby('clusterName').apply(lambda x: x['Gene'].unique())

the clusters are already grouped together and all the gene names are already unique??

Couldn't get around the last problem so ended up just commenting out those lines of code and now I am not sure if my output files are what they should be. Any advice would be greatly appreciated!

python error

I have been running into an error at the line testArray['y_pred'] = 0. IS this suppose to be converting all the values to 0's?

Update the tutorial from `NSForest_v3dot9_1` to `nsforest` for the import

Just installed via pip successfully:

(base) jespinozlt2-osx:~ jespinoz$ conda activate soothsayer_py3.9_env
pip install (soothsayer_py3.9_env) jespinozlt2-osx:~ jespinoz$ pip install nsforest
Collecting nsforest
  Downloading nsforest-3.9.2.5-py3-none-any.whl (7.0 kB)
Requirement already satisfied: scanpy>=1.9.3 in ./anaconda3/envs/soothsayer_py3.9_env/lib/python3.9/site-packages (from nsforest) (1.9.3)
Requirement already satisfied: scikit-learn>=0.22 in ./anaconda3/envs/soothsayer_py3.9_env/lib/python3.9/site-packages (from scanpy>=1.9.3->nsforest) (1.0.2)
Requirement already satisfied: anndata>=0.7.4 in ./anaconda3/envs/soothsayer_py3.9_env/lib/python3.9/site-packages (from scanpy>=1.9.3->nsforest) (0.8.0)
Requirement already satisfied: patsy in ./anaconda3/envs/soothsayer_py3.9_env/lib/python3.9/site-packages (from scanpy>=1.9.3->nsforest) (0.5.2)
Requirement already satisfied: umap-learn>=0.3.10 in ./anaconda3/envs/soothsayer_py3.9_env/lib/python3.9/site-packages (from scanpy>=1.9.3->nsforest) (0.5.2)
Requirement already satisfied: seaborn in ./anaconda3/envs/soothsayer_py3.9_env/lib/python3.9/site-packages (from scanpy>=1.9.3->nsforest) (0.11.2)
Requirement already satisfied: scipy>=1.4 in ./anaconda3/envs/soothsayer_py3.9_env/lib/python3.9/site-packages (from scanpy>=1.9.3->nsforest) (1.8.0)
Requirement already satisfied: tqdm in ./anaconda3/envs/soothsayer_py3.9_env/lib/python3.9/site-packages (from scanpy>=1.9.3->nsforest) (4.62.3)
Requirement already satisfied: pandas>=1.0 in ./anaconda3/envs/soothsayer_py3.9_env/lib/python3.9/site-packages (from scanpy>=1.9.3->nsforest) (1.4.0)
Requirement already satisfied: session-info in ./anaconda3/envs/soothsayer_py3.9_env/lib/python3.9/site-packages (from scanpy>=1.9.3->nsforest) (1.0.0)
Requirement already satisfied: packaging in ./anaconda3/envs/soothsayer_py3.9_env/lib/python3.9/site-packages (from scanpy>=1.9.3->nsforest) (21.3)
Requirement already satisfied: natsort in ./anaconda3/envs/soothsayer_py3.9_env/lib/python3.9/site-packages (from scanpy>=1.9.3->nsforest) (8.1.0)
Requirement already satisfied: networkx>=2.3 in ./anaconda3/envs/soothsayer_py3.9_env/lib/python3.9/site-packages (from scanpy>=1.9.3->nsforest) (2.6.3)
Requirement already satisfied: statsmodels>=0.10.0rc2 in ./anaconda3/envs/soothsayer_py3.9_env/lib/python3.9/site-packages (from scanpy>=1.9.3->nsforest) (0.13.1)
Requirement already satisfied: joblib in ./anaconda3/envs/soothsayer_py3.9_env/lib/python3.9/site-packages (from scanpy>=1.9.3->nsforest) (1.1.0)
Requirement already satisfied: h5py>=3 in ./anaconda3/envs/soothsayer_py3.9_env/lib/python3.9/site-packages (from scanpy>=1.9.3->nsforest) (3.7.0)
Requirement already satisfied: numpy>=1.17.0 in ./anaconda3/envs/soothsayer_py3.9_env/lib/python3.9/site-packages (from scanpy>=1.9.3->nsforest) (1.21.5)
Requirement already satisfied: numba>=0.41.0 in ./anaconda3/envs/soothsayer_py3.9_env/lib/python3.9/site-packages (from scanpy>=1.9.3->nsforest) (0.55.1)
Requirement already satisfied: matplotlib>=3.4 in ./anaconda3/envs/soothsayer_py3.9_env/lib/python3.9/site-packages (from scanpy>=1.9.3->nsforest) (3.5.1)
Requirement already satisfied: fonttools>=4.22.0 in ./anaconda3/envs/soothsayer_py3.9_env/lib/python3.9/site-packages (from matplotlib>=3.4->scanpy>=1.9.3->nsforest) (4.29.1)
Requirement already satisfied: python-dateutil>=2.7 in ./anaconda3/envs/soothsayer_py3.9_env/lib/python3.9/site-packages (from matplotlib>=3.4->scanpy>=1.9.3->nsforest) (2.8.2)
Requirement already satisfied: pillow>=6.2.0 in ./anaconda3/envs/soothsayer_py3.9_env/lib/python3.9/site-packages (from matplotlib>=3.4->scanpy>=1.9.3->nsforest) (9.0.1)
Requirement already satisfied: pyparsing>=2.2.1 in ./anaconda3/envs/soothsayer_py3.9_env/lib/python3.9/site-packages (from matplotlib>=3.4->scanpy>=1.9.3->nsforest) (3.0.7)
Requirement already satisfied: cycler>=0.10 in ./anaconda3/envs/soothsayer_py3.9_env/lib/python3.9/site-packages (from matplotlib>=3.4->scanpy>=1.9.3->nsforest) (0.11.0)
Requirement already satisfied: kiwisolver>=1.0.1 in ./anaconda3/envs/soothsayer_py3.9_env/lib/python3.9/site-packages (from matplotlib>=3.4->scanpy>=1.9.3->nsforest) (1.3.2)
Requirement already satisfied: llvmlite<0.39,>=0.38.0rc1 in ./anaconda3/envs/soothsayer_py3.9_env/lib/python3.9/site-packages (from numba>=0.41.0->scanpy>=1.9.3->nsforest) (0.38.0)
Requirement already satisfied: setuptools in ./anaconda3/envs/soothsayer_py3.9_env/lib/python3.9/site-packages (from numba>=0.41.0->scanpy>=1.9.3->nsforest) (60.7.1)
Requirement already satisfied: pytz>=2020.1 in ./anaconda3/envs/soothsayer_py3.9_env/lib/python3.9/site-packages (from pandas>=1.0->scanpy>=1.9.3->nsforest) (2021.3)
Requirement already satisfied: threadpoolctl>=2.0.0 in ./anaconda3/envs/soothsayer_py3.9_env/lib/python3.9/site-packages (from scikit-learn>=0.22->scanpy>=1.9.3->nsforest) (3.1.0)
Requirement already satisfied: six in ./anaconda3/envs/soothsayer_py3.9_env/lib/python3.9/site-packages (from patsy->scanpy>=1.9.3->nsforest) (1.16.0)
Requirement already satisfied: pynndescent>=0.5 in ./anaconda3/envs/soothsayer_py3.9_env/lib/python3.9/site-packages (from umap-learn>=0.3.10->scanpy>=1.9.3->nsforest) (0.5.6)
Requirement already satisfied: stdlib-list in ./anaconda3/envs/soothsayer_py3.9_env/lib/python3.9/site-packages (from session-info->scanpy>=1.9.3->nsforest) (0.8.0)
Installing collected packages: nsforest
Successfully installed nsforest-3.9.2.5

Following this tutorial:
https://jcventerinstitute.github.io/celligrate/tutorials/NS-Forest_tutorial.html

(soothsayer_py3.9_env) jespinozlt2-osx:~ jespinoz$ python
Python 3.9.15 | packaged by conda-forge | (main, Nov 22 2022, 08:55:37)
[Clang 14.0.6 ] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import NSForest_v3dot9_1 as nsf
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ModuleNotFoundError: No module named 'NSForest_v3dot9_1'
>>> import nsforest as nsf
>>> nsf.__version__
'3.9.2'

Looks like the import command needs to be changed to from nsforest import *

Index error after removal of median expressed genes

Hello - I've been attempting to use NSForest_v3 to identify marker genes and am running into an index issue. I followed the solutions in the other two posted issues to make the cluster column categorical and to rename my clusters "cluster#", and it looks like it starts running and then throws an error after removing median expressed genes? I pasted the error I'm getting below.

I'm pretty new to python so I apologize if there is a stupid solution.
Thanks!

`29
cluster0
WBGene00225572
-0.6523373432931012
Is Right Out!
WBGene00225831
-0.6799969505834687
Is Right Out!
WBGene00224439
-0.06210756783551211
Is Right Out!
WBGene00234289
-0.5902925386789144
Is Right Out!
WBGene00223872
-0.7513306632741568
Is Right Out!
WBGene00233914
-0.7859160344353842
Is Right Out!
WBGene00226831
-0.7513606872951731
Is Right Out!
WBGene00234310
-0.7350392591890457
Is Right Out!
WBGene00222026
-0.7625119590800509
Is Right Out!
WBGene00226306
-0.4988349088717183
Is Right Out!
WBGene00230840
-0.043113547535310874
Is Right Out!
WBGene00234320
-0.45560893452216766
Is Right Out!
WBGene00224751
-0.7094634997854268
Is Right Out!
WBGene00222262
-0.41389846421983106
Is Right Out!
WBGene00222044
-0.7698303495034613
Is Right Out!
[]
0

IndexError Traceback (most recent call last)
/var/folders/xl/ly4bqc356sd34lm07_ggzdw1pfgrjf/T/ipykernel_22678/4072770710.py in
1 # run NSForest on BM_combined
----> 2 adata_markers = NS_Forest(adata, clusterLabelcolumnHeader = 'integrated_snn_res.0.5')

/var/folders/xl/ly4bqc356sd34lm07_ggzdw1pfgrjf/T/ipykernel_22678/3411186380.py in NS_Forest(adata, clusterLabelcolumnHeader, rfTrees, Median_Expression_Level, Genes_to_testing, betaValue)
217 FullpermutationList = permutor(queryInequalities)
218 print(len(FullpermutationList))
--> 219 f1_store = fbetaTest(FullpermutationList, column, adata, Binary_RankedList, testArray, betaValue)
220 f1_store_1D.update(f1_store)
221

/var/folders/xl/ly4bqc356sd34lm07_ggzdw1pfgrjf/T/ipykernel_22678/3411186380.py in fbetaTest(x, column, adata, Binary_RankedList, testArray, betaValue)
120 fbeta_dict = {}
121 subset_adata = adata[:,Binary_RankedList]
--> 122 Subset_dataframe = pd.DataFrame(data = subset_adata.X, index = subset_adata.obs_names, columns = subset_adata.var_names)
123 Subset_dataframe.columns = Subset_dataframe.columns.str.replace("-", "").str.replace(".", "")
124

~/miniconda3/lib/python3.9/site-packages/anndata/_core/anndata.py in X(self)
623 elif self.is_view:
624 X = as_view(
--> 625 _subset(self._adata_ref.X, (self._oidx, self._vidx)),
626 ElementRef(self, "X"),
627 )

~/miniconda3/lib/python3.9/functools.py in wrapper(*args, **kw)
875 '1 positional argument')
876
--> 877 return dispatch(args[0].class)(*args, **kw)
878
879 funcname = getattr(func, 'name', 'singledispatch function')

~/miniconda3/lib/python3.9/site-packages/anndata/_core/index.py in subset(a, subset_idx)
125 if all(isinstance(x, cabc.Iterable) for x in subset_idx):
126 subset_idx = np.ix(*subset_idx)
--> 127 return a[subset_idx]
128
129

IndexError: arrays used as indices must be of integer (or boolean) type`

Use features identified by NS Forest in Seurat

Hello,

I tried your code to delineate the minimum marker genes to characterize subsets of Treg cells. Generally, I work with Seurat package to analyse my singlecell projects. I would like to generate a UMAP plot by using genes identified with NS Forest (2.0 and 1.3). For that, I did a subset of the initial SeuratObject by specifiying the identified genes as features. Then, I performed "FindNeighbors", "FindClusters" and "RunUMAP". Typically, I find the same plot between the one with 2000 genes and the one with 50 genes.
Have you ever tried to combine Seurat and your tool, rather than Scanpy?

Thank you in advance

TypeError: 'NoneType' object is not callable

Hi,

I had an error when I ran the codes in cell 3 of the script 'NS_Forest_v2.ipynb' on Jupyter Notebook. Codes in cell 1 & 2 went smoothly. The following messages were printed on the screen.

RTN1
5.4558917988718
GPM6A
6.5202792613131395
HSPB1
6.68820560318182
PHGDH
4.81677071212406
ANXA5
4.08874724866395
CLU
4.61930439015417
...

D:\Anaconda3\envs\py27\lib\site-packages\ipykernel_launcher.py:98: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
D:\Anaconda3\envs\py27\lib\site-packages\pandas\core\indexing.py:543: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
self.obj[item] = s

RTN1
8.321140985238856
GPM6A
8.261602044663064
HSPB1
3.103076840680365
PHGDH
2.4805289343488903
ANXA5
2.4805289343488903
...

TypeError Traceback (most recent call last)
in ()
74 max_grouped.df.to_csv('NSForest_v2_maxF-scores.csv')
75
---> 76 NSForest_Results_Table_Fin["f-measureRank"] = NSForest_Results_Table_Fin.groupby(by="clusterName")["f-measure"].rank(ascending=False)
77 topResults = NSForest_Results_Table_Fin["f-measureRank"] < 50
78 NSForest_Results_Table_top = NSForest_Results_Table_Fin[topResults]

D:\Anaconda3\envs\py27\lib\site-packages\pandas\core\groupby\groupby.pyc in rank(self, method, ascending, na_option, pct, axis)
1904 return self._cython_transform('rank', numeric_only=False,
1905 ties_method=method, ascending=ascending,
-> 1906 na_option=na_option, pct=pct, axis=axis)
1907
1908 @substitution(name='groupby')

D:\Anaconda3\envs\py27\lib\site-packages\pandas\core\groupby\groupby.pyc in _cython_transform(self, how, numeric_only, **kwargs)
1023 try:
1024 result, names = self.grouper.transform(obj.values, how,
-> 1025 **kwargs)
1026 except NotImplementedError:
1027 continue

D:\Anaconda3\envs\py27\lib\site-packages\pandas\core\groupby\groupby.pyc in transform(self, values, how, axis, **kwargs)
2628
2629 def transform(self, values, how, axis=0, **kwargs):
-> 2630 return self._cython_operation('transform', values, how, axis, **kwargs)
2631
2632 def _aggregate(self, result, counts, values, comp_ids, agg_func,

D:\Anaconda3\envs\py27\lib\site-packages\pandas\core\groupby\groupby.pyc in _cython_operation(self, kind, values, how, axis, min_count, **kwargs)
2588 result = self._transform(
2589 result, values, labels, func, is_numeric, is_datetimelike,
-> 2590 **kwargs)
2591
2592 if is_integer_dtype(result) and not is_datetimelike:

D:\Anaconda3\envs\py27\lib\site-packages\pandas\core\groupby\groupby.pyc in _transform(self, result, values, comp_ids, transform_func, is_numeric, is_datetimelike, **kwargs)
2662 comp_ids, is_datetimelike, **kwargs)
2663 else:
-> 2664 transform_func(result, values, comp_ids, is_datetimelike, **kwargs)
2665
2666 return result

D:\Anaconda3\envs\py27\lib\site-packages\pandas\core\groupby\groupby.pyc in wrapper(*args, **kwargs)
2477
2478 def wrapper(*args, **kwargs):
-> 2479 return f(afunc, *args, **kwargs)
2480
2481 # need to curry our sub-function

D:\Anaconda3\envs\py27\lib\site-packages\pandas\core\groupby\groupby.pyc in (func, a, b, c, d, **kwargs)
2429 kwargs.get('ascending', True),
2430 kwargs.get('pct', False),
-> 2431 kwargs.get('na_option', 'keep')
2432 )
2433 }

TypeError: 'NoneType' object is not callable

What I did:
I followed the Prerequisites in the README.md to install the modules and create the input file. A glance of the input file was shown below.

In [3]: dataFull.head()
Out[3]:
AAK1 AARS ABCD3 ABHD12 ... YWHAH YWHAQ YWHAZ Clusters
AMP358_sc11 5.318264 5.854272 2.480529 6.184276 ... 5.204884 7.599972 6.350880 Cl_1
AMP358_sc18 2.480529 2.480529 2.480529 4.379044 ... 3.193956 7.510237 5.710166 Cl_1
AMP358_sc7 6.228200 2.996529 3.206476 3.890947 ... 2.996529 8.167026 6.861036 Cl_1
AMP358_sc1 3.233557 3.320131 3.534038 4.060293 ... 2.860235 6.999525 6.781777 Cl_1
AMP358_sc11 3.219218 2.480529 2.480529 3.005682 ... 2.480529 7.919291 8.287178 Cl_1

[5 rows x 1255 columns]

Software information:
OS: Windows 10
Python version: 2.7.15

In addition, I tried running the script in CentOS 7 system and got the same error.

Would anyone help solve the issue? Any suggestion is welcome. Thanks!

Shape of passed values is (3474, 1), indices imply (3474, 33514)

Dear author,

I test the latest version and get this error. Is that because the number of genes exceed the limitation?
adata = sc.read_h5ad("***.h5ad")
adata.obs['anno2'] = adata.obs['anno2'].astype('category')
adata_markers = NS_Forest(adata, clusterLabelcolumnHeader = "anno2")

KeyError 0 when running on AnnData object created from Seurat

I converted a Seurat object to AnnData using SeuratDisk. It seemed to work.

These are the features of the resulting object:

AnnData object with n_obs × n_vars = 17912 × 3000
    obs: 'orig.ident', 'nCount_RNA', 'nFeature_RNA', 'status', 'shared_assignment', 'assignment', 'axis', 'log10GenesPerUMI', 'mitoRatio', 'nCount_SCT', 'nFeature_SCT', 'SCT_snn_res.0.4', 'SCT_snn_res.0.6', 'SCT_snn_res.0.8', 'SCT_snn_res.1', 'SCT_snn_res.1.4', 'SCT_snn_res.2', 'seurat_clusters', 'S.Score', 'G2M.Score', 'Phase'
    var: 'features'
    uns: 'neighbors'
    obsm: 'X_pca', 'X_umap'
    varm: 'PCs'
    obsp: 'distances'

The cluster assignment I want to use is 'SCT_snn_res.0.8'. I changed this to dtype category and then changed the function call to NS_Forest to reflect that I want to use this column.

When I run adata_markers = NS_Forest(adata)

It starts up, but I get this error:

22
0
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
/opt/conda/lib/python3.8/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
   3079             try:
-> 3080                 return self._engine.get_loc(casted_key)
   3081             except KeyError as err:

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/index_class_helper.pxi in pandas._libs.index.Int64Engine._check_type()

KeyError: '0'

The above exception was the direct cause of the following exception:

KeyError                                  Traceback (most recent call last)
<ipython-input-7-6cd156db933d> in <module>
----> 1 adata_markers = NS_Forest(adata)

<ipython-input-4-5528acf1c438> in NS_Forest(adata, clusterLabelcolumnHeader, rfTrees, Median_Expression_Level, Genes_to_testing, betaValue)
    202 
    203         #Rerank according to expression level and binary score
--> 204         Positive_RankedList_Complete = negativeOut(RankedList, column, medianValues, Median_Expression_Level)
    205         print(Positive_RankedList_Complete)
    206 

<ipython-input-4-5528acf1c438> in negativeOut(x, column, medianValues, Median_Expression_Level)
     48         Positive_RankedList_Complete = []
     49         for i in x:
---> 50             if medianValues.loc[column, i] > Median_Expression_Level:
     51                 print(i)
     52                 print(medianValues.loc[column, i])

/opt/conda/lib/python3.8/site-packages/pandas/core/indexing.py in __getitem__(self, key)
    887                     # AttributeError for IntervalTree get_value
    888                     return self.obj._get_value(*key, takeable=self._takeable)
--> 889             return self._getitem_tuple(key)
    890         else:
    891             # we by definition only have the 0th axis

/opt/conda/lib/python3.8/site-packages/pandas/core/indexing.py in _getitem_tuple(self, tup)
   1058     def _getitem_tuple(self, tup: Tuple):
   1059         with suppress(IndexingError):
-> 1060             return self._getitem_lowerdim(tup)
   1061 
   1062         # no multi-index, so validate all of the indexers

/opt/conda/lib/python3.8/site-packages/pandas/core/indexing.py in _getitem_lowerdim(self, tup)
    805                 # We don't need to check for tuples here because those are
    806                 #  caught by the _is_nested_tuple_indexer check above.
--> 807                 section = self._getitem_axis(key, axis=i)
    808 
    809                 # We should never have a scalar section here, because

/opt/conda/lib/python3.8/site-packages/pandas/core/indexing.py in _getitem_axis(self, key, axis)
   1122         # fall thru to straight lookup
   1123         self._validate_key(key, axis)
-> 1124         return self._get_label(key, axis=axis)
   1125 
   1126     def _get_slice_axis(self, slice_obj: slice, axis: int):

/opt/conda/lib/python3.8/site-packages/pandas/core/indexing.py in _get_label(self, label, axis)
   1071     def _get_label(self, label, axis: int):
   1072         # GH#5667 this will fail if the label is not present in the axis.
-> 1073         return self.obj.xs(label, axis=axis)
   1074 
   1075     def _handle_lowerdim_multi_index_axis0(self, tup: Tuple):

/opt/conda/lib/python3.8/site-packages/pandas/core/generic.py in xs(self, key, axis, level, drop_level)
   3737                 raise TypeError(f"Expected label or tuple of labels, got {key}") from e
   3738         else:
-> 3739             loc = index.get_loc(key)
   3740 
   3741             if isinstance(loc, np.ndarray):

/opt/conda/lib/python3.8/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
   3080                 return self._engine.get_loc(casted_key)
   3081             except KeyError as err:
-> 3082                 raise KeyError(key) from err
   3083 
   3084         if tolerance is not None:

KeyError: '0'

.yml env installation failure due to pip problems

Hello when I try to install the package using .yml file from git clone directory I get following error:

Looking for: ['python=3.8', 'pip', 'scanpy']

Transaction

Prefix:

Updating specs:

python=3.8
pip
scanpy

Package Version Build Channel Size
───────────────────────────────────────────────────────────────────────────────────────
Install:
───────────────────────────────────────────────────────────────────────────────────────

python_abi 3.8 4_cp38 conda-forge Cached
_libgcc_mutex 0.1 conda_forge conda-forge Cached
libstdcxx-ng 13.2.0 h95c4c6d_6 conda-forge Cached
ca-certificates 2024.2.2 hbcca054_0 conda-forge Cached
ld_impl_linux-64 2.40 h55db66e_0 conda-forge Cached
libgomp 13.2.0 hc881cc4_6 conda-forge Cached
_openmp_mutex 4.5 2_gnu conda-forge Cached
libgcc-ng 13.2.0 hc881cc4_6 conda-forge Cached
xorg-libxdmcp 1.1.3 h7f98852_0 conda-forge Cached
pthread-stubs 0.4 h36c2ea0_1001 conda-forge Cached
c-ares 1.28.1 hd590300_0 conda-forge Cached
libev 4.33 hd590300_2 conda-forge Cached
keyutils 1.6.1 h166bdaf_0 conda-forge Cached
libaec 1.1.3 h59595ed_0 conda-forge Cached
libiconv 1.17 hd590300_2 conda-forge Cached
libwebp-base 1.4.0 hd590300_0 conda-forge Cached
libbrotlicommon 1.1.0 hd590300_1 conda-forge Cached
libdeflate 1.20 hd590300_0 conda-forge Cached
lerc 4.0.0 h27087fc_0 conda-forge Cached
libjpeg-turbo 3.0.0 hd590300_1 conda-forge Cached
icu 73.2 h59595ed_0 conda-forge Cached
xorg-libxau 1.0.11 hd590300_0 conda-forge Cached
openssl 3.2.1 hd590300_1 conda-forge Cached
libxcrypt 4.4.36 hd590300_1 conda-forge Cached
libzlib 1.2.13 hd590300_5 conda-forge Cached
libffi 3.4.2 h7f98852_5 conda-forge Cached
bzip2 1.0.8 hd590300_5 conda-forge Cached
libgfortran5 13.2.0 h43f5ff8_6 conda-forge Cached
ncurses 6.4.20240210 h59595ed_0 conda-forge Cached
libuuid 2.38.1 h0b41bf4_0 conda-forge Cached
libnsl 2.0.1 hd590300_0 conda-forge Cached
xz 5.2.6 h166bdaf_0 conda-forge Cached
libbrotlienc 1.1.0 hd590300_1 conda-forge Cached
libbrotlidec 1.1.0 hd590300_1 conda-forge Cached
libxcb 1.15 h0b41bf4_0 conda-forge Cached
libnghttp2 1.58.0 h47da74e_1 conda-forge Cached
libssh2 1.11.0 h0841786_0 conda-forge Cached
zstd 1.5.5 hfc55251_0 conda-forge Cached
libllvm14 14.0.6 hcd5def8_4 conda-forge Cached
libpng 1.6.43 h2797004_0 conda-forge Cached
tk 8.6.13 noxft_h4845f30_101 conda-forge Cached
libsqlite 3.45.3 h2797004_0 conda-forge Cached
libgfortran-ng 13.2.0 h69a702a_6 conda-forge Cached
libedit 3.1.20191231 he28a2e2_2 conda-forge Cached
readline 8.2 h8228510_1 conda-forge Cached
libxml2 2.12.6 h232c23b_2 conda-forge Cached
brotli-bin 1.1.0 hd590300_1 conda-forge Cached
libtiff 4.6.0 h1dd3fc0_3 conda-forge Cached
freetype 2.12.1 h267a509_2 conda-forge Cached
libopenblas 0.3.25 pthreads_h413a1c8_0 conda-forge Cached
krb5 1.21.2 h659d440_0 conda-forge Cached
python 3.8.19 hd12c33a_0_cpython conda-forge Cached
libhwloc 2.10.0 default_h2fb2949_1000 conda-forge Cached
brotli 1.1.0 hd590300_1 conda-forge Cached
openjpeg 2.5.2 h488ebb8_0 conda-forge Cached
lcms2 2.16 hb7c19ff_0 conda-forge Cached
libblas 3.9.0 20_linux64_openblas conda-forge Cached
libcurl 8.7.1 hca28451_0 conda-forge Cached
tbb 2021.12.0 h00ab1b0_0 conda-forge Cached
libcblas 3.9.0 20_linux64_openblas conda-forge Cached
liblapack 3.9.0 20_linux64_openblas conda-forge Cached
hdf5 1.14.3 nompi_h4f84152_100 conda-forge Cached
wheel 0.43.0 pyhd8ed1ab_1 conda-forge Cached
setuptools 69.5.1 pyhd8ed1ab_0 conda-forge Cached
pip 24.0 pyhd8ed1ab_0 conda-forge Cached
pysocks 1.7.1 pyha2e5f31_6 conda-forge Cached
charset-normalizer 3.3.2 pyhd8ed1ab_0 conda-forge Cached
idna 3.7 pyhd8ed1ab_0 conda-forge Cached
cached_property 1.5.2 pyha770c72_1 conda-forge Cached
zipp 3.17.0 pyhd8ed1ab_0 conda-forge Cached
platformdirs 4.2.1 pyhd8ed1ab_0 conda-forge Cached
colorama 0.4.6 pyhd8ed1ab_0 conda-forge Cached
munkres 1.1.4 pyh9f0ad1d_0 conda-forge Cached
pyparsing 3.1.2 pyhd8ed1ab_0 conda-forge Cached
cycler 0.12.1 pyhd8ed1ab_0 conda-forge Cached
certifi 2024.2.2 pyhd8ed1ab_0 conda-forge Cached
pytz 2024.1 pyhd8ed1ab_0 conda-forge Cached
python-tzdata 2024.1 pyhd8ed1ab_0 conda-forge Cached
threadpoolctl 3.4.0 pyhc1e730c_0 conda-forge Cached
stdlib-list 0.10.0 pyhd8ed1ab_0 conda-forge Cached
six 1.16.0 pyh6c4a22f_0 conda-forge Cached
legacy-api-wrap 1.4 pyhd8ed1ab_1 conda-forge Cached
packaging 24.0 pyhd8ed1ab_0 conda-forge Cached
networkx 3.1 pyhd8ed1ab_0 conda-forge Cached
natsort 8.4.0 pyhd8ed1ab_0 conda-forge Cached
joblib 1.4.0 pyhd8ed1ab_0 conda-forge Cached
get-annotations 0.1.2 pyhd8ed1ab_0 conda-forge Cached
cached-property 1.5.2 hd8ed1ab_1 conda-forge Cached
importlib_resources 6.4.0 pyhd8ed1ab_0 conda-forge Cached
importlib-metadata 7.1.0 pyha770c72_0 conda-forge Cached
tqdm 4.66.2 pyhd8ed1ab_0 conda-forge Cached
session-info 1.0.0 pyhd8ed1ab_0 conda-forge Cached
python-dateutil 2.9.0 pyhd8ed1ab_0 conda-forge Cached
importlib-resources 6.4.0 pyhd8ed1ab_0 conda-forge Cached
brotli-python 1.1.0 py38h17151c0_1 conda-forge Cached
unicodedata2 15.1.0 py38h01eb140_0 conda-forge Cached
pillow 10.3.0 py38h9e66945_0 conda-forge Cached
kiwisolver 1.4.5 py38h7f3f72f_1 conda-forge Cached
numpy 1.24.4 py38h59b608b_0 conda-forge Cached
llvmlite 0.41.1 py38h94a1851_0 conda-forge Cached
fonttools 4.51.0 py38h01eb140_0 conda-forge Cached
contourpy 1.1.1 py38h7f3f72f_1 conda-forge Cached
pandas 2.0.3 py38h01efb38_1 conda-forge Cached
h5py 3.11.0 nompi_py38h2c1edd7_100 conda-forge Cached
numba 0.58.1 py38h4144172_0 conda-forge Cached
matplotlib-base 3.7.3 py38h58ed7fa_0 conda-forge Cached
urllib3 2.2.1 pyhd8ed1ab_0 conda-forge Cached
patsy 0.5.6 pyhd8ed1ab_0 conda-forge Cached
requests 2.31.0 pyhd8ed1ab_0 conda-forge Cached
pooch 1.8.1 pyhd8ed1ab_0 conda-forge Cached
scipy 1.10.1 py38h59b608b_3 conda-forge Cached
statsmodels 0.14.1 py38h7f0c24c_0 conda-forge Cached
scikit-learn 1.3.2 py38ha25d942_2 conda-forge Cached
seaborn-base 0.13.2 pyhd8ed1ab_0 conda-forge Cached
anndata 0.9.2 pyhd8ed1ab_0 conda-forge Cached
pynndescent 0.5.12 pyhca7485f_0 conda-forge Cached
seaborn 0.13.2 hd8ed1ab_0 conda-forge Cached
umap-learn 0.5.5 py38h578d9bd_1 conda-forge Cached
scanpy 1.10.1 pyhd8ed1ab_0 conda-forge Cached

Summary:

Install: 119 packages

Total download: 0 B

───────────────────────────────────────────────────────────────────────────────────────

Downloading and Extracting Packages:

Preparing transaction: done
Verifying transaction: done
Executing transaction: done
Installing pip dependencies: | Ran pip subprocess with arguments:

Requirement already satisfied: numpy in .requirements.txt (line 1)) (1.24.4)
Requirement already satisfied: pandas in vb.requirements.txt (line 2)) (2.0.3)

Pip subprocess error:
ERROR: Ignored the following versions that require a different python version: 1.25.0 Requires-Python >=3.9; 1.25.0rc1 Requires-Python >=3.9; 1.25.1 Requires-Python >=3.9; 1.25.2 Requires-Python >=3.9; 1.26.0 Requires-Python <3.13,>=3.9; 1.26.0b1 Requires-Python <3.13,>=3.9; 1.26.0rc1 Requires-Python <3.13,>=3.9; 1.26.1 Requires-Python <3.13,>=3.9; 1.26.2 Requires-Python >=3.9; 1.26.3 Requires-Python >=3.9; 1.26.4 Requires-Python >=3.9; 2.0.0b1 Requires-Python >=3.9; 2.0.0rc1 Requires-Python >=3.9; 2.1.0 Requires-Python >=3.9; 2.1.0rc0 Requires-Python >=3.9; 2.1.1 Requires-Python >=3.9; 2.1.2 Requires-Python >=3.9; 2.1.3 Requires-Python >=3.9; 2.1.4 Requires-Python >=3.9; 2.2.0 Requires-Python >=3.9; 2.2.0rc0 Requires-Python >=3.9; 2.2.1 Requires-Python >=3.9; 2.2.2 Requires-Python >=3.9
ERROR: Could not find a version that satisfies the requirement itertools (from versions: none)
ERROR: No matching distribution found for itertools
failed

Would be happy for any help.

Thanks

Nodes are not implemented error when running NSForest

I successfully installed NSForest and am running it as outlined in the readme:
adata_markers = NS_Forest(adata, clusterLabelcolumnHeader = 'Leiden_annotation')

I'm getting the following error (see below for copy of error) in Pandas when running, though. Is there a solution to this?

 ---------------------------------------------------------------------------
NotImplementedError                       Traceback (most recent call last)
/tmp/ipykernel_151703/3993040988.py in <module>
      1 # Run NS_Forest
----> 2 adata_markers = NS_Forest(adata, clusterLabelcolumnHeader = 'Leiden_annotation')
      3 
      4 # Get list of minimal markers
      5 Markers = list(itertools.chain.from_iterable(adata_markers['NSForest_Markers']))

/mnt/ibm_lg/spatial-seq/gene_panel_tools/NSForest/NSForest_v3.py in NS_Forest(adata, clusterLabelcolumnHeader, rfTrees, Median_Expression_Level, Genes_to_testing, betaValue)
    216         FullpermutationList = permutor(queryInequalities)
    217         print(len(FullpermutationList))
--> 218         f1_store = fbetaTest(FullpermutationList, column, adata, Binary_RankedList, testArray, betaValue)
    219         f1_store_1D.update(f1_store)
    220 

/mnt/ibm_lg/spatial-seq/gene_panel_tools/NSForest/NSForest_v3.py in fbetaTest(x, column, adata, Binary_RankedList, testArray, betaValue)
    125             testArray['y_pred'] = 0
    126             betaQuery = '&'.join(list)
--> 127             Ineq1 = Subset_dataframe.query(betaQuery)
    128             testList = Ineq1.index.tolist()
    129             testArray.loc[testList, 'y_pred'] = 1

~/miniconda3/envs/jupyter/lib/python3.9/site-packages/pandas/core/frame.py in query(self, expr, inplace, **kwargs)
   4053         kwargs["level"] = kwargs.pop("level", 0) + 1
   4054         kwargs["target"] = None
-> 4055         res = self.eval(expr, **kwargs)
   4056 
   4057         try:

~/miniconda3/envs/jupyter/lib/python3.9/site-packages/pandas/core/frame.py in eval(self, expr, inplace, **kwargs)
   4184         kwargs["resolvers"] = kwargs.get("resolvers", ()) + tuple(resolvers)
   4185 
-> 4186         return _eval(expr, inplace=inplace, **kwargs)
   4187 
   4188     def select_dtypes(self, include=None, exclude=None) -> DataFrame:

~/miniconda3/envs/jupyter/lib/python3.9/site-packages/pandas/core/computation/eval.py in eval(expr, parser, engine, truediv, local_dict, global_dict, resolvers, level, target, inplace)
    346         )
    347 
--> 348         parsed_expr = Expr(expr, engine=engine, parser=parser, env=env)
    349 
    350         # construct the engine and evaluate the parsed expression

~/miniconda3/envs/jupyter/lib/python3.9/site-packages/pandas/core/computation/expr.py in __init__(self, expr, engine, parser, env, level)
    804         self.parser = parser
    805         self._visitor = PARSERS[parser](self.env, self.engine, self.parser)
--> 806         self.terms = self.parse()
    807 
    808     @property

~/miniconda3/envs/jupyter/lib/python3.9/site-packages/pandas/core/computation/expr.py in parse(self)
    823         Parse an expression.
    824         """
--> 825         return self._visitor.visit(self.expr)
    826 
    827     @property

~/miniconda3/envs/jupyter/lib/python3.9/site-packages/pandas/core/computation/expr.py in visit(self, node, **kwargs)
    409         method = "visit_" + type(node).__name__
    410         visitor = getattr(self, method)
--> 411         return visitor(node, **kwargs)
    412 
    413     def visit_Module(self, node, **kwargs):

~/miniconda3/envs/jupyter/lib/python3.9/site-packages/pandas/core/computation/expr.py in visit_Module(self, node, **kwargs)
    415             raise SyntaxError("only a single expression is allowed")
    416         expr = node.body[0]
--> 417         return self.visit(expr, **kwargs)
    418 
    419     def visit_Expr(self, node, **kwargs):

~/miniconda3/envs/jupyter/lib/python3.9/site-packages/pandas/core/computation/expr.py in visit(self, node, **kwargs)
    409         method = "visit_" + type(node).__name__
    410         visitor = getattr(self, method)
--> 411         return visitor(node, **kwargs)
    412 
    413     def visit_Module(self, node, **kwargs):

~/miniconda3/envs/jupyter/lib/python3.9/site-packages/pandas/core/computation/expr.py in f(self, *args, **kwargs)
    261 
    262     def f(self, *args, **kwargs):
--> 263         raise NotImplementedError(f"'{node_name}' nodes are not implemented")
    264 
    265     return f

NotImplementedError: 'AnnAssign' nodes are not implemented

KeyError: '0' on the ranking step

I transferred a Seurat object, so my anndata looks like this:

MU150CDXT_scaled

AnnData object with n_obs × n_vars = 3841 × 2000
obs: 'orig.ident', 'nCount_RNA', 'nFeature_RNA', 'percent.mt', 'percent.rb', 'RNA_snn_res.0.5', 'louvain'
var: 'vst.mean', 'vst.variance', 'vst.variance.expected', 'vst.variance.standardized', 'vst.variable'

I assume NSForest needs just highly variable genes and scaled expression values. I also transferred cells' and features' metadata from Seurat object. I renamed seurat clusters to "louvain" manually.

When I run
MU150CDXT_markers = NS_Forest(MU150CDXT_scaled)

I get this output with an error:

8
0
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 2895, in get_loc
return self._engine.get_loc(casted_key)
File "pandas/_libs/index.pyx", line 70, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 98, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index_class_helper.pxi", line 93, in pandas._libs.index.Int64Engine._check_type
KeyError: '0'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "", line 1, in
File "/Users/yuliai/PycharmProjects/NSforest/NSForest_v3.py", line 202, in NS_Forest
Positive_RankedList_Complete = negativeOut(RankedList, column, medianValues, Median_Expression_Level)
File "/Users/yuliai/PycharmProjects/NSforest/NSForest_v3.py", line 48, in negativeOut
if medianValues.loc[column, i] > Median_Expression_Level:
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pandas/core/indexing.py", line 873, in getitem
return self._getitem_tuple(key)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pandas/core/indexing.py", line 1044, in _getitem_tuple
return self._getitem_lowerdim(tup)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pandas/core/indexing.py", line 786, in _getitem_lowerdim
section = self._getitem_axis(key, axis=i)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pandas/core/indexing.py", line 1110, in _getitem_axis
return self._get_label(key, axis=axis)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pandas/core/indexing.py", line 1059, in _get_label
return self.obj.xs(label, axis=axis)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pandas/core/generic.py", line 3491, in xs
loc = self.index.get_loc(key)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 2897, in get_loc
raise KeyError(key) from err
KeyError: '0'

I guess something is wrong with my 'column' variable. I do not really understand where it comes from.

Potential significant optimization with combinations instead of permutations

Based on my understanding of your algorithm and looking at the "results" and "topResults" csv output files (which contain lines with the same f-measure value for different orderings of a given set of features) I think that at this line

els = [list(x) for x in itertools.permutations(binarylist2, i)]

in the permutor function, you could use the itertools.combinations function and still explore all sets of features required. This would provide significant speedup as there are far fewer combinations than permutations.

jcventerinstitute / nsforest Goto Github PK

nsforest's People

Contributors

Stargazers

Watchers

Forkers

nsforest's Issues

Recommend Projects

Recommend Topics

Recommend Org