jcventerinstitute / nsforest Goto Github PK
View Code? Open in Web Editor NEWA machine learning method for the discovery of the minimum marker gene combinations for cell type identification from single-cell RNA sequencing
License: MIT License
A machine learning method for the discovery of the minimum marker gene combinations for cell type identification from single-cell RNA sequencing
License: MIT License
Hi, I have run NSForest V3.0 using "NS_Forest(adata, clusterLabelcolumnHeader = "leiden"" command in Jupyter Notebook (on a Windows 10 OS) and the program ran for a while. However, after a few minutes I got this message:
File :1
6330403 K07Rik >=1.4837858080863953
^
SyntaxError: invalid syntax
Would you please help me solve this issue?
Hi there,
Thank you so much for creating such an awesome tool! I am quite new to coding especially in python and encountered several errors when running the NSForest function. I am not sure if these problems were specific to my object but I thought I would post them here in case someone else has the same issues. I have tried to fix some of the errors and have gotten the function to finish but I'm not 100% sure if the output is correct.
#---------------------------------------------------------------------------------------------------------------------------------------
The first error was in line 167 of the source code:
"AttributeError: Can only use .cat accessor with a 'category' dtype"
I changed this:
medianValues = pd.DataFrame(columns=adata.var_names, index=adata.obs[clusterLabelcolumnHeader].cat.categories)
to this:
medianValues = pd.DataFrame(columns=adata.var_names, index=adata.obs[clusterLabelcolumnHeader].unique())
which seemed to work
#----------------------------------------------------------------------------------------------------------------------------------------------
The second error was in line 172:
"ValueError: Shape of passed values is (49211, 1), indices imply (49211, 31002)" which I think is because the input to create the pandas data frame was in the wrong format.
When I changed this:
Subset_dataframe = pd.DataFrame(data = subset_adata.X, index = subset_adata.obs, columns = subset_adata.var_names)
to this:
Subset_dataframe = pd.DataFrame(data = subset_adata.X.toarray(), index = list(subset_adata.obs["cells"].tolist()), columns =
subset_adata.var_names)
it seemed to work.
#------------------------------------------------------------------------------------------------------------------------------------------------
A similar problem in line 121 of the source code:
When running:
def fbetaTest(x, column, adata, Binary_RankedList, testArray, betaValue = 0.5)
I get this error "ValueError: Shape of passed values is (113957, 1), indices imply (113957, 0)"
But I changed this:
Subset_dataframe = pd.DataFrame(data = subset_adata.X, index = subset_adata.obs_names, columns = subset_adata.var_names)
to this:
Subset_dataframe = pd.DataFrame(data=subset_adata.X.toarray(), index=subset_adata.obs_names.tolist(), columns=subset_adata.var_names)
and it seemed to work.
#-------------------------------------------------------------------------------------------------------------------------------------
Another error was in line 94 of the source code:
"IndexError: Index dimension must be <= 2"
X = x_train[:, None]
I don't think this code is actually necessary so I commented it out which seemed to fix the problem.
#----------------------------------------------------------------------------------------------------------------------------------------------
Then I also got several errors in the last section and I couldn't quite figure out what the problems were but I changed the code from this:
#Move binary genes to Results dataframe
clusters2Genes = pd.DataFrame(columns = ['Gene', 'clusterName'])
clusters2Genes["clusterName"] = Binary_score_store_DF["clusterName"]
clusters2Genes["Gene"] = Binary_score_store_DF.index
GroupedBinarylist = clusters2Genes.groupby('clusterName').apply(lambda x: x['Gene'].unique())
BinaryFinal = pd.DataFrame(columns = ['clusterName','Binary_Genes'])
BinaryFinal['clusterName'] = GroupedBinarylist.index
BinaryFinal['Binary_Genes'] = GroupedBinarylist.values
to this:
Binary_score_store_DF = pd.read_csv('NS-Forest_v3_Extended_Binary_Markers_Supplmental.csv')
# Move binary genes to Results dataframe
clusters2Genes = pd.DataFrame(columns=['Gene', 'clusterName'])
clusters2Genes["clusterName"] = Binary_score_store_DF["clusterName"]
clusters2Genes["Gene"] = Binary_score_store_DF["Unnamed: 0"]
clusters2Genes.to_csv('clusters2Genes.csv')
#GroupedBinarylist = clusters2Genes.groupby('clusterName').apply(lambda x: x['Gene'].unique())
#GroupedBinarylist = clusters2Genes.apply(lambda x: x['Gene'].unique()) #This seemed to work earlier
BinaryFinal = pd.DataFrame(columns=['clusterName', 'Binary_Genes'])
BinaryFinal['clusterName'] = clusters2Genes["clusterName"]
BinaryFinal['Binary_Genes'] = clusters2Genes["Gene"]
BinaryFinal.to_csv('BinaryFinal.csv')
It seems that in this line of code:
clusters2Genes["clusterName"] = Binary_score_store_DF["clusterName"]
the column name in the Binary_score_store dataframe was incorrect
And in this line of code:
GroupedBinarylist = clusters2Genes.groupby('clusterName').apply(lambda x: x['Gene'].unique())
the clusters are already grouped together and all the gene names are already unique??
Couldn't get around the last problem so ended up just commenting out those lines of code and now I am not sure if my output files are what they should be. Any advice would be greatly appreciated!
I have been running into an error at the line testArray['y_pred'] = 0. IS this suppose to be converting all the values to 0's?
Just installed via pip successfully:
(base) jespinozlt2-osx:~ jespinoz$ conda activate soothsayer_py3.9_env
pip install (soothsayer_py3.9_env) jespinozlt2-osx:~ jespinoz$ pip install nsforest
Collecting nsforest
Downloading nsforest-3.9.2.5-py3-none-any.whl (7.0 kB)
Requirement already satisfied: scanpy>=1.9.3 in ./anaconda3/envs/soothsayer_py3.9_env/lib/python3.9/site-packages (from nsforest) (1.9.3)
Requirement already satisfied: scikit-learn>=0.22 in ./anaconda3/envs/soothsayer_py3.9_env/lib/python3.9/site-packages (from scanpy>=1.9.3->nsforest) (1.0.2)
Requirement already satisfied: anndata>=0.7.4 in ./anaconda3/envs/soothsayer_py3.9_env/lib/python3.9/site-packages (from scanpy>=1.9.3->nsforest) (0.8.0)
Requirement already satisfied: patsy in ./anaconda3/envs/soothsayer_py3.9_env/lib/python3.9/site-packages (from scanpy>=1.9.3->nsforest) (0.5.2)
Requirement already satisfied: umap-learn>=0.3.10 in ./anaconda3/envs/soothsayer_py3.9_env/lib/python3.9/site-packages (from scanpy>=1.9.3->nsforest) (0.5.2)
Requirement already satisfied: seaborn in ./anaconda3/envs/soothsayer_py3.9_env/lib/python3.9/site-packages (from scanpy>=1.9.3->nsforest) (0.11.2)
Requirement already satisfied: scipy>=1.4 in ./anaconda3/envs/soothsayer_py3.9_env/lib/python3.9/site-packages (from scanpy>=1.9.3->nsforest) (1.8.0)
Requirement already satisfied: tqdm in ./anaconda3/envs/soothsayer_py3.9_env/lib/python3.9/site-packages (from scanpy>=1.9.3->nsforest) (4.62.3)
Requirement already satisfied: pandas>=1.0 in ./anaconda3/envs/soothsayer_py3.9_env/lib/python3.9/site-packages (from scanpy>=1.9.3->nsforest) (1.4.0)
Requirement already satisfied: session-info in ./anaconda3/envs/soothsayer_py3.9_env/lib/python3.9/site-packages (from scanpy>=1.9.3->nsforest) (1.0.0)
Requirement already satisfied: packaging in ./anaconda3/envs/soothsayer_py3.9_env/lib/python3.9/site-packages (from scanpy>=1.9.3->nsforest) (21.3)
Requirement already satisfied: natsort in ./anaconda3/envs/soothsayer_py3.9_env/lib/python3.9/site-packages (from scanpy>=1.9.3->nsforest) (8.1.0)
Requirement already satisfied: networkx>=2.3 in ./anaconda3/envs/soothsayer_py3.9_env/lib/python3.9/site-packages (from scanpy>=1.9.3->nsforest) (2.6.3)
Requirement already satisfied: statsmodels>=0.10.0rc2 in ./anaconda3/envs/soothsayer_py3.9_env/lib/python3.9/site-packages (from scanpy>=1.9.3->nsforest) (0.13.1)
Requirement already satisfied: joblib in ./anaconda3/envs/soothsayer_py3.9_env/lib/python3.9/site-packages (from scanpy>=1.9.3->nsforest) (1.1.0)
Requirement already satisfied: h5py>=3 in ./anaconda3/envs/soothsayer_py3.9_env/lib/python3.9/site-packages (from scanpy>=1.9.3->nsforest) (3.7.0)
Requirement already satisfied: numpy>=1.17.0 in ./anaconda3/envs/soothsayer_py3.9_env/lib/python3.9/site-packages (from scanpy>=1.9.3->nsforest) (1.21.5)
Requirement already satisfied: numba>=0.41.0 in ./anaconda3/envs/soothsayer_py3.9_env/lib/python3.9/site-packages (from scanpy>=1.9.3->nsforest) (0.55.1)
Requirement already satisfied: matplotlib>=3.4 in ./anaconda3/envs/soothsayer_py3.9_env/lib/python3.9/site-packages (from scanpy>=1.9.3->nsforest) (3.5.1)
Requirement already satisfied: fonttools>=4.22.0 in ./anaconda3/envs/soothsayer_py3.9_env/lib/python3.9/site-packages (from matplotlib>=3.4->scanpy>=1.9.3->nsforest) (4.29.1)
Requirement already satisfied: python-dateutil>=2.7 in ./anaconda3/envs/soothsayer_py3.9_env/lib/python3.9/site-packages (from matplotlib>=3.4->scanpy>=1.9.3->nsforest) (2.8.2)
Requirement already satisfied: pillow>=6.2.0 in ./anaconda3/envs/soothsayer_py3.9_env/lib/python3.9/site-packages (from matplotlib>=3.4->scanpy>=1.9.3->nsforest) (9.0.1)
Requirement already satisfied: pyparsing>=2.2.1 in ./anaconda3/envs/soothsayer_py3.9_env/lib/python3.9/site-packages (from matplotlib>=3.4->scanpy>=1.9.3->nsforest) (3.0.7)
Requirement already satisfied: cycler>=0.10 in ./anaconda3/envs/soothsayer_py3.9_env/lib/python3.9/site-packages (from matplotlib>=3.4->scanpy>=1.9.3->nsforest) (0.11.0)
Requirement already satisfied: kiwisolver>=1.0.1 in ./anaconda3/envs/soothsayer_py3.9_env/lib/python3.9/site-packages (from matplotlib>=3.4->scanpy>=1.9.3->nsforest) (1.3.2)
Requirement already satisfied: llvmlite<0.39,>=0.38.0rc1 in ./anaconda3/envs/soothsayer_py3.9_env/lib/python3.9/site-packages (from numba>=0.41.0->scanpy>=1.9.3->nsforest) (0.38.0)
Requirement already satisfied: setuptools in ./anaconda3/envs/soothsayer_py3.9_env/lib/python3.9/site-packages (from numba>=0.41.0->scanpy>=1.9.3->nsforest) (60.7.1)
Requirement already satisfied: pytz>=2020.1 in ./anaconda3/envs/soothsayer_py3.9_env/lib/python3.9/site-packages (from pandas>=1.0->scanpy>=1.9.3->nsforest) (2021.3)
Requirement already satisfied: threadpoolctl>=2.0.0 in ./anaconda3/envs/soothsayer_py3.9_env/lib/python3.9/site-packages (from scikit-learn>=0.22->scanpy>=1.9.3->nsforest) (3.1.0)
Requirement already satisfied: six in ./anaconda3/envs/soothsayer_py3.9_env/lib/python3.9/site-packages (from patsy->scanpy>=1.9.3->nsforest) (1.16.0)
Requirement already satisfied: pynndescent>=0.5 in ./anaconda3/envs/soothsayer_py3.9_env/lib/python3.9/site-packages (from umap-learn>=0.3.10->scanpy>=1.9.3->nsforest) (0.5.6)
Requirement already satisfied: stdlib-list in ./anaconda3/envs/soothsayer_py3.9_env/lib/python3.9/site-packages (from session-info->scanpy>=1.9.3->nsforest) (0.8.0)
Installing collected packages: nsforest
Successfully installed nsforest-3.9.2.5
Following this tutorial:
https://jcventerinstitute.github.io/celligrate/tutorials/NS-Forest_tutorial.html
(soothsayer_py3.9_env) jespinozlt2-osx:~ jespinoz$ python
Python 3.9.15 | packaged by conda-forge | (main, Nov 22 2022, 08:55:37)
[Clang 14.0.6 ] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import NSForest_v3dot9_1 as nsf
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ModuleNotFoundError: No module named 'NSForest_v3dot9_1'
>>> import nsforest as nsf
>>> nsf.__version__
'3.9.2'
Looks like the import command needs to be changed to from nsforest import *
Hello - I've been attempting to use NSForest_v3 to identify marker genes and am running into an index issue. I followed the solutions in the other two posted issues to make the cluster column categorical and to rename my clusters "cluster#", and it looks like it starts running and then throws an error after removing median expressed genes? I pasted the error I'm getting below.
I'm pretty new to python so I apologize if there is a stupid solution.
Thanks!
IndexError Traceback (most recent call last)
/var/folders/xl/ly4bqc356sd34lm07_ggzdw1pfgrjf/T/ipykernel_22678/4072770710.py in
1 # run NSForest on BM_combined
----> 2 adata_markers = NS_Forest(adata, clusterLabelcolumnHeader = 'integrated_snn_res.0.5')
/var/folders/xl/ly4bqc356sd34lm07_ggzdw1pfgrjf/T/ipykernel_22678/3411186380.py in NS_Forest(adata, clusterLabelcolumnHeader, rfTrees, Median_Expression_Level, Genes_to_testing, betaValue)
217 FullpermutationList = permutor(queryInequalities)
218 print(len(FullpermutationList))
--> 219 f1_store = fbetaTest(FullpermutationList, column, adata, Binary_RankedList, testArray, betaValue)
220 f1_store_1D.update(f1_store)
221
/var/folders/xl/ly4bqc356sd34lm07_ggzdw1pfgrjf/T/ipykernel_22678/3411186380.py in fbetaTest(x, column, adata, Binary_RankedList, testArray, betaValue)
120 fbeta_dict = {}
121 subset_adata = adata[:,Binary_RankedList]
--> 122 Subset_dataframe = pd.DataFrame(data = subset_adata.X, index = subset_adata.obs_names, columns = subset_adata.var_names)
123 Subset_dataframe.columns = Subset_dataframe.columns.str.replace("-", "").str.replace(".", "")
124
~/miniconda3/lib/python3.9/site-packages/anndata/_core/anndata.py in X(self)
623 elif self.is_view:
624 X = as_view(
--> 625 _subset(self._adata_ref.X, (self._oidx, self._vidx)),
626 ElementRef(self, "X"),
627 )
~/miniconda3/lib/python3.9/functools.py in wrapper(*args, **kw)
875 '1 positional argument')
876
--> 877 return dispatch(args[0].class)(*args, **kw)
878
879 funcname = getattr(func, 'name', 'singledispatch function')
~/miniconda3/lib/python3.9/site-packages/anndata/_core/index.py in subset(a, subset_idx)
125 if all(isinstance(x, cabc.Iterable) for x in subset_idx):
126 subset_idx = np.ix(*subset_idx)
--> 127 return a[subset_idx]
128
129
IndexError: arrays used as indices must be of integer (or boolean) type`
Hello,
I tried your code to delineate the minimum marker genes to characterize subsets of Treg cells. Generally, I work with Seurat package to analyse my singlecell projects. I would like to generate a UMAP plot by using genes identified with NS Forest (2.0 and 1.3). For that, I did a subset of the initial SeuratObject by specifiying the identified genes as features. Then, I performed "FindNeighbors", "FindClusters" and "RunUMAP". Typically, I find the same plot between the one with 2000 genes and the one with 50 genes.
Have you ever tried to combine Seurat and your tool, rather than Scanpy?
Thank you in advance
Hi,
I had an error when I ran the codes in cell 3 of the script 'NS_Forest_v2.ipynb' on Jupyter Notebook. Codes in cell 1 & 2 went smoothly. The following messages were printed on the screen.
RTN1
5.4558917988718
GPM6A
6.5202792613131395
HSPB1
6.68820560318182
PHGDH
4.81677071212406
ANXA5
4.08874724866395
CLU
4.61930439015417
...D:\Anaconda3\envs\py27\lib\site-packages\ipykernel_launcher.py:98: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value insteadSee the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
D:\Anaconda3\envs\py27\lib\site-packages\pandas\core\indexing.py:543: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value insteadSee the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
self.obj[item] = sRTN1
8.321140985238856
GPM6A
8.261602044663064
HSPB1
3.103076840680365
PHGDH
2.4805289343488903
ANXA5
2.4805289343488903
...
TypeError Traceback (most recent call last)
in ()
74 max_grouped.df.to_csv('NSForest_v2_maxF-scores.csv')
75
---> 76 NSForest_Results_Table_Fin["f-measureRank"] = NSForest_Results_Table_Fin.groupby(by="clusterName")["f-measure"].rank(ascending=False)
77 topResults = NSForest_Results_Table_Fin["f-measureRank"] < 50
78 NSForest_Results_Table_top = NSForest_Results_Table_Fin[topResults]D:\Anaconda3\envs\py27\lib\site-packages\pandas\core\groupby\groupby.pyc in rank(self, method, ascending, na_option, pct, axis)
1904 return self._cython_transform('rank', numeric_only=False,
1905 ties_method=method, ascending=ascending,
-> 1906 na_option=na_option, pct=pct, axis=axis)
1907
1908 @substitution(name='groupby')D:\Anaconda3\envs\py27\lib\site-packages\pandas\core\groupby\groupby.pyc in _cython_transform(self, how, numeric_only, **kwargs)
1023 try:
1024 result, names = self.grouper.transform(obj.values, how,
-> 1025 **kwargs)
1026 except NotImplementedError:
1027 continueD:\Anaconda3\envs\py27\lib\site-packages\pandas\core\groupby\groupby.pyc in transform(self, values, how, axis, **kwargs)
2628
2629 def transform(self, values, how, axis=0, **kwargs):
-> 2630 return self._cython_operation('transform', values, how, axis, **kwargs)
2631
2632 def _aggregate(self, result, counts, values, comp_ids, agg_func,D:\Anaconda3\envs\py27\lib\site-packages\pandas\core\groupby\groupby.pyc in _cython_operation(self, kind, values, how, axis, min_count, **kwargs)
2588 result = self._transform(
2589 result, values, labels, func, is_numeric, is_datetimelike,
-> 2590 **kwargs)
2591
2592 if is_integer_dtype(result) and not is_datetimelike:D:\Anaconda3\envs\py27\lib\site-packages\pandas\core\groupby\groupby.pyc in _transform(self, result, values, comp_ids, transform_func, is_numeric, is_datetimelike, **kwargs)
2662 comp_ids, is_datetimelike, **kwargs)
2663 else:
-> 2664 transform_func(result, values, comp_ids, is_datetimelike, **kwargs)
2665
2666 return resultD:\Anaconda3\envs\py27\lib\site-packages\pandas\core\groupby\groupby.pyc in wrapper(*args, **kwargs)
2477
2478 def wrapper(*args, **kwargs):
-> 2479 return f(afunc, *args, **kwargs)
2480
2481 # need to curry our sub-functionD:\Anaconda3\envs\py27\lib\site-packages\pandas\core\groupby\groupby.pyc in (func, a, b, c, d, **kwargs)
2429 kwargs.get('ascending', True),
2430 kwargs.get('pct', False),
-> 2431 kwargs.get('na_option', 'keep')
2432 )
2433 }TypeError: 'NoneType' object is not callable
What I did:
I followed the Prerequisites in the README.md to install the modules and create the input file. A glance of the input file was shown below.
In [3]: dataFull.head()
Out[3]:
AAK1 AARS ABCD3 ABHD12 ... YWHAH YWHAQ YWHAZ Clusters
AMP358_sc11 5.318264 5.854272 2.480529 6.184276 ... 5.204884 7.599972 6.350880 Cl_1
AMP358_sc18 2.480529 2.480529 2.480529 4.379044 ... 3.193956 7.510237 5.710166 Cl_1
AMP358_sc7 6.228200 2.996529 3.206476 3.890947 ... 2.996529 8.167026 6.861036 Cl_1
AMP358_sc1 3.233557 3.320131 3.534038 4.060293 ... 2.860235 6.999525 6.781777 Cl_1
AMP358_sc11 3.219218 2.480529 2.480529 3.005682 ... 2.480529 7.919291 8.287178 Cl_1[5 rows x 1255 columns]
Software information:
OS: Windows 10
Python version: 2.7.15
In addition, I tried running the script in CentOS 7 system and got the same error.
Would anyone help solve the issue? Any suggestion is welcome. Thanks!
Dear author,
I test the latest version and get this error. Is that because the number of genes exceed the limitation?
adata = sc.read_h5ad("***.h5ad")
adata.obs['anno2'] = adata.obs['anno2'].astype('category')
adata_markers = NS_Forest(adata, clusterLabelcolumnHeader = "anno2")
I converted a Seurat object to AnnData using SeuratDisk. It seemed to work.
These are the features of the resulting object:
AnnData object with n_obs × n_vars = 17912 × 3000
obs: 'orig.ident', 'nCount_RNA', 'nFeature_RNA', 'status', 'shared_assignment', 'assignment', 'axis', 'log10GenesPerUMI', 'mitoRatio', 'nCount_SCT', 'nFeature_SCT', 'SCT_snn_res.0.4', 'SCT_snn_res.0.6', 'SCT_snn_res.0.8', 'SCT_snn_res.1', 'SCT_snn_res.1.4', 'SCT_snn_res.2', 'seurat_clusters', 'S.Score', 'G2M.Score', 'Phase'
var: 'features'
uns: 'neighbors'
obsm: 'X_pca', 'X_umap'
varm: 'PCs'
obsp: 'distances'
The cluster assignment I want to use is 'SCT_snn_res.0.8'. I changed this to dtype category and then changed the function call to NS_Forest to reflect that I want to use this column.
When I run adata_markers = NS_Forest(adata)
It starts up, but I get this error:
22
0
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
/opt/conda/lib/python3.8/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
3079 try:
-> 3080 return self._engine.get_loc(casted_key)
3081 except KeyError as err:
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/index_class_helper.pxi in pandas._libs.index.Int64Engine._check_type()
KeyError: '0'
The above exception was the direct cause of the following exception:
KeyError Traceback (most recent call last)
<ipython-input-7-6cd156db933d> in <module>
----> 1 adata_markers = NS_Forest(adata)
<ipython-input-4-5528acf1c438> in NS_Forest(adata, clusterLabelcolumnHeader, rfTrees, Median_Expression_Level, Genes_to_testing, betaValue)
202
203 #Rerank according to expression level and binary score
--> 204 Positive_RankedList_Complete = negativeOut(RankedList, column, medianValues, Median_Expression_Level)
205 print(Positive_RankedList_Complete)
206
<ipython-input-4-5528acf1c438> in negativeOut(x, column, medianValues, Median_Expression_Level)
48 Positive_RankedList_Complete = []
49 for i in x:
---> 50 if medianValues.loc[column, i] > Median_Expression_Level:
51 print(i)
52 print(medianValues.loc[column, i])
/opt/conda/lib/python3.8/site-packages/pandas/core/indexing.py in __getitem__(self, key)
887 # AttributeError for IntervalTree get_value
888 return self.obj._get_value(*key, takeable=self._takeable)
--> 889 return self._getitem_tuple(key)
890 else:
891 # we by definition only have the 0th axis
/opt/conda/lib/python3.8/site-packages/pandas/core/indexing.py in _getitem_tuple(self, tup)
1058 def _getitem_tuple(self, tup: Tuple):
1059 with suppress(IndexingError):
-> 1060 return self._getitem_lowerdim(tup)
1061
1062 # no multi-index, so validate all of the indexers
/opt/conda/lib/python3.8/site-packages/pandas/core/indexing.py in _getitem_lowerdim(self, tup)
805 # We don't need to check for tuples here because those are
806 # caught by the _is_nested_tuple_indexer check above.
--> 807 section = self._getitem_axis(key, axis=i)
808
809 # We should never have a scalar section here, because
/opt/conda/lib/python3.8/site-packages/pandas/core/indexing.py in _getitem_axis(self, key, axis)
1122 # fall thru to straight lookup
1123 self._validate_key(key, axis)
-> 1124 return self._get_label(key, axis=axis)
1125
1126 def _get_slice_axis(self, slice_obj: slice, axis: int):
/opt/conda/lib/python3.8/site-packages/pandas/core/indexing.py in _get_label(self, label, axis)
1071 def _get_label(self, label, axis: int):
1072 # GH#5667 this will fail if the label is not present in the axis.
-> 1073 return self.obj.xs(label, axis=axis)
1074
1075 def _handle_lowerdim_multi_index_axis0(self, tup: Tuple):
/opt/conda/lib/python3.8/site-packages/pandas/core/generic.py in xs(self, key, axis, level, drop_level)
3737 raise TypeError(f"Expected label or tuple of labels, got {key}") from e
3738 else:
-> 3739 loc = index.get_loc(key)
3740
3741 if isinstance(loc, np.ndarray):
/opt/conda/lib/python3.8/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
3080 return self._engine.get_loc(casted_key)
3081 except KeyError as err:
-> 3082 raise KeyError(key) from err
3083
3084 if tolerance is not None:
KeyError: '0'
Hello when I try to install the package using .yml file from git clone directory I get following error:
Looking for: ['python=3.8', 'pip', 'scanpy']
Transaction
Prefix:
Updating specs:
Package Version Build Channel Size
───────────────────────────────────────────────────────────────────────────────────────
Install:
───────────────────────────────────────────────────────────────────────────────────────
Summary:
Install: 119 packages
Total download: 0 B
───────────────────────────────────────────────────────────────────────────────────────
Downloading and Extracting Packages:
Preparing transaction: done
Verifying transaction: done
Executing transaction: done
Installing pip dependencies: | Ran pip subprocess with arguments:
Requirement already satisfied: numpy in .requirements.txt (line 1)) (1.24.4)
Requirement already satisfied: pandas in vb.requirements.txt (line 2)) (2.0.3)
Pip subprocess error:
ERROR: Ignored the following versions that require a different python version: 1.25.0 Requires-Python >=3.9; 1.25.0rc1 Requires-Python >=3.9; 1.25.1 Requires-Python >=3.9; 1.25.2 Requires-Python >=3.9; 1.26.0 Requires-Python <3.13,>=3.9; 1.26.0b1 Requires-Python <3.13,>=3.9; 1.26.0rc1 Requires-Python <3.13,>=3.9; 1.26.1 Requires-Python <3.13,>=3.9; 1.26.2 Requires-Python >=3.9; 1.26.3 Requires-Python >=3.9; 1.26.4 Requires-Python >=3.9; 2.0.0b1 Requires-Python >=3.9; 2.0.0rc1 Requires-Python >=3.9; 2.1.0 Requires-Python >=3.9; 2.1.0rc0 Requires-Python >=3.9; 2.1.1 Requires-Python >=3.9; 2.1.2 Requires-Python >=3.9; 2.1.3 Requires-Python >=3.9; 2.1.4 Requires-Python >=3.9; 2.2.0 Requires-Python >=3.9; 2.2.0rc0 Requires-Python >=3.9; 2.2.1 Requires-Python >=3.9; 2.2.2 Requires-Python >=3.9
ERROR: Could not find a version that satisfies the requirement itertools (from versions: none)
ERROR: No matching distribution found for itertools
failed
Would be happy for any help.
Thanks
I successfully installed NSForest and am running it as outlined in the readme:
adata_markers = NS_Forest(adata, clusterLabelcolumnHeader = 'Leiden_annotation')
I'm getting the following error (see below for copy of error) in Pandas when running, though. Is there a solution to this?
---------------------------------------------------------------------------
NotImplementedError Traceback (most recent call last)
/tmp/ipykernel_151703/3993040988.py in <module>
1 # Run NS_Forest
----> 2 adata_markers = NS_Forest(adata, clusterLabelcolumnHeader = 'Leiden_annotation')
3
4 # Get list of minimal markers
5 Markers = list(itertools.chain.from_iterable(adata_markers['NSForest_Markers']))
/mnt/ibm_lg/spatial-seq/gene_panel_tools/NSForest/NSForest_v3.py in NS_Forest(adata, clusterLabelcolumnHeader, rfTrees, Median_Expression_Level, Genes_to_testing, betaValue)
216 FullpermutationList = permutor(queryInequalities)
217 print(len(FullpermutationList))
--> 218 f1_store = fbetaTest(FullpermutationList, column, adata, Binary_RankedList, testArray, betaValue)
219 f1_store_1D.update(f1_store)
220
/mnt/ibm_lg/spatial-seq/gene_panel_tools/NSForest/NSForest_v3.py in fbetaTest(x, column, adata, Binary_RankedList, testArray, betaValue)
125 testArray['y_pred'] = 0
126 betaQuery = '&'.join(list)
--> 127 Ineq1 = Subset_dataframe.query(betaQuery)
128 testList = Ineq1.index.tolist()
129 testArray.loc[testList, 'y_pred'] = 1
~/miniconda3/envs/jupyter/lib/python3.9/site-packages/pandas/core/frame.py in query(self, expr, inplace, **kwargs)
4053 kwargs["level"] = kwargs.pop("level", 0) + 1
4054 kwargs["target"] = None
-> 4055 res = self.eval(expr, **kwargs)
4056
4057 try:
~/miniconda3/envs/jupyter/lib/python3.9/site-packages/pandas/core/frame.py in eval(self, expr, inplace, **kwargs)
4184 kwargs["resolvers"] = kwargs.get("resolvers", ()) + tuple(resolvers)
4185
-> 4186 return _eval(expr, inplace=inplace, **kwargs)
4187
4188 def select_dtypes(self, include=None, exclude=None) -> DataFrame:
~/miniconda3/envs/jupyter/lib/python3.9/site-packages/pandas/core/computation/eval.py in eval(expr, parser, engine, truediv, local_dict, global_dict, resolvers, level, target, inplace)
346 )
347
--> 348 parsed_expr = Expr(expr, engine=engine, parser=parser, env=env)
349
350 # construct the engine and evaluate the parsed expression
~/miniconda3/envs/jupyter/lib/python3.9/site-packages/pandas/core/computation/expr.py in __init__(self, expr, engine, parser, env, level)
804 self.parser = parser
805 self._visitor = PARSERS[parser](self.env, self.engine, self.parser)
--> 806 self.terms = self.parse()
807
808 @property
~/miniconda3/envs/jupyter/lib/python3.9/site-packages/pandas/core/computation/expr.py in parse(self)
823 Parse an expression.
824 """
--> 825 return self._visitor.visit(self.expr)
826
827 @property
~/miniconda3/envs/jupyter/lib/python3.9/site-packages/pandas/core/computation/expr.py in visit(self, node, **kwargs)
409 method = "visit_" + type(node).__name__
410 visitor = getattr(self, method)
--> 411 return visitor(node, **kwargs)
412
413 def visit_Module(self, node, **kwargs):
~/miniconda3/envs/jupyter/lib/python3.9/site-packages/pandas/core/computation/expr.py in visit_Module(self, node, **kwargs)
415 raise SyntaxError("only a single expression is allowed")
416 expr = node.body[0]
--> 417 return self.visit(expr, **kwargs)
418
419 def visit_Expr(self, node, **kwargs):
~/miniconda3/envs/jupyter/lib/python3.9/site-packages/pandas/core/computation/expr.py in visit(self, node, **kwargs)
409 method = "visit_" + type(node).__name__
410 visitor = getattr(self, method)
--> 411 return visitor(node, **kwargs)
412
413 def visit_Module(self, node, **kwargs):
~/miniconda3/envs/jupyter/lib/python3.9/site-packages/pandas/core/computation/expr.py in f(self, *args, **kwargs)
261
262 def f(self, *args, **kwargs):
--> 263 raise NotImplementedError(f"'{node_name}' nodes are not implemented")
264
265 return f
NotImplementedError: 'AnnAssign' nodes are not implemented
I transferred a Seurat object, so my anndata looks like this:
MU150CDXT_scaled
AnnData object with n_obs × n_vars = 3841 × 2000
obs: 'orig.ident', 'nCount_RNA', 'nFeature_RNA', 'percent.mt', 'percent.rb', 'RNA_snn_res.0.5', 'louvain'
var: 'vst.mean', 'vst.variance', 'vst.variance.expected', 'vst.variance.standardized', 'vst.variable'
I assume NSForest needs just highly variable genes and scaled expression values. I also transferred cells' and features' metadata from Seurat object. I renamed seurat clusters to "louvain" manually.
When I run
MU150CDXT_markers = NS_Forest(MU150CDXT_scaled)
I get this output with an error:
8
0
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 2895, in get_loc
return self._engine.get_loc(casted_key)
File "pandas/_libs/index.pyx", line 70, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 98, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index_class_helper.pxi", line 93, in pandas._libs.index.Int64Engine._check_type
KeyError: '0'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "", line 1, in
File "/Users/yuliai/PycharmProjects/NSforest/NSForest_v3.py", line 202, in NS_Forest
Positive_RankedList_Complete = negativeOut(RankedList, column, medianValues, Median_Expression_Level)
File "/Users/yuliai/PycharmProjects/NSforest/NSForest_v3.py", line 48, in negativeOut
if medianValues.loc[column, i] > Median_Expression_Level:
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pandas/core/indexing.py", line 873, in getitem
return self._getitem_tuple(key)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pandas/core/indexing.py", line 1044, in _getitem_tuple
return self._getitem_lowerdim(tup)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pandas/core/indexing.py", line 786, in _getitem_lowerdim
section = self._getitem_axis(key, axis=i)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pandas/core/indexing.py", line 1110, in _getitem_axis
return self._get_label(key, axis=axis)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pandas/core/indexing.py", line 1059, in _get_label
return self.obj.xs(label, axis=axis)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pandas/core/generic.py", line 3491, in xs
loc = self.index.get_loc(key)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 2897, in get_loc
raise KeyError(key) from err
KeyError: '0'
I guess something is wrong with my 'column' variable. I do not really understand where it comes from.
Based on my understanding of your algorithm and looking at the "results" and "topResults" csv output files (which contain lines with the same f-measure value for different orderings of a given set of features) I think that at this line
els = [list(x) for x in itertools.permutations(binarylist2, i)]
in the permutor
function, you could use the itertools.combinations
function and still explore all sets of features required. This would provide significant speedup as there are far fewer combinations than permutations.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.