The cell-type-classification-paper from deweylab

Reproducing CellO Evaluation on Test Datasets

Hello,

I am currently doing my bachelor thesis with the goal of preventing the repeated retraining of the tool you have developed for cell classification - CellO - using some imputation methods. I am having some issues reproducing the results of the paper using the code of the evaluation you have provided for testing CellO on Zheng_PBMC, the lung cancer and the non-droplet datasets (i.e. to get the F1-scores and avg. precision for these datasets).

I downloaded the whole datasets from https://zenodo.org/record/4289064#.YoOS-FTP2Uk and the repository for running the evaluation from https://github.com/deweylab/cell-type-classification-paper.git. However, I could not understand how the code in the cell-type-classification-paper.git relates to the datasets files. In the Snakefiles there are some files required as an input or referenced.
I would have thought they could be found in the datasets files but they are not or are named differently. As an example, I could find bulk_labels.json in the dataset, but only references to labels.json in the code. There are references to expriment_to_study.json or untampered_bulk_primary_cells_with_data which I was also unable to identify.

I was also unsure which python scripts in the cell-type-classification-paper.git repository to run first. I would have thought to run the train_model.py first but as mentioned it required the labels.json and experiment_to_study.jsons files that do not exist in the provided datasets files.

Thus I would be grateful if you could help me with the following issues:

Which scripts should be run in order to reproduce the evaluation on the above mentioned three datasets?
In one of the python scripts in the cell-type-classification-paper.git repository, I found a dictionary for mapping the cell labels used by Zheng_PBMC dataset into the Cell Ontology labels. However, I could not find similar mapping dictionaries for the lung cancer and the non-droplet datasets. Do these exist? If so, I would be very gratful to get the cell types mapping for the remaining two datasets.
I tried to run CellO on the Zheng_PBMC.h5 from a jupyter notbook but it did not work because CellO expected it to be an AnnData (h5ad file). When I run it with command line it worked. Do you have any advice on how to use files with fomats other than h5ad format from jupyter notebook?

I would be very thankful for a quick response.

deweylab / cell-type-classification-paper Goto Github PK

cell-type-classification-paper's Introduction

cell-type-classification-paper's People

Contributors

Stargazers

Watchers

cell-type-classification-paper's Issues

Reproducing CellO Evaluation on Test Datasets

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent