deweylab / cell-type-classification-paper Goto Github PK
View Code? Open in Web Editor NEWCode implementing experiments described in "Hierarchical cell type classification using mass, heterogeneous RNA-seq data from human primary cells"
Code implementing experiments described in "Hierarchical cell type classification using mass, heterogeneous RNA-seq data from human primary cells"
Hello,
I am currently doing my bachelor thesis with the goal of preventing the repeated retraining of the tool you have developed for cell classification - CellO - using some imputation methods. I am having some issues reproducing the results of the paper using the code of the evaluation you have provided for testing CellO on Zheng_PBMC, the lung cancer and the non-droplet datasets (i.e. to get the F1-scores and avg. precision for these datasets).
I downloaded the whole datasets from https://zenodo.org/record/4289064#.YoOS-FTP2Uk and the repository for running the evaluation from https://github.com/deweylab/cell-type-classification-paper.git. However, I could not understand how the code in the cell-type-classification-paper.git relates to the datasets files. In the Snakefiles there are some files required as an input or referenced.
I would have thought they could be found in the datasets files but they are not or are named differently. As an example, I could find bulk_labels.json in the dataset, but only references to labels.json in the code. There are references to expriment_to_study.json or untampered_bulk_primary_cells_with_data which I was also unable to identify.
I was also unsure which python scripts in the cell-type-classification-paper.git repository to run first. I would have thought to run the train_model.py first but as mentioned it required the labels.json and experiment_to_study.jsons files that do not exist in the provided datasets files.
Thus I would be grateful if you could help me with the following issues:
Which scripts should be run in order to reproduce the evaluation on the above mentioned three datasets?
In one of the python scripts in the cell-type-classification-paper.git repository, I found a dictionary for mapping the cell labels used by Zheng_PBMC dataset into the Cell Ontology labels. However, I could not find similar mapping dictionaries for the lung cancer and the non-droplet datasets. Do these exist? If so, I would be very gratful to get the cell types mapping for the remaining two datasets.
I tried to run CellO on the Zheng_PBMC.h5 from a jupyter notbook but it did not work because CellO expected it to be an AnnData (h5ad file). When I run it with command line it worked. Do you have any advice on how to use files with fomats other than h5ad format from jupyter notebook?
I would be very thankful for a quick response.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.