Giter VIP home page Giter VIP logo

scpopcorn's Introduction

scPopCorn

A python tool to do comparative analysis of mulitple single cell RNA-seq datasets.

1. Installation

$ pip install scpopcorn

2. Input scRNA-seq Data File Format

scPopCorn needs multiple single cell RNA-seq dataset as inputs. Bascially, the format looks like the following. Example data files can be found in the Data folder.

Cell1ID Cell2ID Cell3ID Cell4ID Cell5ID ...
Gene1 12 0 0 0 ...
Gene2 125 0 298 0 ...
Gene3 0 0 0 0 ...
... ... ... ... ... ...

The gourd truth labels for cells in each dataset can also be input. The format is as following

Cell1ID Lable1
Cell1ID Lable2
Cell1ID Lable3
Cell1ID Lable4
... ..

3. How to use

3.1 import scpopcorn package

from scpopcorn import MergeSingleCell
from scpopcorn import SingleCellData

3.2 read in RNA-seq datasets

File1 = "../Data/Human&Mouse_Pancreas/pancreas_human.expressionMatrix.txt"
Test1 = SingleCellData()
Test1.ReadData_SeuratFormat(File1)

File2 = "../Data/Human&Mouse_Pancreas/pancreas_mouse.expressionMatrix.txt"
Test2 = SingleCellData()
Test2.ReadData_SeuratFormat(File2)

3.3 read in ground truth cell labels (this is optional)

File1T = "../Data/Human&Mouse_Pancreas/pancreas_human.CellLabels.txt"
Test1.ReadTurth(File1T, 0, 1)

File2T = "../Data/Human&Mouse_Pancreas/pancreas_mouse.CellLabels.txt"
Test2.ReadTurth(File2T, 0, 1)

3.4 normlize counts data, find highly vaiable genes, and natural logarithm of one plus of the counts data

Test1.Normalized_per_Cell()
Test1.FindHVG()
Test1.Log1P()

Test2.Normalized_per_Cell()
Test2.FindHVG()
Test2.Log1P()

3.5 combine datasets and set number of supercells for each dataset

NumSuperCell_Test1 = 50
NumSuperCell_Test2 = 50
MSingle = MergeSingleCell(Test1, Test2)
MSingle.MultiDefineSuperCell(NumSuperCell_Test1,NumSuperCell_Test2)

In this example, we define 50 supercells for each dataset. The number of super cell can be chosen as following. If you have N cells, then you can define the number of super cell M, by letting N/M between 20 and 30.

3.6 compute co-membership graph within each dataset and similarity matrix across dataset

MSingle.ConstructWithinSimiarlityMat_SuperCellLevel()
MSingle.ConstructBetweenSimiarlityMat_SuperCellLevel()

3.7 run joint partition

Estimate_NumCluster = 10 # initial guess of number of corresponding clusters, do not need to be accurate!!!
MSingle.SDP_NKcut(Estimate_NumCluster)

Estimate_NumCluster is the initial guess of the number of sub-populations you want to find and it is just an approxiamtion.

3.8 rounding the results

NumCluster_Min = 3 
NumCluster_Max = 20
CResult = MSingle.NKcut_Rounding(NumCluster_Min, NumCluster_Max)

scPopCorn will screen number of clusters from NumCluster_Min to NumCluster_Max and automatically find the best number of clusters in [NumCluster_Min, NumCluster_Max]

3.9 evaluate of clustering results using ground truth (this is optional)

MSingle.Evaluation(CResult)

3.10 similairty between cell subpopulations across datasets

MSingle.StatResult()

3.11 Umap plots using the results generated by scPopCorn

MSingle.Umap_Result()

3.12 ScPopCorn for sub-clusters

After see the Umap plot, you may want to further joint partition a sub-cluster. You can do something as following

ClusterID = 0
NumCluster = 3
MSingle.Deep_Partition(ClusterID, NumCluster) # deep partition for cluster 0 into 3 clusters
NumCluster_Min = 3
NumCluster_Max = 5
MSingle.SDP_Deep_Rounding(NumCluster_Min, NumCluster_Max) # find out best number of clusters for the deep partition
MSingle.Merge_Deep_Partition() # merge the new partitions to the original one
MSingle.Umap_Result() # see the new results

3.13 ouptput the results

MSingle.OutputResult("TestOut.txt")

Output results in the "TestOut.txt" file.

4. Examples and reproducible results

Jupypter notebooks of examples are provide in Reproduce folder!!!

scpopcorn's People

Contributors

hxqwyj avatar wyjhxq avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.