Giter VIP home page Giter VIP logo

nclusion.jl's Introduction

Nonparametric CLUstering of SIngle cell PopulatiONs (NCLUSION)

Introduction

Clustering is commonly used in single-cell RNA-sequencing (scRNA-seq) pipelines to characterize cellular heterogeneity. However, current methods face two main limitations. First, they require user-specified heuristics which add time and complexity to bioinformatic workflows; and, second, they rely on post-selective differential expression analyses to identify marker genes driving cluster differences which has been shown to be subject to inflated false discovery rates. In this repository, we present a solution to those challenges by introducing "nonparametric clustering of single-cell populations" (NCLUSION): an infinite mixture model that leverages Bayesian sparse priors to simultaneously identify marker genes while performing clustering on single-cell expression data. NCLUSION uses a scalable variational inference algorithm to perform these analyses on datasets with up to millions of cells. In this package and corresponding documentation we demonstrate that NCLUSION (i) matches the performance of other state-of-the-art clustering techniques with significantly reduced runtime and (ii) provides statistically robust and biologically-relevant transcriptomic signatures for each of the clusters it identifies. Overall, NCLUSION represents a reliable hypothesis generating tool for understanding patterns of expression variation present in single-cell populations.

alt text

The Method

NCLUSION uses a sparse hierarchical Dirichlet process normal mixture model to reduce the number of choices that users need to make while simultaneously performing variable selection to identify top cluster-specific marker genes for downstream analyses. There are three key components to NCLUSION. First, NCLUSION is fit directly on single-cell expression matrices and does not require the data to be embedded within a lower-dimensional space. Second, it assumes that cells can belong to one of infinitely many different clusters. This is captured by placing a Dirichlet process prior over the cluster assignment probabilities for each cell. By always setting the number of possible clusters K = ∞, we remove the need for users to have to iterate through different values until they find the optimal choice. Instead, NCLUSION uses a stick-breaking proccess to naturally allow for more clusters as the amount of variation in a single cell dataset grows. Third, NCLUSION assumes that not all genes are important when assigning cells to a given cluster. To model this, we place a spike and slab prior on the mean expression of each gene within each cluster. This prior shrinks the mean of genes that play an insignificant role in cluster formation towards zero. The learning of the model parameters is done via a variational-EM algorithm that allows the method to scale with the number of cells in the data.

Installation

This package requires that Julia 1.9+ and its dependencies are installed. Instructions for installation can be found here.

To create a new Julia project environment and install the NCLUSION package, follow these steps:

  1. Start a Julia session by entering julia in a bash terminal while in your working directory.
  2. Enter a Julia Pkg REPL by pressing the right square bracket ]. Create a new Julia project environment by typing generate followed by the name of your project (i.e. generate nclusion_demo). This creates a project directory named after the environment you created, and two files: Project.toml and Manifest.toml. These files, which can be found in the project directory, specify the packages that are downloaded in the environment.
  3. Activate the project environment in the Julia Pkg REPL by typing activate followed by the name of the Julia environment (i.e. activate nclusion_demo).
  4. Install NCLUSION in the project environment, by running the following command in the Julia Pkg REPL:
    add Nclusion
    You can exit the Julia Pkg REPL by typing Ctrl + C, and the Julia REPL by entering exit().

To install NCLUSION into an existing Julia project environment, follow these steps:

  1. Activate the project environment and start a Julia REPL by entering julia --project=/path/to/julia/env --thread=auto into a bash terminal, where --project is set equal to the path to your existing Julia project environment.
  2. Install NCLUSION in the activated project environment by entering the following command in the Julia REPL:
    using Pkg;Pkg.add("Nclusion")
    You can exit the Julia Pkg REPL by typing Ctrl + C, and the Julia REPL by entering exit().

Alternatively, you can install NCLUSION directly using git by following these steps:

  1. Enter the following command into a bash terminal:
    git clone [email protected]:microsoft/Nclusion.jl.git
    cd nclusion/
  2. Now start a Julia session by entering the command:
    julia
  3. In the Julia REPL, type ] . This command starts the Pkg REPL. In this Pkg REPL, type the following Julia command:
    activate .
    This activates the Nclusion environment. You can verify that the environment is active if the new Pkg REPL prompt is (Nclusion) pkg>.
  4. Now type the following commands in the in the Pkg REPL:
    instantiate
    precompile
    This completes the installation process. You can exit the Julia Pkg REPL by typing Ctrl + C, and the Julia REPL by entering exit().

Software Package Documentation, Examples, and Tutorials

Documentation for the NCLUSION software package, examples, and tutorials can be found here.

Relevant Citations

C. Nwizu, M. Hughes, M. Ramseier, A. Navia, A. Shalek, N. Fusi, S. Raghavan, P. Winter, A. Amini, and L. Crawford. Scalable nonparametric clustering with unified marker gene selection for single-cell RNA-seq data. bioRxiv. https://doi.org/10.1101/2024.02.11.579839

Questions and Feedback

If you have any questions or concerns regarding NCLUSION or the tutorials, please contact Chibuikem Nwizu, Madeline Hughes, Ava Amini, or Lorin Crawford. Any feedback on the repository, manuscript, and tutorials is appreciated.

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.

When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact [email protected] with any additional questions or comments.

Trademarks

This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft's Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party's policies.

nclusion.jl's People

Contributors

chibuikem709 avatar lorinanthony avatar microsoft-github-operations[bot] avatar microsoftopensource avatar v-mahughes avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

nclusion.jl's Issues

TagBot trigger issue

This issue is used to trigger TagBot; feel free to unsubscribe.

If you haven't already, you should update your TagBot.yml to include issue comment triggers.
Please see this post on Discourse for instructions and more details.

If you'd like for me to do this for you, comment TagBot fix on this issue.
I'll open a PR within a few hours, please be patient!

Action required: migrate or opt-out of migration to GitHub inside Microsoft

Migrate non-Open Source or non-External Collaboration repositories to GitHub inside Microsoft

In order to protect and secure Microsoft, private or internal repositories in GitHub for Open Source which are not related to open source projects or require collaboration with 3rd parties (customer, partners, etc.) must be migrated to GitHub inside Microsoft a.k.a GitHub Enterprise Cloud with Enterprise Managed User (GHEC EMU).

Action

✍️ Please RSVP to opt-in or opt-out of the migration to GitHub inside Microsoft.

❗Only users with admin permission in the repository are allowed to respond. Failure to provide a response will result to your repository getting automatically archived.🔒

Instructions

Reply with a comment on this issue containing one of the following optin or optout command options below.

✅ Opt-in to migrate

@gimsvc optin --date <target_migration_date in mm-dd-yyyy format>

Example: @gimsvc optin --date 03-15-2023

OR

❌ Opt-out of migration

@gimsvc optout --reason <staging|collaboration|delete|other>

Example: @gimsvc optout --reason staging

Options:

  • staging : This repository will ship as Open Source or go public
  • collaboration : Used for external or 3rd party collaboration with customers, partners, suppliers, etc.
  • delete : This repository will be deleted because it is no longer needed.
  • other : Other reasons not specified

Need more help? 🖐️

Action required: self-attest your goal for this repository

It's time to review and renew the intent of this repository

An owner or administrator of this repository has previously indicated that this repository can not be migrated to GitHub inside Microsoft because it is going public, open source, or it is used to collaborate with external parties (customers, partners, suppliers, etc.).

Action

👀 ✍️ In order to keep Microsoft secure, we require repository owners and administrators to review this repository and regularly renew the intent to either opt-in or opt-out of migration to GitHub inside Microsoft which is specifically intended for private or internal projects.

❗Only users with admin permission in the repository are allowed to respond. Failure to provide a response will result to your repository getting automatically archived. 🔒

Instructions

❌ Opt-out of migration

If this repository can not be migrated to GitHub inside Microsoft, you can opt-out of migration by replying with a comment on this issue containing one of the following optout command options below.

@gimsvc optout --reason <staging|collaboration|delete|other>

Example: @gimsvc optout --reason staging

Options:

  • staging : My project will ship as Open Source
  • collaboration : Used for external or 3rd party collaboration with customers, partners, suppliers, etc.
  • delete : This repository will be deleted because it is no longer needed.
  • other : Other reasons not specified

✅ Opt-in to migrate

If the circumstances of this repository has changed and you decide that you need to migrate, then you can specify the optin command below. For example, the repository is no longer going public, open source or require external collaboration.

@gimsvc optin --date <target_migration_date in mm-dd-yyyy format>

Example: @gimsvc optin --date 03-15-2023

Click here for more information about optin and optout command options and examples

Opt-in

@gimsvc optin --date <target_migration_date>

When opting-in to migrate your repository, the --date option is required followed by your specified migration date using the format: mm-dd-yyyy

@gimsvc optin --date 03-15-2023

Opt-out

@gimsvc optout --reason <staging|collaboration|delete|other>

When opting-out of migration, you need to specify the --reason.

  • staging
    • My project will ship as Open Source
  • collaboration
    • Used for external or 3rd party collaboration with customers, partners, suppliers, etc.
  • delete
    • This repository will be deleted because it is no longer needed.
  • other
    • Other reasons not specified

Examples:

@gimsvc optout --reason staging

@gimsvc optout --reason collaboration

@gimsvc optout --reason delete

@gimsvc optout --reason other

Need more help? 🖐️

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.