Giter VIP home page Giter VIP logo

dualsimplex's Introduction

DualSimplex algorithm's R package

About the project

This is the implementation of the Dual Simplex method presented in this paper

Non-negative matrix factorization and deconvolution as dual simplex problem
Denis Kleverov, Ekaterina Aladyeva, Alexey Serdyukov, Maxim Artyomov
bioRxiv 2024.04.09.588652; doi: https://doi.org/10.1101/2024.04.09.588652

This in essence is an NMF algorithm which can factorize nonegative matrix V into two nonnegative matrices W and H.

The key feature is that it operates in a lower dimensional space of a Sinkhorn-transformed original matrix, which aligns both row and column data points of the original matrix via two interrelated geometrical simplex structures.

Therefore, in this space we can search only for K (K-1)-dimensional solution points (K is the number of components i.e. the number of columns/rows of W/H).

This method can be applied to:

  • The general NMF problem, where it outperforms commonly used methods
  • Bulk RNAseq deconvolution
  • Single cell clustering

Getting Started

Prerequisites

This is an R language package so you need to have R We tested our code using Rstudio or Rstudio server as IDE environments. We are actively using Bioconductor and devtools packages so you need it to install.

# in your R environment
install.packages("BiocManager")
install.packages("devtools")

Installation

Install from github

devtools::install_github("artyomovlab/DualSimplex")

Or alternatively install from your local directory with this repository

devtools::load_all("path_to_code_directory")

(This is not working yet) After the publication, it will be:

install.packages("DualSimplex")

Usage

Check our additional paper repository for more examples of NMF, bulk-RNAseq deconvolution and single cell clustering

Read/Generate the data

library("DualSimplex")
library(dplyr)

N <- 100 # number of samples (e.g. mixtures)
M <- 10000 # number of features (e.g. genes)
K <- 3 # Number of pure components

sim <- create_simulation(n_genes = M,
                         n_samples = N,
                         n_cell_types = K,
                         with_marker_genes = FALSE)
sim <- sim %>% add_noise(noise_deviation = 0.2)

data_raw <- sim$data
true_W <- sim$basis
true_H <- sim$proportions

Create a Solver object

This performs Sinkhorn scaling, SVD projection, and data annotation

dso <- DualSimplexSolver$new()
dso$set_data(data_raw) # run Sinkhorn procedure
dso$project(K) # project to SVD space
dso$plot_projected("zero_distance", "zero_distance", with_solution = TRUE, use_dims = list(2:3)) # visualize the projection
dso$set_display_dims(list(2:3)) # remember the use_dims choice, to call just dso$plot_projected()

(Optional) Filter the data/remove outliers

This is only if you are willing to remove points from your dataset

plane_distance_threshold <- 0.05 # Change here several times to see result, start with big and lower it
zero_distance_threshold <- 1
dso$distance_filter(plane_d_lt = plane_distance_threshold, zero_d_lt = zero_distance_threshold, genes = T)
dso$project(K)
dso$plot_projection_diagnostics() # See the distribution of points distances
dso$plot_svd_history() # observe changes in SVD variance explained

Identify simplex corners in the projected space

Initialize solution

dso$init_solution("random")
dso$plot_projected("zero_distance", "zero_distance")

Run optimization

dso$optim_solution(
    5000,
    optim_config(
        coef_hinge_H = 1,
        coef_hinge_W = 1,
        coef_der_X = 0.001, 
        coef_der_Omega = 0.001
    )
)
dso$plot_projected("zero_distance", "zero_distance")
dso$plot_error_history()

Get solution

solution <- dso$finalize_solution()
result_W <- solution$W
result_H <- solution$H

Save/Load the results

# Save
dso$save_state("directory_to_save")

# Load
dso <- DualSimplexSolver$from_state("directory_to_save")

Contacts

For developers

Code structure & Guidelines

The following files in the R/ directory represent different stages of DualSimplex pipeline:

0. simulation.R
1. annotation.R
2. filtering.R
3. sinkhorn.R
4. projection.R
5. initialization.R
6. optimization.R
7. post_analysis.R
8. benchmarking.R

Ideally, main logic functions in a stage shouldn't use functions from another stage, and a downstream stage should only use the objects generated on the previous stage as its input.

Then, either the user or DualSimplexSolver use the main functions from those packages to implement the whole control flow.

This rule of thumb leads to linear code logic and low code coupling, which makes it simple to debug and introduce changes.

Checking your new functions

Please document your code with roxygene2 comments (as it is done for rest of the package)

  • Regenerate NAMESPACE and additional files
devtools::document()
  • ensure standard devtools check is returning 0 errors
devtools::check()
  • ensure package is installable from your repository
devtools::install_github("your_github_nickname"/DualSimplex@your_branch_name")

dualsimplex's People

Contributors

denklewer avatar almdudleer avatar polezhaevalera avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.