Giter VIP home page Giter VIP logo

Comments (2)

gdagstn avatar gdagstn commented on August 21, 2024 2

Hi Brian,
I'm glad you like the GIFs and sure feel free to use them, thanks for the credit.
Should you want to reproduce them this is the code I used:

library(ggplot2)
library(Seurat)
library(scater)
library(reticulate)
library(colorspace)

geosketch <- import("geosketch")

#function to use file names that will be ordered correctly in the shell

zeropad <- function(numbers){
	mn <- nchar(as.character(max(numbers)))
	pads <- mn + 1
	for(i in 1:length(numbers)) numbers[i] <- paste0(paste0(rep(0,  pads - nchar(as.character(numbers[i]))), collapse = ""),  numbers[i])
	return(numbers)
}

#lake is the .RDS SCEset downloaded from: https://hemberg-lab.github.io/scRNA.seq.datasets/human/brain/
lake <- readRDS("../../publicdata/Hemberg_scRNA_datasets/human_brain/lake.Rds")

lake.seurat <- as.Seurat(lake, data = "logcounts", counts = "normcounts")
lake.seurat <- SCTransform(lake.seurat)
lake.seurat <- RunPCA(lake.seurat)
lake.seurat <- RunUMAP(lake.seurat, dims = 1:20)
Idents(lake.seurat) <- colData(datasets$Lake)$cell_type1

lake.pca <- Embeddings(lake.seurat, "pca")

sketch.size <- rev(seq(3000,300,by=-100))

filenames <- zeropad(3001 - sketch.size)

#UMAP plots

for(i in 1:length(sketch.size)){
sketch.indices <- geosketch$gs(lake.pca[,1:20], as.integer(sketch.size[i]))
lake.red <- lake.seurat[,as.numeric(sketch.indices)]
p <- DimPlot(lake.red) + labs(title = paste0(sketch.size[i], " cells sampled"))
ggsave(filename =  paste0("./lake/umap/", filenames[i], ".png"), plot = p, device = "png")
}

#cell type barplots

for(i in 1:length(sketch.size)){
sketch.indices <- geosketch$gs(lake.umap, as.integer(sketch.size[i]))
lake.red <- lake.seurat[,as.numeric(sketch.indices)]
png(file = paste0("./lake/barplot/", filenames[i], ".png"), width = 800, height = 300)
par(mar = c(6,4,4,2))
barplot(table(Idents(lake.red)), las = 2, col = colorspace::qualitative_hcl(n = length(table(Idents(lake.red)))), border = NA, ylab = "# cells", main = paste0(sketch.size[i], " cells sampled"), ylim = c(0, max(table(Idents(lake.seurat)))))
dev.off()
}
# this is done in the shell using ImageMagick

convert -delay 20 ./lake/umap/*.png -loop 0 ./lake/movie_umap.gif
convert -delay 20 ./lake/barplot/*.png -loop 0 ./lake/movie_barplot.gif

As per the rest of your answers, I appreciate them a lot and will definitely investigate more into how different sketch sizes translate to efficient/good integrations. I have access to a pretty large HPC cluster in Singapore, so I have no unreasonable cap on resources; I just want to have timely results so as to know where/when to adjust my parameters.

Thanks a lot again!

from geosketch.

brianhie avatar brianhie commented on August 21, 2024

Hi @gdagstn,

First off, those GIFs are super cool. Can I reuse them when giving talks or presenting geosketch, with appropriate credit to you of course?

First question:

I'd recommend sticking with PCA, since UMAP, t-SNE, etc. do some pretty substantial amounts of density distortion. A somewhat nice part of the geosketch algorithm on scaled PCs, which is a consequence of using the same hypercube length across all dimensions, is that the contribution of potentially many of the lower variance PCs will be automatically ignored by the algorithm. So even if you sketch using the top 20 or 100 PCs, the "effective" number of PCs will actually be much smaller (empirically we've seen this to be around 5, or even less, on scRNA-seq datasets).

Second question:

Choosing the "right" sketch size is a challenging task, largely in part because it's difficult to say what is "right" in a general sense. After reasoning about this, we decided to make the sketch size a parameter set by the user, hoping it would be motivated by some downstream application or even external resource constraints. As you mentioned, even more formally motivated analysis (e.g., the Hausdorff distance or a Chernoff bound on some distributional statistic) have underlying assumptions that may or may not be best for specific applications.

All of your empirical solutions to finding a good sketch size are definitely reasonable. Another way to think about the sketch size, if your end goal is integration, is to pick a sketch size that does not diminish the "quality" of an integrative transformation too much (or perhaps even improves the "quality"). Again, choosing the right metrics for quantifying integration quality is also somewhat of an art, but at least you'll have some way to relate the sketch size parameter to your intended application.

Another thing to consider is choosing a sketch size that fits within your computational resource budget. Not sure how much compute resources you have access to vs. the size of the integration you want to accomplish, but you can, for example, set a resource cap and choose sketch sizes that way (e.g., integration takes no longer than 24 hours).

Great to hear from you and glad the tool is helpful!

from geosketch.

Related Issues (10)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.