Hello, I was wondering if you could share some insights on how one w

Setting up differential expression with multiway alignment,about quon-titative-biology/scalign

Comments (16)

UCDNJJ commented on June 17, 2024

Hi Vijay,

As a heads up, we plan to extend the "Multiway alignment using all pairs method" tutorial to include an example and better documentation for such an analysis in the near future.

In regards to your analysis:

By default in the multi-way alignment tutorial run.decoder is set to FALSE. Setting this parameter to TRUE will produce the per-condition expression matrices as in the "Unsupervised alignment and projection of HSCs" tutorial. Then you should have 4 additional matrices and be able to follow the example paired-DE pipeline.
One note, since you are working with multiCCA as input (which we suggest), all the output of scAlign will be in CC space. Either you will have to use the stored results from multiCCA to project back into gene expression space or ask the decoders to do reconstruct into gene expression values by setting the decoder.data field:

scAlignPancreas = scAlignMulti(scAlignPancreas,
                        options=scAlignOptions(steps=15000,
                                               batch.norm.layer=TRUE,
                                               log.every=5000,
                                               architecture="large",  ## 3 layer neural network
                                               num.dim=64),            ## Number of latent dimensions
                        encoder.data="MultiCCA",
                        decoder.data="scale.data", ## Or some other transformation of gene expression values
                        supervised='none',
                        run.encoder=TRUE,
                        run.decoder=FALSE,
                        log.results=TRUE,
                        log.dir=file.path('./tmp'),
                        device="GPU")

SCE objects should be convertible to Seurat objects but I have also found the as.Seurat() function to not work in all cases. First, I would check the colnames contains no duplicates. Otherwise you could try manually converting the SCE object to a Seurat object. The following tutorial: https://satijalab.org/seurat/v3.0/dim_reduction_vignette.html, specifically the "Storing a custom dimensional reduction calculation" section may be useful.

from scalign.

vshanka23 commented on June 17, 2024

Hi Nelson,

Thank you very much for a very clear explanation. I do have one more question regarding the methodology you have described above (in points 1 and 2). Because I wished to use scAlign to align all four of my samples, they were used as input for scAlignMulti. The alignment included all possible pairs of samples, resulting in a total of 12 (6 possible pair-wise combinations for the 4 samples x 2 for bidirectional) comparisons. And as such, would there be 12 separate decoded matrices (since the "Unsupervised alignment and projection of HSCs" tutorial example contained two samples - OLD and YOUNG and ended up with 2 separate decoded matrices OLD2YOUNG and YOUNG2OLD)? I do not wish to compare every pair of samples but simply across conditions. I would prefer to compress this to just two comparisons (across conditions and bidirectional)? In other words, how would I go about aggregating the replicates from the same conditions post alignment and comparing across conditions for DE?

Also, sounds good about the tutorial. I am looking forward to it!

Thanks again.
-Vijay

from scalign.

UCDNJJ commented on June 17, 2024

Right, the scAlign decoders are going to return you projections of all the cells to replicate specific gene expression (4 combined matrices, ALL2Cond1_Rep1, ALL2Cond1_Rep2 ...) instead of condition specific (ALL2Cond1, ALL2Cond2).

How you aggregating the replicates from the same conditions depends on the degree of batch effect between replicates. Assuming limited/no batch effect, the most straight forward route would be to average: ALL2Cond1_avg = avg(ALL2Cond1_Rep1, ALL2Cond1_Rep2) and then identifying DE genes between ALL2Cond1_avg and ALL2Cond2_avg by taking the difference of these matrices as we do in "Unsupervised alignment and projection of HSCs".

Alternatively, I could modify scAlign to allow the user to train decoders which reconstruct combinations of the input data; in your case, combining replicate data and producing Cond1_2_Cond2 and Cond2_2_Cond1. However, any batch effects between replicates present in the input data may be present in the resulting projections from combined decoders.

from scalign.

vshanka23 commented on June 17, 2024

Dear Nelson,

I really appreciate your thoughtful and prompt responses! Thank you!

With regard to the "degree of batch effect", I had assumed that interpolation post alignment would have imparted effects similar to batch correction. Is this inaccurate?

I think having the option to do condition specific comparisons in the scAlignmulti is very powerful and useful, assuming that the batch effect between replicates are minimal. In either case, if an option such as this can be implemented with a disclaimer statement stating that users should verify that the batch effect is minimal before proceeding, I and several others working with scRNA data would really appreciate it very much.

Thank you,
Vijay

from scalign.

gquon commented on June 17, 2024

Vijay,

I think your technical replicates are most useful if replicates from different conditions were in the same batch (in your four-sample case, if A/B and A'/B' were sequenced together). If this is not the case in your experimental design (e.g. if all samples are in their own batch), then generally the technical replicates are only useful if one can make the assumption that there is less noise/batch effect between replicates of the same condition than across condition. My guess is this is true for scAlign as well as other analysis methods.

from scalign.

vshanka23 commented on June 17, 2024

Hello Gerald,

Indeed, all samples were sequenced together. I am trying to reduce noise due to technical variation pertaining to steps prior to sequencing, rather than due to sequencing itself. Nevertheless, a TSNE plot after alignment did show that all clusters were more or less equally represented by all four samples. My questions pertains to whether my assumption that, because TSNE plot showed very good alignment between all four samples, the interpolated data can be used for differential expression across conditions by aggregating the replicates (within each condition) together, is appropriate or not. If I understand you correctly, the effect similar to batch correction achieved by scAlign works best when the samples were all sequenced together. Is this right?

Thank you for your response!
Vijay

from scalign.

gquon commented on June 17, 2024

Hi Vijay, Great, your experimental design is optimal for doing alignment. My recommendation in this case is to actually merge cells from across replicates (within the same condition), then perform scAlign with "two" samples as input, and perform the differential expression there. The reason is that under the assumption of little to no batch effect across replicates of the same condition, scAlign will benefit from having more cells to train each decoder, so the interpolation step will be more accurate. With the way that the differential expression is calculated in scAlign, I don't think you'd benefit from keeping them separate until after interpolation (it may have been different if you had more samples per condition).

…

On Fri, Sep 20, 2019, 6:56 AM vshanka23 ***@***.***> wrote: Hello Gerald, Indeed, all samples were sequenced together. I am trying to reduce noise due to technical variation pertaining to steps prior to sequencing, rather than due to sequencing itself. Nevertheless, a TSNE plot after alignment did show that all clusters were more or less equally represented by all four samples. My questions pertains to whether my assumption that, because TSNE plot showed very good alignment between all four samples, the interpolated data can be used for differential expression across conditions by aggregating the replicates (within each condition) together, is appropriate or not. If I understand you correctly, the effect similar to batch correction achieved by scAlign works best when the samples were all sequenced together. Is this right? Thank you for your response! Vijay — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#3?email_source=notifications&email_token=ADJDCLRPGB237EEAEYN3MF3QKTJABA5CNFSM4IYBKUV2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD7GYYIQ#issuecomment-533564450>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ADJDCLR226NYDEFR6XP363DQKTJABANCNFSM4IYBKUVQ> .

from scalign.

gquon commented on June 17, 2024

(and empirically it is a good sign that the cells across replicates group on a tSNE, and does suggest there is no major batch effect. If you want to be slightly more rigorous about this, you could formally cluster cells in the original expression domain and quantify how many clusters have mixtures of cells across samples).

…

On Fri, Sep 20, 2019, 7:42 AM Gerald Quon ***@***.***> wrote: Hi Vijay, Great, your experimental design is optimal for doing alignment. My recommendation in this case is to actually merge cells from across replicates (within the same condition), then perform scAlign with "two" samples as input, and perform the differential expression there. The reason is that under the assumption of little to no batch effect across replicates of the same condition, scAlign will benefit from having more cells to train each decoder, so the interpolation step will be more accurate. With the way that the differential expression is calculated in scAlign, I don't think you'd benefit from keeping them separate until after interpolation (it may have been different if you had more samples per condition). On Fri, Sep 20, 2019, 6:56 AM vshanka23 ***@***.***> wrote: > Hello Gerald, > > Indeed, all samples were sequenced together. I am trying to reduce noise > due to technical variation pertaining to steps prior to sequencing, rather > than due to sequencing itself. Nevertheless, a TSNE plot after alignment > did show that all clusters were more or less equally represented by all > four samples. My questions pertains to whether my assumption that, because > TSNE plot showed very good alignment between all four samples, the > interpolated data can be used for differential expression across conditions > by aggregating the replicates (within each condition) together, is > appropriate or not. If I understand you correctly, the effect similar to > batch correction achieved by scAlign works best when the samples were all > sequenced together. Is this right? > > Thank you for your response! > Vijay > > — > You are receiving this because you commented. > Reply to this email directly, view it on GitHub > <#3?email_source=notifications&email_token=ADJDCLRPGB237EEAEYN3MF3QKTJABA5CNFSM4IYBKUV2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD7GYYIQ#issuecomment-533564450>, > or mute the thread > <https://github.com/notifications/unsubscribe-auth/ADJDCLR226NYDEFR6XP363DQKTJABANCNFSM4IYBKUVQ> > . >

from scalign.

vshanka23 commented on June 17, 2024

Dear Gerald,

These are fantastic suggestions and clear explanations! Thank you. I going to try to implement the replicate merging strategy pre-scAlign and see what I get in terms on DE, and perhaps compare it to aggregation post-alignment, just for the sake of characterizing the differences between the strategies. What would you recommend for the pre-scAlign replicate merging process? Should I just use Seurat v3's merge object function? Or is there a more suitable method?

Thanks!
Vijay

from scalign.

UCDNJJ commented on June 17, 2024

Hi Vijay,

Merging replicates within conditions prior to running scAlign sounds like a great idea. Merging Seurat objects is very easy with the MergeSeurat function, I would recommend using it.

from scalign.

gquon commented on June 17, 2024

Vijay, do let us know if you do end up comparing both approaches on your data, we'd love to hear if you found any obvious difference.

from scalign.

vshanka23 commented on June 17, 2024

Hello Gerald and Nelson,

Sounds good. I will let you know what I find. In the mean time, I had one more question about the pipeline. Is there any way to cluster cells based on expression profile after the scAlign procedure, to identify cell types? For example, can I simply take the average of all the interpolation data and run that through unsupervised clustering to identify cell types? Or does the cell type characterization need to be performed prior to scAlign, like it seems to have been done in the scAlign tutorials? My ultimate goal here is to either incorporate scAlign procedure into Seurat's cell type characterization and DE pipeline or construct a synonymous pipeline using only scAlign.

Thanks.
Vijay

from scalign.

UCDNJJ commented on June 17, 2024

Hi Vijay,

scAlign is in part an unsupervised method so cell type characterization is not required apriori to run our method.

For a Seurat based cell type characterization, it would be best to cluster on the cell embeddings produced by scAlign's alignment procedure which removed all source of bias between datasets. You can then use the resulting cluster labels as input to Seurat's standard cell type characterization and DE pipeline. FindMarker, etc.

Otherwise, for a purely scAlign based approach. You can generating the state variance maps (paired-DE) for your two conditions, by running scAlign's decoders, to identify DE genes without the need to cluster before hand!

from scalign.

vshanka23 commented on June 17, 2024

Dear Nelson,

I agree with you that I would rather use the aligned data for cell type characterization. To that end, how would one go about clustering the cell embeddings from the scAligned SCE object (is this a matrix inside the SCE object)? Would it be possible to get some sort of a mini tutorial on how to accomplish this? Basically, I want to take the aligned SCE object, cluster the cells to characterize cell types and perform DE for each cell type and across the entire dataset (global). Unfortunately for me, I am not overly familiar with SCE objects (not nearly as I am with Seurat Objects) and I am having difficulties converting this SCE object over to Seurat to use their pipeline (original post issue). I will try to perform the custom dimensional reduction calculation storage procedure, but again I am unsure on how to transfer the multiCCA part over to the Seurat object or how to transfer the decoded data over to perform cell clustering and DE.

Any help/insight/advice you can provide with regard to these procedure will be greatly appreciated!

Thank you.
Vijay

from scalign.

UCDNJJ commented on June 17, 2024

Hi Vijay,

Sorry for the late reply. I think using the cell embedding from scAlign to perform clustering and DE in Seurat is important enough to warrant a new mini tutorial. I'll work on writing that up and let you know when it is available on our github (in a few days most likely)!

from scalign.

UCDNJJ commented on June 17, 2024

Hi Vijay,

I've added a section about clustering and differential expression after running scAlign at the end of the following tutorial:

https://github.com/quon-titative-biology/examples/blob/master/scAlign_multiway_alignment/scAlign_multiway_pancreas.md

Hopefully this provides some guidance for your analysis and let us know if you encounter an issues with the extended tutorial.

from scalign.

Setting up differential expression with multiway alignment about scalign HOT 16 CLOSED

Comments (16)

Related Issues (10)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent