Hey, how would you recommend handing multiple batches using this normalization scheme? In the past, I've used scTransform, which puts the batch variable into the model itself, so I guess it kind of regresses out the batch effect. It's worked very well in the past for me. This normalization scheme here is simpler and doesn't account for batch effects in the model, so I'm wondering how you recommend dealing with them.
In your paper, in the Cao figure, I noticed you identified batch-specific genes, and just removed them from the dataset. I'm unsure of this, isn't it possible these genes might be biologically relevant? I also noticed that in your PR to scanpy comment you mention applying this normalization to each batch separately, then just concatenating the results. Wouldn't this also be problematic? For instance, if I have two batches of different cell populations and one gene is never expressed in one batch, the residuals will always be zero, since it never deviates. In the other batch, for instance the gene is always expressed. In this case the residuals will also always be zero, since the gene is always expressed, and the model mean can fit this.
I'd love to get your feedback regarding this.