Approaches for integrating heterogeneous RNA-seq data reveals cross-talk between microbes and genes in asthmatic patients
Daniel Spakowicz*1,2,3,4, Shaoke Lou*1, Brian Barron1, Jose L Gomez5, Tianxiao Li1, Qing Liu5, Nicole Grant5, Xiting Yan5, Rebecca Hoyd3, George Weinstock2, Geoffrey L Chupp5, Mark Gerstein1,6,7,8
ABSTRACT Sputum induction is a non-invasive method to evaluate the airway environment, particularly for asthma. RNA sequencing (RNA-seq) can be used on sputum, but it can be challenging to interpret because sputum contains a complex and heterogeneous mixture of human cells and exogenous (microbial) material. In this study, we developed a methodology that integrates dimensionality reduction and statistical modeling to grapple with the heterogeneity. The method, called LDA-link, connects microbes to genes using reduced-dimensionality Latent Dirichlet Allocation (LDA) topics. We validated our method with single-cell RNA-seq and microscopy and then applied it to sputum of asthmatic patients to find known and novel relationships between microbes and genes. We expect this method to be broadly useful for making inferences in heterogeneous and noisy RNA-seq datasets.