Researchers interested in RNA expression now have tools to interrogate expression at the single cell level, providing a window into the fundamental unit of biology. RNA expression of different genes can be captured across cell populations, across spatial locations, and across time; however, extracting accurate and actionable insights from these novel data sources requires proper statistical analysis. Modeling these data can be difficult due to the stochasticity of biological processes and measurement techniques.
Bayesian probabilistic models are ideal candidates for analyzing these data. Generative models account for the stochastic behavior of biological systems and measurement error inherent to experimental data. This thesis leverages Bayesian models to better understand gene expression in individual cells. We leverage three stochastic processes, Dirichlet processes, Gaussian processes and point processes, flexible priors for modeling cell behavior in different domains. These models improve on previous methods by quantifying uncertainty, preventing mode collapse, and providing more interpretable parameters.
The following chapters describe the development and application of models to RNA expression data from four types of data: 1) traditional bulk RNA sequencing (RNA-seq) expression data 2) dissociated cell sequenced by single cell RNA-sequencing (scRNA-seq), 3) in situ single cell RNA expression, and 4) spatiotemporal gene expression from Fluorescence imaging. Modeling each data type requires statistical models with the appropriate assumptions. In this thesis I propose and evaluate probabilistic models for each case. I develop a class of deconvolution models to learn individual cell type expression from bulk RNA-seq. I use Gaussian processes to perform unsupervised, robust dimension reduction on high-dimensional scRNA-seq for visualization and regularization, enabling downstream analysis. I demonstrate semi-supervised learning with Gaussian processes as a powerful tool for integrating multiple single cell data across modalities, finding new insights from seqFISH+ data in conjunction with traditional RNA-seq. Finally, I use point processes to identify spatial signaling patterns from expression time series. In each case, I demonstrate how my model improves on previous methods and allows for novel insights into these data. By developing appropriate Bayesian models, this thesis demonstrates new insights from novel experimental RNA expression data that can be generalized to future experiments and technologies.