Research Projects

Topic modeling

Christine Kuang

Topic-Sentiment Model with Document-Level Covariates: Text data analysis is becoming increasingly important with the rapid growth of text data. Two methods of text analysis are topical analysis and sentiment analysis. Topical analysis aims to detect the topics covered in a collection of documents. Sentiment analysis aims to detect opinions, feelings, and general sentiments expressed in text. Both have equally valuable applications in making inference about social and political cultures, attitudes, and processes. This project proposes a statistical model of text based on the Structural Topic Model (STM) which simultaneously detects both topic and sentiment. The proposed model differs in two aspects from existing topic-sentiment models: the data generating process and the ability to use document-level covariates for improved estimation of topics and sentiments as well as resulting inferences.
Evaluation of Text Models: We compare topic-sentiment models to topic models in their ability to capture sentiment information using three alternative metrics: prediction accuracy, stability of features for prediction, and computation time. If the topic-sentiment models can better capture sentiment information than topic models, then using topic-sentiment models for sentiment prediction would result in higher prediction accuracy than topic models. We investigate this by comparing the accuracy of the models in predicting the partisanship and tone of political TV ads. Since these three metrics are agnostic to the models compared, we used these metrics to compare topic models with two other types of text models, Word2vec and Concise Comparative Summaries (CCS), in predicting partisanship and tone of political TV ads.

Causal inference

Rebecca Barter

Acute rejection in kidney transplant patients with HIV: developing strategies for dynamic prediction: Over the past few decades, HIV has evolved from a death sentence to a manageable chronic condition with HAART therapies drastically extending the life expectancy of HIV positive individuals. As a consequence of prolonged survival, HIV-associated conditions resulting in kidney and liver disease are fueling an increased demand for organ transplantation among HIV patients. While transplantation is proving effective in terms of patient survival, HIV-positive patients exhibit a surprisingly high rate of kidney rejection relative to their HIV-negative counterparts. Together with the Sarwal lab at UCSF, we are developing novel analytic strategies for dynamically predicting and understanding kidney rejection based on 'omics data measured from a range of graft biopsy samples taken over time.

Gene expresssion study

Karl Kumbier

A fundamental problem in systems biology is to understand interactions between transcription factors (TFs). In this collaboration with Siqi Wu, Antony Joseph, Ann S. Hammonds, William W. Fisher, Richard Weiszmann, Susan E. Celniker, Erwin Frise and Bin Yu, we are working to relate developmental TFs based on Drosophila gene expression images. We combined Nonnegative Matrix Factorization with a new stability model selection criterion to decompose the expression patterns of all known TFs into a group of data-driven “principal patterns”. The representation of the expression patterns as learned principal patterns allows for a compact and interpretable representations. These patterns agree well with known pre-organ and sub-organ regions in early and later stage embryos respectively. Using our principal pattern representations, we construct spatially local TF networks to better understand interactions that are driving the development of different organ systems.

Neuroscience: Understanding Visual Pathway

Reza Abbasi, Yuansi Chen, Adam Bloniarz

The volume and quality of data recorded from the brain are constantly increasing, giving us a better view of mental processes. We collaborate with neuroscience labs, primarily the Gallant lab, to develop methodology for analyzing such data. We focus on understanding human vision by studying the representation of images and videos in the early visual areas. These experiments are great examples for modern statistical work: both the treatment (a video, or sequence of images) and the response (continuous brain-scans, or multiple electrodes) are high-dimensional structured objects. We develop principled methods to relate the stimuli and responses for both prediction and interpretation purposes. These include, among others, methods for supervised feature-extraction; high-dimensional (and semi-parametric) regression models relating the features to neural activity; and methods to aggregate information across multiple responses. In particular, multiple methods based on sparse coding and deep convolutional neural networks for feature extraciton in natural images are analyzed.