Filling in Data Holes to Tackle Tough Computational Problems - PSU Institute for Computational and Data Sciences

Research conducted by:

Hyungsuk Tak, Assistant Professor of Astronomy and Astrophysics and Statistics, and ICDS Co-Hire

Tags:

data science

Research Summary:

Like trying to complete a connect-the-dot picture with a few dots missing, missing data can create intractable computational problems for scientists. However, data scientists often use a technique called data augmentation to help add to that latent missing data. In this study, researchers created a framework called transformation-based data augmentation, that can turn an unequal scatter of data points – or heteroscedastic -- into a more uniform plot. This could help scientists solve very difficult computational problems more efficiently.

How Roar played a role in this research:

"It was computationally too heavy to implement the proposed model on a personal laptop (>24 hours). Roar enabled its implementation in parallel under various simulation settings." - Hyungsuk Tak

Publication Details
Article Title:	Data transforming augmentation for heteroscedastic models
Published In:	Journal of Computational and Graphical Statistics
Abstract:	Data augmentation (DA) turns seemingly intractable computational problems into simple ones by augmenting latent missing data. In addition to computational simplicity, it is now well-established that DA equipped with a deterministic transformation can improve the convergence speed of iterative algorithms such as an EM algorithm or Gibbs sampler. In this article, we outline a framework for the transformation-based DA, which we call data transforming augmentation (DTA), allowing augmented data to be a deterministic function of latent and observed data, and unknown parameters. Under this framework, we investigate a novel DTA scheme that turns heteroscedastic models into homoscedastic ones to take advantage of simpler computations typically available in homoscedastic cases. Applying this DTA scheme to fitting linear mixed models, we demonstrate simpler computations and faster convergence rates of resulting iterative algorithms, compared with those under a non-transformation-based DA scheme. We also fit a Beta-Binomial model using the proposed DTA scheme, which enables sampling approximate marginal posterior distributions that are available only under homoscedasticity. View article on publisher's website