Deep Learning Approximation to Intractable Likelihood Functions (Faculty/Rising Researcher Collaboration Opportunity) - PSU Institute for Computational and Data Sciences

Deep Learning Approximation to Intractable Likelihood Functions

PI: Hyungsuk Tak (Statistics, Astronomy and Astrophysics)

Graduate students in Statistics can be supported by a half TA-ship that covers tuition. Tuition for students from other departments, as well as the remainder of a postdoc’s salary, may potentially be supported by the PI’s pending NSF grant proposal, if awarded.

Description: We propose a project to investigate the advantages and limitations of using deep neural networks to approximate intractable or computationally expensive likelihood functions, in comparison to Approximate Bayesian Computation (ABC) and variational inference. In statistics, evaluating likelihood functions can be prohibitively expensive when they scale poorly with data size, or even infeasible due to intractable integrals. ABC and variational inference have been widely used to approximate such likelihoods or to bypass their direct evaluation [1, 2].

However, both approaches have shortcomings. ABC is impractical for high-dimensional problems because no universally applicable decision rule exists in such cases. For example, identifying appropriate sufficient statistics is difficult unless the model is canonical; even when highdimensional sufficient statistics are available, tuning the epsilon tolerance to accept proposals from prior distributions is nontrivial.

Variational inference is advantageous in high-dimensional settings because it adopts a density function that is “closest” to the target density with respect to the Kullback–Leibler divergence, selected from a family of easy-to-compute densities. However, designing such a family is a wellknown challenge, and there is no guarantee that the family includes the target density. Moreover, variational inference lacks a theoretical guarantee of convergence to the target density as data size increases, since the method only finds the closest density within the chosen family.

Recent advances in machine learning show promising results in approximating intractable or expensive likelihood functions using deep learning neural networks [3, 4, 5]. Like ABC, this approach is simulation-based but does not require knowledge of sufficient statistics. Its most attractive feature is the ability to approximate any probabilistic density function without evaluating the target density, provided that simulated datasets can be generated from the probabilistic model. Thus, this neural network-based method avoids the key challenges associated with ABC. Furthermore, while it shares the optimization-based nature of variational inference, it does not require constructing a candidate family of densities and exhibits large-sample convergence.

The goal of this project is to evaluate the strengths and weaknesses of the deep learning approach in comparison to ABC and variational inference. Specifically, we aim to assess how efficiently and accurately it approximates the target density as model dimensionality or data size increases, and to quantify potential gains in computational speed.

Expertise and Skill Sets of Interest: The project requires (1) proficiency in Python, (2) experience using PyTorch or TensorFlow to implement convolutional, recurrent, or transformer-based neural network techniques, and (3) knowledge of high-performance computing (ICDS Roar Collab).

Expectations and Tasks: We seek a post-comprehensive-exam graduate student or postdoctoral researcher with training in Statistics or a related field (e.g., Applied Mathematics, Computer Science or Information Sciences and Technology) to: (1) generate simulated datasets from a model with an intractable or computationally expensive likelihood function, (2) fit suitable neural network models to each simulated dataset to produce output feature vectors (per simulation), and (3) conduct a binary neural network classification using the resulting feature vectors as a training set, with binary labels designed to yield an approximate likelihood function.

Level of Effort: 50% RA-ship for one semester during the academic year 2025–26. This project cannot cover tuition, so graduate students will need an additional funding source for tuition costs. Alternatively, the project may be pursued as a primary effort during summer 2026 (100% summer RA-ship), which does not require tuition coverage.

Outcomes: Successfully completing the tasks outlined above will make a significant contribution to Dr. Tak’s research group. The results will provide a foundation for various applications in astronomy, such as (1) modeling multi-band time-series data and (2) estimating black hole masses via reverberation mapping. They may also serve as preliminary results for external funding proposals to agencies such as NASA ROSES or NSF AAG.

Principal Investigator: Dr. Hyungsuk Tak (Statistics, Astronomy & Astrophysics, ICDS)

Relevant ICDS Hub: Data Sciences

Relevant ICDS Centers: Center for Astrostatistics & Astroinformatics

Connection to ICDS: This project addresses computational challenges in likelihood-based inference, particularly as more complex models lead to increasingly intractable likelihood functions and larger datasets make likelihood evaluations more expensive. The project requires intensive multicore computation for training neural networks and substantial storage for simulated datasets. ICDS Roar Collab will be the primary platform for computation and storage. If the project achieves results warranting external funding, ICDS computational resources will be included in future funding proposals.