Reverberation Mapping Time Lag Estimation via Deep Learning Neural Networks
PI: Hyungsuk Tak (Statistics, Astronomy and Astrophysics) William Nielsen Brandt (Astronomy & Astrophysics)
Supermassive black holes, among the Universe’s most extreme objects, are fueled by mass accretion and exhibit variability in luminosity across the electromagnetic spectrum. This variability enables their growth to be tracked over vast cosmological distances, influencing the evolution of the Universe. Black hole mass is fundamental to understanding SMBH-galaxy coevolution, and reverberation mapping provides a reliable method for mass measurement by estimating time lags between multi-band time series data sets of active galactic nuclei; see Figure 1 for an illustration of the time lag.
One of the most popular model-based methods for inferring reverberation mapping time lags is JAVELIN [1, 2]. However, it has not been updated since 2016 and lacks recent advances in modeling and computational techniques. To address this, the main stochastic driver of JAVELIN should be improved from the current, simplest Ornstein-Uhlenbeck process to the more general continuous-time autoregressive moving average processes [3], which represent a major advance in modeling complex astronomical time series data. Computational challenges arise because the likelihood function of the advanced model contains intractable integrals. Preliminary work has shown that traditional statistical model-fitting tools, such as Markov chain Monte Carlo methods or nested sampling, are infeasible due to the intractable likelihood function. However, recent advances in machine learning techniques demonstrate promising results in approximating such intractable likelihood functions using deep learning neural networks, e.g., recurrent neural networks or transformer-based neural networks [4, 5, 6]. Thus, the primary goal of this project is to implement these deep learning neural network methods to estimate time lags in reverberation mapping by fitting the advanced physical model to multi-band time series data.
Expertise and Skill Sets of Interest:
The project requires (1) proficiency in Python, (2) experience using PyTorch or TensorFlow to implement recurrent neural network techniques with LSTM (Long Short-Term Memory) or GRU (Gated Recurrent Unit), or transformer-based neural networks, and (3) knowledge of high-performance computing, such as ICDS Roar Collab.
Expectations and Tasks:
The project expects a post-comps graduate student or a postdoc with training in Statistics, Astronomy & Astrophysics, or a related field (e.g., Applied Math, Computer Science, IST, Physics) to (1) generate simulated data sets from the continuous-time autoregressive moving average process using one of the publicly available Python (or R) code packages, (2) fit recurrent or transformer-based neural network models to each simulated time series data set to produce an output vector of features (per simulation), and (3) conduct binary neural network classification using multiple feature vectors as a training set with binary labels designed to yield an approximate likelihood function.
Level of Effort:
50% RA-ship for one semester during the academic year 2025–26. This project 1 2 of 3 Tak, Hyungsuk – #27363cannot cover tuition, so graduate students will need another source of funding to cover tuition costs. Alternatively, it can be a primary project over the summer of 2026 (100% summer RAship), which does not require tuition.
Outcomes:
Successfully completing the three tasks specified above will significantly contribute to the research of Dr. Tak’s and Dr. Brandt’s groups, and the results will form the basis for a scientific article or a funding proposal to external agencies, such as NASA ROSES or NSF AAG.
Relevant ICDS Hub: Data Sciences Relevant
ICDS Centers: Center for Astrostatistics & Astroinformatics
Connection to ICDS: This interdisciplinary project, at the interface of statistics and astronomy, addresses computational challenges in astronomical data analysis using advanced machine learning techniques. Conventional statistical model-fitting methods, such as Markov chain Monte Carlo and nested sampling, are not feasible for this work. The project also requires intensive multicore computation for training neural networks and substantial storage for numerous simulated datasets. Therefore, ICDS Roar Collab will serve as the primary source of computational and storage resources. If the project makes significant progress that warrants external funding, ICDS computational resources will be included in the corresponding budget proposals.