Click any of the Data Science proposal summaries below for more information and to apply as a Junior Researcher.
Your deadline to apply is June 16.
Return to the complete list of available research opportunities.
Normalizing flows for Bayesian Model Comparison: Detecting Extrasolar Planets (Eric Ford)
This project compares the robustness and efficiency of different computational methods for performing Bayesian uncertainty quantification and model comparison to improve the sensitivity and robustness of surveys to discover and characterize low-mass planets. (Learn more and apply)
Geodetic inversion and optimization using physics-based FEMs models and AI (Christelle Wauthier)
We will develop and apply AI and computational modeling methods to volcanic processes that will have broader impacts on forecasting. (Learn more and apply)
Forecasting volcanic eruptions using data fusion (Christelle Wauthier)
We will develop and apply data sciences and AI methods to volcanic hazards processes and hope to improve eruption forecasting globally. (Learn more and apply)
Reverberation Mapping Time Lag Estimation via Deep Learning Neural Networks (Hyungsuk Tak)
Supermassive black holes, among the Universe’s most extreme objects, are fueled by mass accretion and exhibit variability in luminosity across the electromagnetic spectrum. This variability enables their growth to be tracked over vast cosmological distances, influencing the evolution of the Universe. (Learn more and apply)
Learning Linear Temporal Logic under Uncertainty for Sustainable Behavioral Change Interventions (Romulo Meira Goes)
The proposed project is an interdisciplinary project combining the fields of artificial intelligence and data science to address an important problem in the field of behavioral change: Which behavior patterns explain behavioral change? (Learn more and apply)
Inference on Multivariate Gaussian Processes via Deep Learning Neural Networks for Astronomical Time Series Data Analysis (Hyungsuk Tak, Eric Ford)
This project in astrostatistics and astroinformatics is inherently interdisciplinary, situated at the intersection of statistics and astronomy. It addresses the broader methodological challenge of evaluating costly Gaussian process likelihoods, with substantial implications for astronomical multi-band time series analysis. The project also requires intensive multi-core computing for training neural networks and significant data storage for simulated datasets. (Learn more and apply)
Using Artificial Intelligence (AI) to Understand Neural and Behavioral Variability (Xiao Liu)
We will develop and apply state-of-art AI models to understand brain functions. The project is also to understand the ANN from the perspective of the brain science. (Learn more and apply)
Dynamically Adjustable Queue to Optimize the Roar GPU Cluster (Guido Cervone)
The goal of this research is to optimize the queue for the Roar GPU cluster. (Learn more and apply)
Advancing Air Pollution Exposure Assessment with Machine Learning Techniques (Xi Gong)
We will develop and apply data science and ML/AI methods to environmental health science to advance understanding, response, and mitigation of air pollution’s adverse health effects. (Learn more and apply)
Quantum Algorithms and Quantum Enhanced Machine Learning for Transient Simulations in Large Scale Nonlinear Dynamical Systems (Xiantao Li, Yan Li)
By reformulating classical numerical algorithms as quantum workflows and benchmarking them on stateoftheart IBM, QuEra, and IQM hardware, we leverage the very “advanced computational and datascience approaches” ICDS seeks to promote. (Learn more and apply)
Informing the detection of flash drought events by mining and modeling media reports (Antonia Hadjimichael)
The project directly supports ICDS’s mission to advance computational and data science approaches to pressing societal challenges, by demonstrating the value of integrating diverse data sources for the detection of hydroclimatic hazards. (Learn more and apply)
Fire and climate change impacts in a tropical biodiversity hotspot (Rwenzori Mtns, Uganda): remote sensing to understand abrupt ecosystem change (Sarah Ivory)
This project will use remote sensing data (primarily Landsat, MODIS, ASTER) to reconstruct fire burned areas on a remote mountain. (Learn more and apply)
A tale of two [equatorial] mountains: state-space modeling of tropical plant communities from fossil data (Sarah Ivory)
In this project, we seek to use a community modeling approach, state-space modeling, to attribute climate drivers to ecosystem change on two equatorial African mountains in the past using fossil information. (Learn more and apply)
Development of an AI image classifier for detecting vulnerability of African ecosystems under changing climates using ancient data (Sarah Ivory)
In this project, we seek to develop a proof of concept for AI image classification of 4 African pollen taxa common in the fossil record. (Learn more and apply)
Development of a Web-Based Platform for Structured CryoEM Data Collection and Metadata Management (Jean-Paul Armache)
This project aims to develop a secure, user-friendly web-based platform to collect, store, and manage cryoEM data collection parameters in a complementary automated and manual approach. (Learn more and apply)
Reduced order modeling for supersonic and hypersonic aerodynamic flows via probabilistic machine learning (Ashwin Renganathan)
We will develop probabilistic AI/ML methods to reduce, interpret, and learn data. This project will include both large-scale data generation by running finite-volume based multiphysics codes on Roar Collab, as well as developing AI/ML methods on that data with GPU acceleration. (Learn more and apply)
Computational Mapping of Alternative Dispute Resolution Institutions (Cyanne Loyle)
By integrating social science expertise with computational tools, the project will provide foundational data for the social science community in the area of conflict management and peace building. (Learn more and apply)
Develop machine learning models to study cell-type-specific aging using single-cell methylation data in the Uzun Lab (Yasin Uzun)
Our goal is to develop a deep learning-based framework to predict cell-type-specific epigenetic age using single-cell methylation data. (Learn more and apply)
Development of standardized file format to maximize data shareability across disciplines (Jean-Paul Armache)
In this proposal, we intend on establishing a standardized file format designed for data sharing in reviews, or as project summaries. (Learn more and apply)
Spinal Fatigue Prediction in High-G Environments Using Human Digital Twins (Reuben Kraft)
This project develops a digital twin framework to evaluate spinal fatigue in pilots subjected to high G acceleration. (Learn more and apply)
Resource Request for Junior Researchers (Systems Engineering) Support for Social Sciences Research Computing (Lindsay Wells)
This proposal requests Junior Researchers to assist the ICDS Special Projects Team with the design and development of research computing solutions tailored to the needs of social science researchers. (Learn more and apply)
Integrating marine geochemistry, physical dynamics, and volcanology in the geological record: an Oceanic Anoxic Event 2 case study (Isabel Fendley)
For this project, the junior researcher will develop a computational framework to integrate geochemical models for each of the key proxies (Hg, Os, and Sr). (Learn more and apply)
Improving economic outcomes via AI-powered bank monitoring and risk management (Nonna Sorokina)
By integrating expertise in finance, economics, regulatory policy, and artificial intelligence, the initiative aims to build an AI-powered monitoring framework for banking risk management—particularly vital in today’s volatile interest rate environment. (Learn more and apply)
De-risking the commercialization of advanced nuclear reactors through innovative financing vehicles (Nonna Sorokina)
By developing innovative financial mechanisms—including pooled investment models, securitization strategies, and CDS-like instruments—this research synthesizes technical reactor design considerations with sophisticated computational modeling of risk and return. (Learn more and apply)
Your next-door neighbor, nuclear reactor: real estate and societal readiness (Nonna Sorokina)
By examining real estate dynamics around nuclear power plants and incorporating novel measures of public sentiment and societal readiness, the research brings together expertise from economics, urban planning, nuclear engineering, and computational social science. (Learn more and apply)
Developing Workforce-Informed Digital Twins for Smart Redevelopment Site Classification (Yuqing Hu)
This project addresses that gap by developing a graph-based digital twin framework to classify and prioritize redevelopment sites based on workforce and infrastructure readiness. (Learn more and apply)
Deep Learning Approximation to Intractable Likelihood Functions (Hyungsuk Tak)
We propose a project to investigate the advantages and limitations of using deep neural networks to approximate intractable or computationally expensive likelihood functions, in comparison to Approximate Bayesian Computation (ABC) and variational inference. (Learn more and apply)
Privacy-Preserving Linear Regression and Synthetic Data for Reproducible Social Science Research (Aleksandra Slavkovic)
This project aims to develop a novel method for DP linear regression that enables valid statistical inference and supports synthetic data generation. (Learn more and apply)
Volcanic (LIP) gas fluxes in geological history using geochemical models (Isabel Fendley)
The key goals of this project are a) to finalize the framework for data-model comparison (e.g., evaluate parameter choices, test various metrics for statistically comparing records), b) optimize the Earth system and Hg cycle code for computational efficiency and the same for parameter sampling in the Bayesian framework, and c) set up and run the model inversion on the Roar Collab Cluster. (Learn more and apply)
Linking Multidimensional Sleep Health to Cognitive Function in Older Adults Using Machine Learning (Sayed Reza)
This project will evaluate the relationship between sleep health and cognitive function in older adults by leveraging wearable device time series data and applying interpretable AI/ML techniques. (Learn more and apply)
Non-Invasive Turkey Body Weight Monitoring and Prediction via Deep Visual Time Series Analysis (Enrico Casella)
This project aims to develop a novel hybrid deep learning model that leverages longitudinal visual data, potentially combined with historical flocklevel time series information, to estimate current body weight, predict future body weight trajectories, and ultimately forecast final carcass weight in turkeys. (Learn more and apply)
Identify the causes of the signal-to-noise paradox in the North Atlantic Oscillation (Laifang Li)
The project will utilize advanced data analysis technique to address one of the most challenging Earth system predictability issues in the climate community. (Learn more and apply)
Test implementation of a distributed database system for privacy-preserving data analysis (Tim Brick)
The goal of this project is to develop a testing initial implementation of distributed database system for privacy-preserving data analysis in behavioral sciences. (Learn more and apply)
Web-based Model Visualization / Creation for Structural Equation Models (Tim Brick)
The goal of this project is to develop an initial implementation of a user interface to specify complex models in the extended Structural Equation Modeling (xSEM) framework. (Learn more and apply)
Address the impacts of changing ocean circulation on US hydroclimate (Laifang Li)
This project aims to answer climate system questions by synergistically using global climate model out from the North Atlantic Hosing Model Intercomparison Project and the numerical downscaling with regional climate models. (Learn more and apply)
Assessment of Geological CO Storage and Geothermal Resources in the Appalachian Basin and Globally (John Wang)
This interdisciplinary research project aims to assess the viability and sustainability of geological CO2 storage and geothermal energy resources in the Appalachian Basin and globally, with a strong focus on applying data science methods to climate and energy challenges. (Learn more and apply)
Classifying Weakly Detected Gamma-ray Transients (James DeLaunay)
This project will consist of finding the optimal way to perform classification on these weakly detected gamma-ray transients, by exploring different AI techniques, inputs, and training data. (Learn more and apply)
Mapping Language Model Failures Through Community Experience: A Study of Multilingual Researchers (Dana Calacci)
This project investigates how English as a Second Language (ESL) graduate students interact with Large Language Models (LLMs) like ChatGPT, focusing on how language proficiency shapes their experience of model failures, biases, and harms. (Learn more and apply)
Protein Misfolding, Mutations and the Emergence of Disease Phenotypes (Hyebin Song)
This project aims to identify and rank proteins containing structural motifs known as “non-covalent lasso entanglements” and assess their association with disease phenotypes. (Learn more and apply)
Topological Data Analysis for the Quantification of Prostate Cancer Heterogeneity (Justin D Silverman)
This project will develop computational tools to quantify the 3D morphology of prostate cancer glands, supplementing and potentially improving upon traditional tumor grading systems. (Learn more and apply)
An improved pipeline to detect astrophysical transients in Atacama Cosmology Telescope time-resolved survey data (Charlotte Ward)
In this project, the junior researcher will build an improved pipeline to extract light curves of low flux sources from multi-epoch imaging from the Atacama Cosmology Telescope. (Learn more and apply)
Analyzing Human and Social Dynamics Through Social Sensing (Xi Gong)
This project aims to expand the current study using social sensing for understanding spatial social networks and public perspectives on controversial social topics, also exploring dealing with the challenges inherited in social sensing research. (Learn more and apply)
Enhancing Road Safety Through Real-Time AI-Powered Drowsiness Detection and Alert system Using EEG Eye-Blink Artifacts (Daniel Otchere)
This proposal seeks to develop an innovative AI-powered system for detecting driver drowsiness through real-time analysis of EEG eye-blink artifacts. (Learn more and apply)
Application of Transformer-Based Machine Learning Models to Whole Organism Computational Phenomics (Keith Cheng)
To enable the first 3-dimensional whole-organism phenotyping that encompasses all cell types and organ systems, we propose to develop and optimize Transformer-based machine learning (ML) models capable of automatically segmenting and labeling regions of interest from high-resolution 3D micro-CT scans at unprecedented resolutions. (Learn more and apply)
Building Digital Twins of Personalized Models for Alzheimer’s Disease Prevention and Treatment (Zi-Kui Liu)
The proposed project aims to develop a Zentropy-Enhanced Neural Network (ZENN) that learns the configurations, total energy, and entropy of brain states using data related to Alzheimer’s disease (AD). (Learn more and apply)
Predicting HIV care loss-to-follow-up using machine learning (Kathryn Risher)
Our project aims to develop an ML model to predict patient LTFU from HIV care, trained on data from PLHIV in the Penn State Comprehensive Care Clinic and TriNetX. (Learn more and apply)
Advanced Deblurring of Electron Beam Induced Motion for High-Resolution CryoEM 3D Reconstructions using Electron Event Data (Wen Jiang)
The primary goal of this project is to develop and implement a novel deblurring methodology for cryo-EM data that leverages the high temporal and spatial resolution of electron event recordings. (Learn more and apply)
Building Toolbox to Characterize the NEID Earth Twin Survey Detection Efficiency (Suvrath Mahadevan)
This project aims to develop performant, parallelized statistical tools to characterize the detection efficiency of the NEID Earth Twin Survey as a function of the mass, orbital period and eccentricity of exoplanets orbiting the target stars. (Learn more and apply)
Better Left Unsaid: Preventing Hallucinations by Learning Abstention (Dongwon Lee)
The project aims to explore a few ideas and produce a prototype with preliminary results. The participating junior researchers will have an opportunity to contribute to scientific publications in top AI venues, while PI aims to use the preliminary findings to pursue an external grant program at the NSF. (Learn more and apply)
Exoplanet Demographics Combining Multiple Detection Method (Eric Ford)
This project aims to develop simulation-based inference (SBI) tools for characterizing the intrinsic distribution of exoplanets while combining observational constraints from multiple exoplanet detection techniques. (Learn more and apply)
Development of Data-based AI-driven Toolkits for Energy Industry Using Distributed Fiberoptic Sensing (Shimin Liu)
In this project, we aim to develop and optimize a robust data analytics pipeline tailored specifically for high-volume DAS datasets generated from industry generated data set in mining and oil and gas fields. (Learn more and apply)
Predict Arctic Sea Ice Variability from Atmospheric River Activities and the Time of Arrival of Ice-free Arctic (Laifang Li)
In this project we propose to apply deep-learning models (e.g., convolutional neural networks; CNN) to predict Arctic sea ice variability based on the life cycle of ARs. (Learn more and apply)