Development of Data-based AI-driven Toolkits for Energy Industry Using Distributed Fiberoptic Sensing
PI: Shimin Liu (EMS)
Background: The energy industry increasingly relies on real-time subsurface monitoring for applications such as geothermal energy, CO₂ sequestration, and mining safety. These operations require early detection of microseismic events, stress changes, and fluid movement to ensure safety and efficiency. Distributed Acoustic Sensing (DAS) offers a scalable, cost-effective solution by converting existing fiber-optic cables into dense arrays of seismic sensors. DAS enables high resolution, continuous monitoring over long distances, making it well-suited for detecting geohazards and tracking subsurface changes. Originally developed in seismology, DAS now plays a vital role in energy applications, measuring strain via Rayleigh back-scattering with meter-scale spatial resolution and high sampling rates—far surpassing traditional geophone systems. For a given time period, DAS datasets can produce orders of magnitude more data than traditional passive seismic experiments (compared to traditional geophones that offer sparse isolated measurement).
Challenges Associated with DAS Datasets: DAS systems, while offering unprecedented spatial resolution and continuous coverage, produce high-dimensional, high-frequency datasets that present significant analytical challenges. Unlike traditional seismic instruments, DAS arrays generate data streams across thousands of sensing points, resulting in complex, noisy, and voluminous time-series that are not uniformly distributed in either time or space. Especially, the noise composition of DAS tends to be more complex due to its different self-noise, common-mode noise, and traffic noise for often along-road fibers and can get further complicated with shorter virtual channel spacing (as required for high resolution data collection). This heterogeneity, compounded by environmental noise, anthropogenic interference, and instrument related distortions, complicates signal interpretation and renders conventional seismic processing methods insufficient. Event detection becomes particularly difficult due to overlapping signal patterns from diverse sources – blasting, natural micro-seismicity, and surface activity – all embedded within fluctuating noise levels. Moreover, sparse network calibrations and under-sampled regions further reduce detection sensitivity and bias catalog statistics, especially in the absence of labeled datasets. As DAS networks continue to produce continuous field data, scalable and intelligent analytical frameworks become essential to distinguish signal from noise, automate event classification, phase-picking and enable meaningful extraction of subsurface insights. Advanced approaches, including semi-supervised and deep learning methods, offer promising avenues to harness the full potential of DAS technology, but require carefully structured data pipelines tailored to the unique structure and scale of DAS arrays.
Project Description: In this project, we aim to develop and optimize a robust data analytics pipeline tailored specifically for high-volume DAS datasets generated from industry generated data set in mining and oil and gas fields. As a preliminary proof of concept, we conducted a field fiber-optic deployment and a conventional seismic survey at one of our industry partner’s sites in Pennsylvania. We collected extensive DAS field data from multiple deployments, where DAS interrogators recorded strain rate signals across number of channels using high-frequency sampling configurations. Additional datasets are continuously being acquired through ongoing and future blasting campaigns, leveraging multiple DAS fiber routes with varied spatial resolutions and sensor geometries. For comparison, co-located geophone arrays have been deployed along the same fiber routes, providing seismic data for cross-validation and benchmarking. This co-deployment enables direct comparison between geophone and DAS recordings, allowing us to evaluate DAS sensitivity, confirm event detections, and refine phase arrival picks through hybrid interpretation. These datasets capture a wide range of signal types – including blast induced seismicity, anthropogenic noise, and environmental ground motion – offering both opportunities and challenges in interpretation. Our proposed work will develop data-based AI-driven toolkits for signal processing, event detection, phase picking and multi-class event classification using machine learning techniques. Specific tasks will include data denoising, feature extraction, dimensionality reduction, and the application of supervised and semi-supervised learning models (e.g., CNNs, transformer-based models) for robust event identification and phase picking. By building a scalable, generalizable analysis pipeline, this project will enable consistent and automated interpretation of dense DAS arrays and serve as a foundation for future research in real-time subsurface diagnostics and intelligent seismic monitoring.
Expected Deliverables: (1) Toolkits for phasing picking and classification; (2) External proposals to Oil and Gas companies including Aramco, Chervon, JKLM energy; (3) Grant proposal submissions to NSF, ARPA-E, DOE, DOD; (4) External proposals to Mining Companies including CRH Company, Graymont Inc, Rosebud Mining Company, etc.
1. Principal Investigator (PI) who would oversee the proposed project and reporting.
Shimin Liu
2. Any other senior or junior team members who would contribute to the project (e.g., a faculty member in another department who would provide expertise complementary to that of the PI). Indicate which members are willing to serve as mentors for an ICDS Junior Researcher.
Ang Liu, Tieyuan Zhu
3. Departments and Units of PI and team members.
Shimin Liu, Ang Liu – Department of Energy and Mineral Engineering, Tieyuan Zhu – Department of Geosciences
4. The ICDS Hub/Area (s) most relevant to the proposed project (to assist in organizing reviews). Choose from: AI, Computational Sciences, Data Science, Digital Twins, and Quantum)
Data Science and AI
5. State if any ICDS-affiliated Centers relevant to the proposed project (will not weight in evaluation but only to assist in the reviewing process)
None
6. The level of effort appropriate for the proposed project (e.g., 2 semesters at 25% RA). Remember that an average of 25% of the ICDS Junior Researcher’s effort is recommended for contributions to another research project, ICDS service and/or engagement in ICDS activities.
Postdoc – 50 %
7. Plan for funding tuition (for graduate students) or the remainder of the researcher’s salary (for postdocs or research faculty)
PI and Co-PI will collaboratively support 50% of the Postdoc from research different projects to explore the new multidepartmental collaboration. We request 50% postdoc support from ICDS.
8. A brief (up to 1 page) description of the proposed project. There should be sufficient detail that potential Junior Researchers can recognize if they would be interested in contributing to this project. They will be required to write a more detailed proposal describing how they would contribute to the project. In Page 3:
9. A list of specific areas of computational and/or data science expertise or skills that the current team is particularly interested in recruiting to support the project. This could be fairly general (e.g., applying machine learning to time series) or very specific (e.g., experience building, training and validating neural networks built using Flux.jl or JAX; experience parallelizing numerical C/C++ code using OpenMP; etc.)
Machine learning for time-series analysis, Signal denoising and filtering techniques, Geospatial analytics and visualization, Time-series database management and annotation tools
10. Any other requirements or expectations of potential ICDS Rising Researchers (e.g., currently a post-comps graduate student in a related field; regular availability for group meetings Wednesdays 10-11am).
None
11. A list of specific objectives for work supported by this call (e.g., generating preliminary data to inform an upcoming decision or to support a grant proposal, submitting a scientific paper). Potential Rising Researchers will be encouraged to draw from any/all of these objectives for their proposal.
Generate preliminary results to support future research grant proposals and Ph.D. student funding.
Develop and validate a scalable data analytics pipeline for ongoing and future DAS datasets.
Prepare curated datasets and visualizations to inform scientific publications and conference submissions.
Enable future machine learning-based event classification through annotated training data and baseline models.
Establish a foundation for long-term interdisciplinary research involving geophysics, AI, and high-performance computing.
12. At least one medium-to-long-term goal (e.g., a successful proposal to a specific call).
a. Submit a proposal to ARPE-E (DE-FOA-0003467: Seeding Critical Advances for Leading Energy technologies with Untapped Potential (SCALEUP) Ready) for novel hydraulic fracturing for US National Energy Security
b. Submit a proposal to the NSF’s Disaster Risk and Resilience initiative, aiming to develop advanced machine learning pipelines for real-time analysis of Distributed Acoustic Sensing (DAS) data, enhancing disaster risk assessment and resilience planning.
c. Engage with Aramco (Houston Technological Innovation Center) for their Mid-land operation for the hydraulic fracture diagnostic. We will develop the Fiberoptic DAS data processing framework that enables large-scale, cost-effective deployment for real-time fracking job monitoring. The engagement will be through PI-Co-led Subsurface Energy Recovery and Storage Jointed Industry Partnership (https://sites.psu.edu/sersjip/)
d. Work with CRH Company at Silver Spring Mine in Pennsylvania for mining ground vibration monitoring through fiberoptics deployment. We will leverage ICDS resources to build resilient and sustainable mineral extraction for the betterment of Pennsylvania Commonwealth since PA is the third largest stone producer in US. CRH company has allowed us to collect some preliminary data for one of their blasting and we need to build a data-based and AI-driven ground vibration tool for the industry to employ for optimizing their extractive operations.
13. A short statement (1 sentence to 1 paragraph) explaining the connection of the project to ICDS’s mission.
This proposed project aligns with ICDS’s mission by synthesizing geophysical domain expertise with advanced data science and machine learning approaches to develop scalable analytics for high-dimensional Distributed Acoustic Sensing (DAS) datasets. Through interdisciplinary collaboration, high-performance computing, and intelligent signal processing, the project addresses scientifically and societally important challenges in seismic event detection and infrastructure monitoring, exemplifying ICDS’s vision of enabling transformative, data-driven discovery.
14. A paragraph summarizing team members’ recent and/or planned engagement with ICDS.
PI’s group is a long-term and active user for ICDS resources for one of our underground mine ventilation projects. PI would like to explore the new opportunity in the geophysical discipline with Dr. Tieyuan Zhu from a different department (Department of Geosciences) to explore future industry-oriented research project. PI’s group has purchased computing allocations and has a Service level agreement for cores and storage with ICDS. PI’s group mainly utilizes ICDS resources for COMSOL and other modeling activities.