UNIVERSITY PARK, Pa. — In her 20 years of bee research, Christina Grozinger had not faced a data management problem quite like the one she encountered in 2020. Grozinger, Publius Vergilius Maro Professor of Entomology at Penn State, studies ways to counteract declining bee populations, and her research requires her to acquire, send and analyze data across teams in a precise way.
“There’s tremendous interest in understanding what flowering plants bees preferentially use for collecting nectar to make honey, or pollen to feed to their developing larvae,” said Grozinger, who also is director of Penn State’s Center for Pollinator Research and associate director of the College of Agricultural Sciences’ Institute for Sustainable Agricultural, Food, and Environmental Science (SAFES).
“Researchers, land managers and beekeepers all would like to know what the key plant species are in different regions and times of year,” said Grozinger. “We can also link these plant communities to land use patterns or climate conditions, to predict how bees will perform in different locations. But to do this, we need a substantial amount of data, which needs to be accessible to large collaborative teams.”
Grozinger’s research group conducts DNA analyses of pollen and honey to link those samples back to their floral source. She is working on multiple projects with different teams that use this approach, and she realized there was an opportunity to integrate across these projects, and across spatial, genomic and plant distribution data sets. However, Grozinger said, data for different projects were being stored on separate spreadsheets and databases, which were challenging to share and integrate.
After a colleague told her about a team of computational scientists known as Research Innovations with Scientists and Engineers (RISE) in the Institute for Computational and Data Sciences (ICDS), Grozinger felt hopeful about a solution to her data science challenges.
“If you want to use cutting-edge computational tools, you have to know what they are to make that connection with your research. The RISE team was able to bridge that gap.”
—Karen Fisher-Vanden, professor of environmental and resource economics and public policy
The RISE team includes team members with a variety of software engineering and computational science skills, ranging from optimizing complex computational codes, to building custom web platforms and data management infrastructure, to data visualization. After an initial consultation, Grozinger partnered with Danying Shao, research and development engineer, who then had several more consultations with Grozinger and the other researchers on the project.
“I created a database and a web application for the team to manage the meta data throughout a pipeline that includes sample collection, DNA sequencing and downstream analysis, and easily share this across their team,” said Shao.
This solution was a resounding success, said Grozinger. She said that the data management platform that Shao developed will serve as foundational research infrastructure. Already, Grozinger is using this platform for multiple projects, and she expects to continue scaling up its uses.
Connecting and optimizing computer models
Grozinger had heard about the RISE team through Karen Fisher-Vanden, professor of environmental and resource economics and public policy, who learned about RISE as a member of ICDS’s Coordinating Committee, a faculty group that provides feedback on ICDS’s strategic initiatives.
Fisher-Vanden and her research team, the Program on Coupled Human and Earth Systems, had been struggling with integrating individual system models to be able to capture important feedbacks between water, power, agricultural and economic systems. Hearing about the RISE team’s services, she said she felt that help from RISE might be exactly what her team needed to overcome the computational challenges they were facing.
“If there is water scarcity in a specific region, we are not just interested in how that impacts one sector, such as the agriculture, but also how it impacts sectors with competing demands for that water, say, the power system and urban areas,” said Fisher-Vanden, who also directs the College of Agricultural Sciences’ Institute for Sustainable Agricultural, Food, and Environmental Science (SAFES). “To study this, we’re coupling computational models that were developed by researchers in different disciplines, which is a huge computational challenge because of how the models differ in spatial and temporal scales.”
For instance, according to Fisher-Vanden, the power system model optimizes at an hourly and spatial grid scale, whereas the economic model optimizes at a yearly and state-level scale. Water scarcity may cause certain power generators to go offline, leading to spikes in electricity prices and potential outages. Consumers of electricity will respond to these price spikes by reducing demand for electricity which will reduce the need for electricity generation. To capture these feedbacks, the two models must pass information to each other, re-optimize, and iterate until convergence is reached. Writing the code to automate and manage this process in an efficient way posed a challenge to the team.
Collaborating with ICDS’s RISE team helped the team address this computational challenge. External funding from the Program on Coupled Human and Earth Systems provided support for one RISE team member’s time for several months. Fisher-Vanden’s team partnered with Jeff Nucciarone, a research and development engineer, whose expertise is in optimizing and parallelizing computer code. Both optimization and parallelization allow code to run faster by eliminating unnecessary steps in the code and breaking the code down into chunks that can run simultaneously.
“I wrote a parallelizer, which used an interface to manage 52 separate processes that would run at same time,” he said. “It also included logic to detect common failure modes, so if the code detected failure for any of the 52 processes, it would restart quickly. Improving this step allowed greater automation of the workflow.”
The result reliably and efficiently connects the power system model and the economic model, a first step in the team’s process. Now, Fisher-Vanden’s team is working with RISE to integrate other models into this coupled system, specifically a water balance model and a crop/land-use model. They are also exploring whether machine learning techniques can help identify stress points in the coupled system. This could help inform decision-makers when and where older power plants should be retired, for example. These types of decisions are typically made on a state-by-state basis, but the impacts often extend beyond state lines. Being able to quantify these impacts could improve future decision-making, said the researchers.
Providing RISE time to agricultural sciences researchers
Fisher-Vanden and Grozinger praised the RISE team’s versatility and their ability to translate information between the worlds of data science and the researchers’ respective domains.
“If you want to use cutting-edge computational tools, you have to know what they are to make that connection with your research,” said Fisher-Vanden. “The RISE team was able to bridge that gap.”
After their positive experience of collaborating with RISE, Fisher-Vanden and Grozinger sought to expose others in the College of Agricultural Sciences to this valuable resource through a joint SAFES-RISE seed grant competition. Through this seed grant program, researchers can apply to be allocated time with RISE team members who can address data science or computational science challenges. The program mirrors a similar program, established by ICDS and funded by the National Science Foundation, which is designed to enable computational research at the University scale.
“Many faculty members are used to having everything run in our own lab, but for these types of data science challenges, we need help,” said Grozinger. “We have expertise within our own fields of genomics, organismal biology, and ecology — we do not have the training or expertise in computational data science that is needed for constructing these sophisticated systems. The RISE team provides a great system for having access to a team of skilled specialists.”
Researchers in the College of Agricultural Sciences can apply for a SAFES-RISE seed grant through May 31. Researchers in other Penn State colleges or campuses can also apply for RISE time through the ICDS RISE seed grant program, which will be offered each semester through 2023.