Artificial intelligence predicts river water quality with weather dataPosted on May 20, 2021
UNIVERSITY PARK, Pa. — The difficulty and expense of collecting river water samples in remote areas has led to significant — and in some cases, decades-long — gaps in available water chemistry data, according to a Penn State-led team of researchers. The team is using artificial intelligence (AI) to predict water quality and fill the gaps in the data. Their efforts could lead to an improved understanding of how rivers react to human disturbances and climate change.
The researchers developed a model that forecasts dissolved oxygen (DO), a key indicator of water’s capability to support aquatic life, in lightly monitored watersheds across the United States. They published their results in Environmental Science & Technology.
Generally, the amount of oxygen dissolved in rivers and streams reflects their ecosystems, as certain organisms produce oxygen while others consume it. DO also varies based on the season and elevation, and the area’s local weather conditions cause fluctuations, too, according to Li Li, professor of civil and environmental engineering at Penn State.
“People usually think about DO as being driven by stream biological and geochemical processes, like fish breathing in the water or aquatic plants making DO on sunny days,” Li said. “But weather can also be a major driver. Hydrometeorological conditions, including temperature and sunlight, are influencing the life in the water, and this in turn influences the concentration levels of DO.”
Hydrometeorological data, which tracks how water moves between the surface of the Earth and the atmosphere, is recorded far more frequently and with more spatial coverage than water chemistry data, according to Wei Zhi, postdoctoral researcher in the Department of Civil and Environmental Engineering and first author of the paper. The team theorized that a nationwide hydrometeorological database, which would include measurements like air temperature, precipitation and stream flow rate, could be used to forecast DO concentrations in remote areas.
“There is a lot of hydrometeorological data available, and we wanted to see if there was enough correlation, even indirectly, to make a prediction and help fill in the river water chemistry data gaps,” Zhi said.
“A seed grant from Penn State’s Institute of Computation and Data Science supported this research.”
The model was created through an AI framework known as a Long Short-Term Memory (LSTM) network, an approach used to model natural “storage and release” systems, according to Chaopeng Shen, associate professor of civil and environmental engineering at Penn State.
“Think of it like a box,” Shen said. “It can take in water and store it in a tank at certain rates, while on the other side releasing it at different rates, and each of those rates are determined by the training. We have used it in the past to model soil moisture, rain flow, water temperature and now, DO.”
The researchers received data from the Catchment Attributes and Meteorology for Large-sample Studies (CAMELS) hydrology database, which included a recent addition of river water chemistry data from 1980 to 2014 for minimally disturbed watersheds. Of the 505 watersheds included in the “CAMELS-chem” data set, the team found 236 with the needed minimum of ten DO concentration measurements in the 35-year span.
To train the LSTM network and create a model, they used watershed data from 1980 to 2000, including DO concentrations, daily hydrometeorological measurements and watershed attributes like topography, land cover and vegetation.
According to Zhi, the team then tested the model’s accuracy against the remaining DO data from 2001 to 2014, finding that the model had generally learned the dynamics of DO solubility, including how oxygen decreases in warmer water temperatures and at higher elevation. It also proved to have strong predictive capability in almost three-quarters of test cases.
“It is a really strong tool,” Zhi said. “It surprised us to see how well the model learned DO dynamics across many different watershed conditions on a continental scale.”
He added that the model performed best in areas with steadier DO levels and stable water flow conditions, but more data would be needed to improve forecasting capabilities for watersheds with higher DO and streamflow variability.
“If we can collect more samples that capture the high peaks and low troughs of DO levels, we will be able to reflect that in the training process and improve performance in the future,” Zhi said.
Penn State researchers Dapeng Feng, doctoral candidate in environmental engineering, and Wen-Ping Tsai, postdoctoral researcher in the Department of Civil and Environmental Engineering, and University of Nevada, Reno researchers Adrian Harpold, associate professor of mountain ecohydrology, and Gary Sterle, graduate research assistant in hydrological sciences, also contributed to the project.
A seed grant from Penn State’s Institute of Computation and Data Science, the U.S. Department of Energy Subsurface Biogeochemical Research program, and the National Science Foundation supported this research.