News & Events


Twitter data from National Park visitors may help provide feedback for officials. This type of data collection would be less expensive and less labor-intensive than face-to-face surveys, according to Penn State researchers. IMAGE: WIKIMEDIA

Mining Twitter data may help National Parks staff gather feedback faster

Posted on August 21, 2020

UNIVERSITY PARK, Pa. — The National Park system has been referred to as one of America’s national treasures. A team of Penn State researchers in the department of Recreation, Park and Tourism Management and the Social Science Research Institute, report that mining tweets about the park may open up a rich vein of information that could lead to better service for park visitors while still protecting these national treasures and their wildlife.

In a study of Twitter data from Yellowstone National Park visitors, a team of researchers found that, in some cases, analyzing Twitter data from park visitors can be just as accurate as information gathered from visitor surveys at parks. The findings can help park officials gather feedback from thousands of visitors and this represents an efficient and timely way to better understand park visitor behavior, said Bing Pan, associate professor of Recreation, Park and Tourism Management and an Institute for Computational and Data Sciences associate.

“On one hand, the National Park Service has a mission to preserve these natural and cultural resources, and, on the other hand, it must allow the public to enjoy these resources,” said Bing Pan, “But, there can be a conflict between those two goals. When the parks are too crowded, it may reduce enjoyment, or even damage those valuable resources.”

The researchers, who reported their findings at the 2020 Travel and Tourism Research Association Conference, found data collected from gender and certain age groups during face-to-face surveys matched data gathered through Twitter. Not all demographic data matched up with the surveys, however, Pan added. For example, Twitter tends to attract younger users, which skewed the results for some ages. However, Pan said that researchers could apply statistical methods, such as adjusting weights, or probabilities, that could make the Twitter data results better match up with survey data.

“When we know the differences, we can do some post-hoc weighting to make the results match the real population,” said Pan.

If Twitter data can be both reliable and valid, this information would offer several advantages, including the availability of data on a near real-time basis, he added. Because people are always tweeting about their experiences, the parks could gather and analyze feedback nearly instantly compared to organizing and executing a survey project, a process that could take weeks or months. Crowd-sourced data such as Tweets is a great complementary data source to survey research.

“With Twitter data, first, it’s publicly available and, second, it’s out there right now,” said Pan.

Pan is intrigued by future research prospects, including using Twitter data to conduct surveys on the population of wildlife at the parks and even examining photographs for signs that people may be getting too close to the wildlife, or certain physical attractions.

“There are new ways that software can analyze photographs to determine the distance between the photographer and the object,” said Pan. “This would be great for parks. Let’s say a person takes a picture of a bison or a geyser. We could determine if the person who took the picture was too close and that might prompt officials to put up more caution signs and give out information to warn people about what could happen if they get too close.”

Data in the study was collected through surveys and from Twitter. Park managers gathered survey from 647 hikers and visitors on the Mount Washburn and the Lonestar Geyser trails in Yellowstone National Park during the summer of 2016. The survey gathered demographic data, including age, gender and race. Geo-tagged Twitter data — which contains location-based information — was collected from Twitter from January to December 2016. In addition to location data, the information contained the username, image and the time the tweet was posted.

Pan worked with Yun Liang, a doctoral student of Recreation, Park and Tourism Management; Zach Miller, former research assistant professor with Protected Areas Research Collaborative in the same department and now assistant professor at Utah State University; Junjun Yin, assistant research professor, the Population Research Institute and ICDS associate; Guangqing Chi, professor of rural sociology, demography, and public health sciences; director of Computational and Spatial Analysis Core, Social Science Research Institute and ICDS associate, all of Penn State; Clio Andris, assistant professor in the city and regional planning and interactive computing, Georgia Institute of Technology; Jack Jorgensen, of RRC Associates; and Norma Nickerson, research professor, University of Montana.

The research is supported by a seed grant funded by ICDS and Social Science Research Institute.


Related Posts