Statistical association datafest competition to be held March 25-27Posted on February 22, 2022
UNIVERSITY PARK, Pa. — Penn State’s Department of Statistics will host its seventh annual American Statistical Association DataFest competition in partnership with Nittany Data Labs (NDL) on the weekend of March 25-27, 2022. During the event, teams work to analyze a large, real-world dataset provided by an external organization.
Students from undergraduate and master’s degree programs throughout the Commonwealth of Pennsylvania are invited to participate in the event. In a new virtual setting for 2021, Penn State DataFest saw 23 teams compete. This year, the event will be held in a hybrid format taking place at both the University Park campus and online simultaneously. The opportunity to experience a real-world data science career situation remains the same.
“This is a great opportunity for students to work with data that’s out of the ordinary, get creative with visualizations, and hone skills that employers are looking for in a low-stakes environment,” said Bob Carey, a data analyst in the Department of Statistics and lead organizer of DataFest at Penn State.
During the 48-hour competition, teams will work to “find and share meaning in a large, rich, and complex dataset,” according to the American Statistical Association. Each team must present a short 3-minute video summarizing their findings to a panel of judges. Presentations are judged based on creativity, visualization, use of external data, and communication, but only one team will win Best in Show. The final vote is up to participants to decide. Donors of datasets in previous years have included the Rocky Mountain Poison and Drug Safety Center, the Canadian Women’s National Rugby Team, Ticketmaster, Indeed, and Expedia. To level the playing field, the dataset and donor are kept secret until the start of the event.
Scattered throughout the weekend are a series of bonus challenges in which individuals and teams alike will compete for prizes. Programming and Data Visualization workshops will be held with the University Libraries’ Research Informatics and Publish (RePub) group as well as Amazon Web Services (AWS) to sharpen analytical skills. Graduate student mentors will also be available to answer questions throughout.
“The bonus challenges are designed to keep people engaged throughout the event – some of them are data-related and then other ones are just for fun,” said Carey.
The use of a real-world data set, exposure to corporate sponsors, and info sessions geared toward industry and graduate school make the event unique among data competitions. Penn State Career Services will coach students in resume development and self-promotion to potential employers.
“Students at DataFest can meet one-on-one with sponsors and give them a resume of they’re interested,” said Carey. “The whole event is basically a portfolio tool.”
Translating real-world data rich with depth and information requires collaboration and teamwork, a foundation of DataFest.
“The real-world data sets involved in DataFest are rich and messy,” said Neil Hatfield, assistant research professor of statistics, “Partnered with the competition’s time constraint, the team-nature of statistics and data analysis is fully on showcase. Each member of the team brings different insights and approaches that allows for the team to develop a fuller understanding of what’s going on.”’
Interested individuals are encouraged to sign up to be placed into a team or form a team of three to five members on their own. Food will be provided for those that attend in person. For more information and to register for the competition, visit datafest.psu.edu. Follow @PSUDatafest for the latest updates. Extra precautions will be taken to limit the spread of the SARS-CoV-2 virus.