Protein Misfolding, Mutations and the Emergence of Disease Phenotypes (Faculty/Rising Researcher Collaboration Opportunity)

Protein Misfolding, Mutations and the Emergence of Disease Phenotypes

PI: Hyebin Song (Statistics)

Apply as Rising Researcher 

Plan for funding tuition for graduate students, or the remainder of the researcher’s salary for postdoc and research faculty: Existing NSF funding through the National Synthesis Center for Emergence in the Molecular and Cellular Sciences (NCEMS) Award No. #2335029 will be used.

Project Narrative: This project supports a working group within the U.S. National Science Foundation (NSF) National Synthesis Center for Emergence in the Molecular and Cellular Sciences (NCEMS) at Penn State. NCEMS aims to drive multidisciplinary collaboration by synthesizing publicly available research data to address fundamental scientific questions at the intersection of data science and molecular and cellular biology. Specifically, this project aims to identify and rank proteins containing structural motifs known as “non-covalent lasso entanglements” and assess their association with disease phenotypes. Proteins are essential for life, but when they misfold, they often fail to carry out their proper function. This loss of function has the potential to give rise to disease phenotypes. Our project explores whether proteins with entanglements in their native structure, which are more prone to misfolding, are linked to disease. By analyzing and synthesizing existing structural, sequence, and gene-disease association data, we aim to uncover hypothesized relationships between entangled proteins and disease. These insights could reveal previously unrecognized causes of disease and provide a new perspective on their molecular origins. This project will advance our fundamental understanding of the interplay between structure, function, and sequence – and how, in turn, this can lead to the emergence of disease. Through NCEMS’s international collaboration, with working group members spanning 43 institutions across 18 U.S. states and six countries, the Junior Researcher will gain unique exposure to a global scientific network, raising both the research profile and international visibility of Penn State.

Project Objectives:

● Data Preparation: Compile and harmonize protein structural and disease-association datasets (e.g., DisGeNET), ensuring data quality and consistent identifiers.

● Descriptive Analysis: Generate summary statistics and visualizations of protein entanglement distributions relative to disease associations.

● Statistical Association Testing: Conduct rigorous statistical testing to evaluate the relationship between protein entanglements and disease occurrences across disease classes.

● Effect Size Estimation: Quantify the strength of these associations using robust measures such as odds ratios and relative risk, providing clear biological interpretations.

● Hypothesis Generation: Explore potential mechanistic insights underlying significant associations by analyzing protein functions, structural domains, and relevant biological pathways.

Required Expertise/Skills: Wrangling, transforming, integrating, harmonizing, heterogeneous biological datasets; statistical modeling & enrichment analysis (hypothesis testing, effect-size estimation, multiple-testing correction); resampling & permutation frameworks; reproducible R programming; classification of tabular data using machine learning

Medium to Long-Term Goals: The immediate outcome will be a rank-ordered list of entangled proteins strongly associated with disease phenotypes, serving as foundational data for future inquiry. A medium-term to long-term goal includes submission of a manuscript detailing statistical associations and potential biological implications, contributing to broader understanding of protein folding diseases.

Interdisciplinary Components: This project synthesizes expertise from computational biology, structural bioinformatics, statistical genetics, and disease ontology. It transcends traditional disciplinary boundaries, aligning with NCEMS’s commitment to community-scale synthesis research and ICDS’s mission of addressing complex scientific and societal issues through interdisciplinary computational approaches.

Mentorship and Team Integration: The graduate student supported by this funding will receive comprehensive mentorship and support from the following interdisciplinary experts:

● Hyebin Song: PI, NCEMS Working Group co-lead, Assistant Professor, Penn State Department of Statistics

● James Stephenson: NCEMS Working Group co-lead, data scientist at the European Bioinformatics Institute, Cambridge, United Kingdom

● Ian Sitarik: NCEMS Staff Scientist, ICDS RISE Engineer

● Maowei Dong: NCEMS Project Manager, Huck Institutes of the Life Sciences

● Justin Petucci: NCEMS Associate Director, ICDS RISE AI/ML Team Lead

Mentorship will include weekly meetings, presentations, methodological training, and regular feedback, fostering advanced analytical and computational skills and international collaboration. An NCEMS Staff Scientist (PhD-level, experienced in open science, team science, and data science) will provide direct mentorship, ensuring both technical and professional development. The junior researcher will gain research experience in a professional environment outside their primary lab, interact with NCEMS Working Groups, and develop transferable data science skills valuable for their thesis and future careers. Participation in NCEMS-sponsored events, such as the Annual Summit, training workshops, and hackathons, will further support professional growth. Authorship will be provided on research papers to which the researcher contributes in accordance with Working Group guidelines.

Funding Request: We are seeking a graduate student at 50% RA for this project that has the ability to work in person at NCEMS offices (4th floor in Benkovic Building)

PI ICDS Engagement: PI Hyebin Song actively collaborates with ICDS co-hire Ed O’Brien’s lab. Her previous collaboration with the ICDS RISE team (Justin Petucci) has resulted in a peer reviewed publication (Journal of molecular biology 436 (6), 168459). She plans to participate in ICDS events including the annual symposium, scientific seminars, and training workshops. Additionally, she utilizes the ICDS Roar Roar Collab system for computational research.