Development of a Web-Based Platform for Structured CryoEM Data Collection and Metadata Management (Faculty/Junior Researcher Collaboration Opportunity)

Development of a Web-Based Platform for Structured CryoEM Data Collection and Metadata Management

PI: Jean-Paul Armache (Biochemistry and Molecular Biology)

Apply as Junior Researcher 

Team Members

• Mike Carnegie, Systems Administrator, The Huck Institutes of the Life Sciences Co-supervises project, provides networking and data-standards know-how.

• Dr. Sung Hyun Cho, Co-Director, Cryo-Electron Microscopy Core Facility, Huck Institutes of the Life Sciences; Associate Research Professor, Department of Biochemistry and Molecular Biology Provides domain expertise in cryoEM instrumentation and data quality assurance.

• Dr. Wen Jiang, Professor of Biochemistry and Molecular Biology, Dorothy Foehr Huck and J. Lloyd Huck Chair in Structural Biology Offers specialized knowledge in cryoEM software design

Departments and Units

• Eberly College of Science

• Huck Institute of the Life Sciences

Level of Effort

• Two semesters at 8 hours a week are appropriate to design, develop, and test the website.

Plan for Funding Tuition or Remainder of Salary

• Tuition support will be provided through departmental RA funding.

• Additional salary coverage will be sought through pending instrumentation and infrastructure grants.

Project Description

Cryo-electron microscopy (cryoEM) is a game-changing technique that has thoroughly changed the way that academics and pharmaceutical companies approach characterization of critical systems. Due to continuous development, it keeps transforming structural biology, but standardized data capture remains a challenge, especially across diverse instrumentation and experimental setups. It has become a big-data issue to collect, store, maintain and describe all the data collected.

This project aims to develop a secure, user-friendly web-based platform to collect, store, and manage cryoEM data collection parameters in a complementary automated and manual approach. The automated procedures will fill the key metadata by extracting it from the data collected at the facility. The interface will further allow users to manually input key metadata such as pixel size, accelerating voltage (kV), detector type, dose rate, defocus range, microscope alignment notes, and sample preparation details.

The platform will serve two main purposes:

1. Standardizing metadata collection for reproducibility and data quality assessment.

2. Enabling easier data sharing and integration into downstream computational workflows and data repositories (e.g., EMDB, PDB).

We envision this tool being used by multiple research groups across the university and integrated with existing upstream and downstream cryoEM data pipelines. The project will involve designing the frontend interface, building a secure backend database, and ensuring the application compliance with institutional data governance standards.

Considering the dramatic LLM development in recent years, in due time, the information accrued and annotated in such a way could lead into improved data quality collection, and sample optimization.

Desired Skills or Expertise

We seek Junior Researchers with any of the following:

Responsibilities:

  • Develop a full-stack web application using Node.js, Django, or Flask
  • Design and implement backend logic for parsing and extracting metadata from XML files
  • Create and manage a relational database (e.g., PostgreSQL, MySQL) to store the extracted metadata
  • Develop a clean, intuitive UI for viewing/storing cryoEM data collection metadata
  • Implement user authentication via Active Directory (AD), ensuring secure and rolebased access to the application
  • Ensure code quality, scalability, and documentation

Requirements:

• Proficiency in Node.js or Django/Flask

• Strong experience working with XML parsing

• Familiarity with relational databases and ORMs

• Experience designing clean and responsive frontend UIs

• Familiarity with libraries/tools such as django-auth-ldap (Django) or passportldapauth (Node.js)

• Ability to work independently and meet deadlines

Other Requirements or Expectations

• Preference for graduate students with experience in software development or scientific computing

• Availability for biweekly project meetings

• Willingness to engage with cryoEM users to iterate on design

Specific Objectives for This Call

• Develop a working prototype of the data collection web interface

• Create and test a structured metadata schema suitable for cryoEM experiments

• Conduct user testing with researchers and core facility staff

• Prepare preliminary results and usability data for inclusion in future grant applications

• Submit a conference abstract or workshop presentation on project outcomes Medium to Long-Term Goals

• Submit an NSF data infrastructure proposal incorporating the tool

• Publish a methods or software note describing the platform

• Integrate with institutional data repositories and CryoEM facility workflows

Connection to ICDS Mission

This project advances ICDS’s mission by leveraging data science and software development to support interdisciplinary research, enhance reproducibility, and facilitate access to structured scientific data.

Engagement with ICDS

Mike Carnegie has participated in previous ICDS events and workshops and intends to engage further by mentoring Junior Researchers and integrating ICDS tools and best practices into their projects. The team is committed to regular engagement through seminars and interdisciplinary collaborations facilitated by ICDS.