Application of Transformer-Based Machine Learning Models to Whole Organism Computational Phenomics
PI: Keith Cheng (College of Medicine)
Level of Effort Requested: Two semesters at 25% Research Assistantship (RA)
Funding Plan: The tuition for the graduate student and remaining salary for any involved researchers will be covered by PI-controlled funds and grants currently secured by the PI’s department.
Project Description:
To enable the first 3-dimensional whole-organism phenotyping that encompasses all cell types and organ systems (whole organism system connectomics), we propose to develop and optimize Transformer-based machine learning (ML) models capable of automatically segmenting and labeling regions of interest — such as organs, nerves, individual cells, tumors, and other regions of interest — from high-resolution 3D micro-CT scans at unprecedented combinations of centimeter scale sub-micron resolutions. Transformer-based models are being increasingly used for medical image segmentation, often outperforming traditional CNNs (convolution neural networks). Their use has grown significantly in recent years due to their ability to model long-range dependencies and capture global context. Our datasets include complex, non-human biological models (Daphnia, Zebrafish, Axolotl, and Octopus) and human cancer sourced from the Cheng Lab, known for pioneering biological segmentation research that are at sub-micron granularity. Our experiments with existing transformer-based segmentation tools show that they do not handle such high-resolution and large-volume data. Existing segmentation models designed for human imaging data have been ineffective due to significant anatomical and morphological differences and the extreme resolution requirements and file sizes of our datasets, the largest being nearly several TBs in size. Processing a single very high-resolution medical image typically exceeds GPU memory capacity, making ML inference infeasible without aggressive patching or downsampling that require custom optimization. Also, transformers typically operate on fixed-size patches (e.g., 16×16 in ViT), which can blur fine-grained features and lead to the lose “boundary-level accuracy” necessary for sub-micron segmentations (e.g., cell membranes, synapses). Finally, upsampling in decoders (like in TransUNet) can introduce artifacts or smooth out tiny structures. Addressing these challenges necessitates specialized transformer architectures that will be built, tested and deployed in this project.
The anticipated outputs include i) the training and validation of high-performance transformer models, ii) publishing papers and foundational data to support subsequent research directions, such as temporal modeling of segmented tumor progression and generating integrated 2D and 3D anatomical and genetic atlases that can anchor molecular phenotyping, and iii) building and maintaining a database of labeled images. Longer term goals include i) performing temporal simulations of segmented tumor and disease progression, and ii) constructing statistical atlases linking detailed structural phenotypic data with their genetic, chemical, and disease causes.
Specific Computational/Data Science Expertise Needed:
● Experience developing and optimizing transformer- and CNN-based 3D segmentation models (e.g., UNETR, Vista-3D, MedSAM)
● GPU-based high-performance computing (HPC) optimization techniques
● Handling and processing large-scale 3D imaging data (microCT)
● Expertise in synthetic data generation (e.g., NVIDIA MAISI) to augment limited training datasets
Additional Requirements:
● Availability for regular weekly meetings
● Currently a graduate student who has passed comprehensive examinations, preferably from a relevant computational, biological, or engineering discipline
Connection to ICDS’s Mission:
This interdisciplinary project directly aligns with ICDS’s mission by synthesizing essential domain expertise in biological imaging with cutting-edge computational approaches, significantly advancing scientific understanding in both computational and life sciences, and providing tools of substantial societal importance in medical, biological research, and setting of public policy.
Team Engagement with ICDS:
The team has been actively participating in ICDS events, utilizing its computational resources, and collaborating with affiliated faculty to foster interdisciplinary research and computational advancement.