Predicting HIV care loss-to-follow-up using machine learning
PI: Kathryn Risher (Penn State College of Medicine)
PROJECT TEAM
- Kathryn Risher, PhD, Assistant Professor of Public Health Sciences, Division of Epidemiology, Penn State College of Medicine – Role: MPI
- Fenglong Ma, PhD, Assistant Professor of Information Sciences and Technology, Penn State University – Role: MPI – able to serve as a mentor to an ICDS researcher
- Tonya Crook, MD, Associate Professor of Medicine, Division of Infectious Diseases & Epidemiology, Penn State College of Medicine – Role: Co-I
- Cynthia Whitener, MD, Professor of Medicine, Division of Infectious Diseases & Epidemiology, Penn State College of Medicine – Role: Co-I
LEVEL OF EFFORT APPROPRIATE FOR THE PROJECT: 2 semesters at 25% RA
PLAN FOR FUNDING TUITION: Dr. Ma has several grants which can cover a graduate student’s tuition
DESCRIPTION OF PROPOSED PROJECT:
Immediate, lifelong initiation of antiretroviral therapy (ART) is recommended for all individuals diagnosed with HIV infection. Early initiation of ART is associated with reduced morbidity and mortality from HIV-related causes, near-normal life expectancy, and reduced morbidity from non-HIV-related causes including cardiovascular disease and non-AIDS-related cancers.
However, maintaining lifelong ART can be difficult for people living with HIV (PLHIV). PLHIV are recommended to attend clinic visits and have laboratory measures taken twice annually after care has been established. PLHIV are disproportionately sexual and racial/ethnic minorities, populations facing systemic barriers to healthcare. Overall, care engagement is complex and multidimensional, with behavioral, structural, cultural, and economic forces impacting whether a person maintains care.
At present, clinicians and case workers rely largely on experience and intuition to decide who to prioritize for enhanced care retention, such as services available through Ryan White Part B. A tool that utilizes all available patient data to predict loss-to-follow-up (LTFU) could provide actionable information to selectively target enhanced retention services in a setting of finite resources. A tool utilizing machine-learning (ML) would provide quantitative evidence based on longitudinal patient records to clinicians and case workers in decisionmaking to select appropriate interventions for individual patients.
Our project aims to develop an ML model to predict patient LTFU from HIV care, trained on data from PLHIV in the Penn State Comprehensive Care Clinic and TriNetX. Dr. Risher’s research group does not currently have experience applying multimodal ML algorithms to this data.
The ICDS student will develop an ML model based on data from the Penn State Comprehensive Care Clinic. We will analyze records from patients of the Penn State Comprehensive Care Program, which provides HIV care to around 750 rural and semi-urban people living with HIV, including outreach to rural areas in southcentral PA. Patient records and socio-demographic characteristics will be analyzed on a de-identified dataset provided from the CCOPAT database. The CCOPAT database is a comprehensive database maintained by the Penn State College of Medicine’s Department of Public Health Sciences and encompassing over 2,400 variables including intake data (e.g. gender, race/ethnicity, education level) and medical record data (e.g. laboratory test results, treatment and prescription history, visit dates).
Training effective machine learning models, especially deep learning models, needs a large amount of data. Thus, the number of patients in the CCOPAT database is too small for deep learning methods. To address this issue, we propose to extract auxiliary data from the TriNetX research database. TriNetX is a database of clinical patient data extracted from electronic medical records, including demographics, diagnoses, procedures, labs, and medications. The TriNetX research database encompasses a network of many healthcare providers, and has over 230,000 patients living with HIV in the database.
We will develop a predictive model of HIV LTFU trained simultaneously on the Penn State CCOPAT and TriNetX databases, while optimizing the information incorporated from the Penn State CCOPAT database to maximize local applicability.
We will define retention in care using the HRSA definition for care engagement (2+ visits per year separated by at least 90 days) to identify person-time that a patient is “engaged” and “disengaged” from care. We will explore the impact of loosening that definition, considering measures of: 1) viral suppression over time, 2) visit no-show rates, 3) time between visits, and 4) prescription refills.
We will also align the variables between the CCOPAT and TriNetX databases. Although there is no real care engagement information recorded in the TriNetX database, we can still use the same definitions of retention by analyzing the longitudinal electronic health records. Due to the input data containing multiple types of variables, in this project, we will explore novel multimodal machine learning approaches to analyze patient data. Particularly, we will design fairness-aware algorithms to mitigate systemic bias against racial/ethnic minorities. The primary limitation of the project will be the alignment of variables between the CCOPAT and TriNetX databases. Our ideal expectation is that the variable set of the TriNetX database is a subset of the CCOPAT database. The worst case will be that they share only a limited set of variables. If this is the case, we will explore data augmentation techniques to generate synthetic data to replace the TriNetX database.
The proposed project provides an efficient way to prioritize patients in need of enhanced HIV retention measures to prevent future LTFU. This work will provide data for a publishable manuscript in the short term, and preliminary data for future grant proposals testing and implementing this tool in clinics throughout the country.
SPECIFIC AREAS OF COMPUTATIONAL AND/OR DATA SCIENCE EXPERTISE: Machine learning to build prediction models from longitudinal data, ideally familiar with deep learning and multimodal machine learning algorithms
OTHER REQUIREMENTS OR EXPECTATIONS OF POTENTIAL ICDS JUNIOR RESEARCHERS: None
SPECIFIC OBJECTIVES FOR WORK SUPPORTED BY THIS CALL:
• Submitting a scientific paper demonstrating the developed ML algorithm and its performance
• Preliminary data to support a grant submission
MEDIUM TO LONG-TERM GOAL: Submit an R01 to NIAID to support the development of a user-friendly tool based on this ML algorithm, and test its efficacy on patient loss to follow-up
CONNECTION OF PROJECT TO ICDS’S MISSION: This work is strongly aligned with the ICDS mission to bring together multidisciplinary teams to bring domain expertise (Dr. Risher – HIV epidemiology, Dr. Crook and Dr. Whitener – clinical) with advanced computational and data science approaches (Dr. Ma and ICDS Junior Researcher) to conduct consequential research. The project will utilize novel ML methods to answer a question of great clinical utility: in settings of scarce resources, on whom do we focus HIV retention activities?
TEAM MEMBERS RECENT AND/OR PLANNED ENGAGEMENT WITH ICDS: As a faculty member at the College of Medicine in Hershey, Dr. Risher has had less opportunity than preferred to interact with ICDS. Dr. Risher will plan to engage with the ICDS through attending the ICDS Symposium in the fall. Additionally, Dr. Risher will reach out to CENSAI to find ways to get more involved. Dr. Ma is a faculty co-hire with ICDS.