Learning Linear Temporal Logic under Uncertainty for Sustainable Behavioral Change Interventions (Faculty/Rising Researcher Collaboration Opportunity) - PSU Institute for Computational and Data Sciences

Learning Linear Temporal Logic under Uncertainty for Sustainable Behavioral Change Interventions

PI: Romulo Meira Goes (Electrical Engineering)

1. Proposal fit. The proposed project is an interdisciplinary project combining the fields of artificial intelligence and data science to address an important problem in the field of behavioral change: which behavior patterns explain behavioral change? For this reason, this project is well aligned with ICDS’s mission of interdisciplinary research using advanced data science approaches to address important societal problems. The results of this project can catalyze the next generation of dynamically tailored health interventions that can: increase access to preventive health care, reduce associated disparities, and reduce public health burden associated with low physical activity. The ICDS Hub/Area (s) most relevant to this project are (1) AI and (2) Data Science. The proposed project is relevant to the Center for Socially Responsible Artificial Intelligence (CSRAI) since the proposed project is related to building data-driven models for interpretable behavioral patterns that lead to sustainable behavior changes.

2. Desired Expertise and Skills. Ideally, the student will have a Boolean and/or Linear Temporal Logic background. Regarding specific areas of computational and/or data science expertise, it is desired: (1) Experience with data cleaning; (2) Experience with optimizers, e.g., Z3, Mixed Integer Programming, Linear Programming; and, (3) Experience with libraries such as Pytorch/Tensorflow for building, training, and validating neural networks.

3. Project Objectives. The goal of this project is to build data-driven algorithms to robustly infer Linear Temporal Logic (LTL) tasks that explicitly account for data uncertainty. The objectives are (1) to engage with team of behavioral researchers that collaborate with Co-PI Lagoa; (2) to process and clean data from a physical activity and weight management study; (3) to generate preliminary models for extracting LTL formulas describing sustainable behavior change; and (4) to generate theoretical advances in learning LTL formulas from data with data uncertainty.

4. Long-term Project Goals. Based on the result of this project, we plan to submit a NSF proposal to the Smart Health and Biomedical Research in the Era of Artificial Intelligence and Advanced Data Science (SCH) call. This project will provide the preliminary results to showcase that wecan robustly learn LTL formulas from data with uncertainty.

5. Working with ICDS. Co-PI Lagoa is currently working with ICDS in two NIH funded projects. The first one is on the implementation of a micro intervention for improving physical activity, where the specific role of ICDS is in the implementation of the algorithms that identify the model of individual response to text messages. The second one is associated with data management, web interface design, and intervention implementation for a project where the aim it control the weight of pregnant women with a history of obesity or being overweight. The proposed project is related to Co-PI Lagoa’s current projects in collaboration with ICDS. Therefore, we plan to continue to leverage Co-PI Lagoa’s current engagement with ICDS. Moreover, we plan to engage with ICDS in the following manner: (1) to use the ROAR infrastructure to build ML models; and (2) to utilize the RISE to improve the scalability of the learning methods using parallel computing and APIs with LLMs.

6. Expected duration and budget. The expected duration of this project is an academic year. We request a student for two semesters at 25% RA. The expected average weekly effort for the student is 10-12 hours/week. If possible, we also request summer support for the student. PI Meira-Goes will use his startup funds to fund the student’s tuition.

7. Proposed activity. Adherence to behavior change is a major clinical and scientific challenge for mitigating risk and promoting health. Even though there have been advances in identifying techniques that initiate behavior change [1], it remains unclear what behavioral patterns lead to systematic and sustained changes in behavior. For example, to internalize a behavior leading to sustainable levels of physical activity, one might not have to achieve activity goals every day. Although large amounts of behavioral data can now be collected and made available, to the best of our knowledge, no tools can leverage this data to uncover interpretable behavioral patterns that lead to sustainable behavior changes. One of the main challenges for learning these behavioral patterns from data is data uncertainty. Humans do not live in a “vacuum”. The data collected not only pertains to the processes of interest, but it is also affected by all aspects of daily life. The current available learning methods are not suitable for uncovering these patterns [2–6]. The goal of this project is to develop learning algorithms to robustly infer Linear Temporal Logic (LTL) tasks that explicitly account for data uncertainty. In particular, given a data set with data uncertainty, our tool will infer LTL formulas that (1) take into account the data uncertainty; (2) have provable guarantees; and (3) provide explainable natural language statements. Linear Temporal Logic. This project will build upon the theory of Linear Temporal Logic (LTL) [7]. Figure 1 shows examples of traces and an LTL formula in a simplified scenario ofengaginginphysicalactivity. A trace represents the temporal data collected from a system. In Fig. 1, we provide two traces with the weekly sequence of daily step goals. Data Sequence of Daily Step Goals M T W T F S S Daily Steps Met Daily Steps Not Met Daily Steps Met Daily Steps Not Met Classification Weekly Step Goals Weekly Step Goals Met Learning Pattern Goals eventually met twice during weekdays Implies →Weekend goals met Weekly Step Goals Not Met Figure 1: Learning temporal behavior in physical activity LTL can capture the behavior of the dynamic evolution of the system through time [7]. For example, the pattern in Fig. 1 can be described an LTL formula. Given an LTL formula φ, we can verify if trace t satisfies this formula, e.g, the top trace in Fig. 1 satisfies the pattern formula. Research Tasks. Wewill carry out the following tasks during this project. • Data Uncertainty: When dealing with physical activity data, we are faced with many uncertainties and perturbations that affect the data collection. Given a week of observed data of daily steps, for example, there could be missing data, sick days, travel, etc. To quantify data uncertainty, we will survey the existing literature on distance metrics to understand (1) what types of uncertainty they model and (2) how to tailor the metric based on domain knowledge from the uncertainty. As an initial starting point, we will use the Hamming metric [8]. The researcher will collaborate with the ICDS RISE team to identify different uncertainty metrics in human behavior datasets. The output of this task will be metrics for specific types of data uncertainty and methods for tailoring them using domain knowledge from the data. • RobustLTLLearning: Given afinite sample of traces S = (P,N) divided into a set of positive examples P and negative examples N. Given S and an uncertainty metric d on S, we will develop learning techniques for automatically inferring a concise and interpretable LTL formula. Atechnical challenge is that many formulas can describe S and metric d, i.e., many formulas are consistent with the data. A natural question is: Which of the formulas better describes the sample and data uncertainty? We will investigate methods to quantify the fitness of the data and the learned formula [9]. Alternatively, we will investigate using Large Language Models (LLMs) to transform human explanations of desired behavior to possible formula candidates. The researcher will collaborate with ICDS RISE team to improve scalability of the algorithms by using parallelization and other techniques.