Federated estimation of causal effects from observational data
PI: Vasant Honavar
Estimation of causal effects from observational, and when available, experimental data) is a fundamental problem in artificial intelligence. Recent years have seen significant advances in the theoretical foundations of causal inference (including techniques for determining whether a causal effect of interest can be estimated from observations, identifying and controlling for potential confounders, generalizing causal effects across experimental settings, etc.) as well as methods, including those that take advantage of recent advances in representation learning using deep neural networks, for estimating causal effects from observational data However, with few exceptions, these methods assume that the causal effect estimation algorithm has centralized access to the entire observational data set. This is clearly not the case in many practical applications, e.g. causal modeling of health risks and health outcomes from large patient data sets (e.g., electronic health records). Privacy constraints prevent such data collected by independent entities, e.g., hospitals, at different locations to be aggregated at a central location. Furthermore, analyses of such data may be subject to access constraints. For example, some sites may be willing to provide access to summary statistics but not individual level raw data, or allow certain pre-approved analyses or computations to be executed on the data. Two general forms of data fragmentation across sites are of interest: horizontal fragmentation where each site collects measurements of the same variables for subsets of individuals in a population of interest; and vertical fragmentation where each site collects measurements for only a subset of variables for all individuals in a population of interest. While federated machine learning algorithms for predictive modeling from data have received significant attention over the past decade, there is limited work on such algorithms for causal effect estimation.
Aims
This work has the following aims:
• Develop a general framework for federated causal effect estimation from observational data under a variety of data access computational constraints imposed by the data sources
• Show how large families of existing causal effect estimation methods can be realized within the proposed framework In this framework, where each participating data source needs to respond to provide answers to restricted statistical queries or queries for gradients of an objective function to an aggregator that then combines them to obtain estimates of the causal effect of interest.
• Establish theoretical guarantees about the accuracy of causal effects estimated by the federated variant of each causal effect estimation algorithm relative to that obtained by its centralized counterpart which has access to the entire data set in a central location
• Theoretically and empirically assess how the performance of the federated methods varies under different conditions
• Apply the resulting methods to causal modeling of health risks from electronic health records data from multiple healthcare providers (through the OHDSI consortium) in collaboration with clinical collaborators
Long-term goal:
Develop robust federated algorithms for causal effect estimation for a broad range of applications in healthcare, education, public policy, etc. where it is generally neither feasible nor desirable to aggregate data collected by independent entities into a centralized repository.
Connection to ICDS Mission
This project contributes to advances in federated methods for causal effect estimation for a broad range of applications. Both the methods and the potential applications of the methods are of broad interest to ICDS missions in AI and Data Sciences.