The Causal Club is a causality-focused reading group organized by young researchers from the Department of Computer Science of the University of Pisa.
Each talk provides an opportunity to explore and debate key concepts with the authors of recent significant works in the field.
Don't miss the next appointments! Join the mailing list, follow us on Bluesky, and add events to your calendar.
16:00-17:00
This work provides a comprehensive review of deep structural causal models (DSCMs), particularly focusing on their ability to answer counterfactual queries using observational data within known causal structures. It delves into the characteristics of DSCMs by analyzing the hypotheses, guarantees, and applications inherent to the underlying deep learning components and structural causal models, fostering a finer understanding of their capabilities and limitations in addressing different counterfactual queries. Furthermore, it highlights the challenges and open questions in the field of deep structural causal modeling. It sets the stages for researchers to identify future work directions and for practitioners to get an overview in order to find out the most appropriate methods for their needs.
15:00-16:00
Causal discovery can be computationally demanding for large numbers of variables. If we only wish to estimate the causal effects on a small subset of target variables, we might not need to learn the causal graph for all variables, but only a small subgraph that includes the targets and their adjustment sets. In this paper, we focus on identifying causal effects between target variables in a computationally and statistically efficient way. This task combines causal discovery and effect estimation, aligning the discovery objective with the effects to be estimated. We show that definite non-ancestors of the targets are unnecessary to learn causal relations between the targets and to identify efficient adjustments sets. We sequentially identify and prune these definite non-ancestors with our Sequential Non-Ancestor Pruning (SNAP) framework, which can be used either as a preprocessing step to standard causal discovery methods, or as a standalone sound and complete causal discovery algorithm. Our results on synthetic and real data show that both approaches substantially reduce the number of independence tests and the computation time without compromising the quality of causal effect estimations.
16:00-17:00
Algorithmic Recourse (AR) aims to provide users with actionable steps to overturn unfavourable decisions made by machine learning predictors. However, these actions often take time to be implemented (e.g., getting a degree can take years), and their effects may vary as the world evolves. Thus, it is natural to ask for recourse that remains valid in a dynamic environment. In this paper, we study the robustness of algorithmic recourse over time by casting the problem through the lens of causality. We demonstrate theoretically and empirically that (even robust) causal AR methods can fail over time except in the – unlikely – case that the world is stationary. Even more critically, counterfactual AR cannot be solved optimally unless the world is fully deterministic. To account for this, we propose a simple yet effective algorithm for temporal AR that explicitly accounts for time under the assumption of having access to an estimator approximating the stochastic process. Our simulations on synthetic and realistic datasets show how considering time produces more resilient solutions to potential trends in the data distribution.
16:00-17:00
Estimating treatment effects over time is crucial in various domains, including precision medicine, epidemiology, economics, and marketing. This paper introduces a novel approach to counterfactual regression over time, with a strong focus on long-term predictions. Unlike existing models such as Causal Transformer, our approach demonstrates the effectiveness of using RNNs for long-term forecasting, enhanced by Contrastive Predictive Coding (CPC) and Information Maximization (InfoMax). Prioritizing efficiency, we avoid computationally expensive transformers. By leveraging CPC, our method captures long-term dependencies in the presence of time-varying confounders. Notably, recent models have overlooked the importance of invertible representations, compromising identification assumptions. To address this, we employ the InfoMax principle, maximizing a lower bound of mutual information between sequence data and its representation. Our method achieves state-of-the-art counterfactual estimation results on both synthetic and real-world data, marking the first integration of Contrastive Predictive Encoding in causal inference.
16:00-17:00
"Sure, I am happy to generate a story for you: Captain Lyra stood at the helm of her trusty ship, the Maelstrom's Fury, gazing out at the endless sea. [...] Lyra's eyes welled up with tears as she realized the bitter truth - she had sacrificed everything for fleeting riches, and lost the love of her crew, her family, and herself." Although this story, generated by a large language model, is captivating, one may wonder -- how would the story have unfolded if the model had chosen "Captain Maeve" as the protagonist instead? We cannot know. State-of-the-art large language models are stateless -- they maintain no internal memory or state. Given a prompt, they generate a sequence of tokens as an output using an autoregressive process. As a consequence, they cannot reason about counterfactual alternatives to tokens they have generated in the past. In this work, our goal is to enhance them with this functionality. To this end, we develop a causal model of token generation that builds upon the Gumbel-Max structural causal model. Our model allows any large language model to perform counterfactual token generation at almost no cost in comparison with vanilla token generation, it is embarrassingly simple to implement, and it does not require any fine-tuning nor prompt engineering. We implement our model on Llama 3 8B-Instruct and Ministral-8B-Instruct and conduct a qualitative and a quantitative analysis of counterfactually generated text. We conclude with a demonstrative application of counterfactual token generation for bias detection, unveiling interesting insights about the model of the world constructed by large language models.
16:00-17:00
We study the differences arising from merging predictors in the causal and anticausal directions using the same data. In particular we study the asymmetries that arise in a simple model where we merge the predictors using one binary variable as target and two continuous variables as predictors. We use Causal Maximum Entropy (CMAXENT) as inductive bias to merge the predictors, however, we expect similar differences to hold also when we use other merging methods that take into account asymmetries between cause and effect. We show that if we observe all bivariate distributions, the CMAXENT solution reduces to a logistic regression in the causal direction and Linear Discriminant Analysis (LDA) in the anticausal direction. Furthermore, we study how the decision boundaries of these two solutions differ whenever we observe only some of the bivariate distributions implications for Out-Of-Variable (OOV) generalisation.
15:00-16:00
Causal abstraction (CA) theory establishes formal criteria for relating multiple structural causal models (SCMs) at different levels of granularity by defining maps between them. These maps have significant relevance for real-world challenges such as synthesizing causal evidence from multiple experimental environments learning causally consistent representations at different resolutions and linking interventions across multiple SCMs. In this talk Yorgos will present COTA, a method to learn abstraction maps from observational and interventional data without assuming complete knowledge of the underlying SCMs. In particular a multi-marginal Optimal Transport (OT) formulation was leveraged that enforces do-calculus causal constraints together with a cost function that relies on interventional information. COTA was thoroughly evaluated on both synthetic and real-world problems demonstrating clear advantages over non-causal independent and aggregated OT formulations. Additionally the method's efficiency was showcased as a data augmentation tool by comparing it to the state-of-the-art CA learning framework which assumes fully specified SCMs on a real-world downstream task.
15:00-16:00
Causal discovery from observational data is hindered by restrictive modelling hypotheses that may be unrealistic and hard to verify. In this talk, we will illustrate how score matching estimation enables causal discovery with observational data on broad classes of causal models, with linear and nonlinear mechanisms, even in the presence of latent confounders. We draw the connection between our findings and the general principles that underlie most of the known results on the identifiability of causal models from observational data. Then, we turn our attention to supervised learning-based approaches for causal discovery: a plethora of learning algorithms for the amortized inference of causal graphs have been recently proposed, accompanied by impressive empirical results. Yet, it is unclear when the output of these methods can be trusted, as they do not come with clear identifiability guarantees. We show that amortized causal discovery still needs to obey identifiability theory, but it also differs from classical methods in how the assumptions are formulated, trading more reliance on assumptions on the noise type for fewer hypotheses on the mechanisms.
15:00-16:00
There is a growing interest in providing explanations of machine learning models using human-understandable concepts. This integration of concepts can alleviate model opacity due to neural networks opening new venues to design interpretable and trustworthy models. However a critical problem is ensuring that machine learning models successfully encode the human concepts in their representations in a correct manner. Specifically when can we assign names to machine representations and how well do their concepts align with ours? I will demonstrate how Causality and Causal Representation Learning can help formalize this problem offering a clear definition of what can be referred to as human-machine alignment. Throughout the presentation I will showcase examples of both successes and failures in models designed to integrate concepts into their representations.
15:00-16:00
Causal graphical models encode causal relationships between variables, typically based on a directed acyclic graph (DAG). Structural causal models expand upon this by expressing the causal relationships as explicit functions, which allows for computing the effect of interventions and much more. We show that, in many common parameterizations using linear functions and additive noise, effects accumulate along the causal order. This property enables inferring the causal order from data simply by sorting by a suitable criterion. We introduce the concept of sortability to capture the magnitude of the phenomenon, and show that the accumulating effects can lead to unrealistic values and deterministic relationships. We discuss implications for existing results and future research based on the same model class.
15:00-16:00
We view causality both as an extension of probability theory and as a study of what happens when one intervenes on a system, and argue in favour of taking Kolmogorov's measure-theoretic axiomatisation of probability as the starting point towards an axiomatisation of causality. To that end, we propose the notion of a causal space, consisting of a probability space along with a collection of transition probability kernels, called causal kernels, that encode the causal information of the space. Our proposed framework is not only rigorously grounded in measure theory, but it also sheds light on long-standing limitations of existing frameworks including, for example, cycles, latent variables and stochastic processes.
15:00-16:00
Model interpretability plays a central role in human-AI decision-making systems. Ideally, explanations should be expressed using human-interpretable semantic concepts. Moreover, the causal relations between these concepts should be captured by the explainer to allow for reasoning about the explanations. Lastly, explanation methods should be efficient and not compromise the predictive task performance. Despite the recent rapid advances in AI explainability, as far as we know, no method yet fulfills these three desiderata. Indeed, mainstream methods for local concept explainability do not yield causal explanations and incur a trade-off between explainability and prediction accuracy. We present DiConStruct, an explanation method that is both concept-based and causal, which produces more interpretable local explanations in the form of structural causal models and concept attributions. Our explainer works as a distillation model to any black-box machine learning model by approximating its predictions while producing the respective explanations. Consequently, DiConStruct generates explanations efficiently while not impacting the black-box prediction task. We validate our method on an image dataset and a tabular dataset, showing that DiConStruct approximates the black-box models with higher fidelity than other concept explainability baselines, while providing explanations that include the causal relations between the concepts.
15:00-16:00
In this talk, I am going to present some results from our latest paper on the problem of sample efficient exploration in Model-Based Reinforcement Learning (MBRL). While most popular exploration methods in (MB)RL are “reactive” in nature, and thus inherently sample inefficient, we discuss the benefits of an “active” approach, where the agent selects actions to query novel states in a data-efficient way, provided that one can guarantee that regions of high epistemic, and not aleatoric, uncertainty are targeted. In order to ensure this, we consider classes of exploration bonuses based on “Bayesian surprise”, and demonstrate their desirable properties. The rest of the talk will be dedicated to presenting our latest, early stage, project that aims at extending this previous work to develop a “Causal Active Planning” framework for sample efficient and robust MBRL, where the goal is to expand this class of exploration bonuses to capture epistemic uncertainty around a (Local) Causal Graphical Model of the transition dynamics. By incorporating these measures as causal “intrinsic rewards” within a MBRL algorithm, we can enable the agent to (actively) learn a causal transition dynamics model. If this causal dynamics model is then consequently utilized for policy training, it should theoretically ensure that the learnt policy is more robust and generalizable (Richens & Everitt, 2024).
15:00-16:00
In this paper, we focus on estimating the causal effect of an intervention over time on a dynamical system. To that end, we formally define causal interventions and their effects over time on discrete-time stochastic processes (DSPs). Then, we show under which conditions the equilibrium states of a DSP, both before and after a causal intervention, can be captured by a structural causal model (SCM). With such an equivalence at hand, we provide an explicit mapping from vector autoregressive models (VARs), broadly applied in econometrics, to linear, but potentially cyclic and/or affected by unmeasured confounders, SCMs. The resulting causal VAR framework allows us to perform causal inference over time from observational time series data. Our experiments on synthetic and real-world datasets show that the proposed framework achieves strong performance in terms of observational forecasting while enabling accurate estimation of the causal effect of interventions on dynamical systems. We demonstrate, through a case study, the potential practical questions that can be addressed using the proposed causal VAR framework.
15:00-16:00
Causal opacity denotes the difficulty in understanding the hidden causal structure underlying a deep neural network's (DNN) reasoning. This leads to the inability to rely on and verify state-of-the-art DNN-based systems especially in high-stakes scenarios. For this reason, causal opacity represents a key open challenge at the intersection of deep learning, interpretability, and causality. This work addresses this gap by introducing Causal Concept Embedding Models (Causal CEMs), a class of interpretable models whose decision-making process is causally transparent by design. The results of our experiments show that Causal CEMs can: (i) match the generalization performance of causally-opaque models, (ii) support the analysis of interventional and counterfactual scenarios, thereby improving the model's causal interpretability and supporting the effective verification of its reliability and fairness, and (iii) enable human-in-the-loop corrections to mispredicted intermediate reasoning steps, boosting not just downstream accuracy after corrections but also accuracy of the explanation provided for a specific instance.
14:00-15:00
The structural vector autoregressive (SVAR) model is the main tool that macroeconomists use to address empirical questions about the effects of policy interventions. I will review the SVAR model, making comparisons with the structural causal model, and I will address the question whether it is equipped to warrant causal claims. Specifically, I will present the model identification problem and spot the difference with the problem of causal inference.
15:00-16:00
Perception occurs when two individuals interpret the same information differently. Despite being a known phenomenon with implications for bias in decision-making, as individual experience determines interpretation, perception remains largely overlooked in machine learning (ML) research. Modern decision flows, whether partially or fully automated, involve human experts interacting with ML applications. How might we then, e.g., account for two experts that interpret differently a deferred instance or an explanation from a ML model? To account for perception, we first need to formulate it. In this work, we define perception under causal reasoning using structural causal models (SCM). Our framework formalizes individual experience as additional causal knowledge that comes with and is used by a human expert (read, decision maker). We present two kinds of causal perception, unfaithful and inconsistent, based on the SCM properties of faithfulness and consistency. Further, we motivate the importance of perception within fairness problems. We illustrate our framework through a series of decision flow examples involving ML applications and human experts.
15:00-16:00
The need for modelling causal knowledge at different levels of granularity arises in several settings. Causal Abstraction provides a framework for formalizing this problem by relating two Structural Causal Models at different levels of detail. Despite increasing interest in applying causal abstraction, e.g. in the interpretability of large machine learning models, the graphical and parametrical conditions under which a causal model can abstract another are not known. Furthermore, learning causal abstractions from data is still an open problem. In this work, we tackle both issues for linear causal models with linear abstraction functions. First, we characterize how the low-level coefficients and the abstraction function determine the high-level coefficients and how the high-level model constrains the causal ordering of low-level variables. Then, we apply our theoretical results to learn high-level and low-level causal models and their abstraction function from observational data. In particular, we introduce Abs-LiNGAM, a method that leverages the constraints induced by the learned high-level model and the abstraction function to speedup the recovery of the larger low-level model, under the assumption of non-Gaussian noise terms. In simulated settings, we show the effectiveness of learning causal abstractions from data and the potential of our method in improving scalability of causal discovery.
15:00-16:00
Deferring systems extend supervised Machine Learning (ML) models with the possibility to defer predictions to human experts. However, evaluating the impact of a deferring strategy on system accuracy is still an overlooked area. This paper fills this gap by evaluating deferring systems through a causal lens. We link the potential outcomes framework for causal inference with deferring systems. This allows us to identify the causal impact of the deferring strategy on predictive accuracy. We distinguish two scenarios. In the first one, we can access both the human and the ML model predictions for the deferred instances. In such a case, we can identify the individual causal effects for deferred instances and aggregates of them. In the second scenario, only human predictions are available for the deferred instances. In this case, we can resort to regression discontinuity design to estimate a local causal effect. We empirically evaluate our approach on synthetic and real datasets for seven deferring systems from the literature.