Counterfactual Explanations as Interventions in Latent Space (CEILS) is a methodology to generate counterfactual explanations capturing by design the underlying causal relations from the data, and at the same time to provide feasible recommendations to reach the proposed profile.
Riccardo Crupi, Alessandro Castelnovo, Daniele Regoli, Beatriz San Miguel Gonzalez
You can cite this work as:
@article{crupi2022counterfactual,
title={Counterfactual explanations as interventions in latent space},
author={Crupi, Riccardo and Castelnovo, Alessandro and Regoli, Daniele and San Miguel Gonzalez, Beatriz},
journal={Data Mining and Knowledge Discovery},
pages={1--37},
year={2022},
publisher={Springer}
}
To know more about this research work, please refer to our full paper (ArXiv).
Currently, CEILS has been published and/or presented in:
- 8th Causal Inference Workshop at UAI (causalUAI2021) (Video) by Riccardo Crupi
- Workshop on Explainable AI in Finance @ICAIF 2021 by Beatriz San Miguel
- ICAART - 14th International Conference on Agents and Artificial Intelligence @ICAART 2022 by Beatriz San Miguel
- Data Mining and Knowledge Discovery Springer 2022
Create a new environment based on Python 3.9 or 3.6 and install the requirements.
Python 3.9:
pip install -r requirements.txt
Python 3.6:
pip install -r requirements_py36.txt
CEILS workflow consists of the following steps:
Two main inputs are needed:
- Data. Prepare your dataset as a
pandas.DataFrame
for the features (X) and apandas.Series
for the target variable (Y) - Causal graph. Define your causal relations in a causal graph (G) using
networkx.DiGraph
.
Moreover, you need to define the features constrains (immutable, higher, lower) as a python dictionary, e.g. constraints_features = {"immutable": ["native-country"], "higher": ["age"]}
In the method create_structural_eqs(X, Y, G)
from core.build_struct_eq
the following steps are carried out:
- generation of structural equations (F) mapping U to X (F: U->X)
- computation of residuals (U)
- generation of original ML model to predict the target variable Y using the features dataset (C: X->Y)
- composition of the model in the latent space, integrating the previous components (C_causal(U) = C(F(U)))
Summary of the main variables and functions involved:
In the method create_counterfactuals(X, Y, G, F, C_causal, constraints_features, numCF=20)
from core.counter_causal_generator
, two set of counterfactual explanations will be generated based on:
- CEILS approach: uses the model in the latent space and a general counterfactual generator (Alibi in our current implementation)
- Baseline approach: uses the original model and the library Alibi
In the method calculate_metrics(X, Y, G, categ_features, constraints_features)
from core.metrics
, a set of metrics will be computed to compare the two sets of counterfactual explanations.
The metrics will be printed.
Currenly we have included 3 experiments based on public datasets and 2 experiments with synthetic data:
Experiments are under a specific folder in:
\experiments_run
We recommend to check the run_experiment.py
file to know the details and understand the whole CEILS workflow.
Synthetic datasets experiments are the best way to have a first understanding of our solution