causalteshap is a treatment effect analysis method that uses statistical hypothesis testing on Shapley values to find predictive and prognostic features.
UNDER CONSTRUCTION, PIP DOES NOT WORK YET
pip | pip install causalteshap |
---|
causalteshap is built to be intuitive, it supports various tree-based models for its S-learner for classification and regression tasks.
from causalteshap.causalteshap import CausalteShap
from catboost import CatBoostClassifier
X, T, y = ... # your classification dataset with treatment T
causalteshap_object = CausalteShap(
model=CatBoostClassifier(n_estimators=250, verbose=0, use_best_model=True,cat_features=["T"], meta_learner="S")
)
causalteshap_object.fit(X, T, y) # Fit the PowerShap feature selector
causalteshap_object.get_analysis_df() # Reduce the dataset to the selected features
- default mode
scikit-learn
compatible- supports various models
- Works for S-learners
- insights into the meta-learners features
Check out our benchmark results here.
Causalteshap uses an introduced noise feature and statistical tests to determine whether a feature is prognostic (i.e. only contributing to the output) or predictive (i.e. explaining the effect of a treatment). First, we train an S-learner on the data. Then we use Shapley values to explain the attribution of the features of the S-learner, into two cases, one where the treatment is set to 0 (
- If the feature is purely prognostic, then the
$S_0$ and$S_1$ distribution should have the same variance and same mean. This is done using both the Fligner and the student t-test with unequal variance. - When these distributions are different and the feature is truly prognostic, then
$|S(X_{noisy}|$ of a known noise variable$X_{noisy}$ that contains no information should be larger or equal compared to$|S_{0}(X)|$ . This covers the cases where these differences would be caused by noise. This is done using the Kolmogorov-Smirnov test.
If a feature passes both parts, i.e. significant result on both the KS-test and either the t-test or Fligner test (that tests whether either the mean and or variance is different), we determine the feature to be predictive. In the case any of the parts fail, the feature is flagged as prognostic.
If you use causalteshap in a scientific publication, we would highly appreciate citing us.
SCIENTIFIC PAPER UNDER REVIEW
👤 Jarne Verhaeghe
This package is available under the MIT license. More information can be found here: https://github.com/predict-idlab/causalteshap/blob/main/LICENSE