Through fairness visualizations allow for first investigations into possible fairness problems in a dataset. In this vignette we will showcase some of the pre-built fairness visualization functions. All the methods showcased below can be used together with objects of type BenchmarkResult
, ResampleResult
and Prediction
.
For this example, we use the adult_train
dataset. Keep in mind all the datasets from mlr3fairness
package already set protected attribute via the col_role
“pta”, here the “sex” column.
We choose a random forest as well as a decision tree model in order to showcase differences in performances.
task = tsk("adult_train")$filter(1:5000)
learner = lrn("classif.ranger", predict_type = "prob")
learner$train(task)
predictions = learner$predict(tsk("adult_test")$filter(1:5000))
Note, that it is important to evaluate predictions on held-out data in order to obtain unbiased estimates of fairness and performance metrics. By inspecting the confusion matrix, we can get some first insights.
We furthermore design a small experiment allowing us to compare a random forest (ranger
) and a decision tree (rpart
). The result, bmr
is a BenchmarkResult
that contains the trained models on each cross-validation split.
design = benchmark_grid(
tasks = tsk("adult_train")$filter(1:5000),
learners = lrns(c("classif.ranger", "classif.rpart"),
predict_type = "prob"),
resamplings = rsmps("cv", folds = 3)
)
bmr = benchmark(design)
#> INFO [18:32:30.110] [mlr3] Running benchmark with 6 resampling iterations
#> INFO [18:32:30.115] [mlr3] Applying learner 'classif.ranger' on task 'adult_train' (iter 1/3)
#> INFO [18:32:31.113] [mlr3] Applying learner 'classif.ranger' on task 'adult_train' (iter 2/3)
#> INFO [18:32:32.268] [mlr3] Applying learner 'classif.rpart' on task 'adult_train' (iter 2/3)
#> INFO [18:32:32.356] [mlr3] Applying learner 'classif.ranger' on task 'adult_train' (iter 3/3)
#> INFO [18:32:33.343] [mlr3] Applying learner 'classif.rpart' on task 'adult_train' (iter 3/3)
#> INFO [18:32:33.408] [mlr3] Applying learner 'classif.rpart' on task 'adult_train' (iter 1/3)
#> INFO [18:32:33.548] [mlr3] Finished benchmark
By inspecting the prediction density plot we can see the predicted probability for a given class split by the protected attribute, in this case "sex"
. Large differences in densities might hint at strong differences in the target between groups, either directly in the data or as a consequence of the modeling process. Note, that plotting densities for a Prediction
requires a Task
since information about protected attributes is not contained in the Prediction
.
We can either plot the density with a Prediction
or use it with a BenchmarkResult
/ ResampleResult
:
In practice, we are most often interested in a trade-off between fairness metrics and a measure of utility such as accuracy. We showcase individual scores obtained in each cross-validation fold as well as the aggregate (mean
) in order to additionally provide an indication in the variance of the performance estimates.
An additional comparison can be obtained using compare_metrics
. It allows comparing Learner
s with respect to multiple metrics. Again, we can use it with a Prediction
:
or use it with a BenchmarkResult
/ ResampleResult
:
The required metrics to create custom visualizations can also be easily computed using the $score()
method.
bmr$score(msr("fairness.tpr"))
#> uhash nr task task_id
#> 1: 7c062021-a64d-4c42-b67a-3bfc22867f0a 1 <TaskClassif[50]> adult_train
#> 2: 7c062021-a64d-4c42-b67a-3bfc22867f0a 1 <TaskClassif[50]> adult_train
#> 3: 7c062021-a64d-4c42-b67a-3bfc22867f0a 1 <TaskClassif[50]> adult_train
#> 4: 401bd12f-8ce9-484c-b38e-11a142416bdb 2 <TaskClassif[50]> adult_train
#> 5: 401bd12f-8ce9-484c-b38e-11a142416bdb 2 <TaskClassif[50]> adult_train
#> 6: 401bd12f-8ce9-484c-b38e-11a142416bdb 2 <TaskClassif[50]> adult_train
#> learner learner_id resampling resampling_id
#> 1: <LearnerClassifRanger[38]> classif.ranger <ResamplingCV[20]> cv
#> 2: <LearnerClassifRanger[38]> classif.ranger <ResamplingCV[20]> cv
#> 3: <LearnerClassifRanger[38]> classif.ranger <ResamplingCV[20]> cv
#> 4: <LearnerClassifRpart[38]> classif.rpart <ResamplingCV[20]> cv
#> 5: <LearnerClassifRpart[38]> classif.rpart <ResamplingCV[20]> cv
#> 6: <LearnerClassifRpart[38]> classif.rpart <ResamplingCV[20]> cv
#> iteration prediction fairness.tpr
#> 1: 1 <PredictionClassif[20]> 0.04645741
#> 2: 2 <PredictionClassif[20]> 0.09633812
#> 3: 3 <PredictionClassif[20]> 0.07391266
#> 4: 1 <PredictionClassif[20]> 0.07199505
#> 5: 2 <PredictionClassif[20]> 0.08552081
#> 6: 3 <PredictionClassif[20]> 0.06530131