Tutorial to the FASI Package in R

Implementing the Fair Adjusted Selective Inference (FASI) procedure

This is an R package for implementing the FASI method as described in the paper “A Fairness-Adjusted Selective Inference Framework ForClassification” by Bradley Rava, Wenguang Sun, Gareth M. James, and Xin Tong.

Introduction

We consider a common classification setting where one wishes to make automated decisions for people from a variety of different protected groups. When controlling the overall false selection rate of a classifier, this error rate can unfairly be concentrated in one protected group instead of being evenly distributed among all of them. To address this, we develop a method called Fair Adjusted Selective Inference (FASI) that can simultaneously control the false selection rate for each protected group and the overall population at any user-specified level. This package is a user friendly way of implimenting the FASI algorithm.

This package will be extensively reworked soon. Please keep an eye out for version 2 when it is released.

What does the package do?

This R package implements the FASI procedure.

How is the package structured?

The package has two main functions, “fasi” and “predict”. They are both described below.

\(\textbf{fasi:}\) The fasi function is used to create a ranking score model model that will be used in the classification step. The inputs for this function are: - observed_data: This is a data set of previous observations with their true classes. The fasi function will automatically split this data according to parameter split_p which the user will define.

\(\textbf{predict:}\) The predict function should be used after the fasi function has been called. Even if you are specifying your own ranking scores! With a fasi_object, the predict function will estimate the r-scores for each observation in your test data set (new observations) and then classify each individual according to the specified thresholds. There are a few options the user can specify for how this is done. - fasi_object: A fitted fasi object that can be obtained from the “fasi” function.

Installing the package

The FASI package is available on github and can be installed through the “devtools” package.

library(devtools)
install_github("bradleyrava/fasi@master")

Once installed, you can load the package and functions into your R session with the following command

library(fasi)

Example

For guidance and reproducibility, this package includes the 2018 census data and compas algorithm data described in the paper. The original unedited versions can be found on ProPublica’s github and at UCI’s machine learning repository.

https://github.com/propublica/compas-analysis/

https://archive.ics.uci.edu/ml/datasets/adult

Let’s load the census data.

z_full <- fasi::adult

## Subset the data so the package runs faster
z <- z_full[sample(1:nrow(z_full), 0.1*nrow(z_full)), ]

For this example, I will use logistic regression for computing the ranking scores.

Using the fasi package is easy. I will first randomly split my data into an observed and testing data set and then call the fasi function.

obs_rows <- sample(1:nrow(z), 0.5*nrow(z))
test_rows <- (1:nrow(z))[-obs_rows]

observed_data <- z[obs_rows,]
test_data <- z[test_rows,]

Now that we have an observed and test data set, I will call the fasi function and specify that I want to use logistic regression.

model_formula <- as.formula("y ~ age")
fasi_object <- fasi::fasi(observed_data = observed_data, model_formula = model_formula, alg = "logit")

The fasi object returns a lot of useful information to us. Perhaps most importantly it gives us the model fit.

fasi_object$model_fit
## 
## Call:  stats::glm(formula = model_formula, family = "binomial", data = train_data_logit)
## 
## Coefficients:
## (Intercept)          age  
##    -2.96174      0.04686  
## 
## Degrees of Freedom: 813 Total (i.e. Null);  812 Residual
## Null Deviance:       925.2 
## Residual Deviance: 863.8     AIC: 867.8

Let’s now use the fasi object to classify the observations in our test data set. For this example, I will use alpha_1=alpha_2=0.1. In this data set, I will also use “sex” as the protected group. Since I did not change this variable name to “a”, I will tell the predict function what the column name is.

fasi_predict <- predict(object = fasi_object, test_data = test_data, alpha_1 = 0.1, alpha_2 = 0.1, ptd_group_var = "sex")
head(fasi_predict$r_scores)
##     r_score1  r_score2
## 1 0.08061420 0.5920000
## 2 0.13353162 0.8039003
## 3 0.27688400 0.5567686
## 4 0.30620155 0.8360656
## 5 0.09752322 0.5836694
## 6 0.01395288 0.8356950
head(fasi_predict$classification)
## [1] 1 0 0 0 1 1

That’s it! You can use the r_scores / classifications directly.

If you wanted to provide your own ranking scores, you would only need to alter the process I described above slightly. For this example I will produce random ranking scores. However, you should strive to estimate better ones if you pick this approach!

## Random ranking scores
calibrate_scores <- rnorm(nrow(observed_data))
test_scores <- rnorm(nrow(test_data))

fasi_object <- fasi::fasi(observed_data = observed_data, alg = "user-provided")
fasi_predict <- predict(object = fasi_object, test_data = test_data, alpha_1 = 0.1, alpha_2 = 0.1, ptd_group_var = "sex",
                        ranking_score_calibrate = calibrate_scores, ranking_score_test = test_scores)

Future work

This package is currently a proof of concept and it can be useful for practitioners looking to quickly impliment the fasi procedure. In version 2, this package will be much faster and it will allow for a cross validation method that will eliminate the need for a training / calibration testing data set. It will also offer more diagnostic / plotting tools.

Please let me know if there is any functionality you would like to see added to version 2.

Further questions or comments?

If you have any questions about this package or notice any bugs, please feel free to email Bradley Rava at