This vignette describes the d-prime; a scoring method introduced by Miller (1996).

Dataset

Load the included Go/No Go dataset and inspect its documentation.

data("ds_gng", package = "splithalfr")
?ds_gng

Relevant variables

The columns used in this example are:

condition, 0 = go, 2 = no go
response. Correct (1) or incorrect (0)
rt. Reaction time (seconds)
participant. Participant ID

Counterbalancing

The variables condition and stim were counterbalanced. Below we illustrate this for the first participant.

ds_1 <- subset(ds_gng, participant  == 1)
table(ds_1$condition, ds_1$stim)

Scoring the Go/No Go

Scoring function

The scoring function receives the data from a single participant. For the proportion of hits and false alarms, it calculates their quantiles given a standard normal distribution. Extreme values are adjusted for via the log-linear approach (Hautus, 1995).

fn_score <- function(ds) {
  n_hit <- sum(ds$condition == 0 & ds$response == 1)
  n_miss <- sum(ds$condition == 0 & ds$response == 0)
  n_cr <- sum(ds$condition == 2 & ds$response == 1)
  n_fa <- sum(ds$condition == 2 & ds$response == 0)
  p_hit <- (n_hit + 0.5) / ((n_hit + 0.5) + n_miss + 1)
  p_fa <- (n_fa + 0.5) / ((n_fa + 0.5) + n_cr + 1)  
  return (qnorm(p_hit) - qnorm(p_fa))
}

Scoring a single participant

Let’s calculate the d-prime score for the participant with UserID 1.

fn_score(subset(ds_gng, participant == 1))

Scoring all participants

To calculate the d-prime score for each participant, we will use R’s native by function and convert the result to a data frame.

scores <- by(
  ds_gng,
  ds_gng$participant,
  fn_score
)
data.frame(
  participant = names(scores),
  score = as.vector(scores)
)

Estimating split-half reliability

Calculating split scores

To calculate split-half scores for each participant, use the function by_split. The first three arguments of this function are the same as for by. An additional set of arguments allow you to specify how to split the data and how often. In this vignette we will calculate scores of 1000 permutated splits. The trial properties condition and stim were counterbalanced in the Go/No Go design. We will stratify splits by these trial properties. See the vignette on splitting methods for more ways to split the data.

The by_split function returns a data frame with the following columns:

participant, which identifies participants
replication, which counts replications
score_1 and score_2, which are the scores calculated for each of the split datasets

Calculating the split scores may take a while. By default, by_split uses all available CPU cores, but no progress bar is displayed. Setting ncores = 1 will display a progress bar, but processing will be slower.

split_scores <- by_split(
  ds_gng,
  ds_gng$participant,
  fn_score,
  replications = 1000,
  stratification = paste(ds_gng$condition, ds_gng$stim)
)

Calculating reliability coefficients

Next, the output of by_split can be analyzed in order to estimate reliability. By default, functions are provided that calculate Spearman-Brown adjusted Pearson correlations (spearman_brown), Flanagan-Rulon (flanagan_rulon), Angoff-Feldt (angoff_feldt), and Intraclass Correlation (short_icc) coefficients. Each of these coefficient functions can be used with split_coef to calculate the corresponding coefficients per split, which can then be plotted or averaged via a simple mean. A bias-corrected and accelerated bootstrap confidence interval can be calculated via split_ci. Note that estimating the confidence interval involves very intensive calculations, so it can take a long time to complete.

# Spearman-Brown adjusted Pearson correlations per replication
coefs <- split_coefs(split_scores, spearman_brown)
# Distribution of coefficients
hist(coefs)
# Mean of coefficients
mean(coefs)
# Confidence interval of coefficients
split_ci(split_scores, spearman_brown)

Go/No Go - D-prime