Weighted regression-based norming

Automatic post-stratification through weighted regression-based norming in cNORM

Representativeness of the norm sample is essential for the estimation of valid norm scores. To achieve this, random sampling is usually applied. But even if there are no systematic biases in data collection, the resulting sample might deviate from the population composition. The cNORM R package offers functionality to integrate sampling weights into the norming process and, therefore, to reduce negative effects of non-representative norm samples on the norm score quality. For this purpose, the so-called raking (= iterative proportional fitting) was integrated in cNORM, which allows post-stratifying the used norm sample with respect to one or more stratification variables (SVs) for given population marginals of the used SVs.

Problem of non-representative norm samples

Non-representative norm samples, i.e., norm samples not representing the target population with respect to one or more relevant stratification variables (Kruskal & Mosteller, 1979), can reduce the quality of norm scores of tests. This is especially true for SVs influencing a person’s true latent ability. For example, not considering parents’ educational background in estimating norm scores for an intelligence test for children may result in a general tendency of over- or underestimation of norm scores, and, therefore, in an over- or underestimation of a child’s true intelligence (Hernandez et al., 2017). Since norm scores are often used as criterion for far-reaching decisions, like school placement or in the diagnosis of learning disabilities (Gary & Lenhard, 2021; Lenhard, Lenhard & Gary, 2019; Lenhard & Lenhard, 2021), biased norm scores can ultimately lead to disadvantages for the individuals being examined. Therefore, it’s necessary to use countermeasures, as for example sample weighting methods, to reduce the non-representativeness of norm samples.

Post-stratification through iterative proportional fitting

Raking, also called iterative proportional fitting, is a post-stratification approach targeted to enhance sample representativeness with respect to two or more stratification variables . For this purpose, sample weights are computed for every case in the norm sample based on the ratio between the proportion of the corresponding strata in the target population and the proportion in the actual norm sample (Lumley, 2011). The procedure can be described as an iterative post-stratification with respect to one variable in each step. For example, let’s assume a target population containing 49% female as well as 51% male persons, while the resulting norm sample contains 45% female and 55% male subjects. To enhance the representativeness of the norm sample with respect to the SV sex (female/male), every single female case would be weighted with \(w_{female}=\frac{49\%}{45\%}=1.09\) and every single male case with \(w_{male}=\frac{55\%}{51\%}=0.93\). For stratifying a norm sample with respect to two or more variables, for example sex(female/male) and education(low/medium/high), the before described adaptation is applied several times regarding the marginals of one variable by time iteratively. For example, if the weights are adapted with respect to the variable sex first, the weights would be adapted regarding education in the second step. Since the weights no longer represent the population with respect to variable sex after the second step, the weights are computed to SV sex in the third step respectively to education in the fourth step and so on until the corresponding raking weights are converged. Finally, the resulting raking weights respectively the weighted norm sample represents the target population with respect to the marginal proportions of the used SVs. Each case is assigned with an according weight in a way that the proportions of the strata in the norm sample aligns with the composition of the representative population.

Integration of raking weights in regression-based norming in cNORM

The integration of raking weights in cNORM is accomplished in three steps.

Computation and standardization of raking weights
Initial ranking of test raw scores using standardized raking weights with weighted percentile estimation
Regression-based norming with standardized regression weights

Step 1: Computation and standardization of raking weights

Raking weights are computed regarding the proportions of the SVs in the target population and the actual norm sample. Afterwards, the resulting raking weights are standardized by dividing every weight by the smallest resulting raking weight, i.e., the smallest weight is set to 1.0, while the ratio between one weight and each other remains the same. Consequently, underrepresented cases in the sample are weighted with a factor larger 1.0. To compute the weights, please provide a data frame with three columns to specify the population marginals. The first column specifies the stratification variable, the second the factor level of the stratification variable and the third the proportion for the representative population. The function ‘computeWeights()’ is used to retrieve the weights. The original data and the marginals have to be passed as function parameters.

Step 2: Weighted percentile estimation

Secondly, the norm sample is ranked with respect to the raking weights using weighted percentile. This step is the actual start of the further regression-based norming approach and it is automatically applied in the ‘cnorm()’ function, as soon as weights are specified.

Step 3: Regression-based norming with standardized regression weights

Finally, the standardized raking weights are used in the weighted best-subset regression to obtain an adequate norm model. While the former steps can be seen as kind of data preparation, the computation of the regression-based norm model represents the actual norming process, since the resulting regression model is used for the actual mapping between achieved raw score and assigned norm score. By using the standardized raking weights in weighted regression, an overfit of the regression model with respect to overrepresented data points should be reduced. This third step is as well applied automatically when using the ‘cnorm()’ function.

Example

In the following, the usage of raking weights in regression-based norming with cNORM is illustrated in detail based the on a not representative norm sample for the German version of the Peabody Picture Vocabulary Test (PPVT-IV)

library(cNORM)
# Assign data to object norm.data
norm.data <- ppvt
head(norm.data)
#>      age sex migration region raw    group
#> 1 2.5971   1         0   west 120 3.160655
#> 2 2.5993   1         0   west  67 3.160655
#> 3 2.6241   1         0   west  23 3.160655
#> 4 2.8622   1         0  south  50 3.160655
#> 5 2.8764   1         0  south  44 3.160655
#> 6 2.9308   1         0   west  55 3.160655

For the post-stratification, we need population marginals for the relevant stratification variables as a data frame, with each level of each stratification variable in a row. The data frame must contain the names of the SVs (column 1), the single levels (column 2) and the corresponding proportion in the target population (column 3).

# Generate population marginals
marginals <- data.frame(var = c("sex", "sex", "migration", "migration"),
                             level = c(1,2,0,1),
                             prop = c(0.51, 0.49, 0.65, 0.35))
head(marginals)
#>         var level prop
#> 1       sex     1 0.51
#> 2       sex     2 0.49
#> 3 migration     0 0.65
#> 4 migration     1 0.35

To caclulate raking weights, the cNORM’s ‘computeWeights()’ function is used, with the norm sample data and the population marginals as function parameters.

weights <- computeWeights(data = norm.data, population.margins = marginals)
#> Raking converged normally after 3 iterations.

Using the ‘cnorm()’ function passing the raking weights by function parameter ‘weights’, the intial weighted ranking and the actual norming process is started.

norm.model <- cnorm(raw = norm.data$raw, group = norm.data$group,
                    weights = weights)

The resulting model contains four predictors with a RMSE of 3.54212.

summary(norm.model)
#> Final solution: 6 terms
#> R-Square Adj. = 0.990042
#> Final regression model: raw ~ L2 + L1A1 + L1A2 + L2A1 + L2A3 + L4A1
#> Regression function: raw ~ -92.16983481 + (0.0195782225*L2) + (1.327109958*L1A1) + (-0.03851643094*L1A2) + (-0.01236535515*L2A1) + (1.320696396e-05*L2A3) + (3.158314092e-07*L4A1)
#> Raw Score RMSE = 3.60335
#> Post stratification was applied. The weights range from 1 to 1.415 (m = 1.116, sd = 0.182).

Moreover, the percentile plot reveals no hints on model violation, like intersecting percentile curves. It reaches a high multiple R2 with only few terms.

plot(norm.model, "subset")
plot(norm.model, "norm")

Caveats and recommendation for use

We extensively simulated biased distributions and assessed, if our approach can mitigate the effects of unrepresentative samples. cNORM itself already corrects for several types of sampling eror, namely if deviations occur in specific age groups or if joint probabilities of stratification variables are unbalanced (while preserving the marginals). Weighted Continuous Norming as well works very well in most, but not all use cases. Please note the following:

Non-representativeness in most cases leads to (moderately) increased error of the normed scores. It is - of course - always better to ensure the highest feasible degree of representativeness in the data collection.
The data collection should be as random as possible.
In most but not in all cases, Weighted Continuous Norming reduces negative effects of non-representative norm samples. If the mean of the standardized weights exceeds a value of \(m_{weights}=2\), this is an indication to rather not use weighting.
With cNORM, representativeness need not necessarily be established in every single age group. If the marginals are more or less correct, weighting is unnecessary.
Only use stratification for variables with considerable influence on the dependent variable.
If available, the probabilities of cross-classifications of the stratification variables can be used. You can recode several variables into one and directly specify the according population marginals (especially in combination with the next point).
Avoid too many stratification variables with many fine-grained levels. This leads to high weights in specific subgroups. Rather combine different levels of stratification variables, if the according subgroups do not differ in the outcome variable.