Worked examples for phe

Function and inputs

phe_sii is an aggregate function, returning the slope index of inequality (SII) statistic for each grouping set in the inputted dataframe, with lower and upper confidence limits based on the specified confidence. The user can choose whether to return the Relative Index of Inequality (RII) via an optional argument in the function.

Each grouping set in the input data should have a row for each quantile, labelled with the quantile number, which contains the associated population, indicator value and 95% confidence limits. The user has the option to provide the standard error instead of the 95% confidence limits, in which case this is used directly rather than being calculated by the function.

The user can also specify the indicator type from “0 - default”, “1 - rate” or “2 - proportion”, where different transformations are applied to the input indicator value and confidence limits in the case of a rate or proportion. Examples are provided below for the three cases.

Example 1 - default (normal) distribution

The example below calculates the SII on some life expectancy data. This is assumed to have symmetric confidence intervals around the indicator values, so default standard error calculations would be done (involving no prior transformations).

The relevant fields in the input dataset are specified for the arguments quantile, population and value. value_type is kept equal to 0 (default), and the number of repetitions set to 100 for faster running of the function as a demonstration.

The standard error (se) has been provided here in the input dataset, meaning this will be used directly and lower/upper 95% confidence limits of the indicator values are not needed.

A warning is generated because one of the GeoCodes (E06000053) in the data does not contain a record for every quantile so no output is provided for this area.

# Pass data through SII function ---------------------------------------
LE_data_SII <- LE_data %>%
        # Group the input dataframe to create subgroups to calculate the SII for
        group_by(Sex, GeoCode) %>% 
        # Run SII function on grouped dataset
        phe_sii(quantile = Decile,
                population = Pop ,
                value = LifeExp,
                value_type = 0, # specify default indicator type
                confidence = c(0.95, 0.998),
                se = SE,
                repetitions = 1000,
                rii = FALSE,
                type = "full") # use smaller no. of repetitions e.g. for testing

## Warning in phe_sii(., quantile = Decile, population = Pop, value = LifeExp, :
## WARNING: some records have been removed due to incomplete or invalid data

# View first 10 rows of results
knitr::kable(head(LE_data_SII, 10))

Sex	GeoCode	sii	sii_lower95_0cl	sii_lower99_8cl	sii_upper95_0cl	sii_upper99_8cl	indicator_type	multiplier	CI_confidence	CI_method
1	E06000001	11.68886	9.357470	7.515727	14.30555	15.88435	normal	1	95%, 99.8%	simulation 1000 reps
1	E06000002	12.54785	10.593443	9.534902	14.65896	15.79386	normal	1	95%, 99.8%	simulation 1000 reps
1	E06000003	10.05084	7.863128	6.083585	12.24109	12.97810	normal	1	95%, 99.8%	simulation 1000 reps
1	E06000004	14.85223	13.098873	12.405061	16.60358	17.30226	normal	1	95%, 99.8%	simulation 1000 reps
1	E06000005	11.68095	9.359647	7.819998	14.17580	15.04620	normal	1	95%, 99.8%	simulation 1000 reps
1	E06000006	12.27526	10.084372	8.792492	14.43717	15.38920	normal	1	95%, 99.8%	simulation 1000 reps
1	E06000007	11.59893	9.964737	8.797193	13.02146	13.94837	normal	1	95%, 99.8%	simulation 1000 reps
1	E06000008	10.72510	8.684302	7.473429	12.78269	13.99949	normal	1	95%, 99.8%	simulation 1000 reps
1	E06000009	13.59387	11.580191	10.575276	15.69956	16.98929	normal	1	95%, 99.8%	simulation 1000 reps
1	E06000010	11.20094	9.542162	8.523556	12.70321	13.29905	normal	1	95%, 99.8%	simulation 1000 reps

Note that some areas are missing quantiles in the dataset, and these are subsequently excluded from the function output with a warning given.

Example 2 - rate

The example below calculates both the SII and RII on Directly Standardised Rate (DSR) data. The value_type argument is set to 1 to specify this indicator is a rate; this means a log transformation will be applied to the value, lower_cl and upper_cl fields before calculating the standard error.

As the number of repetitions is not specified, the function will run on the default 100,000. To return the RII, the rii argument is set to TRUE.

Finally, setting reliability_stat = TRUE will run additional sample sets of the SII/RII confidence limits and return a Mean Average Difference (MAD) value for each subgroup. See below for guidance on how to use this.

# Pass data through SII function ---------------------------------------
DSR_data_SII <- DSR_data %>%
        # Group the input dataframe to create subgroups to calculate the SII for
        group_by(Period) %>% 
        # Run SII function on grouped dataset
        phe_sii(quantile = Quintile,
                population = total_pop ,
                value = value,
                value_type = 1, # specifies indicator is a rate
                lower_cl = lowercl,
                upper_cl = uppercl,
                rii = TRUE, # returns RII as well as SII (default is FALSE)
                reliability_stat = TRUE) # returns reliability stats (default is FALSE)

# View results
knitr::kable(DSR_data_SII)

Period	sii	rii	sii_lower95_0cl	sii_upper95_0cl	rii_lower95_0cl	rii_upper95_0cl	sii_mad95_0	rii_mad95_0	indicator_type	multiplier	CI_confidence	CI_method
2010	-14.848683	0.8956758	-18.93222	-10.745829	0.8689753	0.9232955	0.0279676	0.0001565	rate	1	95%	simulation 1e+05 reps
2011	-13.528326	0.9039924	-17.62082	-9.479390	0.8768695	0.9316620	0.0160189	0.0001372	rate	1	95%	simulation 1e+05 reps
2012	-11.680045	0.9159389	-15.71852	-7.666307	0.8886199	0.9439328	0.0151905	0.0001136	rate	1	95%	simulation 1e+05 reps
2013	-10.544273	0.9232984	-14.51938	-6.575762	0.8960438	0.9514078	0.0221796	0.0001543	rate	1	95%	simulation 1e+05 reps
2014	-10.591214	0.9226440	-14.54407	-6.605808	0.8954460	0.9509513	0.0287551	0.0001945	rate	1	95%	simulation 1e+05 reps
2015	-9.042367	0.9332670	-12.93949	-5.115865	0.9060193	0.9616269	0.0238989	0.0001738	rate	1	95%	simulation 1e+05 reps

Example 3 - proportion

This example calculates the SII for a prevalence indicator. Proportions need to be between 0 and 1 - this formatting is done in the mutate command below, before passing the grouped dataset to the phe_sii function.

The value_type argument is set to 2 to specify the indicator is a proportion, and a logit transformation is applied to the value, lower_cl and upper_cl fields.

The function will again run on the default 100,000 reps, and neither the RII or MAD values will be returned.

There is the option to specify a numeric multiplier in the arguments, which will scale the SII, SII_lowerCL, SII_upperCL (and SII_MAD) before outputting. This could be used if an absolute (i.e. positive) slope is desired for an indicator, where the “high is bad” polarity would otherwise give negative SII results.

Below, a multiplier of -100 is used, to output absolute prevalence figures that are expressed on a scale between 0 and 100.

# Pass data through SII function ---------------------------------------
prevalence_SII <- prevalence_data %>%
          # Group the input dataframe to create subgroups to calculate the SII for
        group_by(Period, SchoolYear, AreaCode) %>% 
          # Format prevalences to be between 0 and 1
        mutate(Rate = Rate/100,
               LCL = LCL/100,
               UCL = UCL/100) %>% 
           # Run SII function on grouped dataset
        phe_sii(quantile = Decile,
                        population = Measured,
                        value = Rate,
                        value_type = 2, # specifies indicator is a proportion
                        lower_cl = LCL,
                        upper_cl = UCL,
                        multiplier = -100) # negative multiplier to scale SII outputs

# View first 10 rows of results
knitr::kable(head(prevalence_SII,10))

Period	SchoolYear	AreaCode	sii	sii_lower95_0cl	sii_upper95_0cl	indicator_type	multiplier	CI_confidence	CI_method
607	6	E92000001	10.720032	10.218388	11.220999	proportion	-100	95%	simulation 1e+05 reps
607	R	E92000001	5.818547	5.416673	6.215898	proportion	-100	95%	simulation 1e+05 reps
708	6	E92000001	11.025821	10.650572	11.401599	proportion	-100	95%	simulation 1e+05 reps
708	R	E92000001	6.048249	5.758679	6.339480	proportion	-100	95%	simulation 1e+05 reps
809	6	E92000001	11.528301	11.164969	11.894447	proportion	-100	95%	simulation 1e+05 reps
809	R	E92000001	6.734782	6.458283	7.010078	proportion	-100	95%	simulation 1e+05 reps
910	6	E92000001	12.186004	11.818941	12.555142	proportion	-100	95%	simulation 1e+05 reps
910	R	E92000001	6.959414	6.689175	7.232961	proportion	-100	95%	simulation 1e+05 reps
1011	6	E92000001	12.781047	12.411679	13.148828	proportion	-100	95%	simulation 1e+05 reps
1011	R	E92000001	6.953482	6.692094	7.216067	proportion	-100	95%	simulation 1e+05 reps

Interpreting the Mean Average Difference (MAD)

If reliability_stat is set to TRUE in the function, a MAD value is returned for each subgroup as a measure of how much the SII (or RII) confidence limits vary.

Note: this option will increase the runtime of the function, as the MAD calculation involves an additional 9 sample sets of the confidence limits to be taken.

A MAD of 0.005 implies that, on rerunning the phe_sii function, the confidence limits can be expected to change by approximately 0.005. The more repetitions the function is run on, the smaller this statistic should be. The tolerance will depend on the level of accuracy to which the user wishes to present the confidence limits - ideally, to display them to 1 d.p., the MAD should be smaller than 0.01. To 2 d.p., smaller than 0.001, etc.

Worked examples for phe_sii function

Emma Clegg

2022-08-08

Introduction

Function and inputs

Example 1 - default (normal) distribution

Example 2 - rate

Example 3 - proportion

Interpreting the Mean Average Difference (MAD)