Introduction to PHEindicatormethods

Introduction

This vignette introduces the following functions from the PHEindicatormethods package and provides basic sample code to demonstrate their execution. The code included is based on the code provided within the ‘examples’ section of the function documentation. This vignette does not explain the methods applied in detail but these can (optionally) be output alongside the statistics or for a more detailed explanation, please see the references section of the function documentation.

The following packages must be installed and loaded if not already available

library(PHEindicatormethods)
library(dplyr)

Package functions

This vignette covers the following functions available within the first release of the package (v1.0.8) but has been updated to apply to these functions in their latest release versions. If further functions are added to the package in future releases these will be explained elsewhere.

Function	Type	Description
phe_proportion	Non-aggregate	Performs a calculation on each row of data (unless data is grouped)
phe_rate	Non-aggregate	Performs a calculation on each row of data (unless data is grouped)

phe_mean	Aggregate	Performs a calculation on each grouping set

phe_dsr	Aggregate, standardised	Performs a calculation on each grouping set and requires additional reference inputs
calculate_ISRatio	Aggregate, standardised	Performs a calculation on each grouping set and requires additional reference inputs
calculate_ISRate	Aggregate, standardised	Performs a calculation on each grouping set and requires additional reference inputs

Non-aggregate functions

Create some test data for the non-aggregate functions

The following code chunk creates a data frame containing observed number of events and populations for 4 geographical areas over 2 time periods that is used later to demonstrate the PHEindicatormethods package functions:

df <- data.frame(
        area = rep(c("Area1","Area2","Area3","Area4"), 2),
        year = rep(2015:2016, each = 4),
        obs = sample(100, 2 * 4, replace = TRUE),
        pop = sample(100:200, 2 * 4, replace = TRUE))
df
#>    area year obs pop
#> 1 Area1 2015  95 159
#> 2 Area2 2015  47 179
#> 3 Area3 2015  31 122
#> 4 Area4 2015  23 199
#> 5 Area1 2016  57 183
#> 6 Area2 2016  60 185
#> 7 Area3 2016  52 134
#> 8 Area4 2016  27 188

Execute phe_proportion and phe_rate

INPUT: The phe_proportion and phe_rate functions take a single data frame as input with columns representing the numerators and denominators for the statistic. Any other columns present will be retained in the output.

OUTPUT: The functions output the original data frame with additional columns appended. By default the additional columns are the proportion or rate, the lower 95% confidence limit, the upper 95% confidence limit, the confidence level, the statistic name and the method.

OPTIONS: The functions also accept additional arguments to specify the level of confidence, the multiplier and a reduced level of detail to be output.

Here are some example code chunks to demonstrate these two functions and the arguments that can optionally be specified

# default proportion
phe_proportion(df, obs, pop)
#>    area year obs pop     value    lowercl   uppercl confidence       statistic
#> 1 Area1 2015  95 159 0.5974843 0.51982778 0.6705414        95% proportion of 1
#> 2 Area2 2015  47 179 0.2625698 0.20358208 0.3315343        95% proportion of 1
#> 3 Area3 2015  31 122 0.2540984 0.18517152 0.3380381        95% proportion of 1
#> 4 Area4 2015  23 199 0.1155779 0.07826151 0.1674548        95% proportion of 1
#> 5 Area1 2016  57 183 0.3114754 0.24883615 0.3818668        95% proportion of 1
#> 6 Area2 2016  60 185 0.3243243 0.26103594 0.3947600        95% proportion of 1
#> 7 Area3 2016  52 134 0.3880597 0.30976871 0.4725899        95% proportion of 1
#> 8 Area4 2016  27 188 0.1436170 0.10061629 0.2008903        95% proportion of 1
#>   method
#> 1 Wilson
#> 2 Wilson
#> 3 Wilson
#> 4 Wilson
#> 5 Wilson
#> 6 Wilson
#> 7 Wilson
#> 8 Wilson

# specify confidence level for proportion
phe_proportion(df, obs, pop, confidence=99.8)
#>    area year obs pop     value    lowercl   uppercl confidence       statistic
#> 1 Area1 2015  95 159 0.5974843 0.47510063 0.7088216      99.8% proportion of 1
#> 2 Area2 2015  47 179 0.2625698 0.17483885 0.3743512      99.8% proportion of 1
#> 3 Area3 2015  31 122 0.2540984 0.15330117 0.3905969      99.8% proportion of 1
#> 4 Area4 2015  23 199 0.1155779 0.06253705 0.2038243      99.8% proportion of 1
#> 5 Area1 2016  57 183 0.3114754 0.21727082 0.4243798      99.8% proportion of 1
#> 6 Area2 2016  60 185 0.3243243 0.22887615 0.4370187      99.8% proportion of 1
#> 7 Area3 2016  52 134 0.3880597 0.26959798 0.5214149      99.8% proportion of 1
#> 8 Area4 2016  27 188 0.1436170 0.08183712 0.2398520      99.8% proportion of 1
#>   method
#> 1 Wilson
#> 2 Wilson
#> 3 Wilson
#> 4 Wilson
#> 5 Wilson
#> 6 Wilson
#> 7 Wilson
#> 8 Wilson

# specify to output proportions as percentages
phe_proportion(df, obs, pop, multiplier=100)
#>    area year obs pop    value   lowercl  uppercl confidence  statistic method
#> 1 Area1 2015  95 159 59.74843 51.982778 67.05414        95% percentage Wilson
#> 2 Area2 2015  47 179 26.25698 20.358208 33.15343        95% percentage Wilson
#> 3 Area3 2015  31 122 25.40984 18.517152 33.80381        95% percentage Wilson
#> 4 Area4 2015  23 199 11.55779  7.826151 16.74548        95% percentage Wilson
#> 5 Area1 2016  57 183 31.14754 24.883615 38.18668        95% percentage Wilson
#> 6 Area2 2016  60 185 32.43243 26.103594 39.47600        95% percentage Wilson
#> 7 Area3 2016  52 134 38.80597 30.976871 47.25899        95% percentage Wilson
#> 8 Area4 2016  27 188 14.36170 10.061629 20.08903        95% percentage Wilson

# specify level of detail to output for proportion
phe_proportion(df, obs, pop, confidence=99.8, multiplier=100)
#>    area year obs pop    value   lowercl  uppercl confidence  statistic method
#> 1 Area1 2015  95 159 59.74843 47.510063 70.88216      99.8% percentage Wilson
#> 2 Area2 2015  47 179 26.25698 17.483885 37.43512      99.8% percentage Wilson
#> 3 Area3 2015  31 122 25.40984 15.330117 39.05969      99.8% percentage Wilson
#> 4 Area4 2015  23 199 11.55779  6.253705 20.38243      99.8% percentage Wilson
#> 5 Area1 2016  57 183 31.14754 21.727082 42.43798      99.8% percentage Wilson
#> 6 Area2 2016  60 185 32.43243 22.887615 43.70187      99.8% percentage Wilson
#> 7 Area3 2016  52 134 38.80597 26.959798 52.14149      99.8% percentage Wilson
#> 8 Area4 2016  27 188 14.36170  8.183712 23.98520      99.8% percentage Wilson

# specify level of detail to output for proportion and remove metadata columns
phe_proportion(df, obs, pop, confidence=99.8, multiplier=100, type="standard")
#>    area year obs pop    value   lowercl  uppercl
#> 1 Area1 2015  95 159 59.74843 47.510063 70.88216
#> 2 Area2 2015  47 179 26.25698 17.483885 37.43512
#> 3 Area3 2015  31 122 25.40984 15.330117 39.05969
#> 4 Area4 2015  23 199 11.55779  6.253705 20.38243
#> 5 Area1 2016  57 183 31.14754 21.727082 42.43798
#> 6 Area2 2016  60 185 32.43243 22.887615 43.70187
#> 7 Area3 2016  52 134 38.80597 26.959798 52.14149
#> 8 Area4 2016  27 188 14.36170  8.183712 23.98520

# default rate
phe_rate(df, obs, pop)
#>    area year obs pop    value   lowercl  uppercl confidence       statistic
#> 1 Area1 2015  95 159 59748.43 48338.823 73040.09        95% rate per 100000
#> 2 Area2 2015  47 179 26256.98 19290.982 34917.01        95% rate per 100000
#> 3 Area3 2015  31 122 25409.84 17261.591 36068.47        95% rate per 100000
#> 4 Area4 2015  23 199 11557.79  7324.307 17343.12        95% rate per 100000
#> 5 Area1 2016  57 183 31147.54 23589.399 40355.99        95% rate per 100000
#> 6 Area2 2016  60 185 32432.43 24747.978 41747.69        95% rate per 100000
#> 7 Area3 2016  52 134 38805.97 28980.069 50889.90        95% rate per 100000
#> 8 Area4 2016  27 188 14361.70  9462.215 20896.33        95% rate per 100000
#>   method
#> 1  Byars
#> 2  Byars
#> 3  Byars
#> 4  Byars
#> 5  Byars
#> 6  Byars
#> 7  Byars
#> 8  Byars

# specify rate parameters
phe_rate(df, obs, pop, confidence=99.8, multiplier=100)
#>    area year obs pop    value   lowercl  uppercl confidence    statistic method
#> 1 Area1 2015  95 159 59.74843 42.569139 81.23649      99.8% rate per 100  Byars
#> 2 Area2 2015  47 179 26.25698 15.976629 40.39763      99.8% rate per 100  Byars
#> 3 Area3 2015  31 122 25.40984 13.574391 42.94506      99.8% rate per 100  Byars
#> 4 Area4 2015  23 199 11.55779  5.492857 21.13512      99.8% rate per 100  Byars
#> 5 Area1 2016  57 183 31.14754 19.923305 46.13788      99.8% rate per 100  Byars
#> 6 Area2 2016  60 185 32.43243 21.002812 47.58499      99.8% rate per 100  Byars
#> 7 Area3 2016  52 134 38.80597 24.256219 58.50523      99.8% rate per 100  Byars
#> 8 Area4 2016  27 188 14.36170  7.288480 25.14231      99.8% rate per 100  Byars

# specify rate parameters and reduce columns output and remove metadata columns
phe_rate(df, obs, pop, type="standard", confidence=99.8, multiplier=100)
#>    area year obs pop    value   lowercl  uppercl
#> 1 Area1 2015  95 159 59.74843 42.569139 81.23649
#> 2 Area2 2015  47 179 26.25698 15.976629 40.39763
#> 3 Area3 2015  31 122 25.40984 13.574391 42.94506
#> 4 Area4 2015  23 199 11.55779  5.492857 21.13512
#> 5 Area1 2016  57 183 31.14754 19.923305 46.13788
#> 6 Area2 2016  60 185 32.43243 21.002812 47.58499
#> 7 Area3 2016  52 134 38.80597 24.256219 58.50523
#> 8 Area4 2016  27 188 14.36170  7.288480 25.14231

These functions can also return aggregate data if the input dataframes are grouped:

# default proportion - grouped
df %>%
  group_by(year) %>%
  phe_proportion(obs, pop)
#> # A tibble: 2 x 9
#> # Groups:   year [2]
#>    year   obs   pop value lowercl uppercl confidence statistic       method
#>   <int> <int> <int> <dbl>   <dbl>   <dbl> <chr>      <chr>           <chr> 
#> 1  2015   196   659 0.297   0.264   0.333 95%        proportion of 1 Wilson
#> 2  2016   196   690 0.284   0.252   0.319 95%        proportion of 1 Wilson

# default rate - grouped
df %>%
  group_by(year) %>%
  phe_rate(obs, pop)
#> # A tibble: 2 x 9
#> # Groups:   year [2]
#>    year   obs   pop  value lowercl uppercl confidence statistic       method
#>   <int> <int> <int>  <dbl>   <dbl>   <dbl> <chr>      <chr>           <chr> 
#> 1  2015   196   659 29742.  25724.  34210. 95%        rate per 100000 Byars 
#> 2  2016   196   690 28406.  24568.  32673. 95%        rate per 100000 Byars

Aggregate functions

The remaining functions aggregate the rows in the input data frame to produce a single statistic. It is also possible to calculate multiple statistics in a single execution of these functions if the input data frame is grouped - for example by indicator ID, geographic area or time period (or all three). The output contains only the grouping variables and the values calculated by the function - any additional unused columns provided in the input data frame will not be retained in the output.

The df test data generated earlier can be used to demonstrate phe_mean:

Execute phe_mean

INPUT: The phe_mean function take a single data frame as input with a column representing the numbers to be averaged.

OUTPUT: By default, the function outputs one row per grouping set containing the grouping variable values (if applicable), the mean, the lower 95% confidence limit, the upper 95% confidence limit, the confidence level, the statistic name and the method.

OPTIONS: The function also accepts additional arguments to specify the level of confidence and a reduced level of detail to be output.

Here are some example code chunks to demonstrate the phe_mean function and the arguments that can optionally be specified

# default mean
phe_mean(df,obs)
#>   value_sum value_count    stdev value  lowercl  uppercl confidence statistic
#> 1       392           8 23.29316    49 29.52643 68.47357        95%      mean
#>                     method
#> 1 Student's t-distribution

# multiple means in a single execution with 99.8% confidence
df %>%
    group_by(year) %>%
        phe_mean(obs, confidence=0.998)
#> # A tibble: 2 x 10
#> # Groups:   year [2]
#>    year value_sum value_count stdev value lowercl uppercl confi~1 stati~2 method
#>   <int>     <int>       <int> <dbl> <dbl>   <dbl>   <dbl> <chr>   <chr>   <chr> 
#> 1  2015       196           4  32.2    49  -116.     214. 99.8%   mean    Stude~
#> 2  2016       196           4  15.0    49   -27.8    126. 99.8%   mean    Stude~
#> # ... with abbreviated variable names 1: confidence, 2: statistic

# multiple means in a single execution with 99.8% confidence and data-only output
df %>%
    group_by(year) %>%
        phe_mean(obs, type = "standard", confidence=0.998)
#> # A tibble: 2 x 7
#> # Groups:   year [2]
#>    year value_sum value_count stdev value lowercl uppercl
#>   <int>     <int>       <int> <dbl> <dbl>   <dbl>   <dbl>
#> 1  2015       196           4  32.2    49  -116.     214.
#> 2  2016       196           4  15.0    49   -27.8    126.

Standardised Aggregate functions

Create some test data for the standardised aggregate functions

The following code chunk creates a data frame containing observed number of events and populations by age band for 4 areas, 5 time periods and 2 sexes:

df_std <- data.frame(
            area = rep(c("Area1", "Area2", "Area3", "Area4"), each = 19 * 2 * 5),
            year = rep(2006:2010, each = 19 * 2),
            sex = rep(rep(c("Male", "Female"), each = 19), 5),
            ageband = rep(c(0, 5,10,15,20,25,30,35,40,45,
                           50,55,60,65,70,75,80,85,90), times = 10),
            obs = sample(200, 19 * 2 * 5 * 4, replace = TRUE),
            pop = sample(10000:20000, 19 * 2 * 5 * 4, replace = TRUE))
head(df_std)
#>    area year  sex ageband obs   pop
#> 1 Area1 2006 Male       0 159 13270
#> 2 Area1 2006 Male       5  25 15465
#> 3 Area1 2006 Male      10  21 14248
#> 4 Area1 2006 Male      15   7 18390
#> 5 Area1 2006 Male      20  91 16672
#> 6 Area1 2006 Male      25 173 12172

Execute phe_dsr

INPUT: The minimum input requirement for the phe_dsr function is a single data frame with columns representing the numerators and denominators for each standardisation category. This is sufficient if the data is:

broken down into 19 standardisation categories per grouping set representing the 19 x 5-year age bands from 00-04 to 90+
sorted, within each grouping set, by these 19 age bands in ascending order
to be standardised against the 2013 European Standard Population

The 2013 European Standard Population is provided within the package in vector form (esp2013) and is used by default by this function. Alternative standard populations can be used but must be provided by the user. When the function joins a standard population vector to the input data frame it does this by position so it is important that the data is sorted accordingly. This is a user responsibility.

The function can also accept standard populations provided as a column within the input data frame.

standard populations provided as a vector - the vector and the input data frame must both contain rows for the same standardisation categories, and both must be sorted, within each grouping set, by these standardisation categories in the same order
standard populations provided as a column within the input data frame - the standard populations can be appended to the input data frame by the user prior to execution of the function - if the data is grouped to generate multiple dsrs then the standard populations will need to be repeated and appended to the data rows for every grouping set.

OUTPUT: By default, the function outputs one row per grouping set containing the grouping variable values, the total count, the total population, the dsr, the lower 95% confidence limit, the upper 95% confidence limit, the confidence level, the statistic name and the method.

OPTIONS: If standard populations are being provided as a column within the input data frame then the user must specify this using the stdpoptype argument as the function expects a vector by default. The function also accepts additional arguments to specify the standard populations, the level of confidence, the multiplier and a reduced level of detail to be output.

Here are some example code chunks to demonstrate the phe_dsr function and the arguments that can optionally be specified

# calculate separate dsrs for each area, year and sex
df_std %>%
    group_by(area, year, sex) %>%
    phe_dsr(obs, pop)
#> # A tibble: 40 x 11
#> # Groups:   area, year, sex [40]
#>    area   year sex    total_count total_~1 value lowercl uppercl confi~2 stati~3
#>    <chr> <int> <chr>        <int>    <int> <dbl>   <dbl>   <dbl> <chr>   <chr>  
#>  1 Area1  2006 Female        1686   291400  676.    643.    711. 95%     dsr pe~
#>  2 Area1  2006 Male          1482   286337  485.    457.    513. 95%     dsr pe~
#>  3 Area1  2007 Female        2255   268154  842.    805.    880. 95%     dsr pe~
#>  4 Area1  2007 Male          2332   277961  881.    844.    920. 95%     dsr pe~
#>  5 Area1  2008 Female        2346   279399  917.    877.    958. 95%     dsr pe~
#>  6 Area1  2008 Male          1989   287406  736.    702.    772. 95%     dsr pe~
#>  7 Area1  2009 Female        1977   289207  722.    689.    756. 95%     dsr pe~
#>  8 Area1  2009 Male          1901   289622  668.    636.    701. 95%     dsr pe~
#>  9 Area1  2010 Female        2241   298418  806.    771.    843. 95%     dsr pe~
#> 10 Area1  2010 Male          2034   293566  668.    638.    700. 95%     dsr pe~
#> # ... with 30 more rows, 1 more variable: method <chr>, and abbreviated
#> #   variable names 1: total_pop, 2: confidence, 3: statistic
#> # i Use `print(n = ...)` to see more rows, and `colnames()` to see all variable names

# calculate separate dsrs for each area, year and sex and drop metadata fields from output
df_std %>%
    group_by(area, year, sex) %>%
    phe_dsr(obs, pop, type="standard")
#> # A tibble: 40 x 8
#> # Groups:   area, year, sex [40]
#>    area   year sex    total_count total_pop value lowercl uppercl
#>    <chr> <int> <chr>        <int>     <int> <dbl>   <dbl>   <dbl>
#>  1 Area1  2006 Female        1686    291400  676.    643.    711.
#>  2 Area1  2006 Male          1482    286337  485.    457.    513.
#>  3 Area1  2007 Female        2255    268154  842.    805.    880.
#>  4 Area1  2007 Male          2332    277961  881.    844.    920.
#>  5 Area1  2008 Female        2346    279399  917.    877.    958.
#>  6 Area1  2008 Male          1989    287406  736.    702.    772.
#>  7 Area1  2009 Female        1977    289207  722.    689.    756.
#>  8 Area1  2009 Male          1901    289622  668.    636.    701.
#>  9 Area1  2010 Female        2241    298418  806.    771.    843.
#> 10 Area1  2010 Male          2034    293566  668.    638.    700.
#> # ... with 30 more rows
#> # i Use `print(n = ...)` to see more rows

# calculate same specifying standard population in vector form
df_std %>%
    group_by(area, year, sex) %>%
    phe_dsr(obs, pop, stdpop = esp2013)
#> # A tibble: 40 x 11
#> # Groups:   area, year, sex [40]
#>    area   year sex    total_count total_~1 value lowercl uppercl confi~2 stati~3
#>    <chr> <int> <chr>        <int>    <int> <dbl>   <dbl>   <dbl> <chr>   <chr>  
#>  1 Area1  2006 Female        1686   291400  676.    643.    711. 95%     dsr pe~
#>  2 Area1  2006 Male          1482   286337  485.    457.    513. 95%     dsr pe~
#>  3 Area1  2007 Female        2255   268154  842.    805.    880. 95%     dsr pe~
#>  4 Area1  2007 Male          2332   277961  881.    844.    920. 95%     dsr pe~
#>  5 Area1  2008 Female        2346   279399  917.    877.    958. 95%     dsr pe~
#>  6 Area1  2008 Male          1989   287406  736.    702.    772. 95%     dsr pe~
#>  7 Area1  2009 Female        1977   289207  722.    689.    756. 95%     dsr pe~
#>  8 Area1  2009 Male          1901   289622  668.    636.    701. 95%     dsr pe~
#>  9 Area1  2010 Female        2241   298418  806.    771.    843. 95%     dsr pe~
#> 10 Area1  2010 Male          2034   293566  668.    638.    700. 95%     dsr pe~
#> # ... with 30 more rows, 1 more variable: method <chr>, and abbreviated
#> #   variable names 1: total_pop, 2: confidence, 3: statistic
#> # i Use `print(n = ...)` to see more rows, and `colnames()` to see all variable names

# calculate the same dsrs by appending the standard populations to the data frame
df_std %>%
    mutate(refpop = rep(esp2013,40)) %>%
    group_by(area, year, sex) %>%
    phe_dsr(obs,pop, stdpop=refpop, stdpoptype="field")
#> # A tibble: 40 x 11
#> # Groups:   area, year, sex [40]
#>    area   year sex    total_count total_~1 value lowercl uppercl confi~2 stati~3
#>    <chr> <int> <chr>        <int>    <int> <dbl>   <dbl>   <dbl> <chr>   <chr>  
#>  1 Area1  2006 Female        1686   291400  676.    643.    711. 95%     dsr pe~
#>  2 Area1  2006 Male          1482   286337  485.    457.    513. 95%     dsr pe~
#>  3 Area1  2007 Female        2255   268154  842.    805.    880. 95%     dsr pe~
#>  4 Area1  2007 Male          2332   277961  881.    844.    920. 95%     dsr pe~
#>  5 Area1  2008 Female        2346   279399  917.    877.    958. 95%     dsr pe~
#>  6 Area1  2008 Male          1989   287406  736.    702.    772. 95%     dsr pe~
#>  7 Area1  2009 Female        1977   289207  722.    689.    756. 95%     dsr pe~
#>  8 Area1  2009 Male          1901   289622  668.    636.    701. 95%     dsr pe~
#>  9 Area1  2010 Female        2241   298418  806.    771.    843. 95%     dsr pe~
#> 10 Area1  2010 Male          2034   293566  668.    638.    700. 95%     dsr pe~
#> # ... with 30 more rows, 1 more variable: method <chr>, and abbreviated
#> #   variable names 1: total_pop, 2: confidence, 3: statistic
#> # i Use `print(n = ...)` to see more rows, and `colnames()` to see all variable names

# calculate for under 75s by filtering out records for 75+ from input data frame and standard population
df_std %>%
    filter(ageband <= 70) %>%
    group_by(area, year, sex) %>%
    phe_dsr(obs, pop, stdpop = esp2013[1:15])
#> # A tibble: 40 x 11
#> # Groups:   area, year, sex [40]
#>    area   year sex    total_count total_~1 value lowercl uppercl confi~2 stati~3
#>    <chr> <int> <chr>        <int>    <int> <dbl>   <dbl>   <dbl> <chr>   <chr>  
#>  1 Area1  2006 Female        1510   236874  696.    660.    734. 95%     dsr pe~
#>  2 Area1  2006 Male           973   218799  461.    432.    492. 95%     dsr pe~
#>  3 Area1  2007 Female        1833   214629  862.    822.    903. 95%     dsr pe~
#>  4 Area1  2007 Male          1827   229541  843.    804.    883. 95%     dsr pe~
#>  5 Area1  2008 Female        1919   208495  959.    916.   1004. 95%     dsr pe~
#>  6 Area1  2008 Male          1699   216630  777.    739.    815. 95%     dsr pe~
#>  7 Area1  2009 Female        1675   232775  715.    681.    751. 95%     dsr pe~
#>  8 Area1  2009 Male          1346   233860  606.    572.    640. 95%     dsr pe~
#>  9 Area1  2010 Female        1848   231491  820.    782.    859. 95%     dsr pe~
#> 10 Area1  2010 Male          1517   240194  637.    605.    671. 95%     dsr pe~
#> # ... with 30 more rows, 1 more variable: method <chr>, and abbreviated
#> #   variable names 1: total_pop, 2: confidence, 3: statistic
#> # i Use `print(n = ...)` to see more rows, and `colnames()` to see all variable names
    
# calculate separate dsrs for persons for each area and year)
df_std %>%
    group_by(area, year, ageband) %>%
    summarise(obs = sum(obs),
              pop = sum(pop),
              .groups = "drop_last") %>%
    phe_dsr(obs,pop)
#> # A tibble: 20 x 10
#> # Groups:   area, year [20]
#>    area   year total_count total_~1 value lowercl uppercl confi~2 stati~3 method
#>    <chr> <int>       <int>    <int> <dbl>   <dbl>   <dbl> <chr>   <chr>   <chr> 
#>  1 Area1  2006        3168   577737  560.    539.    581. 95%     dsr pe~ Dobson
#>  2 Area1  2007        4587   546115  841.    816.    867. 95%     dsr pe~ Dobson
#>  3 Area1  2008        4335   566805  832.    806.    859. 95%     dsr pe~ Dobson
#>  4 Area1  2009        3878   578829  681.    659.    704. 95%     dsr pe~ Dobson
#>  5 Area1  2010        4275   591984  730.    707.    754. 95%     dsr pe~ Dobson
#>  6 Area2  2006        4414   555817  823.    797.    849. 95%     dsr pe~ Dobson
#>  7 Area2  2007        4444   552401  828.    801.    854. 95%     dsr pe~ Dobson
#>  8 Area2  2008        3558   556367  640.    618.    663. 95%     dsr pe~ Dobson
#>  9 Area2  2009        3880   568466  671.    648.    694. 95%     dsr pe~ Dobson
#> 10 Area2  2010        4227   533701  768.    743.    793. 95%     dsr pe~ Dobson
#> 11 Area3  2006        4306   609516  730.    707.    753. 95%     dsr pe~ Dobson
#> 12 Area3  2007        3575   565038  677.    653.    701. 95%     dsr pe~ Dobson
#> 13 Area3  2008        3803   582952  664.    642.    687. 95%     dsr pe~ Dobson
#> 14 Area3  2009        3702   572392  712.    688.    737. 95%     dsr pe~ Dobson
#> 15 Area3  2010        4012   563172  722.    698.    746. 95%     dsr pe~ Dobson
#> 16 Area4  2006        3706   577029  692.    669.    716. 95%     dsr pe~ Dobson
#> 17 Area4  2007        4248   580619  706.    684.    730. 95%     dsr pe~ Dobson
#> 18 Area4  2008        3672   534298  765.    740.    791. 95%     dsr pe~ Dobson
#> 19 Area4  2009        3690   588180  637.    616.    659. 95%     dsr pe~ Dobson
#> 20 Area4  2010        3420   558149  625.    603.    648. 95%     dsr pe~ Dobson
#> # ... with abbreviated variable names 1: total_pop, 2: confidence, 3: statistic

Execute calculate_ISRatio and calculate_ISRate

INPUT: Unlike the phe_dsr function, there is no default standard or reference data for the calculate_ISRatio and calculate_ISRate functions. These functions take a single data frame as input, with columns representing the numerators and denominators for each standardisation category, plus reference numerators and denominators for each standardisation category.

The reference data can either be provided in a separate data frame/vectors or as columns within the input data frame:

reference data provided as a data frame or as vectors - the data frame/vectors and the input data frame must both contain rows for the same standardisation categories, and both must be sorted, within each grouping set, by these standardisation categories in the same order.
reference data provided as columns within the input data frame - the reference numerators and denominators can be appended to the input data frame prior to execution of the function - if the data is grouped to generate multiple smrs/isrs then the reference data will need to be repeated and appended to the data rows for every grouping set.

OUTPUT: By default, the functions output one row per grouping set containing the grouping variable values, the observed and expected counts, the reference rate (isr only), the smr or isr, the lower 95% confidence limit, and the upper 95% confidence limit, the confidence level, the statistic name and the method.

OPTIONS: If reference data are being provided as columns within the input data frame then the user must specify this as the function expects vectors by default. The function also accepts additional arguments to specify the level of confidence, the multiplier and a reduced level of detail to be output.

The following code chunk creates a data frame containing the reference data - this example uses the all area data for persons in the baseline year:

df_ref <- df_std %>%
    filter(year == 2006) %>%
    group_by(ageband) %>%
    summarise(obs = sum(obs),
              pop = sum(pop),
              .groups = "drop_last")
    
head(df_ref)
#> # A tibble: 6 x 3
#>   ageband   obs    pop
#>     <dbl> <int>  <int>
#> 1       0  1218 110863
#> 2       5   691 123609
#> 3      10   707 125770
#> 4      15   518 122501
#> 5      20   885 108969
#> 6      25   924 121251

Here are some example code chunks to demonstrate the calculate_ISRatio function and the arguments that can optionally be specified

# calculate separate smrs for each area, year and sex
df_std %>%
    group_by(area, year, sex) %>%
    calculate_ISRatio(obs, pop, df_ref$obs, df_ref$pop)
#> # A tibble: 40 x 11
#> # Groups:   area, year, sex [40]
#>    area   year sex    observed expected value lowercl uppercl confidence stati~1
#>    <chr> <int> <chr>     <int>    <dbl> <dbl>   <dbl>   <dbl> <chr>      <chr>  
#>  1 Area1  2006 Female     1686    1957. 0.861   0.821   0.904 95%        indire~
#>  2 Area1  2006 Male       1482    1929. 0.768   0.730   0.808 95%        indire~
#>  3 Area1  2007 Female     2255    1823. 1.24    1.19    1.29  95%        indire~
#>  4 Area1  2007 Male       2332    1875. 1.24    1.19    1.30  95%        indire~
#>  5 Area1  2008 Female     2346    1874. 1.25    1.20    1.30  95%        indire~
#>  6 Area1  2008 Male       1989    1967. 1.01    0.967   1.06  95%        indire~
#>  7 Area1  2009 Female     1977    1957. 1.01    0.966   1.06  95%        indire~
#>  8 Area1  2009 Male       1901    1969. 0.966   0.923   1.01  95%        indire~
#>  9 Area1  2010 Female     2241    1998. 1.12    1.08    1.17  95%        indire~
#> 10 Area1  2010 Male       2034    1987. 1.02    0.980   1.07  95%        indire~
#> # ... with 30 more rows, 1 more variable: method <chr>, and abbreviated
#> #   variable name 1: statistic
#> # i Use `print(n = ...)` to see more rows, and `colnames()` to see all variable names

# calculate the same smrs by appending the reference data to the data frame
df_std %>%
    mutate(refobs = rep(df_ref$obs,40),
           refpop = rep(df_ref$pop,40)) %>%
    group_by(area, year, sex) %>%
    calculate_ISRatio(obs, pop, refobs, refpop, refpoptype="field")
#> # A tibble: 40 x 11
#> # Groups:   area, year, sex [40]
#>    area   year sex    observed expected value lowercl uppercl confidence stati~1
#>    <chr> <int> <chr>     <int>    <dbl> <dbl>   <dbl>   <dbl> <chr>      <chr>  
#>  1 Area1  2006 Female     1686    1957. 0.861   0.821   0.904 95%        indire~
#>  2 Area1  2006 Male       1482    1929. 0.768   0.730   0.808 95%        indire~
#>  3 Area1  2007 Female     2255    1823. 1.24    1.19    1.29  95%        indire~
#>  4 Area1  2007 Male       2332    1875. 1.24    1.19    1.30  95%        indire~
#>  5 Area1  2008 Female     2346    1874. 1.25    1.20    1.30  95%        indire~
#>  6 Area1  2008 Male       1989    1967. 1.01    0.967   1.06  95%        indire~
#>  7 Area1  2009 Female     1977    1957. 1.01    0.966   1.06  95%        indire~
#>  8 Area1  2009 Male       1901    1969. 0.966   0.923   1.01  95%        indire~
#>  9 Area1  2010 Female     2241    1998. 1.12    1.08    1.17  95%        indire~
#> 10 Area1  2010 Male       2034    1987. 1.02    0.980   1.07  95%        indire~
#> # ... with 30 more rows, 1 more variable: method <chr>, and abbreviated
#> #   variable name 1: statistic
#> # i Use `print(n = ...)` to see more rows, and `colnames()` to see all variable names

# calculate separate smrs for each year and drop metadata columns from output
df_std %>%
    group_by(year, ageband) %>%
    summarise(obs = sum(obs),
              pop = sum(pop),
              .groups = "drop_last") %>%
    calculate_ISRatio(obs, pop, df_ref$obs, df_ref$pop, type="standard")
#> # A tibble: 5 x 6
#> # Groups:   year [5]
#>    year observed expected value lowercl uppercl
#>   <int>    <int>    <dbl> <dbl>   <dbl>   <dbl>
#> 1  2006    15594   15594  1       0.984   1.02 
#> 2  2007    16854   15163. 1.11    1.09    1.13 
#> 3  2008    15368   15108. 1.02    1.00    1.03 
#> 4  2009    15150   15655. 0.968   0.952   0.983
#> 5  2010    15934   15267. 1.04    1.03    1.06

The calculate_ISRate function works exactly the same way but instead of expressing the result as a ratio of the observed and expected rates the result is expressed as a rate and the reference rate is also provided. Here are some examples:

# calculate separate isrs for each area, year and sex
df_std %>%
    group_by(area, year, sex) %>%
    calculate_ISRate(obs, pop, df_ref$obs, df_ref$pop)
#> # A tibble: 40 x 12
#> # Groups:   area, year, sex [40]
#>    area   year sex    observed expected ref_rate value lowercl uppercl confide~1
#>    <chr> <int> <chr>     <int>    <dbl>    <dbl> <dbl>   <dbl>   <dbl> <chr>    
#>  1 Area1  2006 Female     1686    1957.     672.  579.    552.    607. 95%      
#>  2 Area1  2006 Male       1482    1929.     672.  516.    490.    543. 95%      
#>  3 Area1  2007 Female     2255    1823.     672.  832.    798.    867. 95%      
#>  4 Area1  2007 Male       2332    1875.     672.  836.    802.    870. 95%      
#>  5 Area1  2008 Female     2346    1874.     672.  841.    808.    876. 95%      
#>  6 Area1  2008 Male       1989    1967.     672.  680.    650.    710. 95%      
#>  7 Area1  2009 Female     1977    1957.     672.  679.    649.    710. 95%      
#>  8 Area1  2009 Male       1901    1969.     672.  649.    620.    679. 95%      
#>  9 Area1  2010 Female     2241    1998.     672.  754.    723.    786. 95%      
#> 10 Area1  2010 Male       2034    1987.     672.  688.    659.    719. 95%      
#> # ... with 30 more rows, 2 more variables: statistic <chr>, method <chr>, and
#> #   abbreviated variable name 1: confidence
#> # i Use `print(n = ...)` to see more rows, and `colnames()` to see all variable names

# calculate the same isrs by appending the reference data to the data frame
df_std %>%
    mutate(refobs = rep(df_ref$obs,40),
           refpop = rep(df_ref$pop,40)) %>%
    group_by(area, year, sex) %>%
    calculate_ISRate(obs, pop, refobs, refpop, refpoptype="field")
#> # A tibble: 40 x 12
#> # Groups:   area, year, sex [40]
#>    area   year sex    observed expected ref_rate value lowercl uppercl confide~1
#>    <chr> <int> <chr>     <int>    <dbl>    <dbl> <dbl>   <dbl>   <dbl> <chr>    
#>  1 Area1  2006 Female     1686    1957.     672.  579.    552.    607. 95%      
#>  2 Area1  2006 Male       1482    1929.     672.  516.    490.    543. 95%      
#>  3 Area1  2007 Female     2255    1823.     672.  832.    798.    867. 95%      
#>  4 Area1  2007 Male       2332    1875.     672.  836.    802.    870. 95%      
#>  5 Area1  2008 Female     2346    1874.     672.  841.    808.    876. 95%      
#>  6 Area1  2008 Male       1989    1967.     672.  680.    650.    710. 95%      
#>  7 Area1  2009 Female     1977    1957.     672.  679.    649.    710. 95%      
#>  8 Area1  2009 Male       1901    1969.     672.  649.    620.    679. 95%      
#>  9 Area1  2010 Female     2241    1998.     672.  754.    723.    786. 95%      
#> 10 Area1  2010 Male       2034    1987.     672.  688.    659.    719. 95%      
#> # ... with 30 more rows, 2 more variables: statistic <chr>, method <chr>, and
#> #   abbreviated variable name 1: confidence
#> # i Use `print(n = ...)` to see more rows, and `colnames()` to see all variable names

# calculate separate isrs for each year and drop metadata columns from output
df_std %>%
    group_by(year, ageband) %>%
    summarise(obs = sum(obs),
              pop = sum(pop),
              .groups = "drop_last")  %>%
    calculate_ISRate(obs, pop, df_ref$obs, df_ref$pop, type="standard")
#> # A tibble: 5 x 7
#> # Groups:   year [5]
#>    year observed expected ref_rate value lowercl uppercl
#>   <int>    <int>    <dbl>    <dbl> <dbl>   <dbl>   <dbl>
#> 1  2006    15594   15594      672.  672.    662.    683.
#> 2  2007    16854   15163.     672.  747.    736.    758.
#> 3  2008    15368   15108.     672.  684.    673.    695.
#> 4  2009    15150   15655.     672.  650.    640.    661.
#> 5  2010    15934   15267.     672.  702.    691.    712.