Get started

Koen Derks

last modified: 27-10-2021

Welcome to the ‘Get started’ vignette of the jfa package. This vignette provides a simple explanation of the functions in the package and how they facilitate the statistical audit sampling workflow. See the other vignettes for a more detailed explanation of the functionality of the package.

Example data

To concretely illustrate jfa‘s functionality, we consider the BuildIt data set that is included in the package (for more info, see ?BuildIt). This data set contains a population of 3500 invoices paid to the fictional ’BuildIt’ construction company. Each invoice has an identification number (ID), a recorded value (bookValue), and a corresponding audit (true) value (auditValue).

Note: The information in the auditValue column is added for illustrative purposes since it will only be known to the auditor after having inspected a sample of invoices.

First, we load the jfa package and the BuildIt data set. The first 10 invoices from the data set are displayed below.

library(jfa)

data('BuildIt')
head(BuildIt, n = 10)
##       ID bookValue auditValue
## 1  82884    242.61     242.61
## 2  25064    642.99     642.99
## 3  81235    628.53     628.53
## 4  71769    431.87     431.87
## 5  55080    620.88     620.88
## 6  93224    501.76     501.76
## 7  24331    466.01     466.01
## 8  81460    295.20     295.20
## 9  14608    216.48     216.48
## 10 79064    243.43     243.43

For a fully illustrated walkthrough of jfa’s workflow functionality using the BuildIt data set, see Workflow: Classical audit sampling. For a Bayesian version of the illustrated walkthrough, see Workflow: Bayesian audit sampling.

(Optional) Using auditPrior(): The basics

The auditPrior() function can be used to create a prior distribution for the misstatement parameter in a statistical audit sampling model. In an audit sampling context, an advantage of Bayesian inference is that the prior distribution can be used to incorporate existing information into the statistical procedure. Incorporating existing information can potentially yield a decrease in sample size and an increase in efficiency. The type of audit information that can be incorporated depends on the information that is available in the context of the audit. See the vignette Planning: Prior distributions or the accompanying article for a detailed explanation of the types of audit information that jfa is able to incorporate into a prior distribution.

With the prior distribution in hand, Bayesian audit sampling can be performed by providing the object returned by the auditPrior() function as input for the prior argument in subsequent calls to the planning() and evaluation() functions.

Using planning(): The basics

Planning a minimum sample size requires knowledge of the conditions that lead to acceptance of the population (i.e., the sampling objectives). Generally, a sampling objective can be one (or both) of the following:

Next to determining the sampling objective(s), it is also important to determine the statistical distribution linking the sample outcomes to the population misstatement (e.g., poisson, binomial, or hypergeometric). All three distributions are standard in an audit sampling context because they are (approximations) of the hypergeometric distribution, but poisson is the default in jfa because it is the most conservative.

Lastly, it is advised to obtain knowledge of the expected (or tolerable) errors in the sample. It is strongly recommended to set the value for the expected errors in the sample conservatively to minimize the chance of the observed errors in the sample exceeding the expected errors, which would imply that insufficient work has been done in the end.

With the BuildIt data set, because the booked amounts (monetary values) of each invoice in the population are given, an auditor may want to make a statement about the amount of misstatement in the population. For illustrative purposes we will tolerate zero misstatements in the sample.

Hypothesis testing

First, let’s take a look at how you can use the planning() function to calculate the minimum sample size for testing the hypothesis that the misstatement in the population is lower than the performance materiality. In this example the performance materiality is set to 5% of the total population value, meaning that the population may not contain more than 5% misstatement.

Sampling objective: Calculate a minimum sample size such that, when no misstatements are found in the sample, there is a 95% chance that the misstatement in the population is lower than 5% of the population value.

A minimum sample size for this sampling objective can be calculated by specifying the materiality parameter in the planning() function, see the code below. Next, a summary of the statistical results can be obtained using the summary() function. The results show that, given zero tolerable errors, the minimum sample size is 60 units.

stage1 <- planning(materiality = 0.05, expected = 0, likelihood = 'poisson', conf.level = 0.95)
summary(stage1)
## 
##  Classical Audit Sample Planning Summary
## 
## Options:
##   Confidence level:              0.95 
##   Materiality:                   0.05 
##   Hypotheses:                    H₀: Θ >= 0.05 vs. H₁: Θ < 0.05 
##   Expected:                      0 
##   Likelihood:                    poisson 
## 
## Results:
##   Minimum sample size:           60 
##   Tolerable errors:              0 
##   Expected most likely error:    0 
##   Expected upper bound:          0.049929 
##   Expected precision:            0.049929 
##   Expected p-value:              < 2.22e-16

Estimation

Next, let’s take a look at how you can use the planning() function to calculate the minimum sample size for estimating the misstatement in the population with a minimum precision. The precision is defined as the difference between the most likely misstatement and the upper confidence bound on the misstatement. For this example, the minimum precision is set to 2% of the population value.

Sampling objective: Calculate a minimum sample size such that, when zero misstatements are found in the sample, there is a 95% chance that the misstatement in the population is at most 2% above the most likely misstatement.

A minimum sample size for this sampling objective can be calculated by specifying the min.precision parameter in the planning() function, see the code below. The results show that, given zero tolerable errors, the minimum sample size is 150 units.

stage1 <- planning(min.precision = 0.02, expected = 0, likelihood = 'poisson', conf.level = 0.95)
summary(stage1)
## 
##  Classical Audit Sample Planning Summary
## 
## Options:
##   Confidence level:              0.95 
##   Min. precision:                0.02 
##   Expected:                      0 
##   Likelihood:                    poisson 
## 
## Results:
##   Minimum sample size:           150 
##   Tolerable errors:              0 
##   Expected most likely error:    0 
##   Expected upper bound:          0.019971 
##   Expected precision:            0.019971

Using selection(): The basics

Selecting a sample using the selection() function requires knowledge of units in the population that are eligible for selection (i.e., sampling units). Sampling units can be items or monetary units. Items can be selected from the population using record sampling (also known as attribute sampling or item sampling) with units = 'items'. On the other hand, monetary units can be selected from the population using monetary unit sampling (MUS) with units = 'values'.

Once the sampling units are determined it should also be determined what method is used to select the units (i.e., the selection method). Sampling units can be selected with a fixed interval sampling (also known as systematic sampling) scheme using method = 'interval' (the default), with a cell sampling scheme using method = 'cell', using random sampling using method = 'random', or using modified sieve sampling with method = 'sieve'. See the vignette Selection: Sampling methodology for a more detailed explanation the selection methods implemented in jfa.

Record sampling

First, let’s take a look at how the selection() function can be used to perform random record sampling. Random record sampling implies that the sampling units are set to items and the selection method is set to random. The code below selects the 60 planned invoices from the BuildIt data set using such a random record sampling scheme.

set.seed(1)
stage2 <- selection(data = BuildIt, size = 60, units = 'items', method = 'random')
summary(stage2)
## 
##  Audit Sample Selection Summary
## 
## Options:
##   Requested sample size:         60 
##   Sampling units:                items 
##   Method:                        random sampling 
## 
## Data:
##   Population size:               3500 
## 
## Results:
##   Selected sampling units:       60 
##   Selected items:                60 
##   Proportion of size:            0.017143

Monetary unit sampling (MUS)

Next, let’s take a look at how the selection() function can be used to perform fixed interval monetary unit sampling. Fixed interval monetary unit sampling implies that the sampling units are set to values and the selection method is set to interval. The code below selects 150 monetary units from the BuildIt data set using such a fixed interval monetary unit sampling scheme.

stage2 <- selection(data = BuildIt, size = 150, units = 'values', method = 'interval', values = 'bookValue')
summary(stage2)
## 
##  Audit Sample Selection Summary
## 
## Options:
##   Requested sample size:         150 
##   Sampling units:                monetary units 
##   Method:                        fixed interval sampling 
##   Starting point:                1 
## 
## Data:
##   Population size:               3500 
##   Population value:              1403221 
##   Selection interval:            9354.8 
## 
## Results:
##   Selected sampling units:       150 
##   Proportion of value:           0.0001069 
##   Selected items:                150 
##   Proportion of size:            0.042857

Extracting the sample

The selected units and corresponding items are stored in the object that is returned by the selection() function. The sample can be extracted from this object by indexing it via $sample, see the code below. After this step it is up to the auditor to annotate the sample.

set.seed(1)
stage2 <- selection(data = BuildIt, size = 60, units = 'items', method = 'random')

sample <- stage2$sample
head(sample, n = 10)
##     row times    ID bookValue auditValue
## 1  1017     1 50755    618.24     618.24
## 2   679     1 20237    669.75     669.75
## 3  2177     1  9517    454.02     454.02
## 4   930     1 85674    257.82     257.82
## 5  1533     1 31051    308.53     308.53
## 6   471     1 84375    824.66     824.66
## 7  2347     1 75616    623.70     623.70
## 8   270     1 82033    352.75     352.75
## 9  1211     1 12877     52.89      52.89
## 10 3379     1 85322    330.24     330.24

Using evaluation(): The basics

After annotating the items in the sample with their audit values you can perform statistical inference about the misstatement in the population with the evaluation() function. Next to a data sample as input, this function can also be used when only summary statistics from a data sample (e.g., sample size and number of errors) are available. For a more elaborate explanation of the output of this function for each sampling objective, see the package vignettes Evaluation: Testing misstatement and Evaluation: Estimating misstatement.

Summary statistics

First, let’s take a look at how the evaluation() function can be combined with summary statistics from a sample. Suppose that in the previously selected sample of 60 invoices it is found that a single invoice is missing an autograph. These summary statistics can be provided to the evaluation() function with x = 1 and n = 60. The function also requires that you specify the sampling objectives using the materiality or min.precision arguments. Again, a performance materiality of 5% again applies.

stage4 <- evaluation(materiality = 0.05, method = 'poisson', conf.level = 0.95, x = 1, n = 60)
summary(stage4)
## 
##  Classical Audit Sample Evaluation Summary
## 
## Options:
##   Confidence level:               0.95 
##   Materiality:                    0.05 
##   Materiality:                    0.05 
##   Hypotheses:                     H₀: Θ >= 0.05 vs. H₁: Θ < 0.05 
##   Method:                         poisson 
## 
## Data:
##   Sample size:                    60 
##   Number of errors:               1 
##   Sum of taints:                  1 
## 
## Results:
##   Most likely error:              0.016667 
##   95 percent confidence interval: [0, 0.079064] 
##   Precision:                      0.062398 
##   p-value:                        0.19915

The results indicate that the most likely error in the population is 1.66%. Moreover, the 95% one-sided confidence interval for the population misstatement ranges from 0% to 7.9% and contains the performance materiality. This implies that we cannot reject the hypothesis that the population misstatement is lower than 5%, which is also indicated by a non-significant p value (p = 0.199).

Data sample

Next, let’s take a look at how the evaluation() function can be combined with a data sample. Returning to our annotated sample from the selection() function, suppose that in the previously selected sample of 60 invoices it is found that a single invoice has a true value that deviates from its booked value.

sample$auditValue    <- sample$bookValue
sample$auditValue[1] <- sample$auditValue[1] - 100

These data can be provided to the evaluation() function using the data, values, values.audit, and times arguments. The method argument determines the method of inference. For example, the code below evaluates the misstatement in the population using the commonly used Stringer bound. You can find more information about which evaluation methods are implemented on the home page.

stage4 <- evaluation(materiality = 0.05, method = 'stringer', conf.level = 0.95,
                     data = sample, values = 'bookValue', values.audit = 'auditValue',
                     times = 'times')
summary(stage4)
## 
##  Classical Audit Sample Evaluation Summary
## 
## Options:
##   Confidence level:               0.95 
##   Materiality:                    0.05 
##   Method:                         stringer 
## 
## Data:
##   Sample size:                    60 
##   Number of errors:               1 
##   Sum of taints:                  0.1617495 
## 
## Results:
##   Most likely error:              0.0026958 
##   95 percent confidence interval: [0, 0.053222] 
##   Precision:                      0.050526

The results indicate that the most likely error in the population is 1%. Moreover, the 95% one-sided confidence interval for the population misstatement ranges from 0% to 6.5% and contains the performance materiality. The stringer method does not provide a p value for hypothesis testing.

Using report(): The basics

With the results from the evaluation() function in hand, a call to the report() function automatically generates a report containing the data, the statistical results and their interpretation, and the conclusion of the sampling procedure with respect to the sampling objectives. The object returned by the evaluation() function can be supplied directly to the report() function, see the code below.

stage4 <- evaluation(materiality = 0.05, method = 'stringer', conf.level = 0.95,
                     data = sample, values = 'bookValue', values.audit = 'auditValue',
                     times = 'times')

report(stage4, file = 'report.html', format = 'html_document') # Generates .html report