sphunif: Uniformity Tests on the Circle, Sphere, and Hypersphere

Eduardo García-Portugués and Thomas Verdebout

2021-09-01, v1.0.1

Just give me a quick example!

Circular data

Suppose that you want to test if a sample of circular data is uniformly distributed. For example, the following circular uniform sample in radians:

set.seed(987202226)
cir_data <- runif(n = 30, min = 0, max = 2 * pi)

Then you can call the main function in the sphunif package, unif_test, specifying the type of test to be performed. For example, the Watson (omnibus) test:

library(sphunif)
#> Loading required package: Rcpp
unif_test(data = cir_data, type = "Watson") # An htest object
#> 
#>  Watson test of circular uniformity
#> 
#> data:  cir_data
#> statistic = 0.03701, p-value = 0.8584
#> alternative hypothesis: any alternative to circular uniformity

By default, the calibration of the test statistic is done by Monte Carlo. This can be changed with p_value = "asymp" to use the asymptotic distribution:

unif_test(data = cir_data, type = "Watson", p_value = "MC") # Monte Carlo
#> Loading required package: foreach
#> Loading required package: future
#> 
#>  Watson test of circular uniformity
#> 
#> data:  cir_data
#> statistic = 0.03701, p-value = 0.8592
#> alternative hypothesis: any alternative to circular uniformity
unif_test(data = cir_data, type = "Watson", p_value = "asymp") # Asymp. distr.
#> 
#>  Watson test of circular uniformity
#> 
#> data:  cir_data
#> statistic = 0.03701, p-value = 0.8584
#> alternative hypothesis: any alternative to circular uniformity

You can perform several tests within a single call to unif_test. Choose the available circular uniformity tests from

avail_cir_tests
#>  [1] "Ajne"           "Bakshaev"       "Bingham"        "Cressie"       
#>  [5] "CCF09"          "FG01"           "Gine_Fn"        "Gine_Gn"       
#>  [9] "Gini"           "Gini_squared"   "Greenwood"      "Hermans_Rasson"
#> [13] "Hodges_Ajne"    "Kuiper"         "Log_gaps"       "Max_uncover"   
#> [17] "Num_uncover"    "PAD"            "PCvM"           "PRt"           
#> [21] "Pycke"          "Pycke_q"        "Range"          "Rao"           
#> [25] "Rayleigh"       "Riesz"          "Rothman"        "Vacancy"       
#> [29] "Watson"         "Watson_1976"

For example, you can use the Projected Anderson–Darling (García-Portugués, Navarro-Esteban, and Cuesta-Albertos (2020), also an omnibus test) test and the Watson test:

# A *list* of htest objects
unif_test(data = cir_data, type = c("PAD", "Watson"))
#> Series truncated from 10000 to 10000 terms (difference <= 0 with the HBE tail probability; last weight = 1.137e-08).
#> $PAD
#> 
#>  Projected Anderson-Darling test of circular uniformity
#> 
#> data:  cir_data
#> statistic = 0.54217, p-value = 0.8403
#> alternative hypothesis: any alternative to circular uniformity
#> 
#> 
#> $Watson
#> 
#>  Watson test of circular uniformity
#> 
#> data:  cir_data
#> statistic = 0.03701, p-value = 0.8584
#> alternative hypothesis: any alternative to circular uniformity

García-Portugués and Verdebout (2018) gives a review of different tests of uniformity on the circle. Section 5.1 in Pewsey and García-Portugués (2021) also gives an overview of recent advances.

Spherical data

Suppose now that you want to test if a sample of spherical data is uniformly distributed. Consider a non-uniformly-generated1 sample of directions in Cartesian coordinates, such as:

# Sample data on S^2
set.seed(987202226)
theta <- runif(n = 30, min = 0, max = 2 * pi)
phi <- runif(n = 30, min = 0, max = pi)
sph_data <- cbind(cos(theta) * sin(phi), sin(theta) * sin(phi), cos(phi))

The available spherical uniformity tests:

avail_sph_tests
#>  [1] "Ajne"        "Bakshaev"    "Bingham"     "CJ12"        "CCF09"      
#>  [6] "Gine_Fn"     "Gine_Gn"     "PAD"         "PCvM"        "PRt"        
#> [11] "Pycke"       "Rayleigh"    "Rayleigh_HD" "Riesz"

See again García-Portugués and Verdebout (2018) for a review of tests of uniformity on the sphere and complementary Section 5.1 in Pewsey and García-Portugués (2021).

The default type = "all" equals type = avail_sph_tests, which might be too much for standard analysis:

unif_test(data = sph_data, type = "all", p_value = "MC", M = 1e3)
#> $Ajne
#> 
#>  Ajne test of spherical uniformity
#> 
#> data:  sph_data
#> statistic = 0.057937, p-value = 0.991
#> alternative hypothesis: any non-axial alternative to spherical uniformity
#> 
#> 
#> $Bakshaev
#> 
#>  Bakshaev (2010) test of spherical uniformity
#> 
#> data:  sph_data
#> statistic = 1.0215, p-value = 0.622
#> alternative hypothesis: any alternative to spherical uniformity
#> 
#> 
#> $Bingham
#> 
#>  Bingham test of spherical uniformity
#> 
#> data:  sph_data
#> statistic = 17.573, p-value = 0.003
#> alternative hypothesis: scatter matrix different from constant
#> 
#> 
#> $CJ12
#> 
#>  Cai and Jiang (2012) test of spherical uniformity
#> 
#> data:  sph_data
#> statistic = 19.442, p-value = 0.78
#> alternative hypothesis: unclear consistency
#> 
#> 
#> $CCF09
#> 
#>  Cuesta-Albertos et al. (2009) test of spherical uniformity with k = 50
#> 
#> data:  sph_data
#> statistic = 1.1355, p-value = 0.764
#> alternative hypothesis: any alternative to spherical uniformity
#> 
#> 
#> $Gine_Fn
#> 
#>  Gine's Fn test of spherical uniformity
#> 
#> data:  sph_data
#> statistic = 1.5391, p-value = 0.392
#> alternative hypothesis: any alternative to spherical uniformity
#> 
#> 
#> $Gine_Gn
#> 
#>  Gine's Gn test of spherical uniformity
#> 
#> data:  sph_data
#> statistic = 1.3074, p-value = 0.003
#> alternative hypothesis: any axial alternative to spherical uniformity
#> 
#> 
#> $PAD
#> 
#>  Projected Anderson-Darling test of spherical uniformity
#> 
#> data:  sph_data
#> statistic = 0.91653, p-value = 0.5
#> alternative hypothesis: any alternative to spherical uniformity
#> 
#> 
#> $PCvM
#> 
#>  Projected Cramer-von Mises test of spherical uniformity
#> 
#> data:  sph_data
#> statistic = 0.12769, p-value = 0.622
#> alternative hypothesis: any alternative to spherical uniformity
#> 
#> 
#> $PRt
#> 
#>  Projected Rothman test of spherical uniformity with t = 0.333
#> 
#> data:  sph_data
#> statistic = 0.15523, p-value = 0.666
#> alternative hypothesis: any alternative to spherical uniformity if t is irrational
#> 
#> 
#> $Pycke
#> 
#>  Pycke test of spherical uniformity
#> 
#> data:  sph_data
#> statistic = 0.042839, p-value = 0.299
#> alternative hypothesis: any alternative to spherical uniformity
#> 
#> 
#> $Rayleigh
#> 
#>  Rayleigh test of spherical uniformity
#> 
#> data:  sph_data
#> statistic = 0.13345, p-value = 0.98
#> alternative hypothesis: mean direction different from zero
#> 
#> 
#> $Rayleigh_HD
#> 
#>  HD-standardized Rayleigh test of spherical uniformity
#> 
#> data:  sph_data
#> statistic = -1.1703, p-value = 0.98
#> alternative hypothesis: mean direction different from zero
#> 
#> 
#> $Riesz
#> 
#>  Warning! This is an experimental test not meant to be used
#> 
#> data:  sph_data
#> statistic = 1.0215, p-value = 0.622
#> alternative hypothesis: unclear, experimental test
unif_test(data = sph_data, type = "Rayleigh", p_value = "asymp")
#> 
#>  Rayleigh test of spherical uniformity
#> 
#> data:  sph_data
#> statistic = 0.13345, p-value = 0.9875
#> alternative hypothesis: mean direction different from zero

Higher dimensions

The hyperspherical setting is treated analogously to the spherical setting, and the available tests are exactly the same (avail_sph_tests). Here is an example of testing uniformity with a sample of size 100 on the \(9\)-sphere:

# Sample data on S^9
hyp_data <- r_unif_sph(n = 30, p = 10)

# Test
unif_test(data = hyp_data, type = "Rayleigh", p_value = "asymp")
#> 
#>  Rayleigh test of spherical uniformity
#> 
#> data:  hyp_data
#> statistic = 14.081, p-value = 0.1693
#> alternative hypothesis: mean direction different from zero

A data example: are Venusian craters uniformly distributed?

The following snippet partially reproduces the data application in García-Portugués, Navarro-Esteban, and Cuesta-Albertos (2021) on testing the uniformity of known Venusian craters. Incidentally, it also illustrates how to monitor the progress of a Monte Carlo simulation when p_value = "MC" using progressr.

# Load spherical data
data(venus)
head(venus)
#>        name diameter     theta      phi
#> 1    Janice     10.0 4.5724136 1.523672
#> 2  HuaMulan     24.0 5.8939769 1.514946
#> 3   Tatyana     19.0 3.7070793 1.490511
#> 4 Landowska     33.0 1.2967796 1.476025
#> 5 Ruslanova     44.3 0.2897247 1.465029
#> 6     Sveta     21.0 4.7684140 1.439181
nrow(venus)
#> [1] 967

# Compute Cartesian coordinates from polar coordinates
venus$X <- cbind(cos(venus$theta) * cos(venus$phi),
                 sin(venus$theta) * cos(venus$phi),
                 sin(venus$phi))

# Test uniformity using the Projected Cramér-von Mises and Projected
# Anderson-Darling tests on S^2, both using asymptotic distributions
unif_test(data = venus$X, type = c("PCvM", "PAD"), p_value = "asymp")
#> Series truncated from 10000 to 10000 terms (difference <= 0 with the HBE tail probability; last weight = 6.249e-14).
#> Series truncated from 10000 to 10000 terms (difference <= 0 with the HBE tail probability; last weight = 4.999e-13).
#> $PCvM
#> 
#>  Projected Cramer-von Mises test of spherical uniformity
#> 
#> data:  venus$X
#> statistic = 0.25844, p-value = 0.1272
#> alternative hypothesis: any alternative to spherical uniformity
#> 
#> 
#> $PAD
#> 
#>  Projected Anderson-Darling test of spherical uniformity
#> 
#> data:  venus$X
#> statistic = 1.5077, p-value = 0.1149
#> alternative hypothesis: any alternative to spherical uniformity

# Define a handler for progressr
require(progress)
#> Loading required package: progress
require(progressr)
#> Loading required package: progressr
handlers(handler_progress(
  format = ":spin [:bar] :percent Total: :elapsedfull End \u2248 :eta",
  clear = FALSE))

# Test uniformity using Monte-Carlo approximated null distributions
with_progress(
  unif_test(data = venus$X, type = c("PCvM", "PAD"),
            p_value = "MC", chunks = 1e2, M = 5e2, cores = 2)
)
#> $PCvM
#> 
#>  Projected Cramer-von Mises test of spherical uniformity
#> 
#> data:  venus$X
#> statistic = 0.25844, p-value = 0.116
#> alternative hypothesis: any alternative to spherical uniformity
#> 
#> 
#> $PAD
#> 
#>  Projected Anderson-Darling test of spherical uniformity
#> 
#> data:  venus$X
#> statistic = 1.5077, p-value = 0.11
#> alternative hypothesis: any alternative to spherical uniformity

Simulation studies done simple

unif_stat allows to compute several statistics to different samples within a single call, thus facilitating Monte Carlo experiments:

# M samples of size n on S^2
samps_sph <- r_unif_sph(n = 30, p = 3, M = 10)

# Apply all statistics to the M samples
unif_stat(data = samps_sph, type = "all")
#>          Ajne  Bakshaev  Bingham     CJ12     CCF09   Gine_Fn   Gine_Gn
#> 1  0.47609328 2.2119954 2.382230 18.36313 1.5714737 2.2702163 0.3658432
#> 2  0.05231638 0.5266412 4.984710 21.26823 0.8961785 0.7372848 0.5280193
#> 3  0.21601031 1.0974739 3.752754 18.42823 1.4807671 1.2776073 0.4135661
#> 4  0.16994745 0.8920866 2.710392 26.76415 1.1073186 1.0173293 0.3375395
#> 5  0.24320463 1.2717785 3.607297 23.14723 1.3761164 1.4079420 0.4351235
#> 6  0.31846184 1.5920190 4.297304 21.63874 1.3718298 1.7129064 0.4390590
#> 7  0.29956640 1.5662143 4.377052 21.03363 1.5190590 1.7113360 0.5130704
#> 8  0.24576739 1.1694726 1.151481 22.37936 1.1051464 1.2430711 0.2600015
#> 9  0.30644656 1.5440129 4.982531 21.16927 1.4369394 1.6971020 0.4713157
#> 10 0.29664792 1.5808092 5.911191 24.17573 1.4809133 1.7620336 0.5754419
#>          PAD       PCvM        PRt        Pycke   Rayleigh Rayleigh_HD
#> 1  1.5396354 0.27649943 0.39262094  0.157845184 6.50306437  1.43012004
#> 2  0.4863028 0.06583015 0.08353385 -0.140634304 0.09011958 -1.18795371
#> 3  0.8766092 0.13718424 0.16400787  0.019631134 1.87776213 -0.45815169
#> 4  0.6948689 0.11151082 0.14774812 -0.103164830 1.75644170 -0.50768055
#> 5  0.9503006 0.15897231 0.21864541 -0.006193974 2.95653601 -0.01774410
#> 6  1.1442056 0.19900238 0.27062129  0.007535607 4.18444451  0.48354745
#> 7  1.1468233 0.19577679 0.27158541  0.055528018 3.88963126  0.36319044
#> 8  0.8602975 0.14618407 0.19759185 -0.066992075 2.93592539 -0.02615835
#> 9  1.1340723 0.19300161 0.25856393  0.037310492 3.76467886  0.31217884
#> 10 1.1687326 0.19760115 0.27044780  0.065156710 3.75199894  0.30700228
#>        Riesz
#> 1  2.2119954
#> 2  0.5266412
#> 3  1.0974739
#> 4  0.8920866
#> 5  1.2717785
#> 6  1.5920190
#> 7  1.5662143
#> 8  1.1694726
#> 9  1.5440129
#> 10 1.5808092

Additionally, unif_stat_MC is an utility for performing the previous simulation through a convenient parallel wrapper:

# Break the simulation in 10 chunks of tasks to be divided between 2 cores
sim <- unif_stat_MC(n = 30, type = "all", p = 3, M = 1e4, cores = 2,
                    chunks = 10)

# Critical values for test statistics
sim$crit_val_MC
#>           Ajne Bakshaev   Bingham     CJ12    CCF09  Gine_Fn   Gine_Gn      PAD
#> 0.1  0.4469816 2.186773  9.161085 26.00228 1.651559 2.327829 0.7642920 1.542838
#> 0.05 0.5440321 2.606217 10.817409 26.78061 1.763648 2.701988 0.8673248 1.801016
#> 0.01 0.7533937 3.516652 14.560299 27.96065 1.989172 3.582482 1.1082368 2.389605
#>           PCvM       PRt     Pycke  Rayleigh Rayleigh_HD    Riesz
#> 0.1  0.2733466 0.3787205 0.1484198  6.161694    1.290756 2.186773
#> 0.05 0.3257771 0.4554343 0.2132590  7.659209    1.902114 2.606217
#> 0.01 0.4395814 0.6296545 0.3562978 11.083608    3.300119 3.516652

# Test statistics
head(sim$stats_MC)
#>        Ajne  Bakshaev   Bingham     CJ12    CCF09   Gine_Fn   Gine_Gn       PAD
#> 1 0.1095132 0.7945384  4.802204 18.99292 1.186288 1.0109726 0.5729197 0.6776932
#> 2 0.1330324 1.0832080 11.866309 17.15186 1.173641 1.4474474 0.9153178 0.9097490
#> 3 0.1570914 1.0143378  6.772900 18.85335 1.228233 1.2455681 0.6172027 0.8126579
#> 4 0.1227879 0.6856030  3.188910 19.24447 1.037321 0.8544891 0.3633373 0.5906101
#> 5 0.2680238 1.2598328  2.211225 12.27737 1.367386 1.3410810 0.2689859 0.9149259
#> 6 0.1171362 0.8103943  7.611415 25.07804 1.038336 1.0689460 0.6004014 0.6906607
#>         PCvM        PRt       Pycke  Rayleigh Rayleigh_HD     Riesz
#> 1 0.09931730 0.13195058 -0.04782339 0.8419743  -0.8810103 0.7945384
#> 2 0.13540100 0.16156212  0.03973539 1.0884959  -0.7803683 1.0832080
#> 3 0.12679222 0.16485102 -0.04383765 1.6056949  -0.5692227 1.0143378
#> 4 0.08570037 0.09593621 -0.09811272 0.6702957  -0.9510978 0.6856030
#> 5 0.15747910 0.21016107 -0.03650681 3.2666485   0.1088588 1.2598328
#> 6 0.10129928 0.11489275 -0.08651426 0.7350144  -0.9246765 0.8103943

# Power computation using a pre-built sampler for the alternative
pow <- unif_stat_MC(n = 30, type = "all", p = 3, M = 1e4, cores = 2,
                    chunks = 10, r_H1 = r_alt, crit_val = sim$crit_val_MC,
                    alt = "vMF", kappa = 1)
pow$power_MC
#>        Ajne Bakshaev Bingham   CJ12  CCF09 Gine_Fn Gine_Gn    PAD   PCvM    PRt
#> 0.1  0.8262   0.8202  0.1380 0.0906 0.7812  0.8110  0.1358 0.8157 0.8202 0.8242
#> 0.05 0.7230   0.7190  0.0793 0.0449 0.6695  0.7156  0.0786 0.7166 0.7190 0.7224
#> 0.01 0.4813   0.4765  0.0195 0.0096 0.4147  0.4656  0.0187 0.4634 0.4765 0.4706
#>       Pycke Rayleigh Rayleigh_HD  Riesz
#> 0.1  0.7853   0.8268      0.8268 0.8202
#> 0.05 0.6881   0.7264      0.7264 0.7190
#> 0.01 0.4443   0.4791      0.4791 0.4765

# How to use a custom sampler according to ?unif_stat_MC
r_H1 <- function(n, p, M, l = 1) {

  samp <- array(dim = c(n, p, M))
  for (j in 1:M) {

    samp[, , j] <- mvtnorm::rmvnorm(n = n, mean = c(l, rep(0, p - 1)),
                                    sigma = diag(rep(1, p)))
    samp[, , j] <- samp[, , j] / sqrt(rowSums(samp[, , j]^2))

  }
  return(samp)

}
pow <- unif_stat_MC(n = 30, type = "all", p = 3, M = 1e2, cores = 2,
                    chunks = 5, r_H1 = r_H1, crit_val = sim$crit_val_MC)
pow$power_MC
#>      Ajne Bakshaev Bingham CJ12 CCF09 Gine_Fn Gine_Gn  PAD PCvM  PRt Pycke
#> 0.1  0.99     0.99    0.46 0.04  0.99    0.99    0.47 0.99 0.99 0.99  0.99
#> 0.05 0.99     0.99    0.35 0.02  0.99    0.99    0.32 0.99 0.99 0.99  0.99
#> 0.01 0.97     0.97    0.15 0.00  0.93    0.98    0.16 0.98 0.97 0.97  0.95
#>      Rayleigh Rayleigh_HD Riesz
#> 0.1      0.99        0.99  0.99
#> 0.05     0.99        0.99  0.99
#> 0.01     0.97        0.97  0.97

unif_stat_MC can be used for constructing the Monte Carlo calibration necessary for unif_test, either for providing a rejection rule based on $crit_val_MC or for approximating the \(p\)-value by $stats_MC.

# Using precomputed critical values
ht1 <- unif_test(data = samps_sph[, , 1], type = "Rayleigh",
                 p_value = "crit_val", crit_val = sim$crit_val_MC)
ht1
#> 
#>  Rayleigh test of spherical uniformity
#> 
#> data:  samps_sph[, , 1]
#> statistic = 6.5031, p-value = NA
#> alternative hypothesis: mean direction different from zero
ht1$reject
#>   0.1  0.05  0.01 
#> FALSE  TRUE  TRUE

# Using precomputed Monte Carlo statistics
ht2 <- unif_test(data = samps_sph[, , 1], type = "Rayleigh",
                 p_value = "MC", stats_MC = sim$stats_MC)
ht2
#> 
#>  Rayleigh test of spherical uniformity
#> 
#> data:  samps_sph[, , 1]
#> statistic = 6.5031, p-value = 0.9993
#> alternative hypothesis: mean direction different from zero
ht2$reject
#>  0.1 0.05 0.01 
#> TRUE TRUE TRUE

# Faster than
unif_test(data = samps_sph[, , 1], type = "Rayleigh", p_value = "MC")
#> 
#>  Rayleigh test of spherical uniformity
#> 
#> data:  samps_sph[, , 1]
#> statistic = 6.5031, p-value = 0.09281
#> alternative hypothesis: mean direction different from zero

References

García-Portugués, E., P. Navarro-Esteban, and J. A. Cuesta-Albertos. 2020. “On a Projection-Based Class of Uniformity Tests on the Hypersphere.” arXiv:2008.09897. https://arxiv.org/abs/2008.09897.
———. 2021. “A Cramér–von Mises Test of Uniformity on the Hypersphere.” In Statistical Learning and Modeling in Data Analysis, edited by S. Balzano, G. C. Porzio, R. Salvatore, D. Vistocco, and M. Vichi, 107–16. Studies in Classification, Data Analysis and Knowledge Organization. Cham: Springer. https://doi.org/10.1007/978-3-030-69944-4_12.
García-Portugués, E., and T. Verdebout. 2018. “A Review of Uniformity Tests on the Hypersphere.” arXiv:1804.00286. https://arxiv.org/abs/1804.00286.
Pewsey, A., and E. García-Portugués. 2021. “Recent Advances in Directional Statistics.” Test 30 (1): 1–58. https://doi.org/10.1007/s11749-021-00759-x.

  1. Uniformly-distributed polar coordinates do not translate into uniformly-distributed Cartesian coordinates!↩︎