We create some data and replace one column with NA
.
data <- matrix(rnorm(120), ncol = 10)
data[, 3] <- NA
print(data)
## [,1] [,2] [,3] [,4] [,5] [,6] [,7]
## [1,] -1.27904056 0.07041690 NA -0.6860051 2.5689575 0.01661376 0.2660269
## [2,] 0.01582929 0.10838551 NA 0.1983723 0.3317581 0.23909421 0.5211834
## [3,] 0.17377845 -0.18116283 NA 1.4516706 0.8478808 0.06836127 0.3524147
## [4,] 0.39588479 -1.27501737 NA -1.0191441 -1.7632159 1.85809392 -1.4437067
## [5,] -0.43867090 0.05833588 NA 1.1633967 -0.8750039 -0.25678572 1.1270084
## [6,] 1.38585178 -0.08704935 NA 0.2305356 -0.2538861 -0.70162460 0.6592947
## [7,] -1.98244052 -0.08099831 NA -0.1788013 -0.4358925 0.97635899 -0.4190691
## [8,] -0.77275961 0.68237415 NA 0.8616933 1.4830114 0.65399842 -0.1069551
## [9,] 0.02140573 -2.05046615 NA 1.1876451 -0.4541026 1.68484609 0.7483050
## [10,] -0.67486965 -0.51053473 NA 0.2604653 2.2114452 -1.20954650 0.6502747
## [11,] 1.10818990 -0.38955767 NA 1.7030965 0.3027627 0.65007537 -0.1837478
## [12,] 0.64832275 -0.70074403 NA 0.1546892 0.3395298 0.36944301 -0.4303423
## [,8] [,9] [,10]
## [1,] 0.73444965 0.2391790 -1.2267461
## [2,] 0.41432078 0.6372093 0.3698468
## [3,] -1.54205027 1.3260340 0.1087109
## [4,] -0.09674137 0.5994490 0.4092155
## [5,] -0.74621737 0.5610081 0.6499265
## [6,] -0.34306352 0.7358511 -1.0762464
## [7,] -2.01981490 0.8143593 0.6143677
## [8,] -1.73645315 -0.0437167 -0.1038178
## [9,] 0.39093559 -1.0372843 1.2474161
## [10,] -0.44341900 -1.2213759 -0.6960890
## [11,] -2.37455002 -2.7202644 -1.4806522
## [12,] 0.66754807 -0.3176129 -0.5420826
The covariance, with the implicit use = 'everything'
will give us a “cross” of NA
in the covariance matrix.
cov(data)
## [,1] [,2] [,3] [,4] [,5] [,6]
## [1,] 0.94588099 -0.20278009 NA 0.24411612 -0.41146606 -0.02286725
## [2,] -0.20278009 0.51681678 NA 0.02307173 0.39436669 -0.33743470
## [3,] NA NA NA NA NA NA
## [4,] 0.24411612 0.02307173 NA 0.71587279 0.02772605 -0.08276874
## [5,] -0.41146606 0.39436669 NA 0.02772605 1.59248532 -0.62057830
## [6,] -0.02286725 -0.33743470 NA -0.08276874 -0.62057830 0.79441203
## [7,] -0.01928866 0.08728117 NA 0.29595639 0.25849987 -0.38141327
## [8,] 0.11280894 -0.29355774 NA -0.46236664 0.07513895 -0.03454355
## [9,] -0.28211277 0.25744764 NA -0.37518239 -0.29204861 -0.04857147
## [10,] -0.25959318 -0.22264501 NA 0.03867026 -0.59040715 0.39658572
## [,7] [,8] [,9] [,10]
## [1,] -0.019288657 0.11280894 -0.282112770 -0.25959318
## [2,] 0.087281166 -0.29355774 0.257447645 -0.22264501
## [3,] NA NA NA NA
## [4,] 0.295956389 -0.46236664 -0.375182390 0.03867026
## [5,] 0.258499868 0.07513895 -0.292048606 -0.59040715
## [6,] -0.381413267 -0.03454355 -0.048571469 0.39658572
## [7,] 0.490185384 0.10499681 -0.008502191 0.02972928
## [8,] 0.104996809 1.19148959 0.202521809 0.07257879
## [9,] -0.008502191 0.20252181 1.286861377 0.37308917
## [10,] 0.029729278 0.07257879 0.373089172 0.73348871
The jackknife covariance does the same thing.
jackknife_cov(data)
## [,1] [,2] [,3] [,4] [,5] [,6] [,7]
## [1,] 9.5376333 -2.0446993 NA 2.4615042 -4.1489494 -0.2305781 -0.19449396
## [2,] -2.0446993 5.2112358 NA 0.2326399 3.9765308 -3.4024666 0.88008509
## [3,] NA NA NA NA NA NA NA
## [4,] 2.4615042 0.2326399 NA 7.2183840 0.2795711 -0.8345848 2.98422692
## [5,] -4.1489494 3.9765308 NA 0.2795711 16.0575603 -6.2574979 2.60654033
## [6,] -0.2305781 -3.4024666 NA -0.8345848 -6.2574979 8.0103213 -3.84591711
## [7,] -0.1944940 0.8800851 NA 2.9842269 2.6065403 -3.8459171 4.94270262
## [8,] 1.1374902 -2.9600405 NA -4.6621969 0.7576511 -0.3483141 1.05871783
## [9,] -2.8446371 2.5959304 NA -3.7830891 -2.9448234 -0.4897623 -0.08573042
## [10,] -2.6175646 -2.2450038 NA 0.3899252 -5.9532721 3.9989061 0.29977022
## [,8] [,9] [,10]
## [1,] 1.1374902 -2.84463710 -2.6175646
## [2,] -2.9600405 2.59593042 -2.2450038
## [3,] NA NA NA
## [4,] -4.6621969 -3.78308910 0.3899252
## [5,] 0.7576511 -2.94482345 -5.9532721
## [6,] -0.3483141 -0.48976231 3.9989061
## [7,] 1.0587178 -0.08573042 0.2997702
## [8,] 12.0141867 2.04209491 0.7318362
## [9,] 2.0420949 12.97585222 3.7619825
## [10,] 0.7318362 3.76198248 7.3960112
When we have some NA
values in a row, we have a conceptual problem with the jackknife as the width of the jackknife distribution is linked to the number of measurements.
data <- matrix(rnorm(120), ncol = 10)
data[2, ] <- NA
print(data)
## [,1] [,2] [,3] [,4] [,5] [,6]
## [1,] 0.09235770 -0.2736822 1.07768454 1.3986476 -0.04427985 -0.43386960
## [2,] NA NA NA NA NA NA
## [3,] -0.81576487 0.7775704 0.17109196 -1.4391706 -0.44610521 -0.03114897
## [4,] 2.29712384 -0.9708294 -1.82820072 1.2548585 -0.29008661 -0.04011966
## [5,] 0.39095616 0.5544508 -2.14947293 -0.1399317 0.27295889 1.55526028
## [6,] 0.77340232 -0.5477555 -0.17083258 0.7541595 -0.20869824 0.24868524
## [7,] 0.08800119 -0.6336545 -0.98499901 0.8890336 1.53409005 -1.50295644
## [8,] -0.96631559 1.1562604 0.61086918 -0.6392279 -1.63786811 -1.36196747
## [9,] -0.33965840 0.1433969 -0.02903113 -0.6113492 0.11225663 -0.20504150
## [10,] -1.30450909 1.7679298 -0.15480305 -0.3012219 -0.35047327 -1.58522455
## [11,] 0.93987339 1.7115772 0.15712533 -0.5699128 -0.30008218 -0.09824490
## [12,] 1.17786851 -0.8153058 0.52999370 1.6446007 -0.65509473 1.54280077
## [,7] [,8] [,9] [,10]
## [1,] -1.44594714 1.8779742 -1.5210546 1.3315831
## [2,] NA NA NA NA
## [3,] 2.18019040 0.5985201 -1.3091372 -0.1942886
## [4,] 0.46982988 1.4966075 -0.3836306 -1.5564391
## [5,] 2.44998447 -0.5811074 0.4178415 -1.0270927
## [6,] -1.29552794 -0.7256546 -1.1649551 -1.0798785
## [7,] 0.27222324 -0.5419584 -0.5780117 2.4435094
## [8,] 1.00917973 -1.2529065 0.4527346 0.2751830
## [9,] 0.91578302 -0.8642721 -2.1689747 -0.3336574
## [10,] 0.41343601 -0.5788200 -1.7820646 1.4319158
## [11,] 0.04372569 1.2240819 -1.8080071 -1.3366318
## [12,] 2.89349134 1.0544009 0.8544051 -1.0875780
Also here we get the same behavior by default:
cov(data)
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
## [1,] NA NA NA NA NA NA NA NA NA NA
## [2,] NA NA NA NA NA NA NA NA NA NA
## [3,] NA NA NA NA NA NA NA NA NA NA
## [4,] NA NA NA NA NA NA NA NA NA NA
## [5,] NA NA NA NA NA NA NA NA NA NA
## [6,] NA NA NA NA NA NA NA NA NA NA
## [7,] NA NA NA NA NA NA NA NA NA NA
## [8,] NA NA NA NA NA NA NA NA NA NA
## [9,] NA NA NA NA NA NA NA NA NA NA
## [10,] NA NA NA NA NA NA NA NA NA NA
jackknife_cov(data)
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
## [1,] NA NA NA NA NA NA NA NA NA NA
## [2,] NA NA NA NA NA NA NA NA NA NA
## [3,] NA NA NA NA NA NA NA NA NA NA
## [4,] NA NA NA NA NA NA NA NA NA NA
## [5,] NA NA NA NA NA NA NA NA NA NA
## [6,] NA NA NA NA NA NA NA NA NA NA
## [7,] NA NA NA NA NA NA NA NA NA NA
## [8,] NA NA NA NA NA NA NA NA NA NA
## [9,] NA NA NA NA NA NA NA NA NA NA
## [10,] NA NA NA NA NA NA NA NA NA NA
When we use complete
, we get the same thing as just dropping the NA
rows.
cov(data, use = 'complete')
## [,1] [,2] [,3] [,4] [,5] [,6]
## [1,] 1.12361858 -0.6585074 -0.43176457 0.69370787 0.11489624 0.61453819
## [2,] -0.65850740 0.9959507 0.21270732 -0.80342909 -0.27110863 -0.35220366
## [3,] -0.43176457 0.2127073 1.01583588 -0.07165297 -0.35068365 -0.25645484
## [4,] 0.69370787 -0.8034291 -0.07165297 1.04270786 0.19719938 0.24775409
## [5,] 0.11489624 -0.2711086 -0.35068365 0.19719938 0.57253390 -0.04561483
## [6,] 0.61453819 -0.3522037 -0.25645484 0.24775409 -0.04561483 1.13979200
## [7,] -0.09064129 0.1212457 -0.32995292 -0.38544663 -0.18446155 0.72359362
## [8,] 0.65766841 -0.2888639 0.18274364 0.48879303 -0.02105994 0.34411418
## [9,] 0.30959101 -0.3376672 -0.27720183 0.36645680 -0.17901332 0.45312449
## [10,] -0.86329613 0.1165757 0.31168312 0.04913339 0.40888931 -1.03675392
## [,7] [,8] [,9] [,10]
## [1,] -0.09064129 0.65766841 0.30959101 -0.86329613
## [2,] 0.12124571 -0.28886389 -0.33766721 0.11657574
## [3,] -0.32995292 0.18274364 -0.27720183 0.31168312
## [4,] -0.38544663 0.48879303 0.36645680 0.04913339
## [5,] -0.18446155 -0.02105994 -0.17901332 0.40888931
## [6,] 0.72359362 0.34411418 0.45312449 -1.03675392
## [7,] 1.95391033 -0.17561476 0.81297857 -0.53589753
## [8,] -0.17561476 1.22798844 -0.08766992 -0.33810496
## [9,] 0.81297857 -0.08766992 1.07758226 -0.28999355
## [10,] -0.53589753 -0.33810496 -0.28999355 1.75152292
all(cov(data, use = 'complete') == cov(data[complete.cases(data), ]))
## [1] TRUE
With our jackknife function we get a failure, which should not happen!
jackknife_cov(data, na.rm = TRUE)
## [,1] [,2] [,3] [,4] [,5] [,6]
## [1,] 135.95785 -79.67939 -52.243513 83.938652 13.902446 74.359121
## [2,] -79.67939 120.51003 25.737586 -97.214919 -32.804144 -42.616642
## [3,] -52.24351 25.73759 122.916142 -8.670009 -42.432722 -31.031036
## [4,] 83.93865 -97.21492 -8.670009 126.167650 23.861125 29.978245
## [5,] 13.90245 -32.80414 -42.432722 23.861125 69.276601 -5.519394
## [6,] 74.35912 -42.61664 -31.031036 29.978245 -5.519394 137.914832
## [7,] -10.96760 14.67073 -39.924304 -46.639043 -22.319847 87.554828
## [8,] 79.57788 -34.95253 22.111981 59.143957 -2.548253 41.637816
## [9,] 37.46051 -40.85773 -33.541421 44.341273 -21.660612 54.828063
## [10,] -104.45883 14.10566 37.713658 5.945140 49.475606 -125.447224
## [,7] [,8] [,9] [,10]
## [1,] -10.96760 79.577878 37.46051 -104.45883
## [2,] 14.67073 -34.952530 -40.85773 14.10566
## [3,] -39.92430 22.111981 -33.54142 37.71366
## [4,] -46.63904 59.143957 44.34127 5.94514
## [5,] -22.31985 -2.548253 -21.66061 49.47561
## [6,] 87.55483 41.637816 54.82806 -125.44722
## [7,] 236.42315 -21.249386 98.37041 -64.84360
## [8,] -21.24939 148.586601 -10.60806 -40.91070
## [9,] 98.37041 -10.608060 130.38745 -35.08922
## [10,] -64.84360 -40.910701 -35.08922 211.93427