ndi: Neighborhood Deprivation Indices

Ian D. Buller (GitHub: @idblr)

2022-08-04

Start with the necessary packages for the vignette.

loadedPackages <- c("ggplot2", "ndi", "sf", "tidycensus", "tigris")
invisible(lapply(loadedPackages, library, character.only = TRUE))
options(tigris_use_cache = TRUE)

Set your U.S. Census Bureau access key. Follow this link to obtain one. Specify your access key in the messer() or powell_wiley() functions using the key argument of the get_acs() function from the tidycensus package called within each or by using the census_api_key() function from the tidycensus package before running the messer() or powell_wiley() functions (see an example of the latter below).

tidycensus::census_api_key("...") # INSERT YOUR OWN KEY FROM U.S. CENSUS API

Calculate NDI (Messer)

Compute the NDI (Messer) values (2006-2010 5-year ACS) for Georgia, U.S.A., census tracts. This metric is based on Messer et al. (2006) with the following socio-economic status (SES) variables:

Characteristic SES dimension ACS table source Description
OCC Occupation C24030 Percent males in management, science, and arts occupation
CWD Housing B25014 Percent of crowded housing
POV Poverty B17017 Percent of households in poverty
FHH Poverty B25115 Percent of female headed households with dependents
PUB Poverty B19058 Percent of households on public assistance
U30 Poverty B19001 Percent households earning <$30,000 per year
EDU Education B06009 Percent earning less than a high school education
EMP Employment B23001 (2010 only); B23025 (2011 onward) Percent unemployed
messer2010GA <- ndi::messer(state = "GA", year = 2010)

One output from messer() function is tibble containing the identification, geographic name, NDI (Messer) values, and raw census characteristics for each tract.

messer2010GA$ndi
## # A tibble: 1,969 × 14
##    GEOID  state county tract     NDI NDIQu…¹   OCC   CWD   POV   FHH   PUB   U30
##    <chr>  <chr> <chr>  <chr>   <dbl> <fct>   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
##  1 13001… Geor… Appli… 9501  -0.0075 2-Belo…     0   0     0.1   0.1   0.1   0.3
##  2 13001… Geor… Appli… 9502   0.0458 4-Most…     0   0     0.3   0.1   0.2   0.5
##  3 13001… Geor… Appli… 9503   0.0269 3-Abov…     0   0     0.2   0     0.2   0.4
##  4 13001… Geor… Appli… 9504  -0.0083 2-Belo…     0   0     0.1   0     0.1   0.3
##  5 13001… Geor… Appli… 9505   0.0231 3-Abov…     0   0     0.2   0     0.2   0.4
##  6 13003… Geor… Atkin… 9601   0.0619 4-Most…     0   0.1   0.2   0.2   0.2   0.5
##  7 13003… Geor… Atkin… 9602   0.0593 4-Most…     0   0.1   0.3   0.1   0.2   0.4
##  8 13003… Geor… Atkin… 9603   0.0252 3-Abov…     0   0     0.3   0.1   0.2   0.4
##  9 13005… Geor… Bacon… 9701   0.0061 3-Abov…     0   0     0.2   0     0.2   0.4
## 10 13005… Geor… Bacon… 9702…  0.0121 3-Abov…     0   0     0.2   0.1   0.1   0.5
## # … with 1,959 more rows, 2 more variables: EDU <dbl>, EMP <dbl>, and
## #   abbreviated variable name ¹​NDIQuart
## # ℹ Use `print(n = ...)` to see more rows, and `colnames()` to see all variable names

A second output from messer() function is the results from the principal component analysis used to compute the NDI (Messer) values.

messer2010GA$pca
## Principal Components Analysis
## Call: psych::principal(r = ndi_vars_pca, nfactors = 1, n.obs = nrow(ndi_vars_pca), 
##     covar = FALSE, scores = TRUE, missing = imp)
## Standardized loadings (pattern matrix) based upon correlation matrix
##       PC1   h2   u2 com
## OCC -0.59 0.35 0.65   1
## CWD  0.47 0.22 0.78   1
## POV  0.87 0.76 0.24   1
## FHH  0.67 0.45 0.55   1
## PUB  0.89 0.79 0.21   1
## U30  0.90 0.81 0.19   1
## EDU  0.79 0.62 0.38   1
## EMP  0.46 0.21 0.79   1
## 
##                 PC1
## SS loadings    4.21
## Proportion Var 0.53
## 
## Mean item complexity =  1
## Test of the hypothesis that 1 component is sufficient.
## 
## The root mean square of the residuals (RMSR) is  0.11 
##  with the empirical chi square  1221.09  with prob <  2.3e-246 
## 
## Fit based upon off diagonal values = 0.95

A third output from messer() function is a tibble containing a breakdown of the missingness of the census characteristics used to compute the NDI (Messer) values.

messer2010GA$missing
## # A tibble: 8 × 5
## # Groups:   variable, total, missing [8]
##   variable total missing     n percent
##   <chr>    <int> <lgl>   <int> <chr>  
## 1 CWD       1969 TRUE       14 0.71 % 
## 2 EDU       1969 TRUE       13 0.66 % 
## 3 EMP       1969 TRUE       13 0.66 % 
## 4 FHH       1969 TRUE       14 0.71 % 
## 5 OCC       1969 TRUE       15 0.76 % 
## 6 POV       1969 TRUE       14 0.71 % 
## 7 PUB       1969 TRUE       14 0.71 % 
## 8 U30       1969 TRUE       14 0.71 %

We can visualize the NDI (Messer) values geographically by linking them to spatial information from tigris and plotting with the ggplot2 package suite.

# Obtain the 2010 counties from the "tigris" package
county2010GA <- tigris::counties(state = "GA", year = 2010, cb = TRUE)
# Remove first 9 characters from GEOID for compatibility with tigris information
county2010GA$GEOID <- substring(county2010GA$GEO_ID, 10) 

# Obtain the 2010 census tracts from the "tigris" package
tract2010GA <- tigris::tracts(state = "GA", year = 2010, cb = TRUE)
# Remove first 9 characters from GEOID for compatibility with tigris information
tract2010GA$GEOID <- substring(tract2010GA$GEO_ID, 10) 

# Join the NDI (Messer) values to the census tract geometry
GA2010messer <- merge(tract2010GA, messer2010GA$ndi, by = "GEOID")

# Visualize the NDI (Messer) values (2006-2010 5-year ACS) for Georgia, U.S.A., census tracts 
## Continuous Index
ggplot2::ggplot() + 
  ggplot2::geom_sf(data = GA2010messer, 
                   ggplot2::aes(fill = NDI),
                   size = 0.05,
                   color = "white") +
   ggplot2::geom_sf(data = county2010GA,
                   fill = "transparent", 
                   color = "white",
                   size = 0.2) +
  ggplot2::theme_minimal() +
  ggplot2::scale_fill_viridis_c() +
  ggplot2::labs(fill = "Index (Continuous)",
                caption = "Source: U.S. Census ACS 2006-2010 estimates") +
  ggplot2::ggtitle("Neighborhood Deprivation Index (Messer)",
                   subtitle = "Georgia, U.S.A., census tracts as the referent")

## Categorical Index
### Rename "9-NDI not avail" level as NA for plotting
GA2010messer$NDIQuartNA <- factor(replace(as.character(GA2010messer$NDIQuart), 
                                            GA2010messer$NDIQuart == "9-NDI not avail", NA),
                                  c(levels(GA2010messer$NDIQuart)[-5], NA))

ggplot2::ggplot() + 
  ggplot2::geom_sf(data = GA2010messer, 
                   ggplot2::aes(fill = NDIQuartNA),
                   size = 0.05,
                   color = "white") +
   ggplot2::geom_sf(data = county2010GA,
                   fill = "transparent", 
                   color = "white",
                   size = 0.2) +
  ggplot2::theme_minimal() + 
  ggplot2::scale_fill_viridis_d(guide = ggplot2::guide_legend(reverse = TRUE),
                                na.value = "grey80") +
  ggplot2::labs(fill = "Index (Categorical)",
                caption = "Source: U.S. Census ACS 2006-2010 estimates") +
  ggplot2::ggtitle("Neighborhood Deprivation Index (Messer) Quartiles",
                   subtitle = "Georgia, U.S.A., census tracts as the referent")

The results above are at the tract-level. The NDI (Messer) values can also be calculated at the county level.

messer2010GA_county <- ndi::messer(geo = "county", state = "GA", year = 2010)

# Join the NDI (Messer) values to the county geometry
GA2010messer_county <- merge(county2010GA, messer2010GA_county$ndi, by = "GEOID")

# Visualize the NDI (Messer) values (2006-2010 5-year ACS) for Georgia, U.S.A., counties
## Continuous Index
ggplot2::ggplot() + 
  ggplot2::geom_sf(data = GA2010messer_county, 
                   ggplot2::aes(fill = NDI),
                   size = 0.20,
                   color = "white") +
  ggplot2::theme_minimal() + 
  ggplot2::scale_fill_viridis_c() +
  ggplot2::labs(fill = "Index (Continuous)",
                caption = "Source: U.S. Census ACS 2006-2010 estimates") +
  ggplot2::ggtitle("Neighborhood Deprivation Index (Messer)",
                   subtitle = "Georgia, U.S.A., counties as the referent")

## Categorical Index

### Rename "9-NDI not avail" level as NA for plotting
GA2010messer_county$NDIQuartNA <- factor(replace(as.character(GA2010messer_county$NDIQuart), 
                                            GA2010messer_county$NDIQuart == "9-NDI not avail", NA),
                                         c(levels(GA2010messer_county$NDIQuart)[-5], NA))

ggplot2::ggplot() + 
  ggplot2::geom_sf(data = GA2010messer_county, 
                   ggplot2::aes(fill = NDIQuartNA),
                   size = 0.20,
                   color = "white") +
  ggplot2::theme_minimal() + 
  ggplot2::scale_fill_viridis_d(guide = ggplot2::guide_legend(reverse = TRUE),
                                na.value = "grey80") +
  ggplot2::labs(fill = "Index (Categorical)",
                caption = "Source: U.S. Census ACS 2006-2010 estimates") +
  ggplot2::ggtitle("Neighborhood Deprivation Index (Messer) Quartiles",
                   subtitle = "Georgia, U.S.A., counties as the referent")

Calculate NDI (Powell-Wiley)

Compute the NDI (Powell-Wiley) values (2016-2020 5-year ACS) for Maryland, Virginia, Washington, D.C., and West Virginia census tracts. This metric is based on Andrews et al. (2020) and Slotman et al. (2022) with socio-economic status (SES) variables chosen by Roux and Mair (2010):

Characteristic SES dimension ACS table source Description
MedHHInc Wealth and income B19013 Median household income (dollars)
PctRecvIDR Wealth and income B19054 Percent of households receiving dividends, interest, or rental income
PctPubAsst Wealth and income B19058 Percent of households receiving public assistance
MedHomeVal Wealth and income B25077 Median home value (dollars)
PctMgmtBusSciArt Occupation C24060 Percent in a management, business, science, or arts occupation
PctFemHeadKids Occupation B11005 Percent of households that are female headed with any children under 18 years
PctOwnerOcc Housing conditions DP04 Percent of housing units that are owner occupied
PctNoPhone Housing conditions DP04 Percent of households without a telephone
PctNComPlmb Housing conditions DP04 Percent of households without complete plumbing facilities
PctEducHSPlus Education S1501 Percent with a high school degree or higher (population 25 years and over)
PctEducBchPlus Education S1501 Percent with a college degree or higher (population 25 years and over)
PctFamBelowPov Wealth and income S1702 Percent of families with incomes below the poverty level
PctUnempl Occupation S2301 Percent unemployed

More information about the codebook and computation of the NDI (Powell-Wiley) can be found on a GIS Portal for Cancer Research website.

powell_wiley2020DMVW <- ndi::powell_wiley(state = c("DC", "MD", "VA", "WV"), year = 2020)

One output from powell_wiley() function is tibble containing the identification, geographic name, NDI (Powell-Wiley) values, and raw census characteristics for each tract.

powell_wiley2020DMVW$ndi
## # A tibble: 4,425 × 20
##    GEOID       state  county tract   NDI NDIQu…¹ MedHH…² PctRe…³ PctPu…⁴ MedHo…⁵
##    <chr>       <chr>  <chr>  <chr> <dbl> <fct>     <dbl>   <dbl>   <dbl>   <dbl>
##  1 11001000101 Distr… Distr… 1.01  -2.13 1-Leas…  187839    50.9     0.8  699100
##  2 11001000102 Distr… Distr… 1.02  -2.46 1-Leas…  184167    52.2     0.6 1556000
##  3 11001000201 Distr… Distr… 2.01  NA    9-NDI …      NA   NaN     NaN        NA
##  4 11001000202 Distr… Distr… 2.02  -2.30 1-Leas…  164261    49.6     0.9 1309100
##  5 11001000300 Distr… Distr… 3     -2.06 1-Leas…  156483    46       0.6  976500
##  6 11001000400 Distr… Distr… 4     -2.09 1-Leas…  153397    47.8     0   1164200
##  7 11001000501 Distr… Distr… 5.01  -2.11 1-Leas…  119911    44.5     0.8  674600
##  8 11001000502 Distr… Distr… 5.02  -2.21 1-Leas…  153264    46.8     0.5 1012500
##  9 11001000600 Distr… Distr… 6     -2.16 1-Leas…  154266    60.8     7.4 1109800
## 10 11001000702 Distr… Distr… 7.02  -1.20 1-Leas…   71747    22.9     0    289900
## # … with 4,415 more rows, 10 more variables: PctMgmtBusScArti <dbl>,
## #   PctFemHeadKids <dbl>, PctOwnerOcc <dbl>, PctNoPhone <dbl>,
## #   PctNComPlmb <dbl>, PctEducHSPlus <dbl>, PctEducBchPlus <dbl>,
## #   PctFamBelowPov <dbl>, PctUnempl <dbl>, TotalPop <dbl>, and abbreviated
## #   variable names ¹​NDIQuint, ²​MedHHInc, ³​PctRecvIDR, ⁴​PctPubAsst, ⁵​MedHomeVal
## # ℹ Use `print(n = ...)` to see more rows, and `colnames()` to see all variable names

A second output from powell_wiley() function is the results from the principal component analysis used to compute the NDI (Powell-Wiley) values.

powell_wiley2020DMVW$pca
## $loadings
## 
## Loadings:
##                 PC1    PC2    PC3   
## logMedHHInc     -0.638 -0.364       
## PctNoIDRZ        0.612  0.319       
## PctPubAsstZ      0.379  0.615       
## logMedHomeVal   -0.893              
## PctWorkClassZ    0.974              
## PctFemHeadKidsZ  0.128  0.697 -0.233
## PctNotOwnerOccZ -0.375  0.923       
## PctNoPhoneZ             0.329  0.524
## PctNComPlmbZ           -0.141  0.869
## PctEducLTHSZ     0.642  0.164       
## PctEducLTBchZ    1.020 -0.121       
## PctFamBelowPovZ  0.219  0.698       
## PctUnemplZ              0.596       
## 
##                  PC1   PC2   PC3
## SS loadings    4.340 2.971 1.102
## Proportion Var 0.334 0.229 0.085
## Cumulative Var 0.334 0.562 0.647
## 
## $rotmat
##             [,1]       [,2]       [,3]
## [1,]  0.68516447  0.4403017 0.04517229
## [2,] -0.95446519  1.0806634 0.03196818
## [3,] -0.09078904 -0.2028145 1.02053725
## 
## $rotation
## [1] "promax"
## 
## $Phi
##           [,1]      [,2]      [,3]
## [1,] 1.0000000 0.5250277 0.1649550
## [2,] 0.5250277 1.0000000 0.1923867
## [3,] 0.1649550 0.1923867 1.0000000
## 
## $Structure
##             [,1]        [,2]        [,3]
##  [1,] -0.8432264 -0.71542812 -0.26037289
##  [2,]  0.7750210  0.63477178  0.13357439
##  [3,]  0.7001489  0.81237803  0.17181801
##  [4,] -0.8931597 -0.46211477 -0.19777858
##  [5,]  0.9253217  0.42037509  0.12424330
##  [6,]  0.4559503  0.71962940 -0.07780352
##  [7,]  0.1195748  0.73799969  0.17788881
##  [8,]  0.2552300  0.42753150  0.58693047
##  [9,]  0.1155170  0.05008712  0.84948765
## [10,]  0.7325510  0.50620197  0.16403994
## [11,]  0.9557341  0.41335682  0.13787215
## [12,]  0.5881663  0.81650181  0.18717861
## [13,]  0.3841044  0.63036827  0.09925687
## 
## $communality
##  [1] 0.5468870 0.4774810 0.5220533 0.8012746 0.9576793 0.5566938 0.9968490
##  [8] 0.3829550 0.7774209 0.4398519 1.0560478 0.5359573 0.3616523
## 
## $uniqueness
##     logMedHHInc       PctNoIDRZ     PctPubAsstZ   logMedHomeVal   PctWorkClassZ 
##     0.453113047     0.522518997     0.477946655     0.198725386     0.042320711 
## PctFemHeadKidsZ PctNotOwnerOccZ     PctNoPhoneZ    PctNComPlmbZ    PctEducLTHSZ 
##     0.443306153     0.003150984     0.617045044     0.222579117     0.560148142 
##   PctEducLTBchZ PctFamBelowPovZ      PctUnemplZ 
##    -0.056047809     0.464042693     0.638347726 
## 
## $Vaccounted
##                            [,1]      [,2]       [,3]
## SS loadings           4.5987837 3.2131788 1.10386926
## Proportion Var        0.3537526 0.2471676 0.08491302
## Cumulative Var        0.3537526 0.6009202 0.68583321
## Proportion Explained  0.5157997 0.3603902 0.12381002
## Cumulative Proportion 0.5157997 0.8761900 1.00000000

A third output from powell_wiley() function is a tibble containing a breakdown of the missingness of the census characteristics used to compute the NDI (Powell-Wiley) values.

powell_wiley2020DMVW$missing
## # A tibble: 13 × 5
## # Groups:   variable, total, missing [13]
##    variable         total missing     n percent
##    <chr>            <int> <lgl>   <int> <chr>  
##  1 MedHHInc          4425 TRUE       73 1.65 % 
##  2 MedHomeVal        4425 TRUE      148 3.34 % 
##  3 PctEducBchPlus    4425 TRUE       47 1.06 % 
##  4 PctEducHSPlus     4425 TRUE       47 1.06 % 
##  5 PctFamBelowPov    4425 TRUE       63 1.42 % 
##  6 PctFemHeadKids    4425 TRUE       60 1.36 % 
##  7 PctMgmtBusScArti  4425 TRUE       57 1.29 % 
##  8 PctNComPlmb       4425 TRUE       60 1.36 % 
##  9 PctNoPhone        4425 TRUE       60 1.36 % 
## 10 PctOwnerOcc       4425 TRUE       60 1.36 % 
## 11 PctPubAsst        4425 TRUE       60 1.36 % 
## 12 PctRecvIDR        4425 TRUE       60 1.36 % 
## 13 PctUnempl         4425 TRUE       57 1.29 %

A fourth output from powell_wiley() function is a character string or numeric value of a standardized Cronbach’s alpha. A value greater than 0.7 is desired.

powell_wiley2020DMVW$cronbach
## [1] 0.931138

We can visualize the NDI (Powell-Wiley) values geographically by linking them to spatial information from tigris and plotting with the ggplot2 package suite.

# Obtain the 2020 census tracts from the "tigris" package
state2020 <- tigris::states(cb = TRUE)
state2020DMVW <- state2020[state2020$STUSPS %in% c("DC", "MD", "VA", "WV"), ]
tract2020D <- tigris::tracts(state = "DC", year = 2020, cb = TRUE)
tract2020M <- tigris::tracts(state = "MD", year = 2020, cb = TRUE)
tract2020V <- tigris::tracts(state = "VA", year = 2020, cb = TRUE)
tract2020W <- tigris::tracts(state = "WV", year = 2020, cb = TRUE)
tracts2020DMVW <- rbind(tract2020D, tract2020M, tract2020V, tract2020W)

# Join the NDI (Powell-Wiley) values to the census tract geometry
DMVW2020powell_wiley <- merge(tracts2020DMVW, powell_wiley2020DMVW$ndi, by = "GEOID")

# Visualize the NDI (Powell-Wiley) values (2016-2020 5-year ACS) 
## Maryland, Virginia, Washington, D.C., and West Virginia census tracts 
## Continuous Index
ggplot2::ggplot() + 
  ggplot2::geom_sf(data = DMVW2020powell_wiley, 
                   ggplot2::aes(fill = NDI), 
                   color = NA) +
  ggplot2::geom_sf(data = state2020DMVW,
                   fill = "transparent", 
                   color = "white") +
  ggplot2::theme_minimal() + 
  ggplot2::scale_fill_viridis_c(na.value = "grey80") +
  ggplot2::labs(fill = "Index (Continuous)",
                caption = "Source: U.S. Census ACS 2016-2020 estimates")+
  ggplot2::ggtitle("Neighborhood Deprivation Index (Powell-Wiley)",
                   subtitle = "Maryland, Virginia, Washington, D.C., and West Virginia tracts as the referent")

## Categorical Index (Population-weighted quintiles)
### Rename "9-NDI not avail" level as NA for plotting
DMVW2020powell_wiley$NDIQuintNA <- factor(replace(as.character(DMVW2020powell_wiley$NDIQuint), 
                                            DMVW2020powell_wiley$NDIQuint == "9-NDI not avail", NA),
                                     c(levels(DMVW2020powell_wiley$NDIQuint)[-6], NA))

ggplot2::ggplot() + 
  ggplot2::geom_sf(data = DMVW2020powell_wiley, 
                   ggplot2::aes(fill = NDIQuintNA), 
                   color = NA) +
  ggplot2::geom_sf(data = state2020DMVW,
                   fill = "transparent", 
                   color = "white") +
  ggplot2::theme_minimal() + 
  ggplot2::scale_fill_viridis_d(guide = ggplot2::guide_legend(reverse = TRUE),
                                na.value = "grey80") +
  ggplot2::labs(fill = "Index (Categorical)",
                caption = "Source: U.S. Census ACS 2016-2020 estimates")+
  ggplot2::ggtitle("Neighborhood Deprivation Index (Powell-Wiley) Population-weighted Quintiles",
                   subtitle = "Maryland, Virginia, Washington, D.C., and West Virginia tracts as the referent")

Like the NDI (Messer), we also calculate county-level NDI (Powell-Wiley).

# Obtain the 2020 counties from the "tigris" package
county2010DMVW <- tigris::counties(state = c("DC", "MD", "VA", "WV"), year = 2020, cb = TRUE)

# NDI (Powell-Wiley) at the county level (2016-2020)
powell_wiley2020DMVW_county <- ndi::powell_wiley(geo = "county", state = c("DC", "MD", "VA", "WV"), year = 2020)

# Join the NDI (Powell-Wiley) values to the county geometry
DMVW2020powell_wiley_county <- merge(county2010DMVW, powell_wiley2020DMVW_county$ndi, by = "GEOID")

# Visualize the NDI (Powell-Wiley) values (2016-2020 5-year ACS)
## Maryland, Virginia, Washington, D.C., and West Virginia counties
## Continuous Index
ggplot2::ggplot() + 
  ggplot2::geom_sf(data = DMVW2020powell_wiley_county, 
                   ggplot2::aes(fill = NDI),
                   size = 0.20,
                   color = "white") +
  ggplot2::theme_minimal() + 
  ggplot2::scale_fill_viridis_c() +
  ggplot2::labs(fill = "Index (Continuous)",
                caption = "Source: U.S. Census ACS 2016-2020 estimates") +
  ggplot2::ggtitle("Neighborhood Deprivation Index (Powell-Wiley)",
                   subtitle = "Maryland, Virginia, Washington, D.C., and West Virginia counties as the referent")

## Categorical Index

### Rename "9-NDI not avail" level as NA for plotting
DMVW2020powell_wiley_county$NDIQuintNA <- factor(replace(as.character(DMVW2020powell_wiley_county$NDIQuint), 
                                            DMVW2020powell_wiley_county$NDIQuint == "9-NDI not avail", NA),
                                         c(levels(DMVW2020powell_wiley_county$NDIQuint)[-6], NA))

ggplot2::ggplot() + 
  ggplot2::geom_sf(data = DMVW2020powell_wiley_county, 
                   ggplot2::aes(fill = NDIQuint),
                   size = 0.20,
                   color = "white") +
  ggplot2::theme_minimal() + 
  ggplot2::scale_fill_viridis_d(guide = ggplot2::guide_legend(reverse = TRUE),
                                na.value = "grey80") +
  ggplot2::labs(fill = "Index (Categorical)",
                caption = "Source: U.S. Census ACS 2016-2020 estimates") +
  ggplot2::ggtitle("Neighborhood Deprivation Index (Powell-Wiley) Population-weighted Quintiles",
                   subtitle = "Maryland, Virginia, Washington, D.C., and West Virginia counties as the referent")

Advanced Features

Imputing missing census variables

In the messer() and powell_wiley() functions, missing census characteristics can be imputed using the missing and impute arguments of the pca() and fa() functions in the psych package called within the messer() and powell_wiley() function, respectively. Impute values using the logical imp argument (currently only calls impute = "median" by default, which assigns the median values of each missing census variable for a geography).

powell_wiley2020DC <- ndi::powell_wiley(state = "DC", year = 2020) # without imputation
powell_wiley2020DCi <- ndi::powell_wiley(state = "DC", year = 2020, imp = TRUE) # with imputation

table(is.na(powell_wiley2020DC$ndi$NDI)) # n=2 tracts without NDI (Powell-Wiley) values
table(is.na(powell_wiley2020DCi$ndi$NDI)) # n=0 tracts without NDI (Powell-Wiley) values

# Obtain the 2020 census tracts from the "tigris" package
tract2020DC <- tigris::tracts(state = "DC", year = 2020, cb = TRUE)

# Join the NDI (Powell-Wiley) values to the census tract geometry
DC2020powell_wiley <- merge(tract2020DC, powell_wiley2020DC$ndi, by = "GEOID")
DC2020powell_wiley <- merge(DC2020powell_wiley, powell_wiley2020DCi$ndi, by = "GEOID", suffixes = c("_nonimp", "_imp"))

# Visualize the NDI (Powell-Wiley) values (2006-2010 5-year ACS) for Washington, D.C., census tracts
## Continuous Index
ggplot2::ggplot() + 
  ggplot2::geom_sf(data = DC2020powell_wiley, 
                   ggplot2::aes(fill = NDI_nonimp),
                   size = 0.2,
                   color = "white") +
  ggplot2::theme_minimal() + 
  ggplot2::scale_fill_viridis_c() +
  ggplot2::labs(fill = "Index (Continuous)",
                caption = "Source: U.S. Census ACS 2016-2020 estimates") +
  ggplot2::ggtitle("Neighborhood Deprivation Index (Powell-Wiley), Non-Imputed",
                   subtitle = "Washington, D.C., census tracts as the referent")

ggplot2::ggplot() + 
  ggplot2::geom_sf(data = DC2020powell_wiley, 
                   ggplot2::aes(fill = NDI_imp),
                   size = 0.2,
                   color = "white") +
  ggplot2::theme_minimal() + 
  ggplot2::scale_fill_viridis_c() +
  ggplot2::labs(fill = "Index (Continuous)",
                caption = "Source: U.S. Census ACS 2016-2020 estimates") +
  ggplot2::ggtitle("Neighborhood Deprivation Index (Powell-Wiley), Imputed",
                   subtitle = "Washington, D.C., census tracts as the referent")

## Categorical Index
### Rename "9-NDI not avail" level as NA for plotting
DC2020powell_wiley$NDIQuintNA_nonimp <- factor(replace(as.character(DC2020powell_wiley$NDIQuint_nonimp), 
                                            DC2020powell_wiley$NDIQuint_nonimp == "9-NDI not avail", NA),
                                         c(levels(DC2020powell_wiley$NDIQuint_nonimp)[-6], NA))

ggplot2::ggplot() + 
  ggplot2::geom_sf(data = DC2020powell_wiley, 
                   ggplot2::aes(fill = NDIQuintNA_nonimp),
                   size = 0.2,
                   color = "white") +
  ggplot2::theme_minimal() + 
  ggplot2::scale_fill_viridis_d(guide = ggplot2::guide_legend(reverse = TRUE),
                                na.value = "grey80") +
  ggplot2::labs(fill = "Index (Categorical)",
                caption = "Source: U.S. Census ACS 2016-2020 estimates") +
  ggplot2::ggtitle("Neighborhood Deprivation Index (Powell-Wiley) Quintiles, Non-Imputed",
                   subtitle = "Washington, D.C., census tracts as the referent")

### Rename "9-NDI not avail" level as NA for plotting
DC2020powell_wiley$NDIQuintNA_imp <- factor(replace(as.character(DC2020powell_wiley$NDIQuint_imp), 
                                            DC2020powell_wiley$NDIQuint_imp == "9-NDI not avail", NA),
                                      c(levels(DC2020powell_wiley$NDIQuint_imp)[-6], NA))

ggplot2::ggplot() + 
  ggplot2::geom_sf(data = DC2020powell_wiley, 
                   ggplot2::aes(fill = NDIQuintNA_imp),
                   size = 0.2,
                   color = "white") +
  ggplot2::theme_minimal() + 
  ggplot2::scale_fill_viridis_d(guide = ggplot2::guide_legend(reverse = TRUE),
                                na.value = "grey80") +
  ggplot2::labs(fill = "Index (Categorical)",
                caption = "Source: U.S. Census ACS 2016-2020 estimates") +
  ggplot2::ggtitle("Neighborhood Deprivation Index (Powell-Wiley) Quintiles, Imputed",
                   subtitle = "Washington, D.C., census tracts as the referent")

Assign the referent (U.S.-Standardized Metric)

To conduct a contiguous US-standardized index, compute an NDI for all states as in the example below that replicates the nationally standardized NDI (Powell-Wiley) values (2013-2017 ACS-5) found in Slotman et al. (2022) and available from a GIS Portal for Cancer Research website. To replicate the nationally standardized NDI (Powell-Wiley) values (2006-2010 ACS-5) found in Andrews et al. (2020) change the year argument to 2010 (i.e., year = 2010).

us <- tigris::states()
n51 <- c("Commonwealth of the Northern Mariana Islands", "Guam", "American Samoa",
         "Puerto Rico", "United States Virgin Islands")
y51 <- us$STUSPS[!(us$NAME %in% n51)]

start_time <- Sys.time() # record start time
powell_wiley2017US <- ndi::powell_wiley(state = y51, year = 2017)
end_time <- Sys.time() # record end time
time_srr <- end_time - start_time # Calculate run time
powell_wiley2017US
## $ndi
## # A tibble: 73,056 × 20
##    GEOID     state county tract      NDI NDIQu…¹ MedHH…² PctRe…³ PctPu…⁴ MedHo…⁵
##    <chr>     <chr> <chr>  <chr>    <dbl> <fct>     <dbl>   <dbl>   <dbl>   <dbl>
##  1 01001020… Alab… Autau… 201   -2.51e-1 2-Belo…   67826    26.9    12.1  152500
##  2 01001020… Alab… Autau… 202    8.27e-1 4-Abov…   41287    10.9    24.1   96100
##  3 01001020… Alab… Autau… 203    6.63e-1 4-Abov…   46806    12.3    12.9   98900
##  4 01001020… Alab… Autau… 204    1.90e-1 3-Aver…   55895    17.8     5.7  140800
##  5 01001020… Alab… Autau… 205   -5.44e-1 2-Belo…   68143    20.2     8.8  187900
##  6 01001020… Alab… Autau… 206    7.26e-1 4-Abov…   44549    17.8    16     93300
##  7 01001020… Alab… Autau… 207    1.03e+0 5-Most…   41250     8.4    23.2   90800
##  8 01001020… Alab… Autau… 208.… -7.12e-1 2-Belo…   80089    25.7    13.1  299100
##  9 01001020… Alab… Autau… 208.… -3.76e-4 3-Aver…   64439    22.2     9.4  163200
## 10 01001020… Alab… Autau… 209    5.07e-1 4-Abov…   49469    11.6    19    120200
## # … with 73,046 more rows, 10 more variables: PctMgmtBusScArti <dbl>,
## #   PctFemHeadKids <dbl>, PctOwnerOcc <dbl>, PctNoPhone <dbl>,
## #   PctNComPlmb <dbl>, PctEducHSPlus <dbl>, PctEducBchPlus <dbl>,
## #   PctFamBelowPov <dbl>, PctUnempl <dbl>, TotalPop <dbl>, and abbreviated
## #   variable names ¹​NDIQuint, ²​MedHHInc, ³​PctRecvIDR, ⁴​PctPubAsst, ⁵​MedHomeVal
## # ℹ Use `print(n = ...)` to see more rows, and `colnames()` to see all variable names
## 
## $pca
## $pca$loadings
## 
## Loadings:
##                 PC1    PC2    PC3   
## logMedHHInc     -0.508 -0.482 -0.106
## PctNoIDRZ        0.520  0.446       
## PctPubAsstZ      0.309  0.686       
## logMedHomeVal   -0.902  0.135       
## PctWorkClassZ    0.896              
## PctFemHeadKidsZ  0.194  0.701 -0.145
## PctNotOwnerOccZ -0.418  1.022       
## PctNoPhoneZ             0.208  0.654
## PctNComPlmbZ           -0.114  0.855
## PctEducLTHSZ     0.448  0.420       
## PctEducLTBchZ    0.989              
## PctFamBelowPovZ  0.183  0.772       
## PctUnemplZ       0.125  0.612       
## 
##                  PC1   PC2   PC3
## SS loadings    3.680 3.667 1.203
## Proportion Var 0.283 0.282 0.093
## Cumulative Var 0.283 0.565 0.658
## 
## $pca$rotmat
##             [,1]       [,2]       [,3]
## [1,]  0.55862784  0.5566946 0.05689447
## [2,] -1.07454532  1.0353417 0.11522338
## [3,] -0.02457479 -0.3181005 1.02299745
## 
## $pca$rotation
## [1] "promax"
## 
## $pca$Phi
##           [,1]      [,2]      [,3]
## [1,] 1.0000000 0.5589563 0.2019482
## [2,] 0.5589563 1.0000000 0.2257122
## [3,] 0.2019482 0.2257122 1.0000000
## 
## $pca$Structure
##             [,1]        [,2]       [,3]
##  [1,] -0.7983619 -0.78924511 -0.3169437
##  [2,]  0.7566491  0.72262792  0.1464839
##  [3,]  0.6945047  0.86105916  0.2250651
##  [4,] -0.8412442 -0.38554970 -0.2228492
##  [5,]  0.9209060  0.55006528  0.1613593
##  [6,]  0.5562044  0.77638871  0.0519852
##  [7,]  0.1563371  0.79221347  0.1626307
##  [8,]  0.2466075  0.35502655  0.7010271
##  [9,]  0.1201396  0.08519067  0.8311965
## [10,]  0.6827484  0.67044690  0.1843296
## [11,]  0.9475443  0.48138341  0.1669005
## [12,]  0.6232212  0.88362870  0.2527666
## [13,]  0.4705114  0.68557375  0.1788853
## 
## $pca$communality
##  [1] 0.5009586 0.4719001 0.5669373 0.8372646 0.8066373 0.5499017 1.2207095
##  [8] 0.4716959 0.7435617 0.3773519 0.9823819 0.6309002 0.3905570
## 
## $pca$uniqueness
##     logMedHHInc       PctNoIDRZ     PctPubAsstZ   logMedHomeVal   PctWorkClassZ 
##      0.49904136      0.52809991      0.43306269      0.16273538      0.19336271 
## PctFemHeadKidsZ PctNotOwnerOccZ     PctNoPhoneZ    PctNComPlmbZ    PctEducLTHSZ 
##      0.45009834     -0.22070951      0.52830412      0.25643832      0.62264809 
##   PctEducLTBchZ PctFamBelowPovZ      PctUnemplZ 
##      0.01761813      0.36909983      0.60944297 
## 
## $pca$Vaccounted
##                            [,1]      [,2]       [,3]
## SS loadings           4.0565234 4.0415662 1.21158031
## Proportion Var        0.3120403 0.3108897 0.09319849
## Cumulative Var        0.3120403 0.6229300 0.71612846
## Proportion Explained  0.4357322 0.4341256 0.13014213
## Cumulative Proportion 0.4357322 0.8698579 1.00000000
## 
## 
## $missing
## # A tibble: 13 × 5
## # Groups:   variable, total, missing [13]
##    variable         total missing     n percent
##    <chr>            <int> <lgl>   <int> <chr>  
##  1 MedHHInc         73056 TRUE      995 1.36 % 
##  2 MedHomeVal       73056 TRUE     1948 2.67 % 
##  3 PctEducBchPlus   73056 TRUE      646 0.88 % 
##  4 PctEducHSPlus    73056 TRUE      646 0.88 % 
##  5 PctFamBelowPov   73056 TRUE      867 1.19 % 
##  6 PctFemHeadKids   73056 TRUE      816 1.12 % 
##  7 PctMgmtBusScArti 73056 TRUE      752 1.03 % 
##  8 PctNComPlmb      73056 TRUE      816 1.12 % 
##  9 PctNoPhone       73056 TRUE     1377 1.88 % 
## 10 PctOwnerOcc      73056 TRUE      816 1.12 % 
## 11 PctPubAsst       73056 TRUE      816 1.12 % 
## 12 PctRecvIDR       73056 TRUE      816 1.12 % 
## 13 PctUnempl        73056 TRUE      751 1.03 % 
## 
## $cronbach
## [1] 0.9372104
ggplot2::ggplot(powell_wiley2017US$ndi, ggplot2::aes(x = NDI)) +
  ggplot2::geom_histogram(color = "black", fill = "white") + 
  ggplot2::theme_minimal() +
  ggplot2::ggtitle("Histogram of US-standardized NDI (Powell-Wiley) values (2013-2017)",
                   subtitle = "U.S. census tracts as the referent (including Alaska, Hawaii, and Washington, D.C.)")

The process to compute a US-standardized NDI (Powell-Wiley) took about 4.5 minutes to run on an machine with the following features:

sessionInfo()
## R version 4.2.1 (2022-06-23)
## Platform: x86_64-apple-darwin17.0 (64-bit)
## Running under: macOS Catalina 10.15.7
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] tigris_1.6.1     tidycensus_1.2.2 sf_1.0-7         ndi_0.0.1       
## [5] ggplot2_3.3.6    knitr_1.39      
## 
## loaded via a namespace (and not attached):
##  [1] httr_1.4.3         sass_0.4.2         tidyr_1.2.0        jsonlite_1.8.0    
##  [5] viridisLite_0.4.0  bslib_0.4.0        sp_1.5-0           highr_0.9         
##  [9] yaml_2.3.5         pillar_1.8.0       lattice_0.20-45    glue_1.6.2        
## [13] uuid_1.1-0         digest_0.6.29      rvest_1.0.2        colorspace_2.0-3  
## [17] htmltools_0.5.3    psych_2.2.5        pkgconfig_2.0.3    s2_1.1.0          
## [21] purrr_0.3.4        scales_1.2.0       tzdb_0.3.0         tibble_3.1.8      
## [25] proxy_0.4-27       generics_0.1.3     farver_2.1.1       ellipsis_0.3.2    
## [29] cachem_1.0.6       withr_2.5.0        cli_3.3.0          mnormt_2.1.0      
## [33] magrittr_2.0.3     crayon_1.5.1       maptools_1.1-4     evaluate_0.15     
## [37] fansi_1.0.3        nlme_3.1-158       MASS_7.3-57        xml2_1.3.3        
## [41] foreign_0.8-82     class_7.3-20       tools_4.2.1        hms_1.1.1         
## [45] lifecycle_1.0.1    stringr_1.4.0      munsell_0.5.0      compiler_4.2.1    
## [49] jquerylib_0.1.4    e1071_1.7-11       rlang_1.0.4        classInt_0.4-7    
## [53] units_0.8-0        grid_4.2.1         rstudioapi_0.13    rappdirs_0.3.3    
## [57] labeling_0.4.2     rmarkdown_2.14     wk_0.6.0           gtable_0.3.0      
## [61] DBI_1.1.3          curl_4.3.2         R6_2.5.1           dplyr_1.0.9       
## [65] rgdal_1.5-30       fastmap_1.1.0      utf8_1.2.2         KernSmooth_2.23-20
## [69] readr_2.1.2        stringi_1.7.8      parallel_4.2.1     Rcpp_1.0.9        
## [73] vctrs_0.4.1        tidyselect_1.1.2   xfun_0.31