ndi: Neighborhood Deprivation Indices

License GitHub last commit

Date repository last updated: August 10, 2022

Overview

The ndi package is a suite of R functions to compute various geospatial neighborhood deprivation indices (NDI) in the United States. Two types of NDI are available in the initial repository: (1) based on Messer et al. (2006) and (2) based on Andrews et al. (2020) and Slotman et al. (2022) who uses variables chosen by Roux and Mair (2010). Both are a decomposition of various demographic characteristics from the U.S. Census Bureau American Community Survey 5-year estimates pulled by the tidycensus package.

Installation

To install the release version from CRAN:

install.packages("ndi")

To install the development version from GitHub:

devtools::install_github("idblr/ndi")

Available functions

Function Description
messer Compute NDI based on Messer et al. (2006).
powell_wiley Compute NDI based on Andrews et al. (2020) and Slotman et al. (2022) with variables chosen by Roux and Mair (2010).

The repository also includes the code to create the project hexsticker.

Author

See also the list of contributors who participated in this package.

Getting Started

Usage

# ------------------ #
# Necessary packages #
# ------------------ #

library(ndi)
library(ggplot2)
library(sf)
library(tidycensus) # a dependency for the "ndi"" package
library(tigris) # a dependency for the "ndi"" package

# -------- #
# Settings #
# -------- #

## Access Key for census data download
### Obtain one at http://api.census.gov/data/key_signup.html
tidycensus::census_api_key("...") # INSERT YOUR OWN KEY FROM U.S. CENSUS API

# ---------------------- #
# Calculate NDI (Messer) #
# ---------------------- #

# Compute the NDI (Messer) values (2016-2020 5-year ACS) for Washington, D.C. census tracts
messer2020DC <- ndi::messer(state = "DC", year = 2020)

# ------------------------------ #
# Outputs from messer() function #
# ------------------------------ #

# A tibble containing the identification, geographic name, NDI (Messer) values, NDI (Messer) quartiles, and raw census characteristics for each tract
messer2020DC$ndi

# The results from the principal component analysis used to compute the NDI (Messer) values
messer2020DC$pca

# A tibble containing a breakdown of the missingingness of the census characteristics used to compute the NDI (Messer) values
messer2020DC$missing

# -------------------------------------- #
# Visualize the messer() function output #
# -------------------------------------- #

# Obtain the 2020 census tracts from the "tigris" package
tract2020DC <- tigris::tracts(state = "DC", year = 2020, cb = TRUE)

# Join the NDI (Messer) values to the census tract geometry
DC2020messer <- merge(tract2020DC, messer2020DC$ndi, by = "GEOID")

# Visualize the NDI (Messer) values (2016-2020 5-year ACS) for Washington, D.C. census tracts

## Continuous Index
ggplot2::ggplot() + 
  ggplot2::geom_sf(data = DC2020messer, 
                   ggplot2::aes(fill = NDI),
                   color = "white") +
  ggplot2::theme_bw() +  
  ggplot2::scale_fill_viridis_c() +
  ggplot2::labs(fill = "Index (Continuous)",
                caption = "Source: U.S. Census ACS 2016-2020 estimates")+
  ggplot2::ggtitle("Neighborhood Deprivation Index\nContinuous (Messer, non-imputed)",
                   subtitle = "Washington, D.C. tracts as the referent")

## Categorical Index (Quartiles)
### Rename "9-NDI not avail" level as NA for plotting
DC2020messer$NDIQuartNA <- factor(replace(as.character(DC2020messer$NDIQuart),
                                          DC2020messer$NDIQuart == "9-NDI not avail",
                                          NA),
                                  c(levels(DC2020messer$NDIQuart)[-5], NA))

ggplot2::ggplot() + 
  ggplot2::geom_sf(data = DC2020messer, 
                   ggplot2::aes(fill = NDIQuartNA),
                   color = "white") +
  ggplot2::theme_bw() + 
  ggplot2::scale_fill_viridis_d(guide = ggplot2::guide_legend(reverse = TRUE),
                                na.value = "grey50") +
  ggplot2::labs(fill = "Index (Categorical)",
                caption = "Source: U.S. Census ACS 2016-2020 estimates") +
  ggplot2::ggtitle("Neighborhood Deprivation Index\nQuartiles (Messer, non-imputed)",
                   subtitle = "Washington, D.C. tracts as the referent")

# ---------------------------- #
# Calculate NDI (Powell-Wiley) #
# ---------------------------- #

# Compute the NDI (Powell-Wiley) values (2016-2020 5-year ACS) for Washington, D.C. census tracts
powell_wiley2020DC <- powell_wiley(state = "DC", year = 2020)
powell_wiley2020DCi <- powell_wiley(state = "DC", year = 2020, imp = TRUE) # impute missing values

# ------------------------------------ #
# Outputs from powell_wiley() function #
# ------------------------------------ #

# A tibble containing the identification, geographic name, NDI (Powell-Wiley) value, and raw census characteristics for each tract
powell_wiley2020DC$ndi

# The results from the principal component analysis used to compute the NDI (Powell-Wiley) values
powell_wiley2020DC$pca

# A tibble containing a breakdown of the missingingness of the census characteristics used to compute the NDI (Powell-Wiley) values
powell_wiley2020DC$missing

# -------------------------------------------- #
# Visualize the powell_wiley() function output #
# -------------------------------------------- #

# Obtain the 2020 census tracts from the "tigris" package
tract2020DC <- tigris::tracts(state = "DC", year = 2020, cb = TRUE)

# Join the NDI (powell_wiley) values to the census tract geometry
DC2020powell_wiley <- merge(tract2020DC, powell_wiley2020DC$ndi, by = "GEOID")
DC2020powell_wiley <- merge(DC2020powell_wiley, powell_wiley2020DCi$ndi, by = "GEOID")

# Visualize the NDI (Powell-Wiley) values (2016-2020 5-year ACS) for Washington, D.C. census tracts

## Non-imputed missing tracts (Continuous)
ggplot2::ggplot() + 
  ggplot2::geom_sf(data = DC2020powell_wiley, 
                   ggplot2::aes(fill = NDI.x),
                   color = "white") +
  ggplot2::theme_bw() + 
  ggplot2::scale_fill_viridis_c() +
  ggplot2::labs(fill = "Index (Continuous)",
                caption = "Source: U.S. Census ACS 2016-2020 estimates")+
  ggplot2::ggtitle("Neighborhood Deprivation Index\nContinuous (Powell-Wiley, non-imputed)",
                   subtitle = "Washington, D.C. tracts as the referent")

## Non-imputed missing tracts (Categorical quintiles)
### Rename "9-NDI not avail" level as NA for plotting
DC2020powell_wiley$NDIQuintNA.x <- factor(replace(as.character(DC2020powell_wiley$NDIQuint.x),
                                                  DC2020powell_wiley$NDIQuint.x == "9-NDI not avail",
                                                  NA),
                                          c(levels(DC2020powell_wiley$NDIQuint.x)[-6], NA))

ggplot2::ggplot() + 
  ggplot2::geom_sf(data = DC2020powell_wiley, 
                   ggplot2::aes(fill = NDIQuintNA.x),
                   color = "white") +
  ggplot2::theme_bw() + 
  ggplot2::scale_fill_viridis_d(guide = ggplot2::guide_legend(reverse = TRUE),
                                na.value = "grey50") +
  ggplot2::labs(fill = "Index (Categorical)",
                caption = "Source: U.S. Census ACS 2016-2020 estimates")+
  ggplot2::ggtitle("Neighborhood Deprivation Index\nPopulation-weighted Quintiles (Powell-Wiley, non-imputed)",
                   subtitle = "Washington, D.C. tracts as the referent")

## Imputed missing tracts (Continuous)
ggplot2::ggplot() + 
  ggplot2::geom_sf(data = DC2020powell_wiley, 
                   ggplot2::aes(fill = NDI.y),
                   color = "white") +
  ggplot2::theme_bw() + 
  ggplot2::scale_fill_viridis_c() +
  ggplot2::labs(fill = "Index (Continuous)",
                caption = "Source: U.S. Census ACS 2016-2020 estimates")+
  ggplot2::ggtitle("Neighborhood Deprivation Index\nContinuous (Powell-Wiley, imputed)",
                   subtitle = "Washington, D.C. tracts as the referent")

## Imputed missing tracts (Categorical quintiles)
### Rename "9-NDI not avail" level as NA for plotting
DC2020powell_wiley$NDIQuintNA.y <- factor(replace(as.character(DC2020powell_wiley$NDIQuint.y), 
                                                  DC2020powell_wiley$NDIQuint.y == "9-NDI not avail",
                                                  NA), 
                                          c(levels(DC2020powell_wiley$NDIQuint.y)[-6], NA))

ggplot2::ggplot() + 
  ggplot2::geom_sf(data = DC2020powell_wiley, 
                   ggplot2::aes(fill = NDIQuintNA.y),
                   color = "white") +
  ggplot2::theme_bw() + 
  ggplot2::scale_fill_viridis_d(guide = ggplot2::guide_legend(reverse = TRUE),
                                na.value = "grey50") +
  ggplot2::labs(fill = "Index (Categorical)",
                caption = "Source: U.S. Census ACS 2016-2020 estimates")+
  ggplot2::ggtitle("Neighborhood Deprivation Index\nPopulation-weighted Quintiles (Powell-Wiley, imputed)",
                   subtitle = "Washington, D.C. tracts as the referent")

# --------------------------- #
# Compare the two NDI metrics #
# --------------------------- #

# Merge the two NDI metrics (Messer and Powell-Wiley, imputed)
ndi2020DC <- merge(messer2020DC$ndi, powell_wiley2020DCi$ndi, by = "GEOID", suffixes = c(".messer", ".powell_wiley"))

# Check the correlation the two NDI metrics (Messer and Powell-Wiley, imputed) as continuous values
cor(ndi2020DC$NDI.messer, ndi2020DC$NDI.powell_wiley, use = "complete.obs") # Pearsons r = 0.975

# Check the similarity of the two NDI metrics (Messer and Powell-Wiley, imputed) as quartiles
table(ndi2020DC$NDIQuart, ndi2020DC$NDIQuint)

Funding

This package was developed while the author was a postdoctoral fellow supported by the Cancer Prevention Fellowship Program at the National Cancer Institute.

Acknowledgments

The messer() function functionalizes the code found in Hruska et al. (2022) available on an OSF repository, but with percent with income less than $30K added to the computation based on Messer et al. (2006). The messer() function also allows for the computation of NDI (Messer) of each year between 2010-2020 (when the U.S. census characteristics are available to-date). There was no code companion to compute NDI (Powell-Wiley) included in Andrews et al. (2020) or Slotman et al. (2022), but the package author worked directly with the latter manuscript authors to replicate their SAS code in R for the powell_wiley() function. Please note: the NDI (Powell-Wiley) values will not exactly match (but will highly correlate with) those found in Andrews et al. (2020) and Slotman et al. (2022) because the two studies used a different statistical platform (i.e., SPSS and SAS, respectively) that intrinsically calculate the principal component analysis differently from R.

When citing this package for publication, please follow:

citation("ndi")

Questions? Feedback?

For questions about the package please contact the maintainer Dr. Ian D. Buller or submit a new issue. Confirmation of the computation, feedback, and feature collaboration is welcomed, especially from the authors of references cited above.