library(ggplot2)
library(maps)
library(glue)
library(dplyr)
library(tidyr)
library(postal)
library(zipcode)The postal::zips_zones_sample dataset provides a quarter of the ~4 million 3-digit origin-destination pairs. (If you want allll of them, you can run fetch_all and put on a pot of coffee.)
What if we wanted to use that to plot how zones increase as the destination gets farther from the origin?
We’ll need a measure of latitude and longitude for that. Luckily, data from the zipcode package relates every zip codes to its latitude and longitude. We can that along with the partial data in postal::zips_zones_sample to match up zips to latitudes and longitudes.
Let’s load in the randomly sampled postal package data.
data(zips_zones_sample)
zips_zones_sample
#> # A tibble: 1,000,000 x 6
#> origin_zip dest_zip zone specific_to_prior… same_ndc has_five_digit_ex…
#> <chr> <chr> <int> <lgl> <lgl> <lgl>
#> 1 003 <NA> NA NA NA NA
#> 2 004 <NA> NA NA NA NA
#> 3 005 012 2 FALSE FALSE FALSE
#> 4 005 027 2 FALSE FALSE FALSE
#> 5 005 028 2 FALSE FALSE FALSE
#> 6 005 030 3 FALSE FALSE FALSE
#> 7 005 042 3 FALSE FALSE FALSE
#> 8 005 044 4 FALSE FALSE FALSE
#> 9 005 051 3 FALSE FALSE FALSE
#> 10 005 053 3 FALSE FALSE FALSE
#> # … with 999,990 more rowsLater we’ll select away all the other details and focus just on origin_zip, dest_zip, and zone. As we know, the postal::zip_zones data displays all origin zips as 3 digit prefixes and most destination zips as 3 digits as well.
However, the zipcode::zipcode data displays zips in the usual way, as 5 digit.
data(zipcode)
zipcode %>%
as_tibble()
#> # A tibble: 44,336 x 5
#> zip city state latitude longitude
#> <chr> <chr> <chr> <dbl> <dbl>
#> 1 00210 Portsmouth NH 43.0 -71.0
#> 2 00211 Portsmouth NH 43.0 -71.0
#> 3 00212 Portsmouth NH 43.0 -71.0
#> 4 00213 Portsmouth NH 43.0 -71.0
#> 5 00214 Portsmouth NH 43.0 -71.0
#> 6 00215 Portsmouth NH 43.0 -71.0
#> 7 00501 Holtsville NY 40.9 -72.6
#> 8 00544 Holtsville NY 40.9 -72.6
#> 9 00601 Adjuntas PR 18.2 -66.7
#> 10 00602 Aguada PR 18.4 -67.2
#> # … with 44,326 more rowsSo we’ll want to trim them in order to be able to join on our data, the majority of which are 3 digits.
zips <-
zipcode %>%
as_tibble() %>%
mutate(
zip_trim = substr(zip, 1, 3)
)
zips
#> # A tibble: 44,336 x 6
#> zip city state latitude longitude zip_trim
#> <chr> <chr> <chr> <dbl> <dbl> <chr>
#> 1 00210 Portsmouth NH 43.0 -71.0 002
#> 2 00211 Portsmouth NH 43.0 -71.0 002
#> 3 00212 Portsmouth NH 43.0 -71.0 002
#> 4 00213 Portsmouth NH 43.0 -71.0 002
#> 5 00214 Portsmouth NH 43.0 -71.0 002
#> 6 00215 Portsmouth NH 43.0 -71.0 002
#> 7 00501 Holtsville NY 40.9 -72.6 005
#> 8 00544 Holtsville NY 40.9 -72.6 005
#> 9 00601 Adjuntas PR 18.2 -66.7 006
#> 10 00602 Aguada PR 18.4 -67.2 006
#> # … with 44,326 more rowsLet’s get a tibble of all possible USPS zips, both origin prefixes and destinations, including our 5 digit destinations.
(usps_zips <-
tibble(
zip =
unique(zips_zones_sample$origin_zip) %>%
c(unique(zips_zones_sample$dest_zip))
) %>%
distinct())
#> # A tibble: 91,929 x 1
#> zip
#> <chr>
#> 1 003
#> 2 004
#> 3 005
#> 4 006
#> 5 007
#> 6 008
#> 7 009
#> 8 010
#> 9 011
#> 10 012
#> # … with 91,919 more rowsNow we can join the zipcode trimmed zips on our usps_zips to get a corresponding lat and long for each 3-digit origin and destination zip.
(zips_lat_long <-
zips %>%
distinct(zip_trim, .keep_all = TRUE) %>%
left_join(usps_zips, by = c("zip_trim" = "zip")) %>%
select(zip_trim, latitude, longitude))
#> # A tibble: 955 x 3
#> zip_trim latitude longitude
#> <chr> <dbl> <dbl>
#> 1 002 43.0 -71.0
#> 2 005 40.9 -72.6
#> 3 006 18.2 -66.7
#> 4 007 18.2 -66.1
#> 5 008 18.3 -65.0
#> 6 009 18.5 -66.1
#> 7 010 42.1 -72.6
#> 8 011 42.2 -72.6
#> 9 012 42.5 -73.3
#> 10 013 42.6 -72.6
#> # … with 945 more rowsNow we have a mapping between each possible 3 digit zip and its latitude and longitude. But for every row in our zips_zones_sample dataset, we have two lats and two longs: one for the origin and one for the destination.
So we’ll want to take our zips_lat_long mapping and use that to attach latitude and longitude to each origin and each destination in zips_zones_sample.
(zips_zones_lat_long <-
zips_zones_sample %>%
select(origin_zip, dest_zip, zone) %>%
left_join(zips_lat_long, by = c("origin_zip" = "zip_trim")) %>%
rename(
lat_origin = latitude,
long_origin = longitude) %>%
left_join(zips_lat_long, by = c("dest_zip" = "zip_trim")) %>%
rename(
lat_dest = latitude,
long_dest = longitude) %>%
drop_na(zone))
#> # A tibble: 999,978 x 7
#> origin_zip dest_zip zone lat_origin long_origin lat_dest long_dest
#> <chr> <chr> <int> <dbl> <dbl> <dbl> <dbl>
#> 1 005 012 2 40.9 -72.6 42.5 -73.3
#> 2 005 027 2 40.9 -72.6 41.8 -71.1
#> 3 005 028 2 40.9 -72.6 41.5 -71.3
#> 4 005 030 3 40.9 -72.6 43.0 -71.2
#> 5 005 042 3 40.9 -72.6 44.1 -70.2
#> 6 005 044 4 40.9 -72.6 44.8 -68.8
#> 7 005 051 3 40.9 -72.6 43.2 -72.5
#> 8 005 053 3 40.9 -72.6 42.8 -72.6
#> 9 005 054 3 40.9 -72.6 44.5 -73.2
#> 10 005 059 3 40.9 -72.6 44.7 -71.7
#> # … with 999,968 more rowsWe could use the ggmap package to get a map of the US like get_googlemap("us", zoom = 4) but to go simpler, we’ll use ggplot2’s built-in map_data function.
Now, using a single origin zip (041), we can plot all of the destination zips’ zones relative to that origin. We’ll also filter out any outliers.
filtered <-
zips_zones_lat_long %>%
filter(origin_zip == origin_prefix) %>%
left_join(us, by = c("lat_dest" = "lat", "long_dest" = "long")) %>%
filter(as.numeric(dest_zip) > 10 &
long_dest < -50 &
long_dest > -120)Remember – always confusingly, to me, longitude is on the x and latitude is on the y (!)
The map should look fragmented as our zip_zones data only covers a quarter of the possible origin-destination pairs. Still, the pattern is clear: zones increase as your destination gets farther away.
ggplot() +
geom_polygon(data = us, aes(x = long, y = lat, group = group), fill = "white", color = "black") +
geom_density_2d(data = filtered,
aes(long_dest, lat_dest, colour = factor(zone)),
alpha = 1) +
labs(x = "Longitude", y = "Latitude", colour = "Zone") +
ggtitle("Shipping Zones from Portland, Maine",
subtitle = glue("Origin zone prefix: {origin_prefix}")) +
scale_colour_brewer(type = "seq", palette = "BrBG") +
theme_classic(base_family = "Arial Narrow") +
coord_quickmap()