Package Overview

Short Description of Package

The package fetches data from three disparate data sources and allows user to perform analyses on them. It offers two core components:

  1. A robust data retrieval and preparation infrastructure for wildfire, climate, and air quality data
  2. A simple, informative, and interactive visualizations of the aforementioned datasets for California counties from 2011 to 2015.

Namely, the data preparation tools are convenient for scientists interested in retrieving and tidying data at-scale from reputable sources. The interactive dashboard, built via Shiny, is a user-friendly way to investigate wildfire data and how it may or may not link to climate and air quality data.

Use Cases

Use Case 1

A “casual observer” is interested in learning more about wildfires, given high media coverage on the topic. They use the package which contains pre-built datasets and run the Shiny app. The Home page gives an introduction on the tool. The “Examples” tab offers a slow transition into analysis as well as primes questions that the user may follow up on in subsequent tabs. The “Explore” tab allows the user to plot data they are interested in within three formats:

  1. A map format
  2. Density plots and boxplots that highlight differences in metrics before and after the discovery date of top-ranked fires of the dataset
  3. Time series plots that show the timeline of metrics before and after top-ranked fires in the dataset

The user can use inputs such as dropdown menus and date ranges to select what data are presented. The app is launched simply via wildviz::wildvizApp().

Additionally, those who are more statistically and scientifically inclined can also use the app for discussion with non-technical personnel and for preliminary analyses. The app is written in a way that makes it easily tailored and scaled to other data.

Use Case 2

Researchers and scientists (such as Fire Scientists or Fire Prevention Engineers) studying the relationships and impact between wildfires, climate, and air quality has a need to query and compile the data for their analysis. They require data from reputable sources that is cleaned and prepared in a standard format. They also need the flexibility to query and filter the data for specific states or counties in the US and for specific date ranges. A sample workflow may look like:

  1. The scientist queries the datasets for wildfire, climate, and/or air quality using the functions provided through the package. They are:
  2. The scientist combines the datasets by the join key columns: counties and date
  3. The scientist visualizes the dataset to gain insights or builds time series or machine learrning models around the dataset to make predictions.

Additional Material

Required Set-up

Wildfires

The create_wildfire() function relies on the SQLite db available on Kaggle: https://www.kaggle.com/rtatman/188-million-us-wildfires. Unfortunately, Kaggle API does not support R at this time. Download the '188-million-us-wildfires.zip' file, and provide the path to the function as db_name, i.e.

db_name = 'data-raw/FPA_FOD_20170508.sqlite'

Climate

The climate functions use the NOAA API to identify the weather monitors within a U.S. county, you will need to get an access token from NOAA to use this function. Visit NOAA's token request page https://www.ncdc.noaa.gov/cdo-web/token to request a token by email. You then need to set that API code in your R session, i.e.

options(noaakey = "your key")

replacing “your key” with the API key you've requested from NOAA).

Note: You may need to reload the package so that the API email and key is set by default.

AQI

The AQI functions use the AQS API to fetch the data from the API endpoints. Use the following service to register as a user:

  1. A verification email will be sent to the email account specified. To register using the email address create and request this link (Replace 'myemail@@example.com' in the example with your email address.): https://aqs.epa.gov/data/api/signup?email=myemail@example.com
  2. You then need to set the email and the key in the .Renviron file as follows, i.e.
aqs_api_email=myemail@example.com
aqs_api_key=testkey1234

Note: You may need to reload the package so that the API email and key is set by default.

Example Function Usage

Wildfires

fires_df <- create_wildfire(db_name = 'data-raw/FPA_FOD_20170508.sqlite', 
                            state_abbrev = c('CA', 'NY'),
                            cols=c('FIRE_NAME', 'DISCOVERY_DATE', 'CONT_DATE', 'STAT_CAUSE_DESCR', 
                                   'FIRE_SIZE', 'FIRE_SIZE_CLASS', 'LATITUDE', 'LONGITUDE', 
                                   'STATE', 'FIPS_CODE', 'FIPS_NAME'),
                            year_min = 1992, 
                            year_max = 2015)

Climate

climate <- daily_climate_counties(fips_list = c('06001', '06003'), 
                                  var_list = c('prcp', 'snow', 'snwd', 'tmax', 'tmin'), 
                                  date_min = '2015-01-01', 
                                  date_max = '2015-12-31', 
                                  coverage = 0.90)

AQI

aqs_api_email = Sys.getenv("aqs_api_email")
aqs_api_key = Sys.getenv("aqs_api_key")

state_codes <- get_state_code(aqs_api_email = aqs_api_email, 
                              aqs_api_key = aqs_api_key, 
                              state_names = c('California'))
counties <- get_counties(aqs_api_email = aqs_api_email, 
                         aqs_api_key = aqs_api_key, 
                         state_codes = state_codes)
aqi <- daily_aqi(aqs_api_email = aqs_api_email, 
                 aqs_api_key = aqs_api_key, 
                 fips_list = counties$fips, 
                 year_min = 2015, 
                 year_max = 2015)