This vignette explores the mts_monitor
data model used throughout the AirMonitor package to store and work with monitoring data.
The AirMonitor package is designed to provide a compact, full-featured suite of utilities for working with PM 2.5 data. A uniform data model provides consistent data access across monitoring data available from different agencies. The core data model in this package is defined by the mts_monitor
object used to store data associated with groups of individual monitors.
To work efficiently with the package it is important to understand the structure of this data object and which functions operate on it. Package functions that begin with monitor_
, expect objects of class mts_monitor
as their first argument. (‘mts_’ stands for ‘Multiple Time Series’)
The AirMonitor package uses the mts data model defined in MazamaTimeSeries.
In this data model, each unique time series is referred to as a “device-deployment” – a timeseries collected by a particular device at a specific location. Multiple device-deployments are stored in memory as a monitor object – an R list with two dataframes:
monitor$meta
– rows = unique device-deployments; cols = device/location metadata
monitor$data
– rows = UTC times; cols = device-deployments (plus an additional datetime
column)
A key feature of this data model is the use of the deviceDeploymentID
as a “foreign key” that allows data
columns to be mapped onto the associated spatial and device metadata in a meta
row. The following will always be true:
identical(names(monitor$data), c('datetime', monitor$meta$deviceDeploymentID))
Each column of monitor$data
represents a timeseries associated with a particular device-deployment while each row represents a synoptic snapshot of all measurements made at a particular time.
In this manner, software can create both timeseries plots and maps from a single monitor
object in memory.
The data
dataframe contains all hourly measurements organized with rows (the ‘unlimited’ dimension) as unique timesteps and columns as unique device-deployments. The very first column is always named datetime
and contains the POSIXct
datetime in Coordinated Universal Time (UTC). This time axis is guaranteed to be a regular hourly axis with no gaps.
The meta
dataframe contains all metadata associated with device-deployments and is organized with rows as unique device-deployments and columns containing both location and device metadata. The following columns are guaranteed to exist in the meta
dataframe:
deviceDeploymentID
– unique ID associated with a time seriesdeviceID
– unique location IDdeviceType
– (optional) device typedeviceDescription
– (optional) human readable device descriptiondeviceExtra
– (optional) additional human readable device informationpollutant
– pollutant name from AirMonitor::pollutantNames
units
– one of "PPM|PPB|UG/M3"
dataIngestSource
– (optional) source of datadataIngestURL
– (optional) URL used to access datadataIngestUnitID
– (optional) instrument identifier used at dataIngestSource
dataIngestExtra
– (optional) human readable data ingest informationdataIngestDescription
– (optional) human readable data ingest instructionslocationID
– unique location ID from MazamaLocationUtils::location_createID()
locationName
– human readable location namelongitude
– longitudelatitude
– latitudeelevation
– (optional) elevationcountryCode
– ISO 3166-1 alpha-2 country codestateCode
– ISO 3166-2 alpha-2 state codecountyName
– US county nametimezone
– Olson time zonehouseNumber
– (optional)street
– (optional)city
– (optional)zip
– (optional)AQSID
– (optional) EPA AQS unique identifierIt is important to note that the deviceDeploymentID
acts as a unique key that connects data
with meta
. The following will always be true:
rownames(mts_monitor$meta) == mts_monitor$meta$deviceDeploymentID
colnames(mts_monitor$data) == c('datetime', mts_monitor$meta$deviceDeploymentID)
Example 1: Exploring mts_monitor
objects
We will use the built-in “NW_Megafires” dataset and various monitor_filter~()
functions to subset a mts_monitor
object which we then examine.
suppressPackageStartupMessages(library(AirMonitor))
# Recipe to create Washington fires in August of 2014:
monitor <- # Start with NW Megafires
NW_Megafires %>%
# Filter to only include Washington state
monitor_filter(stateCode == "WA") %>%
# Filter to only include August
monitor_filterDate(20150801, 20150831)
# 'mts_monitor' objects can be identified by their class
class(monitor)
## [1] "mts_monitor" "mts" "list"
# They alwyas have two elements called 'meta' and 'data'
names(monitor)
## [1] "meta" "data"
# Examine the 'meta' dataframe
dim(monitor$meta)
## [1] 67 26
names(monitor$meta)
## [1] "deviceDeploymentID" "deviceID" "deviceType"
## [4] "deviceDescription" "deviceExtra" "pollutant"
## [7] "units" "dataIngestSource" "dataIngestURL"
## [10] "dataIngestUnitID" "dataIngestExtra" "dataIngestDescription"
## [13] "locationID" "locationName" "longitude"
## [16] "latitude" "elevation" "countryCode"
## [19] "stateCode" "countyName" "timezone"
## [22] "houseNumber" "street" "city"
## [25] "zip" "AQSID"
# Examine the 'data' dataframe
dim(monitor$data)
## [1] 720 68
# This should always be true
identical(names(monitor$data), c('datetime', monitor$meta$deviceDeploymentID))
## [1] TRUE
Example 2: Basic manipulation of mts_monitor
objects
The AirMonitor package has numerous functions that can work with mts_monitor
objects, all of which begin with monitor_
. If you need to do something that the package functions do not provide, you can manipulate mts_monitor
objects directly as long as you retain the structure of the data model.
Functions that accept and return mts_monitor
objects include:
monitor_aqi()
monitor_collapse()
monitor_combine()
monitor_dailyStatistic()
monitor_dailyThreshold()
monitor_dropEmpty()
monitor_filter()
( aka monitor_filterMeta()
)monitor_filterByDistance()
monitor_filterDate()
monitor_filterDateTime()
monitor_mutate()
monitor_nowcast()
monitor_replaceValues()
monitor_select()
( aka monitor_reorder()
)monitor_trimDate()
These functions can be used with the magrittr package %>%
pipe as in the following example:
# Calculate daily means for the Methow Valley from monitors in Twisp and Winthrop
"450d08fb5a3e4ea0_530470009"
TwispID <- "40ffdacb421a5ee6_530470010"
WinthropID <-
# Recipe to calculate Methow Valley August Means:
Methow_Valley_AugustMeans <- # Start with NW Megafires
NW_Megafires %>%
# Select monitors from Twisp and Winthrop
monitor_select(c(TwispID, WinthropID)) %>%
# Average them together hour-by-hour
monitor_collapse(deviceID = 'MethowValley') %>%
# Restrict data to of July
monitor_filterDate(20150801, 20150901) %>%
# Calculate daily mean
monitor_dailyStatistic(mean, minHours = 18) %>%
# Round data to one decimal place
monitor_mutate(round, 1)
# Look at the first week
$data[1:7,] Methow_Valley_AugustMeans
## datetime bbdd6c928df114fb_MethowValley
## 1 2015-08-01 20.3
## 2 2015-08-02 30.7
## 3 2015-08-03 12.1
## 4 2015-08-04 9.0
## 5 2015-08-05 3.7
## 6 2015-08-06 3.2
## 7 2015-08-07 11.0
Example 3: Advanced manipulation of mts_monitor
objects
The following code demonstrates user manipulation of the data from a mts_monitor
object outside the scope of provided monitor_~()
functions.
# Spokane area AQSIDs all begin with "53063"
Spokane <- NW_Megafires %>%
monitor_filter(stringr::str_detect(AQSID, "^53063")) %>%
monitor_filterDate(20150801, 20150808) %>%
monitor_dropEmpty()
# Show the daily statistic
%>%
Spokane monitor_dailyStatistic(mean) %>%
monitor_getData()
## # A tibble: 7 × 4
## datetime `5b3acb7aa679dc14_… abde4337eb9064e4_5… fa8288b1da3b2a87_…
## <dttm> <dbl> <dbl> <dbl>
## 1 2015-08-01 00:00:00 13.3 14.0 18.2
## 2 2015-08-02 00:00:00 34.4 39.0 47.1
## 3 2015-08-03 00:00:00 31.8 35.2 37.1
## 4 2015-08-04 00:00:00 7.22 7.08 7.31
## 5 2015-08-05 00:00:00 9.15 10.2 5.82
## 6 2015-08-06 00:00:00 4.47 7.48 3.74
## 7 2015-08-07 00:00:00 7.52 5.35 4.50
# Use a custom function to convert from ug/m3 to oz/ft3
%>%
Spokane monitor_mutate(function(x) { return( (x / 28350) * (.3048)^3 ) }) %>%
monitor_dailyStatistic(mean) %>%
monitor_getData()
## # A tibble: 7 × 4
## datetime `5b3acb7aa679dc14_… abde4337eb9064e4_5… fa8288b1da3b2a87_…
## <dttm> <dbl> <dbl> <dbl>
## 1 2015-08-01 00:00:00 0.0000133 0.0000140 0.0000182
## 2 2015-08-02 00:00:00 0.0000344 0.0000389 0.0000471
## 3 2015-08-03 00:00:00 0.0000318 0.0000352 0.0000371
## 4 2015-08-04 00:00:00 0.00000721 0.00000707 0.00000730
## 5 2015-08-05 00:00:00 0.00000914 0.0000102 0.00000581
## 6 2015-08-06 00:00:00 0.00000447 0.00000747 0.00000373
## 7 2015-08-07 00:00:00 0.00000752 0.00000535 0.00000449
# Pull out the time series data to calculate correlations
%>%
Spokane monitor_getData() %>%
dplyr::select(-1) %>%
cor(use = "complete.obs")
## 5b3acb7aa679dc14_530639997
## 5b3acb7aa679dc14_530639997 1.0000000
## abde4337eb9064e4_530639996 0.8811545
## fa8288b1da3b2a87_530630047 0.9258439
## abde4337eb9064e4_530639996
## 5b3acb7aa679dc14_530639997 0.8811545
## abde4337eb9064e4_530639996 1.0000000
## fa8288b1da3b2a87_530630047 0.8835324
## fa8288b1da3b2a87_530630047
## 5b3acb7aa679dc14_530639997 0.9258439
## abde4337eb9064e4_530639996 0.8835324
## fa8288b1da3b2a87_530630047 1.0000000
This introduction to the mts_monitor
data model should be enough to get you started. Lots more documentation and examples are available in the package documentation.
Best of luck exploring and understanding PM 2.air quality data!