The Assessment, Total Maximum Daily Load (TMDL) Tracking and Implementation System (ATTAINS) is the U.S. Environmental Protection Agency (EPA) database used to track information provided by states about water quality assessments conducted under the Clean Water Act. The assessments are conducted every two years to evaluate if the nation’s water bodies meet water quality standards. States are required to take Actions (TMDLs or other efforts) on water bodies that do not meet standards. Public information in ATTAINS is made available through webservices and provided as JSON files. rATTAINS facilitates accessing this data with various functions that provide raw JSON or formatted “tidy” data for each of the ATTAINS webservice endpoints. More information about Clean Water Act assessment and reporting is available through the EPA. For alternative methods of accessing the same data, see “How’s My Waterway” webpage for interactive data exploration or the ArcGIS MapService for spatial data.
The EPA provides two summary service endpoint that provide summaries of assessed uses by the organization identifier or by hydrologic unit code (HUC). For example, to return a summary of assessed uses by the state of Tennessee the following function is used:
library(rATTAINS)
<- state_summary(organization_id = "TDECWR",
x reporting_cycle = "2016")
x#> # A tibble: 22 x 13
#> organization_identifier organization_name organization_type_~ reporting_cycle
#> <chr> <chr> <chr> <chr>
#> 1 TDECWR Tennessee State 2016
#> 2 TDECWR Tennessee State 2016
#> 3 TDECWR Tennessee State 2016
#> 4 TDECWR Tennessee State 2016
#> 5 TDECWR Tennessee State 2016
#> 6 TDECWR Tennessee State 2016
#> 7 TDECWR Tennessee State 2016
#> 8 TDECWR Tennessee State 2016
#> 9 TDECWR Tennessee State 2016
#> 10 TDECWR Tennessee State 2016
#> # ... with 12 more rows, and 9 more variables: combined_cycles <chr>,
#> # water_type_code <chr>, units_code <chr>, use_name <chr>,
#> # fully_supporting <chr>, fully_supporting_count <chr>, not_assessed <chr>,
#> # not_assessed_count <chr>, parameters <list>
The resulting tibble includes the water type, designated use, summary of the how much of the assessed uses meet criteria (by count, area, distance, etc.) or are not assessed. For each row, there is a variable called “parameters” composed of a nested tibble that provides further information about the use assessment by parameters assessed:
$parameters[[1]]
x#> # A tibble: 9 x 7
#> parameter_group cause cause_count meeting_criteria meeting_criteria~
#> <chr> <chr> <chr> <dbl> <dbl>
#> 1 NUTRIENTS 1289.~ 5 NA NA
#> 2 METALS (OTHER THAN MERC~ 2254 4 NA NA
#> 3 FLOW ALTERATION(S) 494 1 NA NA
#> 4 TEMPERATURE 20459 1 NA NA
#> 5 AMMONIA 56.10~ 1 NA NA
#> 6 PH/ACIDITY/CAUSTIC COND~ 56.10~ 1 NA NA
#> 7 SEDIMENT 3772.~ 7 NA NA
#> 8 SALINITY/TOTAL DISSOLVE~ 56.10~ 1 NA NA
#> 9 ORGANIC ENRICHMENT/OXYG~ 5269.~ 5 NA NA
#> # ... with 2 more variables: insufficent_information <dbl>,
#> # insufficient_information_count <dbl>
The HUC12 service operates similarly but provides data summarized by area, specifically HUC12 units. For example:
<- huc12_summary("020700100204")
x
x#> $huc_summary
#> # A tibble: 1 x 14
#> huc12 assessment_unit~ total_catchment~ total_huc_area_~ assessed_catchm~
#> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 020700100204 20 46.2 46.2 44.1
#> # ... with 9 more variables: assessed_catchment_area_percent <dbl>,
#> # assessed_good_catchment_area_sq_mi <dbl>,
#> # assessed_good_catchment_area_percent <dbl>,
#> # assessed_unknown_catchment_area_sq_mi <dbl>,
#> # assessed_unknown_catchment_area_percent <dbl>,
#> # contain_impaired_waters_catchment_area_sq_mi <dbl>,
#> # contain_impaired_waters_catchment_area_percent <dbl>, ...
#>
#> $au_summary
#> # A tibble: 20 x 1
#> assessment_unit_id
#> <chr>
#> 1 MD-ANATF-02140205
#> 2 MD-02140205-Northwest_Branch
#> 3 MD-02140205
#> 4 DCTFD01R_00
#> 5 MD-ANATF
#> 6 DCTFS01R_00
#> 7 DCTNA01R_00
#> 8 DCTTX27R_00
#> 9 DCTFC01R_00
#> 10 MD-02140205-Mainstem
#> 11 MD-02140205-Mainstem2
#> 12 MD-02140205-Northeast_Northwest_Branches
#> 13 DCTWB00R_02
#> 14 DCTWB00R_01
#> 15 DCANA00E_02
#> 16 DCTHR01R_00
#> 17 DCTPB01R_00
#> 18 DCTDU01R_00
#> 19 DCANA00E_01
#> 20 DCAKL00L_00
#>
#> $ir_summary
#> # A tibble: 3 x 4
#> epa_ir_category_name catchment_size_sq_mi catchment_size_pe~ assessment_unit_~
#> <chr> <dbl> <dbl> <dbl>
#> 1 1 1.77 3.83 2
#> 2 4A 25.3 54.8 11
#> 3 5 37.9 81.9 7
#>
#> $use_summary
#> # A tibble: 6 x 5
#> use_group_name use_attainment catchment_size_~ catchment_size_~
#> <chr> <chr> <dbl> <dbl>
#> 1 ECOLOGICAL_USE Not Supporting 19.5 42.1
#> 2 FISHCONSUMPTION_USE Fully Supporting 1.77 3.83
#> 3 FISHCONSUMPTION_USE Insufficient Information 1.91 4.14
#> 4 FISHCONSUMPTION_USE Not Supporting 22.8 49.3
#> 5 OTHER_USE Fully Supporting 1.91 4.13
#> 6 RECREATION_USE Not Supporting 24.5 53.0
#> # ... with 1 more variable: assessment_unit_count <dbl>
#>
#> $param_summary
#> # A tibble: 17 x 4
#> parameter_group_name catchment_size_s~ catchment_size_p~ assessment_unit_~
#> <chr> <dbl> <dbl> <dbl>
#> 1 ALGAL GROWTH 22.8 49.3 2
#> 2 CHLORINE 10.7 23.2 1
#> 3 HABITAT ALTERATIONS 25.3 54.7 3
#> 4 HYDROLOGIC ALTERATION 36.5 79.0 6
#> 5 METALS (OTHER THAN MER~ 22.8 49.3 9
#> 6 NUTRIENTS 42.4 91.7 4
#> 7 OIL AND GREASE 22.8 49.3 3
#> 8 ORGANIC ENRICHMENT/OXY~ 42.4 91.7 8
#> 9 PATHOGENS 44.1 95.4 15
#> 10 PESTICIDES 26.4 57.1 11
#> 11 PH/ACIDITY/CAUSTIC CON~ 1.72 3.71 1
#> 12 POLYCHLORINATED BIPHEN~ 26.4 57.1 12
#> 13 SALINITY/TOTAL DISSOLV~ 19.5 42.1 1
#> 14 SEDIMENT 3.88 8.39 1
#> 15 TOXIC ORGANICS 22.8 49.3 8
#> 16 TRASH 42.4 91.7 4
#> 17 TURBIDITY 44.1 95.4 15
#>
#> $res_plan_summary
#> # A tibble: 1 x 4
#> summary_type_name catchment_size_sq_mi catchment_size_percent assessment_unit~
#> <chr> <dbl> <dbl> <dbl>
#> 1 TMDL 26.4 57.1 15
#>
#> $vision_plan_summary
#> # A tibble: 1 x 4
#> summary_type_name catchment_size_sq_mi catchment_size_percent assessment_unit~
#> <chr> <dbl> <dbl> <dbl>
#> 1 TMDL 26.4 57.1 15
huc12_summary()
returns a list of tibbles with different summaries of information. Using the above example, x$huc_summary
provides a summary of HUC area, and the area and percentage of catchment assessed as good, unknown, or impaired. x$assessment_unit_id
provides a tibble with the unique identifiers for the assessment units (or distinct sections of waterbodies) within the queried HUC12. x$ir_summary
provides a simple summary of the area of the catchment classified under different Integrated Report Categories. x$use_summary
provides a summary of use attainment with the HUC12. x$param_summary
provides the same information for parameter groups. x$res_plan_summary
and x$vision_plan_summary
provides a summary of the amount of the watershed covered by particular types of restoration plans or vision plan, such as TMDLs.
Each function has a number of allowable arguments and associated values. In order to explore what values you might be interested in querying, the Domain Value service provides information about allowable options. This is mapped to the domain_values()
function. When used without any arguments you get a full list of possible “domains.” These are typically searchable parameters used in all the functions in rATTAINS. Note that the domain names returned by these service are not a one to one match with the argument names used in rATTAINS. It is typically fairly easy to figure out which ones match up to which arguments.
For example if I want to find out the possible organization identifiers to query by:
<- domain_values(domain_name = "OrgStateCode")
x
x#> # A tibble: 146 x 4
#> domain name code context
#> <chr> <chr> <chr> <chr>
#> 1 OrgStateCode AK AK EPA
#> 2 OrgStateCode FL FL 21FL303D
#> 3 OrgStateCode PA PA EPA
#> 4 OrgStateCode CC CC TEST_ORG_C
#> 5 OrgStateCode AZ AZ TEST_TRIBE_B
#> 6 OrgStateCode MS MS 21MSWQ
#> 7 OrgStateCode CT CT CT_DEP01
#> 8 OrgStateCode ND ND 21NDHDWQ
#> 9 OrgStateCode MN MN REDLAKE
#> 10 OrgStateCode NM NM PUEBLO_POJOAQUE
#> # ... with 136 more rows
The function returns a variable with the state codes and the possible parameter values as the context variable. Similarly if I want to look up possible Use Names that are utilized by the Texas Commission on Environmental Quality:
<- domain_values(domain_name = "UseName", context = "TCEQMAIN")
x
x#> # A tibble: 17 x 4
#> domain name code context
#> <chr> <chr> <chr> <chr>
#> 1 UseName Recreation Use Recreation Use TCEQMA~
#> 2 UseName Fish Consumption Use Fish Consumption~ TCEQMA~
#> 3 UseName INTERMEDIATE AQUATIC LIFE INTERMEDIATE AQU~ TCEQMA~
#> 4 UseName OVERALL USE SUPPORT OVERALL USE SUPP~ TCEQMA~
#> 5 UseName Aquatic Life Use Aquatic Life Use TCEQMA~
#> 6 UseName Oyster Waters Use Oyster Waters Use TCEQMA~
#> 7 UseName FISH CONSUMPTION FISH CONSUMPTION TCEQMA~
#> 8 UseName OYSTER AQUATIC LIFE OYSTER AQUATIC L~ TCEQMA~
#> 9 UseName NON-CONTACT RECREATION NON-CONTACT RECR~ TCEQMA~
#> 10 UseName CONTACT RECREATION USE CONTACT RECREATI~ TCEQMA~
#> 11 UseName DOMESTIC WATER SUPPLY - PUBLIC WATER SUPPLY DOMESTIC WATER S~ TCEQMA~
#> 12 UseName Public Water Supply Use Public Water Sup~ TCEQMA~
#> 13 UseName General Use General Use TCEQMA~
#> 14 UseName PRIMARY RECREATION/SWIMMING PRIMARY RECREATI~ TCEQMA~
#> 15 UseName CONTACT RECREATION CONTACT RECREATI~ TCEQMA~
#> 16 UseName NONCONTACT RECREATION USE NONCONTACT RECRE~ TCEQMA~
#> 17 UseName Recreational Beaches Recreational Bea~ TCEQMA~
assessment_units()
: provides information about assessment units by the specified argument parameters.
assessments()
provides information about assessment decisions by the specified argument parameters.
actions()
provides information about Actions (such as TMDLs, 4B Actions, or similar) that have been finalized by the specified argument parameters.
plans()
is similiar to actions but provides information about finalized Actions and assessment units by HUC8.
surveys()
provides information about organization conducted statistical surveys about water quality assessment results.
By default, all the functions rATTAINS return one or more “tidy” dataframes. These dataframe are created by attempting to flatten the nested JSON data returned by the webservice. This does require some opinionated decisions on what constitutes flat data, and at what variable data should be flattened to. We recognize that the dataframe output might not meet user needs. There if you would prefer to parse the JSON data yourself, use the tidy=FALSE
argument to return an unparsed JSON string. A number of R packages are available to parse and flatten JSON data to prepare it for analysis.
To reduce unneeded calls to the server, downloaded JSONs files can be individually cached based on the input arguments. This means if you make a repeated call using the same function and combination of argument values, the function will instead read the cached file. You will want to periodically clean these files to ensure your results are up to date (although given the two year cycles of assessments, do not anticipate data being frequently updated). rATTAINS uses the hoardr package and associated methods to access the file cache. See rATTAINS_caching
for additional info, but an example of accessing the file path and deleting files is shown below.
## set package option
rATTAINS_options(cache_downloads = TRUE)
<- domain_values(domain_name = "UseName", context = "TCEQMAIN")
x
## This returns the file path where the files are cached
$cache_path_get()
dv_cache
## get a list of cached files
$list()
dv_cache
## delete one file
#dv_cache$delete(dv_cache$list()[[1]])
## delete all files in the directory
$delete_all() dv_cache
The caching objects for each function are as follows:
actions()
: actions_cache
assessments()
: assessments_cache
assessment_units()
: au_cache
domain_values()
: dv_cache
plans()
: plans_cache
state_summary()
: state_cache
surveys()
: surveys_cache
The U.S. EPA is the data provider for this public information. rATTAINS and the author are not affiliated with the EPA. Questions about the package functionality should be directed to the package author. Questions about the webservice or underlying data should be directed to the U.S. EPA. Please do not abuse the webservice using this package.