Syntax changes

Martin Westgate & Dax Kellie

2022-01-24

Version 1.4.0 brings in three major changes in how galah works:

Below we discuss each of these changes in turn. Please note, however, that these changes are by no means set in stone - it is absolutely possible to change syntax in future versions of galah if alternative names are easier to use and understand. We would appreciate any feedback from users about what works or what doesn’t work. It is our goal to create a package that is as easy and intuitive for users as possible!

NSE and comparison to dplyr

galah_ functions now evaluate arguments just like dplyr. To see what we mean, let’s look at an example of how dplyr::filter() works. Notice how dplyr::filter and galah_filter both require logical arguments to be added by using the == sign:

library(dplyr)

mtcars %>% 
  filter(mpg == 21)
##               mpg cyl disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4      21   6  160 110  3.9 2.620 16.46  0  1    4    4
## Mazda RX4 Wag  21   6  160 110  3.9 2.875 17.02  0  1    4    4
galah_call() %>% 
  galah_filter(year == 2021) %>% 
  atlas_counts()
## # A tibble: 1 × 1
##     count
##     <int>
## 1 1161557

As another example, notice how galah_group_by() + atlas_counts() works very similarly to dplyr::group_by() + dplyr::count():

mtcars %>% 
  group_by(vs) %>% 
  count()
## # A tibble: 2 × 2
## # Groups:   vs [2]
##      vs     n
##   <dbl> <int>
## 1     0    18
## 2     1    14
galah_call() %>%
  galah_group_by(biome) %>%
  atlas_counts()
## # A tibble: 2 × 2
##   biome          count
##   <chr>          <int>
## 1 TERRESTRIAL 93590939
## 2 MARINE       3519480

We made this move towards tidy evaluation to make it possible to use piping for building queries to the Atlas of Living Australia. In practice, this means that data queries can be filtered just like how you might filter a data.frame with the tidyverse suite of functions.

Function naming

Prior to version 1.4.0, galah naming conventions had two major problems:

To address these concerns (and other smaller points), we have completed a rewrite of our function names to increase clarity (see table below). Deprecated function names will now return a warning message when used, suggesting to users that they switch to the new syntax.

galah 1.3.1 and earlier galah 1.4.0
galah_call
select_taxa galah_identify
select_filters galah_filter
select_columns galah_select
select_locations galah_geolocate
galah_group_by
galah_down_to
ala_counts atlas_counts
ala_occurrences atlas_occurrences
ala_species atlas_species
ala_media atlas_media
ala_taxonomy atlas_taxonomy
ala_citation atlas_citation
select_taxa search_taxa, search_identifiers
search_fields search_fields
show_all_fields
find_profiles show_all_profiles
find_ranks show_all_ranks
find_atlases show_all_atlases
find_reasons show_all_reasons
find_cached_files show_all_cached_files
find_field_values search_field_values
find_profile_attributes search_profile_attributes

Piping with galah_call()

Perhaps the largest change from galah 1.4.0 is the implementation of piping using galah_call().

Beginning a query with galah_call() (be sure to add the parentheses!) tells galah that you will be using pipes to construct your query. Follow this with your preferred pipe (|> from base or %>% from magrittr). You can then narrow your query line-by-line using galah_ functions. Finally, end with an atlas_ function to identify what type of data you want from your query.

Unlike old function names, which will be removed from future versions, we do intend to continue supported un-piped syntax in future, although piping only works with revamped function names. If you’re new to piping, here’s a comparison against code from previous versions of galah.

Previously, if you wanted to look up the number of records of each bandicoot species every year from 2010 to 2021, you’d have had to do something like this:

library(purrr)
library(dplyr)

taxa <- ala_species(taxa = select_taxa("perameles"))$species
years <- select_filters(year = seq(2010:2021))

taxa %>%
  map_dfr( ~ ala_counts(
    taxa = select_taxa(list(species = .x)),
    filters = years,
    group_by = "year")

Not very easy because you had to use multiple atlas_ functions and you had to use loops. However, now with piping you can do it like this:

galah_call() %>%
  galah_identify("perameles") %>%
  galah_filter(year > 2010) %>%
  galah_group_by(species, year) %>%
  atlas_counts()

And a second example, if you wanted to download occurrence records of bandicoots in 2021, and also to include information on which records had zero coordinates, previously you would have had to do this:

atlas_occurrences(taxa = select_taxa("perameles"),
                  filters = select_filters(year = 2021),
                  columns = select_columns(group = "basic", "ZERO_COORDINATE"))

Now with piping:

galah_call() %>%
  galah_identify("perameles") %>%
  galah_filter(year == 2021) %>%
  galah_select(group = "basic", ZERO_COORDINATE) %>%
  atlas_occurrences()