The noctua package aims to make it easier to work with
data stored in AWS
Athena. noctua package attempts to provide three
levels of interacting with AWS Athena:
AWS
Athena backend utilising the AWS SDK paws. This
includes configuring AWS
Athena Work Groups to assuming different roles within
AWS when connecting to AWS Athena.noctua, by providing a DBI
interface to AWS Athena. Users are able to interact with
AWS Athena utilising familiar functions and methods they
have used for other Databases from R.dplyr is coming more popular, noctua aims to
give dplyr a seamless interface into AWS
Athena.noctua:As noctua utilising the R AWS SDK paws the
installation of noctua is pretty straight forward:
# cran version
install.packages("noctua")
# Dev version
remotes::install_github("dyfanjones/noctua")To help with users wishing to run noctua in a docker, a simple docker file has been
created here.
To set up the docker please refer to link.
For demo purposes we will use the example
docker and run it locally:
# build docker image
docker build . -t noctua
# start container with aws credentials passed from local
docker run \
-e AWS_ACCESS_KEY_ID="$(aws configure get aws_access_key_id)" \
-e AWS_SECRET_ACCESS_KEY="$(aws configure get aws_secret_access_key)" \
-e AWS_SESSION_TOKEN="$(aws configure get aws_session_token)" \
-e AWS_DEFAULT_REGION="$(aws configure get region)" \
-it noctua
NOTE: readr isn’t required for
noctua, however it has been included in the docker file to
improve performance when querying AWS Athena.
library(DBI)
library(noctua)
con <- dbConnect(athena())
# list all current work groups in AWS Athena
list_work_groups(con)
# Create a new work group
create_work_group(con, "demo_work_group", description = "This is a demo work group",
tags = tag_options(key= "demo_work_group", value = "demo_01"))library(DBI)
con <- dbConnect(noctua::athena())
# Get metadata
dbGetInfo(con)
# $profile_name
# [1] "default"
#
# $s3_staging
# [1] ######## NOTE: Please don't share your S3 bucket to the public
#
# $dbms.name
# [1] "default"
#
# $work_group
# [1] "primary"
#
# $poll_interval
# NULL
#
# $encryption_option
# NULL
#
# $kms_key
# NULL
#
# $expiration
# NULL
#
# $region_name
# [1] "eu-west-1"
#
# $paws
# [1] "0.1.6"
#
# $noctua
# [1] "1.5.1"
# create table to AWS Athena
dbWriteTable(con, "iris", iris)
dbGetQuery(con, "select * from iris limit 10")
# Info: (Data scanned: 860 Bytes)
# sepal_length sepal_width petal_length petal_width species
# 1: 5.1 3.5 1.4 0.2 setosa
# 2: 4.9 3.0 1.4 0.2 setosa
# 3: 4.7 3.2 1.3 0.2 setosa
# 4: 4.6 3.1 1.5 0.2 setosa
# 5: 5.0 3.6 1.4 0.2 setosa
# 6: 5.4 3.9 1.7 0.4 setosa
# 7: 4.6 3.4 1.4 0.3 setosa
# 8: 5.0 3.4 1.5 0.2 setosa
# 9: 4.4 2.9 1.4 0.2 setosa
# 10: 4.9 3.1 1.5 0.1 setosalibrary(dplyr)
athena_iris <- tbl(con, "iris")
athena_iris %>%
select(species, sepal_length, sepal_width) %>%
head(10) %>%
collect()
# Info: (Data scanned: 860 Bytes)
# # A tibble: 10 x 3
# species sepal_length sepal_width
# <chr> <dbl> <dbl>
# 1 setosa 5.1 3.5
# 2 setosa 4.9 3
# 3 setosa 4.7 3.2
# 4 setosa 4.6 3.1
# 5 setosa 5 3.6
# 6 setosa 5.4 3.9
# 7 setosa 4.6 3.4
# 8 setosa 5 3.4
# 9 setosa 4.4 2.9
# 10 setosa 4.9 3.1