epi2me2r is designed to take CSV output from Oxford Nanopore’s EPI2ME and facilitate the easy import of these documents into R for downstream analysis. The raw data are imported either into a phyloseq or metagenomeSeq object, or count tables and taxonomy for other packages. Currently, raw data from WIMP and Antimicrobial Resistance (AMR) can be used as inputs.
There are three main types of functions in epi2me2r
:
phyloseq
or metagenomeSeq
objects for
downstream analysis:
raw_amr_to_phyloseq()
phyloseq
object with count table, taxonomy, and
metadataraw_amr_to_metagenomeseq()
metagenomeSeq
object with count table,
taxonomy, and metadataraw_wimp_to_phyloseq()
phyloseq
object with count table, taxonomy, and
metadataraw_wimp_to_metagenomeseq()
metagenomeSeq
object with count table,
taxonomy, and metadataread_in_amr_files
read_in_wimp_files
generate_amr_taxonomy
read_in_amr_files
or generated
in the same formatgenerate_wimp_taxonomy
read_in_wimp_files
or generated
in the same formatamr_read_taxonomy()
.
This function reads in both AMR and WIMP raw data and adds the taxonomic
information to the AMR gene if available. This function uses the paths
from both the raw AMR directory containing the AMR CSVs and the WIMP
directory with all the raw WIMP CSVs. The output is a taxonomic
classification of all AMR genes that have both classifications
available.The development version of epi2me2r
can currently be
downloaded and installed from GitHub:
if (!require(remotes, quietly = TRUE)) install.packages('remotes')
remotes::install_github("mweinroth/epi2me2r")
This will install epi2me2r
and its CRAN dependencies
data.table
and taxonomizr
. However it will not
install the dependencies Biobase
, phyloseq
,
and metagenomeSeq
, which are available through
Bioconductor. To install those dependencies, run the following code. The
following code first installs the BiocManager
package. Then
it calls BiocManager::install()
without any arguments to
install the latest version of Bioconductor. Then it calls
BiocManager::install()
again to install the Bioconductor
dependencies of epi2me2r
. See Bioconductor’s installation
instructions for more details.
if (!require("BiocManager", quietly = TRUE)) install.packages("BiocManager")
BiocManager::install()
BiocManager::install(c("Biobase", "phyloseq", "metagenomeSeq"))
To run epi2me2r
, call
library(epi2me2r)
To use epi2me2r
you will need your raw
data and a metadata file.
Raw data files are downloaded from the EPI2ME report either in the
WIMP or AMR CARD tab (each sample will have 2 different files if you
conducted both an AMR and WIMP analysis). Raw data will be downloaded as
a CSV file with a separate file from each run. If you have barcodes for
your samples, multiple samples will be contained in one CSV file; if
not, each sample will have its own file. DO NOT change
the names of the files you have downloaded, as the file name will have
the type of analysis and run number in it (e.g.,
arma_288715.csv
[AMR] or
226094_1777.csv
[WIMP]).
Place all raw data files of the same analysis type in the directory with only those files in it!
You will use this directory location when you import your samples.
The second file you will need is a metadata file describing the type of samples you have, such as sample names. This file may also contain other important information about your samples such as treatments. There are four required columns if you are running both a WIMP and AMR analysis:
This file has 4 required columns that must be named as follows:
arma_filename
: the original amr file name without the
.csv
extensionarma_barcode
: the barcodes of each sample (note: if
you did not barcode any of your samples, enter none
in all
of the cells). In the AMR workflow, missing barcodes are coded
as none
wimp_filename
: the original amr file name without the
.csv
extensionwimp_barcode
: the barcodes of each sample (note: if
you did not barcode any of your samples, enter NA
in all of
the cells). In the WIMP workflow, missing barcodes are coded as
NA
additional information
after these four required
columns, you may include any additional metadata that is important, such
as treatment type, sample numbers, etc.A quick note about barcodes: As talked about in the
Issues section below, earlier versions of the
EPI2ME workflow list ARMA barcodes as "barcode"
and a two
digit number while WIMP barcodes are listed as "BC"
and a
two digit number. While this is no longer the case (both are written as
barcode
followed by the number), if you are using older
output, see the section below on changing the barcodes to a compatible
style.
An example CSV is available here.
See vignette for usage.
The current ARMA and WIMP barcodes have mostly compatible barcode
nomenclature (the only difference being ARMA files without barcodes are
entered as none
and WIMP as NA
). However, in
early EPI2ME versions, ARMA barcodes were listed as
"barcode"
and a two digit number while WIMP barcodes were
listed as "BC"
and a two digit number. While this is no
longer the case (both are formatted as "barcode"
followed
by the number), if you are using older output, you will need to replace
the "BC"
in the WIMP files with "barcode"
. You
can do this through opening the file and Ctrl+H
to search
and replace all "BC"
with "barcode"
in Excel
or another text editor or in R.
The following example code reads in a CSV file and replaces all
instances of the string "BC"
with "barcode"
in
the barcode
column, then overwrites the original file.
fake.data <- read.csv("226094_1777.csv")
fake.data$barcode <- sub("BC*", "barcode", fake.data$barcode)
write.csv(fake.data, "226094_1777.csv")
If you have a question or comment, please open a GitHub issue on
the epi2me2r
repository.