NACHO Analysis

A NAnostring quality Control dasHbOard

Mickaël Canouil, Ph.D., Gerard A. Bouland and Roderick C. Slieker, Ph.D.

May 31, 2022

1 Installation

# Install NACHO from CRAN:
install.packages("NACHO")

# Or the the development version from GitHub:
# install.packages("remotes")
remotes::install_github("mcanouil/NACHO")

2 Overview

NACHO (NAnostring quality Control dasHbOard) is developed for NanoString nCounter data.
NanoString nCounter data is a messenger-RNA/micro-RNA (mRNA/miRNA) expression assay and works with fluorescent barcodes.
Each barcode is assigned a mRNA/miRNA, which can be counted after bonding with its target.
As a result each count of a specific barcode represents the presence of its target mRNA/miRNA.

NACHO is able to load, visualise and normalise the exported NanoString nCounter data and facilitates the user in performing a quality control.
NACHO does this by visualising quality control metrics, expression of control genes, principal components and sample specific size factors in an interactive web application.

With the use of two functions, RCC files are summarised and visualised, namely: load_rcc() and visualise().

NACHO also includes a function normalise(), which (re)calculates sample specific size factors and normalises the data.

In addition (since v0.6.0) NACHO includes two (three) additional functions:

For more vignette("NACHO") and vignette("NACHO-analysis").

Canouil M, Bouland GA, Bonnefond A, Froguel P, Hart L, Slieker R (2019). “NACHO: an R package for quality control of NanoString nCounter data.” Bioinformatics. ISSN 1367-4803, doi:10.1093/bioinformatics/btz647.

@Article{,
  title = {{NACHO}: an {R} package for quality control of {NanoString} {nCounter} data},
  author = {Mickaël Canouil and Gerard A. Bouland and Amélie Bonnefond and Philippe Froguel and Leen Hart and Roderick Slieker},
  journal = {Bioinformatics},
  address = {Oxford, England},
  year = {2019},
  month = {aug},
  issn = {1367-4803},
  doi = {10.1093/bioinformatics/btz647},
}

3 Analyse NanoString data

3.1 Load packages

library(NACHO)
library(GEOquery, quietly = TRUE, warn.conflicts = FALSE)
## 
## Attaching package: 'BiocGenerics'
## The following object is masked from 'package:NACHO':
## 
##     normalize
## The following objects are masked from 'package:stats':
## 
##     IQR, mad, sd, var, xtabs
## The following objects are masked from 'package:base':
## 
##     Filter, Find, Map, Position, Reduce, anyDuplicated, append,
##     as.data.frame, basename, cbind, colnames, dirname, do.call,
##     duplicated, eval, evalq, get, grep, grepl, intersect, is.unsorted,
##     lapply, mapply, match, mget, order, paste, pmax, pmax.int, pmin,
##     pmin.int, rank, rbind, rownames, sapply, setdiff, sort, table,
##     tapply, union, unique, unsplit, which.max, which.min
## Welcome to Bioconductor
## 
##     Vignettes contain introductory material; view with
##     'browseVignettes()'. To cite Bioconductor, see
##     'citation("Biobase")', and for packages 'citation("pkgname")'.
## Setting options('download.file.method.GEOquery'='auto')
## Setting options('GEOquery.inmemory.gpl'=FALSE)

3.2 Download GSE70970 from GEO (or use your own data)

data_directory <- file.path(tempdir(), "GSE70970", "Data")

# Download data
gse <- getGEO("GSE70970")
## Found 1 file(s)
## GSE70970_series_matrix.txt.gz
getGEOSuppFiles(GEO = "GSE70970", baseDir = tempdir())
##                                                                                                                 size
## /var/folders/gn/mxv05rj52wd1yg1hb018s4s40000gn/T//RtmpT9FzPf/GSE70970/GSE70970_RAW.tar                       1986560
## /var/folders/gn/mxv05rj52wd1yg1hb018s4s40000gn/T//RtmpT9FzPf/GSE70970/GSE70970_characteristics_readme.txt.gz     672
##                                                                                                              isdir
## /var/folders/gn/mxv05rj52wd1yg1hb018s4s40000gn/T//RtmpT9FzPf/GSE70970/GSE70970_RAW.tar                       FALSE
## /var/folders/gn/mxv05rj52wd1yg1hb018s4s40000gn/T//RtmpT9FzPf/GSE70970/GSE70970_characteristics_readme.txt.gz FALSE
##                                                                                                              mode
## /var/folders/gn/mxv05rj52wd1yg1hb018s4s40000gn/T//RtmpT9FzPf/GSE70970/GSE70970_RAW.tar                        644
## /var/folders/gn/mxv05rj52wd1yg1hb018s4s40000gn/T//RtmpT9FzPf/GSE70970/GSE70970_characteristics_readme.txt.gz  644
##                                                                                                                            mtime
## /var/folders/gn/mxv05rj52wd1yg1hb018s4s40000gn/T//RtmpT9FzPf/GSE70970/GSE70970_RAW.tar                       2022-05-31 08:51:26
## /var/folders/gn/mxv05rj52wd1yg1hb018s4s40000gn/T//RtmpT9FzPf/GSE70970/GSE70970_characteristics_readme.txt.gz 2022-05-31 08:51:26
##                                                                                                                            ctime
## /var/folders/gn/mxv05rj52wd1yg1hb018s4s40000gn/T//RtmpT9FzPf/GSE70970/GSE70970_RAW.tar                       2022-05-31 08:51:26
## /var/folders/gn/mxv05rj52wd1yg1hb018s4s40000gn/T//RtmpT9FzPf/GSE70970/GSE70970_characteristics_readme.txt.gz 2022-05-31 08:51:26
##                                                                                                                            atime
## /var/folders/gn/mxv05rj52wd1yg1hb018s4s40000gn/T//RtmpT9FzPf/GSE70970/GSE70970_RAW.tar                       2022-05-31 08:51:23
## /var/folders/gn/mxv05rj52wd1yg1hb018s4s40000gn/T//RtmpT9FzPf/GSE70970/GSE70970_characteristics_readme.txt.gz 2022-05-31 08:51:26
##                                                                                                              uid
## /var/folders/gn/mxv05rj52wd1yg1hb018s4s40000gn/T//RtmpT9FzPf/GSE70970/GSE70970_RAW.tar                       501
## /var/folders/gn/mxv05rj52wd1yg1hb018s4s40000gn/T//RtmpT9FzPf/GSE70970/GSE70970_characteristics_readme.txt.gz 501
##                                                                                                              gid
## /var/folders/gn/mxv05rj52wd1yg1hb018s4s40000gn/T//RtmpT9FzPf/GSE70970/GSE70970_RAW.tar                        20
## /var/folders/gn/mxv05rj52wd1yg1hb018s4s40000gn/T//RtmpT9FzPf/GSE70970/GSE70970_characteristics_readme.txt.gz  20
##                                                                                                                 uname
## /var/folders/gn/mxv05rj52wd1yg1hb018s4s40000gn/T//RtmpT9FzPf/GSE70970/GSE70970_RAW.tar                       mcanouil
## /var/folders/gn/mxv05rj52wd1yg1hb018s4s40000gn/T//RtmpT9FzPf/GSE70970/GSE70970_characteristics_readme.txt.gz mcanouil
##                                                                                                              grname
## /var/folders/gn/mxv05rj52wd1yg1hb018s4s40000gn/T//RtmpT9FzPf/GSE70970/GSE70970_RAW.tar                        staff
## /var/folders/gn/mxv05rj52wd1yg1hb018s4s40000gn/T//RtmpT9FzPf/GSE70970/GSE70970_characteristics_readme.txt.gz  staff
# Unzip data
untar(
  tarfile = file.path(tempdir(), "GSE70970", "GSE70970_RAW.tar"),
  exdir = data_directory
)
# Get phenotypes and add IDs
targets <- pData(phenoData(gse[[1]]))
targets$IDFILE <- list.files(data_directory)

3.3 Import RCC files

GSE70970 <- load_rcc(data_directory, targets, id_colname = "IDFILE")
## [NACHO] Importing RCC files.
## Error in load_rcc(data_directory, targets, id_colname = "IDFILE"): [NACHO] Multiple Nanostring file/software versions detected.
##   Please provide a set of files with the same version.
##   - FileVersion: '1.6', '1.6'
##   - SoftwareVersion: '2.1.2.3', '2.1.1.0005'

3.4 Perform the analyses using limma

library(limma)
## 
## Attaching package: 'limma'
## The following object is masked from 'package:BiocGenerics':
## 
##     plotMA

3.4.1 Get the phenotypes

selected_pheno <- GSE70970[["nacho"]][
  j = lapply(unique(.SD), function(x) ifelse(x == "NA", NA, x)),
  .SDcols = c("IDFILE", "age:ch1", "gender:ch1", "chemo:ch1", "disease.event:ch1")
]
## Error in eval(expr, envir, enclos): object 'GSE70970' not found
selected_pheno <- na.exclude(selected_pheno)
## Error in na.exclude(selected_pheno): object 'selected_pheno' not found
## Error in head(selected_pheno): object 'selected_pheno' not found

3.4.2 Get the normalised counts

expr_counts <- GSE70970[["nacho"]][
  i = grepl("Endogenous", CodeClass),
  j = as.matrix(
    dcast(.SD, Name ~ IDFILE, value.var = "Count_Norm"),
    "Name"
  ),
  .SDcols = c("IDFILE", "Name", "Count_Norm")
]
## Error in eval(expr, envir, enclos): object 'GSE70970' not found
## Error in eval(expr, envir, enclos): object 'expr_counts' not found

Alternatively, "Accession" number is also available.

GSE70970[["nacho"]][
  i = grepl("Endogenous", CodeClass),
  j = as.matrix(
    dcast(.SD, Accession ~ IDFILE, value.var = "Count_Norm"),
    "Accession"
  ),
  .SDcols = c("IDFILE", "Accession", "Count_Norm")
]

3.4.3 Select phenotypes and counts

  1. Make sure count matrix and phenotypes have the same samples
samples_kept <- intersect(selected_pheno[["IDFILE"]], colnames(expr_counts))
## Error in h(simpleError(msg, call)): error in evaluating the argument 'x' in selecting a method for function 'intersect': object 'selected_pheno' not found
expr_counts <- expr_counts[, samples_kept]
## Error in eval(expr, envir, enclos): object 'expr_counts' not found
selected_pheno <- selected_pheno[IDFILE %in% c(samples_kept)]
## Error in eval(expr, envir, enclos): object 'selected_pheno' not found
  1. Build the numeric design matrix
design <- model.matrix(~ `disease.event:ch1`, selected_pheno)
## Error in terms.formula(object, data = data): object 'selected_pheno' not found
  1. limma
eBayes(lmFit(expr_counts, design))
## Error in lmFit(expr_counts, design): object 'expr_counts' not found

3.5 Perform the analyses using lm (or any other model)

GSE70970[["nacho"]][
  i = grepl("Endogenous", CodeClass),
  j = lapply(unique(.SD), function(x) ifelse(x == "NA", NA, x)),
  .SDcols = c(
    "IDFILE", "Name", "Accession", "Count", "Count_Norm",
    "age:ch1", "gender:ch1", "chemo:ch1", "disease.event:ch1"
  )
][
  Name %in% head(unique(Name), 10)
][
  j = as.data.table(
    coef(summary(lm(
      formula = Count_Norm ~ `disease.event:ch1`,
      data = na.exclude(.SD)
    ))),
    "term"
  ),
  by = c("Name", "Accession")
]
## Error in eval(expr, envir, enclos): object 'GSE70970' not found