Overview

Transcription factors and microRNAs are important for regulating the gene expression in normal physiology and pathological conditions. Many bioinformatic tools were built to predict and identify transcription factors and microRNA targets and their role in development of diseases including cancers. The availability of public access high-throughput data allowed for data-driven discoveries and validations of these predictions. Here, we build on that kind of tools and integrative analyses to provide a tool to access, manage and visualize data from open source databases. cRegulome provides a programmatic access to the regulome (microRNA and transcription factor) correlations with target genes in cancer. The package obtains a local instance of Cistrome Cancer and miRCancerdb databases and provides objects and methods to interact with and visualize the correlation data.

Getting started

To get started with cRegulome, we show a very quick example. We first start by downloading a small test database file, make a simple query and convert the output to a cRegulome object to print and visualize.

# load required libraries
library(cRegulome)
library(RSQLite)
library(ggplot2)
# download the db file when using it for the first time
destfile = paste(tempdir(), 'cRegulome.db.gz', sep = '/')
if(!file.exists(destfile)) {
    get_db(test = TRUE)
}

# connect to the db file
db_file = paste(tempdir(), 'cRegulome.db', sep = '/')
conn <- dbConnect(SQLite(), db_file)
# alternative to downloading the database file
wget https://s3-eu-west-1.amazonaws.com/pfigshare-u-files/9537385/cRegulome.db.gz
gunzip cRegulome.db.gz
# enter a custom query with different arguments
dat <- get_mir(conn,
               mir = 'hsa-let-7g',
               study = 'STES',
               min_abs_cor = .3,
               max_num = 5)

# make a cmicroRNA object   
ob <- cmicroRNA(dat)
# print object
ob
## A cmicroRNA object: microRNA-gene correlations in Cancer
##  Contains:
##  1 Cancer study/ies: STES 
##  1 microRNA/s: hsa-let-7g 
##  5 features: ARHGEF15 LSM11 STARD13 CXorf36 IL13
# plot object
cor_plot(ob)

Package Description

Data sources

The two main sources of data used by this package are Cistrome Cancer and miRCancerdb databases. Cistrome Cancer is based on an integrative analysis of The Cancer Genome Atlas (TCGA) and public ChIP-seq data. It provides calculated correlations of (n = 320) transcription factors and their target genes in (n = 29) cancer study. In addition, Cistrome Cancer provides the transcription factors regulatory potential to target and non-target genes. miRCancerdb uses TCGA data and TargetScan annotations to correlate known microRNAs (n = 750) and target and non-target genes in (n = 25) cancer studies.

Database file

cRegulome obtains a pre-build SQLite database file of the Cistrome Cancer and miRCancerdb databases. The details of this build is provided at cRegulomedb in addition to the scripts used to pull, format and deposit the data at an on-line repository. Briefly, the SQLite database consist of 4 tables cor_mir and cor_tf for correlation values and targets_mir and targets_tf for microRNA miRBase ID and transcription factors symbols to genes mappings. Two indices were created to facilitate the database search using the miRBase IDs and transcription factors symbols. The database file can be downloaded using the function get_db.

To show the details of the database file, the following code connects to the database and show the names of tables and fields in each of them.

# table names
tabs <- dbListTables(conn)
print(tabs)
## [1] "cor_mir"     "cor_tf"      "targets_mir" "targets_tf"
# fields/columns in the tables
for(i in seq_along(tabs)) {
  print(dbListFields(conn, tabs[i]))
}
## [1] "mirna_base" "feature"    "STES"      
## [1] "tf"      "feature" "STES"   
## [1] "mirna_base" "feature"   
## [1] "tf"      "feature" "study"

Database query

To query the database using cRegulome, we provide two main functions; get_mir and get_tf for querying microRNA and transcription factors correlations respectively. Users need to provide the proper IDs for microRNA, transcription factor symbols and/or TCGA study identifiers. microRNAs are referred to by the official miRBase IDs, transcription factors by their corresponding official gene symbols that contains them and TCGA studies with their common identifiers. In either cases, the output of calling the these functions is a tidy data frame of 4 columns; mirna_base/ tf, feature, cor and study These correspond to the miRBase IDs or transcription factors symbol, gene symbol, correlation value and the TCGA study identifier.

Here we show an example of such a query. Then, we illustrate how this query is executed on the database using basic RSQLite and dbplyr which is what the get_* functions are doing.

# query the db for two microRNAs
dat_mir <- get_mir(conn,
                   mir = c('hsa-let-7g', 'hsa-let-7i'),
                   study = 'STES')

# query the db for two transcription factors
dat_tf <- get_tf(conn,
                 tf = c('LEF1', 'MYB'),
                 study = 'STES')

# show first 6 line of each of the data.frames
head(dat_mir); head(dat_tf)
##   mirna_base feature   cor study
## 1 hsa-let-7g    ABL2  0.11  STES
## 2 hsa-let-7g  ACTR10 -0.16  STES
## 3 hsa-let-7g  ACVR1C -0.13  STES
## 4 hsa-let-7g ADAMTS1  0.27  STES
## 5 hsa-let-7g ADAMTS5  0.20  STES
## 6 hsa-let-7g   ADCY9  0.18  STES
##     tf feature  cor study
## 1 LEF1     A2M 0.55  STES
## 2 LEF1  A4GALT 0.40  STES
## 3 LEF1   ABCA1 0.44  STES
## 4 LEF1   ABCA6 0.54  STES
## 5 LEF1   ABCA8 0.41  STES
## 6 LEF1   ABCA9 0.48  STES

Objects

Two S3 objects are provided by cRegulome to store and dispatch methods on the correlation data. cmicroRNA and cTF for microRNA and transcription factors respectively. The structure of these objects is very similar. Basically, as all S3 objects, it’s a list of 4 items; microRNA or TF for the regulome element, features for the gene hits, studies for the TCGA studies and finally corr is either a data.frame when the object has data.from a single TCGA study or a named list of data.frames when it has data from multiple studies. Each of these data.frames has the regulome element (microRNAs or transcription factors) in columns and features/genes in rows.
To construct these objects, users need to call a constructor function with the corresponding names on the data.frame output form get_*. The reverse is possible by calling the function cor_tidy on the object to get back the tidy data.frame.

# explore the cmicroRNA object
ob_mir <- cmicroRNA(dat_mir)
class(ob_mir)
## [1] "cmicroRNA"
str(ob_mir)
## List of 4
##  $ microRNA: chr [1:2] "hsa-let-7g" "hsa-let-7i"
##  $ features: chr [1:612] "ABL2" "ACTR10" "ACVR1C" "ADAMTS1" ...
##  $ studies : chr "STES"
##  $ corr    :List of 1
##   ..$ STES:'data.frame': 612 obs. of  2 variables:
##   .. ..$ hsa-let-7g: num [1:612] NA NA NA NA 0.11 NA NA NA -0.16 NA ...
##   .. ..$ hsa-let-7i: num [1:612] -0.2 0.12 0.14 -0.1 0.27 -0.12 0.13 0.1 NA 0.23 ...
##  - attr(*, "class")= chr "cmicroRNA"
# explore the cTF object
ob_tf <- cTF(dat_tf)
class(ob_tf)
## [1] "cTF"
str(ob_tf)
## List of 4
##  $ TF      : chr [1:2] "LEF1" "MYB"
##  $ features: chr [1:1953] "A2M" "A4GALT" "ABCA1" "ABCA6" ...
##  $ studies : chr "STES"
##  $ corr    :List of 1
##   ..$ STES:'data.frame': 1953 obs. of  2 variables:
##   .. ..$ LEF1: num [1:1953] 0.55 0.4 0.44 0.54 0.41 0.48 0.56 0.43 0.54 0.45 ...
##   .. ..$ MYB : num [1:1953] NA NA NA NA NA NA NA NA NA NA ...
##  - attr(*, "class")= chr "cTF"

Methods

cRegulome provides S3 methods to interact a visualize the correlations data in the cmicroRNA and cTF objects. Table 1 provides an over view of these functions. These methods dispatch directly on the objects and could be customized and manipulated in the same way as their generics.

# cmicroRNA object methods
methods(class = 'cmicroRNA')
## [1] cor_hist         cor_igraph       cor_joy          cor_plot        
## [5] cor_tidy         cor_upset        cor_venn_diagram print           
## see '?methods' for accessing help and source code
# cTF object methods
methods(class = 'cTF')
## [1] cor_hist         cor_igraph       cor_joy          cor_plot        
## [5] cor_tidy         cor_upset        cor_venn_diagram print           
## see '?methods' for accessing help and source code
# tidy method
head(cor_tidy(ob_mir))
##    mirna_base feature   cor study
## 5  hsa-let-7g    ABL2  0.11  STES
## 9  hsa-let-7g  ACTR10 -0.16  STES
## 11 hsa-let-7g  ACVR1C -0.13  STES
## 13 hsa-let-7g ADAMTS1  0.27  STES
## 16 hsa-let-7g ADAMTS5  0.20  STES
## 17 hsa-let-7g   ADCY9  0.18  STES
# cor_hist method
cor_hist(ob_mir,
     breaks = 100,
     main = '', xlab = 'Correlation')

dev.off()
## null device 
##           1
# cor_joy method
cor_joy(ob_mir) +
    labs(x = 'Correlation', y = '')

dev.off()
## null device 
##           1
# cor_venn_diagram method
cor_venn_diagram(ob_mir, cat.default.pos = 'text')

dev.off()
## null device 
##           1
# cor_upset method
cor_upset(ob_mir)

dev.off()
## null device 
##           1

Contributions

Comments, issues and contributions are welcomed at: https://github.com/MahShaaban/cRegulome

Citations

Please cite:

citation('cRegulome')