2.A: Enrichr & rbioapi

Moosa Rezwani

2022-08-06


1 Introduction

Enrichr is a gene-set enrichment analysis tool developed in the Ma’ayan Lab.

2 Gene set library concept in Enrichr

Directly quoting from Enrichr’s help page;

A gene set library is a set of related gene sets or enrichment terms. Each enrichment term in Enrichr’s results pages is organized by its gene set library. These libraries have been constructed from many sources such as published studies and major biological and biomedical online databases. Others have been created for and only available through Enrichr. For example, the ChEA 2015 library is a set of functional terms representing transcription factors profiled by ChIP-seq in mammalian cells. Each term is associated with a collection of putative targets inferred from the peaks identified in each ChIP-seq study.

(source: https://maayanlab.cloud/Enrichr/help#background)

To get a list of available Enrichr libraries, use:

enrichr_libs <- rba_enrichr_libs()

In the returned data frame, you can find the names of available Enrichr libraries in “libraryName” column:


3 Enrichment analysis using Enrichr

To perform enrichment analysis on your gene-set with Enrichr using rbioapi, you can take two approaches. we will begin with the simple one.

3.1 Approach 1: Using the Wrapper function

Just fill the arguments of rba_enrichr according to the function’s manual; Simply supply your gene-set as a character vector and select the libraries.

# 1 We create a variable with our genes' NCBI IDs
genes <- c("p53", "BRCA1", "cdk2", "Q99835", "CDC42","CDK1","KIF23","PLK1",
           "RAC2","RACGAP1","RHOA","RHOB", "PHF14", "RBM3", "MSL1")

# 2.a Do enrichment analysis on your genes using "MSigDB_Hallmark_2020" library
enrichr_msig_hallmark <- rba_enrichr(gene_list = genes,
                                     gene_set_library = "MSigDB_Hallmark_2020")
# 2.b Maybe you want to perform enrichment analysis using every library that contains the word "msig":
enrichr_msig <- rba_enrichr(gene_list = genes,
                            gene_set_library = "msig",
                            regex_library_name = TRUE)
# 2.c Or maybe you want to perform enrichment analysis using every library available at Enrichr:
# enrichr_all <- rba_enrichr(gene_list = genes,
#                            gene_set_library = "all")

Note that when only one Enrichr library is selected, a data frame with enrichment analysis result will be returned:

str(enrichr_msig_hallmark)
#> 'data.frame':    18 obs. of  9 variables:
#>  $ Term                : chr  "Mitotic Spindle" "G2-M Checkpoint" "E2F Targets" "Apoptosis" ...
#>  $ Overlap             : chr  "5/199" "4/200" "4/200" "3/161" ...
#>  $ P.value             : num  2.57e-07 1.22e-05 1.22e-05 2.17e-04 2.74e-03 ...
#>  $ Adjusted.P.value    : num  4.62e-06 7.29e-05 7.29e-05 9.76e-04 9.87e-03 ...
#>  $ Old.P.value         : int  0 0 0 0 0 0 0 0 0 0 ...
#>  $ Old.Adjusted.P.value: int  0 0 0 0 0 0 0 0 0 0 ...
#>  $ Odds.Ratio          : num  51 36.7 36.7 31.4 29.7 ...
#>  $ Combined.Score      : num  774 416 416 265 175 ...
#>  $ Genes               : chr  "CDC42;RACGAP1;PLK1;CDK1;KIF23" "RACGAP1;PLK1;CDK1;KIF23" "RACGAP1;PLK1;CDK1;BRCA1" "CDK2;BRCA1;RHOB" ...

But when multiple libraries have been selected, the function’s output will be a list where each element is a data frame corresponding to one of the selected libraries:

str(enrichr_msig, 1)
#> List of 3
#>  $ MSigDB_Computational       :'data.frame': 195 obs. of  9 variables:
#>  $ MSigDB_Oncogenic_Signatures:'data.frame': 26 obs. of  9 variables:
#>  $ MSigDB_Hallmark_2020       :'data.frame': 18 obs. of  9 variables:

3.2 Approach 2: Going step-by-step

As you can see in rba_enrichr()’s name, it is a wrapper function. It basically executes the following sequence of functions:

# 1 Get a list of available Enrichr libraries
libs <- rba_enrichr_libs(store_in_options = TRUE)

# 2 Submit your gene-set to enrichr
list_id <- rba_enrichr_add_list(gene_list = genes)

# 3 Perform Enrichment analysis with your uploaded gene-set
enriched <- rba_enrichr_enrich(user_list_id = list_id$userListId,
                               gene_set_library = "Table_Mining_of_CRISPR_Studies")

## As always, use str() to see what you have:
str(enriched, 1)
#> 'data.frame':    46 obs. of  9 variables:
#>  $ Term                : chr  "Reaction polymerase chain quantitative time, PMC6627898 (Table1)" "Anticancer corresponding targets potential non, PMC5981615 (Table1)" "Names alphabetical order corresponding listed, PMC6783930 (Table1)" "Supplemental, PMC6879830 (Tablesupplemental)" ...
#>  $ Overlap             : chr  "4/6" "3/25" "3/31" "10/2992" ...
#>  $ P.value             : num  3.07e-12 7.77e-07 1.51e-06 8.05e-06 6.26e-05 ...
#>  $ Adjusted.P.value    : num  1.41e-10 1.79e-05 2.32e-05 9.26e-05 4.80e-04 ...
#>  $ Old.P.value         : int  0 0 0 0 0 0 0 0 0 0 ...
#>  $ Old.Adjusted.P.value: int  0 0 0 0 0 0 0 0 0 0 ...
#>  $ Odds.Ratio          : num  3633.3 226.9 178.2 11.4 219.5 ...
#>  $ Combined.Score      : num  96318 3191 2388 134 2124 ...
#>  $ Genes               : chr  "CDC42;CDK2;CDK1;RHOA" "PLK1;CDK2;CDK1" "CDK2;CDK1;BRCA1" "CDC42;RBM3;RACGAP1;PLK1;CDK2;CDK1;PHF14;KIF23;BRCA1;MSL1" ...

Please Note: Other services supported by rbioapi also provide Over-representation analysis tools. Please see the vignette article Do with rbioapi: Over-Representation (Enrichment) Analysis in R (link to the documentation site) for an in-depth review.

4 See also in Functions’ manuals

Some rbioapi Enrichr functions were not covered in this vignette, be sure to check their manuals:


5 How to Cite?

To cite Enrichr (Please see https://maayanlab.cloud/Enrichr/help#terms):

To cite rbioapi: (Free access link to the article)


7 Session info

#> R version 4.2.1 (2022-06-23 ucrt)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows 10 x64 (build 19044)
#> 
#> Matrix products: default
#> 
#> locale:
#> [1] LC_COLLATE=C                          
#> [2] LC_CTYPE=English_United States.utf8   
#> [3] LC_MONETARY=English_United States.utf8
#> [4] LC_NUMERIC=C                          
#> [5] LC_TIME=English_United States.utf8    
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] rbioapi_0.7.7
#> 
#> loaded via a namespace (and not attached):
#>  [1] digest_0.6.29     R6_2.5.1          jsonlite_1.8.0    magrittr_2.0.3   
#>  [5] evaluate_0.15     httr_1.4.3        stringi_1.7.8     cachem_1.0.6     
#>  [9] rlang_1.0.4       cli_3.3.0         curl_4.3.2        rstudioapi_0.13  
#> [13] jquerylib_0.1.4   DT_0.23           bslib_0.4.0       rmarkdown_2.14   
#> [17] tools_4.2.1       stringr_1.4.0     htmlwidgets_1.5.4 crosstalk_1.2.0  
#> [21] xfun_0.31         yaml_2.3.5        fastmap_1.1.0     compiler_4.2.1   
#> [25] htmltools_0.5.3   knitr_1.39        sass_0.4.2