RAINBOWR: Reliable Association INference By Optimizing Weights with R

Kosuke Hamazaki (hamazaki@ut-biomet.org)

Hiroyoshi Iwata

2022-01-05

NOTE!!!!

The paper for RAINBOWR has been published in PLOS Computational Biology (https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1007663). If you use this RAINBOWR in your paper, please cite RAINBOWR as follows:
The stable version for RAINBOWR package is now available at the CRAN (Comprehensive R Archive Network).
Please check the change in RAINBOWR with the version update from NEWS.md.

In this repository, the R package RAINBOWR is available. Here, we describe how to install and how to use RAINBOWR.


What is RAINBOWR

RAINBOWR(Reliable Association INference By Optimizing Weights with R) is a package to perform several types of GWAS as follows.

RAINBOWR also offers some functions to solve the linear mixed effects model.

By utilizing these functions, you can estimate the genomic heritability and perform genomic prediction (GP).

Finally, RAINBOWR offers other useful functions.

Installation

The stable version of RAINBOWR is now available at the CRAN (Comprehensive R Archive Network). The latest version of RAINBOWR is also available at the KosukeHamazaki/RAINBOWR repository in the GitHub, so please run the following code in the R console.

#### Stable version of RAINBOWR ####
install.packages("RAINBOWR")  


#### Latest version of RAINBOWR ####
### If you have not installed yet, ...
install.packages("devtools")  

### Install RAINBOWR from GitHub
devtools::install_github("KosukeHamazaki/RAINBOWR")

If you get some errors via installation, please check if the following packages are correctly installed. (We removed a dependency on rgl package!)

In RAINBOWR, since part of the code is written in Rcpp (C++ in R), please check if you can use C++ in R. For Windows users, you should install Rtools.

If you have some questions about installation, please contact us by e-mail ().

Usage

First, import RAINBOWR package and load example datasets. These example datasets consist of marker genotype (scored with {-1, 0, 1}, 1,536 SNP chip (Zhao et al., 2010; PLoS One 5(5): e10780)), map with physical position, and phenotypic data (Zhao et al., 2011; Nature Communications 2:467). Both datasets can be downloaded from Rice Diversity homepage (http://www.ricediversity.org/data/).

### Import RAINBOWR
require(RAINBOWR)
#>  要求されたパッケージ RAINBOWR をロード中です

### Load example datasets
data("Rice_Zhao_etal")
Rice_geno_score <- Rice_Zhao_etal$genoScore
Rice_geno_map <- Rice_Zhao_etal$genoMap
Rice_pheno <- Rice_Zhao_etal$pheno

### View each dataset
See(Rice_geno_score)
#>                  L1        L2        L3        L4        L5        L6
#> class     <integer> <integer> <integer> <integer> <integer> <integer>
#> id1000223         1         1        -1        -1        -1        -1
#> id1000556        -1        -1         1         1        -1         1
#> id1000673        -1         1        -1         1        -1         1
#> id1000830        -1         1         1         1         1         1
#> id1000955        -1         1         1         1        -1         1
#> id1001073        -1        -1        -1         1         1        -1
#> [1] "class: data.frame"
#> [1] "dimension: 1311 x 395"
See(Rice_geno_map)
#>              marker       chr       pos
#> class      <factor> <integer> <integer>
#> id1000223 id1000223         1    420422
#> id1000556 id1000556         1    655693
#> id1000673 id1000673         1    740153
#> id1000830 id1000830         1    913806
#> id1000955 id1000955         1   1041748
#> id1001073 id1001073         1   1172387
#> [1] "class: data.frame"
#> [1] "dimension: 1311 x 3"
See(Rice_pheno)
#>       Flowering.time.at.Arkansas Flowering.time.at.Faridpur
#> class                  <numeric>                  <integer>
#> L1                   75.08333333                         64
#> L3                          89.5                         66
#> L4                          94.5                         67
#> L5                          87.5                         70
#> L6                   89.08333333                         73
#> L7                           105                       <NA>
#>       Flowering.time.at.Aberdeen FT.ratio.of.Arkansas.Aberdeen
#> class                  <integer>                     <numeric>
#> L1                            81                   0.926954733
#> L3                            83                   1.078313253
#> L4                            93                   1.016129032
#> L5                           108                   0.810185185
#> L6                           101                   0.882013201
#> L7                           158                   0.664556962
#>       FT.ratio.of.Faridpur.Aberdeen Culm.habit
#> class                     <numeric>  <numeric>
#> L1                      0.790123457          4
#> L3                      0.795180723        7.5
#> L4                      0.720430108          6
#> L5                      0.648148148        3.5
#> L6                      0.722772277          6
#> L7                             <NA>          3
#> [1] "class: data.frame"
#> [1] "dimension: 413 x 36"

You can check the original data format by See function. Then, select one trait (here, Flowering.time.at.Arkansas) for example.

### Select one trait for example
trait.name <- "Flowering.time.at.Arkansas"
y <- Rice_pheno[, trait.name, drop = FALSE]

For GWAS, first you can remove SNPs whose MAF <= 0.05 by MAF.cut function.

### Remove SNPs whose MAF <= 0.05
x.0 <- t(Rice_geno_score)
MAF.cut.res <- MAF.cut(x.0 = x.0, map.0 = Rice_geno_map)
x <- MAF.cut.res$x
map <- MAF.cut.res$map

Next, we estimate additive genomic relationship matrix (GRM) by using calcGRM function.

### Estimate genomic relationship matrix (GRM) 
K.A <- calcGRM(genoMat = x)

Next, we modify these data into the GWAS format of RAINBOWR by modify.data function.

### Modify data
modify.data.res <- modify.data(pheno.mat = y, geno.mat = x, map = map,
                               return.ZETA = TRUE, return.GWAS.format = TRUE)
#> Warning in modify.data(pheno.mat = y, geno.mat = x, map = map, return.ZETA =
#> TRUE, : The following lines have phenotypes but no genotypes: L14, L31, L33,
#> L68, L86, L97, L98, L102, L111, L136, L173, L175, L185, L193, L212, L223, L226,
#> L305, L358, L361, L648, L30, L36, L104, L229, L295, L319, L90, L253, L246
pheno.GWAS <- modify.data.res$pheno.GWAS
geno.GWAS <- modify.data.res$geno.GWAS
ZETA <- modify.data.res$ZETA

### View each data for RAINBOWR
See(pheno.GWAS)
#>       Sample_names Flowering.time.at.Arkansas
#> class  <character>                  <numeric>
#> L1              L1                   75.08333
#> L3              L3                   89.50000
#> L4              L4                   94.50000
#> L5              L5                   87.50000
#> L6              L6                   89.08333
#> L7              L7                  105.00000
#> [1] "class: data.frame"
#> [1] "dimension: 383 x 2"
See(geno.GWAS)
#>          marker     chrom       pos        L1        L3        L4
#> class  <factor> <integer> <integer> <integer> <integer> <integer>
#> 1     id1000223         1    420422         1        -1        -1
#> 2     id1000556         1    655693        -1         1         1
#> 3     id1000673         1    740153        -1        -1         1
#> 4     id1000830         1    913806        -1         1         1
#> 5     id1000955         1   1041748        -1         1         1
#> 6     id1001073         1   1172387        -1        -1         1
#> [1] "class: data.frame"
#> [1] "dimension: 1264 x 386"
str(ZETA)
#> List of 1
#>  $ A:List of 2
#>   ..$ Z: num [1:383, 1:383] 1 0 0 0 0 0 0 0 0 0 ...
#>   .. ..- attr(*, "dimnames")=List of 2
#>   .. .. ..$ : chr [1:383] "L1" "L3" "L4" "L5" ...
#>   .. .. ..$ : chr [1:383] "L1" "L3" "L4" "L5" ...
#>   ..$ K: num [1:383, 1:383] 1.0112 -0.45 -0.417 -0.0454 -0.4051 ...
#>   .. ..- attr(*, "dimnames")=List of 2
#>   .. .. ..$ : chr [1:383] "L1" "L3" "L4" "L5" ...
#>   .. .. ..$ : chr [1:383] "L1" "L3" "L4" "L5" ...

ZETA is a list of genomic relationship matrix (GRM) and its design matrix.

Finally, we can perform GWAS using these data. First, we perform single-SNP GWAS by RGWAS.normal function as follows.

### Perform single-SNP GWAS
normal.res <- RGWAS.normal(pheno = pheno.GWAS, geno = geno.GWAS,
                           plot.qq = FALSE, plot.Manhattan = FALSE,
                           ZETA = ZETA, n.PC = 4, P3D = TRUE, count = FALSE)
#> [1] "GWAS for trait: Flowering.time.at.Arkansas"
#> [1] "Variance components estimated. Testing markers."
#> Time difference of 2.162821 secs
See(normal.res$D)  ### Column 4 contains -log10(p) values for markers
#>          marker     chrom       pos Flowering.time.at.Arkansas
#> class  <factor> <integer> <integer>                  <numeric>
#> 1     id1000223         1    420422                  0.4947885
#> 2     id1000556         1    655693                  0.3805267
#> 3     id1000673         1    740153                  0.3443146
#> 4     id1000830         1    913806                  0.1364734
#> 5     id1000955         1   1041748                  1.0212223
#> 6     id1001073         1   1172387                  0.5772126
#> [1] "class: data.frame"
#> [1] "dimension: 1264 x 4"

Next, we perform SNP-set GWAS by RGWAS.multisnp function.

### Perform SNP-set GWAS (by regarding 11 SNPs as one SNP-set, first 300 SNPs)
SNP_set.res <- RGWAS.multisnp(pheno = pheno.GWAS, geno = geno.GWAS[1:300, ], ZETA = ZETA,
                              plot.qq = FALSE, plot.Manhattan = FALSE, count = FALSE,
                              n.PC = 4, test.method = "LR", kernel.method = "linear", gene.set = NULL,
                              test.effect = "additive", window.size.half = 5, window.slide = 11)
#> [1] "GWAS for trait: Flowering.time.at.Arkansas"
#> Time difference of 2.227231 secs

See(SNP_set.res$D)  ### Column 4 contains -log10(p) values for markers
#>          marker     chrom       pos Flowering.time.at.Arkansas
#> class  <factor> <integer> <integer>                  <numeric>
#> 1     id1000223         1    420422                          0
#> 12    id1002158         1   2723270                          0
#> 23    id1004109         1   5067948                          0
#> 34    id1005263         1   6972700                          0
#> 45    id1007975         1  11107052                          0
#> 56    id1009557         1  14413616                          0
#> [1] "class: data.frame"
#> [1] "dimension: 28 x 4"

You can perform SNP-set GWAS with sliding window by setting window.slide = 1. And you can also perform gene-set (or haplotype-based) GWAS by assigning the following data set to gene.set argument.

ex.)

gene (or haplotype block) marker
gene_1 id1000556
gene_1 id1000673
gene_2 id1000830
gene_2 id1000955
gene_2 id1001516

Help

If you have some help before performing GWAS with RAINBOWR, please see the help for each function by ?function_name. You can also check how to determine each argument by

RGWAS.menu()

RGWAS.menu function asks some questions, and by answering these question, the function tells you how to determine which function use and how to set arguments.

References

Kennedy, B.W., Quinton, M. and van Arendonk, J.A. (1992) Estimation of effects of single genes on quantitative traits. J Anim Sci. 70(7): 2000-2012.

Storey, J.D. and Tibshirani, R. (2003) Statistical significance for genomewide studies. Proc Natl Acad Sci. 100(16): 9440-9445.

Yu, J. et al. (2006) A unified mixed-model method for association mapping that accounts for multiple levels of relatedness. Nat Genet. 38(2): 203-208.

Kang, H.M. et al. (2008) Efficient Control of Population Structure in Model Organism Association Mapping. Genetics. 178(3): 1709-1723.

Kang, H.M. et al. (2010) Variance component model to account for sample structure in genome-wide association studies. Nat Genet. 42(4): 348-354.

Zhang, Z. et al. (2010) Mixed linear model approach adapted for genome-wide association studies. Nat Genet. 42(4): 355-360.

Endelman, J.B. (2011) Ridge Regression and Other Kernels for Genomic Selection with R Package rrBLUP. Plant Genome J. 4(3): 250.

Endelman, J.B. and Jannink, J.L. (2012) Shrinkage Estimation of the Realized Relationship Matrix. G3 Genes, Genomes, Genet. 2(11): 1405-1413.

Su, G. et al. (2012) Estimating Additive and Non-Additive Genetic Variances and Predicting Genetic Merits Using Genome-Wide Dense Single Nucleotide Polymorphism Markers. PLoS One. 7(9): 1-7.

Zhou, X. and Stephens, M. (2012) Genome-wide efficient mixed-model analysis for association studies. Nat Genet. 44(7): 821-824.

Listgarten, J. et al. (2013) A powerful and efficient set test for genetic markers that handles confounders. Bioinformatics. 29(12): 1526-1533.

Lippert, C. et al. (2014) Greater power and computational efficiency for kernel-based association testing of sets of genetic variants. Bioinformatics. 30(22): 3206-3214.

Jiang, Y. and Reif, J.C. (2015) Modeling epistasis in genomic selection. Genetics. 201(2): 759-768.

Hamazaki, K. and Iwata, H. (2020) RAINBOW: Haplotype-based genome-wide association study using a novel SNP-set method. PLOS Computational Biology, 16(2): e1007663.