VariantScan: a machine leaning tool for genetic variant association testing

Xinghu Qin -- Berijing Institute of Genomics, Chinese Academy of Sciences & China National Center for Bioinformation

25-06-2022

Instruction

This package provides a set of tool for performing association tests to identify (e)QTLs in genome-wide association studies (GWAS,MWAS,EWAS,PWAS). It integrates three methods, Linear Model, Local Polynomial Fitting (Nonlinear Model) and Generalized Additive Model (GAM) to carry out genome-wide scanning. These methods can be also applied to case-control studies, where the ROC is used to assess the model performance.

This vignette demonstrates the utility of "VariantScan" for association testing in genomics. In this vignette, we take the simulated genotypes of 640 individuals as an example. We compared two methods, linear regression and local polynomial regression for identifying QTLs.

Install VariantScan

#Install from CRAN
#install.packages("VariantScan")
## or you can get the latest version of HierDpart from github
#library(devtools)

#install_github("xinghuq/VariantScan")
library("ggplot2")
library("VariantScan")

Load example data

# example genepop file
f <- system.file('extdata',package='VariantScan')
infile <- file.path(f, "sim1.csv")
## read genotype file
geno=read.csv(infile)

# traits
traitq=geno[,14]
genotype=geno[,-c(1:14)]

# get PCs as covariates

PCs=prcomp(genotype)
summary(PCs$x[,1:2])
#>       PC1                PC2          
#>  Min.   :-4.63463   Min.   :-4.21373  
#>  1st Qu.:-1.65826   1st Qu.:-1.25936  
#>  Median :-0.02212   Median : 0.04811  
#>  Mean   : 0.00000   Mean   : 0.00000  
#>  3rd Qu.: 1.61079   3rd Qu.: 1.19832  
#>  Max.   : 4.51837   Max.   : 4.64150

Doing Vscan using local polynomial regression fitting without specifying covariates

loessW=VScan(x=genotype,y=(traitq),methods ="loess")

Doing Vscan using local polynomial regression fitting using PCs as covariates


loessWcv=VScan(x=genotype,y=(traitq),U=PCs$x[,1:2],methods ="loess")

try linear model

lmW=VScan(x=genotype,y=(traitq),methods ="lm")

lmWcv=VScan(x=genotype,y=(traitq),U=PCs$x[,1:2],methods ="lm")

Visualizing the association signatures

Plot Manhattan plot

Linear model

## 
Loci<-rep("Neutral", 1000)
Loci[c(201,211,221,231,241,251,261,271,281,291)]<-"QTL"
Selected_Loci<-Loci[-which(Loci=="Neutral")]


g1=ggplot() +
  geom_point(aes(x=which(Loci=="Neutral"), y=-log10(lmWcv$p_norm$p.value[-which(Loci!="Neutral")])), col = "gray83") +
  geom_point(aes(x=which(Loci!="Neutral"), y=-log10(lmWcv$p_norm$p.value[-which(Loci=="Neutral")]), colour = Selected_Loci)) +
  xlab("SNPs") + ylab("-log10(p-value)") +ylim(c(0,35))+theme_bw()

g1

Fig.1 Manhattan plot for linear model

Local polynomial regression model


## Manhattan plot


g2=ggplot() +
  geom_point(aes(x=which(Loci=="Neutral"), y=-log10(loessWcv$p_norm$p.value[-which(Loci!="Neutral")])), col = "gray83") +
  geom_point(aes(x=which(Loci!="Neutral"), y=-log10(loessWcv$p_norm$p.value[-which(Loci=="Neutral")]), colour = Selected_Loci)) +
  xlab("SNPs") + ylab("-log10(p-value)") +ylim(c(0,35))+theme_bw()

g2

Fig.2 Manhattan plot for local polynomial regression model

References

W. S. Cleveland, E. Grosse and W. M. Shyu (1992) Local regression models. Chapter 8 of Statistical Models in S eds J.M. Chambers and T.J. Hastie, Wadsworth & Brooks/Cole.