Implementation

The following list shows the data and functions currently available.

Name	Description
ANALYSIS
AE3	AE model using nuclear family trios
bt	Bradley-Terry model for contingency table
ccsize	Power and sample size for case-cohort design
cs	Credibel set
fbsize	Sample size for family-based linkage and association design
gc.em	Gene counting for haplotype analysis
gcontrol	genomic control
gcontrol2	genomic control based on p values
gcp	Permutation tests using GENECOUNTING
gc.lambda	Estionmation of the genomic control inflation statistic (lambda)
genecounting	Gene counting for haplotype analysis
gif	Kinship coefficient and genetic index of familiality
grmMCMC	Mixed modeling with genetic relationship matrices
gsmr	Mendelian randomization analysis
hap	Haplotype reconstruction
hap.em	Gene counting for haplotype analysis
hap.score	Score statistics for association of traits with haplotypes
htr	Haplotype trend regression
hwe	Hardy-Weinberg equilibrium test for a multiallelic marker
hwe.cc	A likelihood ratio test of population Hardy-Weinberg equilibrium
hwe.hardy	Hardy-Weinberg equilibrium test using MCMC
invnormal	Inverse normal transformation
kin.morgan	kinship matrix for simple pedigree
LD22	LD statistics for two diallelic markers
LDkl	LD statistics for two multiallelic markers
lambda1000	A standardized estimate of the genomic inflation scaling to
	a study of 1,000 cases and 1,000 controls
log10p	log10(p) for a standard normal deviate
log10pvalue	log10(p) for a P value including its scientific format
logp	log(p) for a normal deviate
masize	Sample size calculation for mediation analysis
mia	multiple imputation analysis for hap
mtdt	Transmission/disequilibrium test of a multiallelic marker
mtdt2	Transmission/disequilibrium test of a multiallelic marker
	by Bradley-Terry model
mvmeta	Multivariate meta-analysis based on generalized least squares
pbsize	Power for population-based association design
pbsize2	Power for case-control association design
pfc	Probability of familial clustering of disease
pfc.sim	Probability of familial clustering of disease
pgc	Preparing weight for GENECOUNTING
print.hap.score	Print a hap.score object
s2k	Statistics for 2 by K table
sentinels	Sentinel identification from GWAS summary statistics
tscc	Power calculation for two-stage case-control design

GRAPHICS
asplot	Regional association plot
ESplot	Effect-size plot
circos.cnvplot	circos plot of CNVs
circos.cis.vs.trans.plot	circos plot of cis/trans classification
circos.mhtplot	circos Manhattan plot with gene annotation
cnvplot	genomewide plot of CNVs
METAL_forestplot	forest plot as R/meta’s forest for METAL outputs
makeRLEplot	make relative log expression plot
mhtplot	Manhattan plot
mhtplot2	Manhattan plot with annotations
pqtl2dplot	2D pQTL plot
pqtl2dplotly	2D pQTL plotly
pqtl3dplotly	3D pQTL plotly
mhtplot.trunc	truncated Manhattan plot
miamiplot	Miami plot
pedtodot	Converting pedigree(s) to dot file(s)
plot.hap.score	Plot haplotype frequencies versus haplotype score statistics
qqfun	Quantile-comparison plots
qqunif	Q-Q plot for uniformly distributed random variable

UTILITIES
SNP	Functions for single nucleotide polymorphisms (SNPs)
BFDP	Bayesian false-discovery probability
FPRP	False-positive report probability
ab	Test/Power calculation for mediating effect
b2r	Obtain correlation coefficients and their variance-covariances
chow.test	Chow’s test for heterogeneity in two regressions
chr_pos_a1_a2	Form SNPID from chromosome, posistion and alleles
cis.vs.trans.classification	a cis/trans classifier
comp.score	score statistics for testing genetic linkage of quantitative trait
GRM functions	ReadGRM, ReadGRMBin, ReadGRMPLINK,
	ReadGRMPCA, WriteGRM,
	WriteGRMBin, WriteGRMSAS
	handle genomic relationship matrix involving other software
get_b_se	Get b and se from AF, n, and z
get_pve_se	Get pve and its standard error from n, z
get_sdy	Get sd(y) from AF, n, b, se
h2G	A utility function for heritability
h2GE	A utility function for heritability involving gene-environment interaction
h2l	A utility function for converting observed heritability to its counterpart
	under liability threshold model
h2_mzdz	Heritability estimation according to twin correlations
klem	Haplotype frequency estimation based on a genotype table
	of two multiallelic markers
makeped	A function to prepare pedigrees in post-MAKEPED format
metap	Meta-analysis of p values
metareg	Fixed and random effects model for meta-analysis
muvar	Means and variances under 1- and 2- locus (diallelic) QTL model
pvalue	P value for a normal deviate
read.ms.output	A utility function to read ms output
revStrand	Allele on the reverse strand
runshinygap	Start shinygap
snptest_sample	A utility to generate SNPTEST sample file
twinan90	Classic twin models
whscore	Whittemore-Halpern scores for allele-sharing
weighted.median	Weighted median with interpolation

Assuming proper installation, you will be able to obtain the list by typing library(help=gap) or view the list within a web browser via help.start(). See Appendix on how to obtain a full list of functions.

This file can be viewed with command vignette("gap", package="gap").

You can cut and paste examples at end of each function’s documentation.

Both genecounting and hap are able to handle SNPs and multiallelic markers, with the former be flexible enough to include features such as X-linked data and the later being able to handle large number of SNPs. But they are unable to recode allele labels automatically, so functions gc.em and hap.em are in haplo.em format and used by a modified function hap.score in association testing.

It is notable that multilocus data are handled differently from that in hwde and elegant definitions of basic genetic data can be found in the genetics package.

Incidentally, I found my C mixed-radixed sorting routine⁶ is much faster than R’s internal function.

With exceptions such as function pfc which is very computer-intensive, most functions in the package can easily be adapted for analysis of large datasets involving either SNPs or multiallelic markers. Some are utility functions, e.g. muvar and whscore, which will be part of the other analysis routines in the future.

The benefit with R compared to standalone programs is that for users, all functions have unified format. For developers, it is able to incorporate their C/C++ programs more easily and avoid repetitive work such as preparing own routines for matrix algebra and linear models. Further advantage can be taken from packages in Bioconductor, which are designed and written to deal with large number of genes.

Independent programs

To facilitate comparisons and individual preferences, The source codes for 2LD, EHPLUS, GENECOUNTING, HAP, now hosted at GitHub, have enjoyed great popularity ahead of the genomewide association studies (GWAS) therefore are likely to be more familiar than their R couunterparts in gap. However, you need to follow their instructions to compile for a particular computer system.

I have included ms code (which is required by read.ms.output and .xls files to accompany read.ms.output and FPRP and BFDP functions as with a classic twin example for ACE model in OpenMx. The package is now available from CRAN.

For these models it is actually simpler to use facilities as in package mets, which is now in suggests.

A final category is twinan90, which is now dropped from the package function list due to difficulty to keep up with the requirements by the R environment but nevertheless you will still be able to compile and use otherwise.

Demos

This has been a template for self-contained examples:

library(gap)
demo(gap)

See examples of haplotype analysis there – additional examples are given below.

Examples

I would like to highlight pedtodot, pbsize, fbsize and ccsize functions used for pedigree drawing and power/sample calculations in a genome-wide asssociatoin study as reported.⁷

Pedigree drawing

I have included the original file for the R News as well as put examples in separate vignettes. They can be accessed via vignette("rnews",package="gap.examples") and vignette("pedtodot", package="gap.examples"), respectively.

Kinship calculation

Next, I will provide an example for kinship calculation using kin.morgan. It is recommended that individuals in a pedigree are ordered so that parents always precede their children. In this regard, package pedigree can be used, and package kinship2 can be used to produce pedigree diagram as with kinship matrix.

Pedigree diagram

The pedigree diagram is as follows,

library(gap)
#> Loading required package: gap.datasets
#> gap version 1.2.3-6
# pedigree diagram
data(lukas, package="gap.datasets")
library(kinship2)
#> Loading required package: Matrix
#> Loading required package: quadprog
ped <- with(lukas,pedigree(id,father,mother,sex))
plot(ped,cex=0.4)

A pedigree diagram

Kinship calculation

We then turn to the kinship calculation.

# unordered individuals
library(gap)
gk1 <- kin.morgan(lukas)
write.table(gk1$kin.matrix,"gap_1.txt",quote=FALSE)

library(kinship2)
kk1 <- kinship(lukas[,1],lukas[,2],lukas[,3])
write.table(kk1,"kinship_1.txt",quote=FALSE)

d <- gk1$kin.matrix-kk1
sum(abs(d))
#> [1] 2.443634

# order individuals so that parents precede their children
library(pedigree)
#> Loading required package: HaploSim
#> Loading required package: reshape
#> 
#> Attaching package: 'reshape'
#> The following object is masked from 'package:Matrix':
#> 
#>     expand
op <- orderPed(lukas)
olukas <- lukas[order(op),]
gk2 <- kin.morgan(olukas)

write.table(olukas,"olukas.csv",quote=FALSE)
write.table(gk2$kin.matrix,"gap_2.txt",quote=FALSE)

kk2 <- kinship(olukas[,1],olukas[,2],olukas[,3])
write.table(kk2,"kinship_2.txt",quote=FALSE)

z <- gk2$kin.matrix-kk2
sum(abs(z))
#> [1] 0

We see that in the second case, the result agrees with kinship2.

Study design

Family-based design

It now has an experimental work via Shiny from inst/shinygap.

The example involving family-based design is as follows,

options(width=150)
library(gap)
models <- matrix(c(
         4.0, 0.01,
         4.0, 0.10,
         4.0, 0.50, 
         4.0, 0.80,
         2.0, 0.01,
         2.0, 0.10,
         2.0, 0.50,
         2.0, 0.80,
         1.5, 0.01,    
         1.5, 0.10,
         1.5, 0.50,
         1.5, 0.80), ncol=2, byrow=TRUE)
outfile <- "fbsize.txt"
cat("gamma","p","Y","N_asp","P_A","H1","N_tdt","H2","N_asp/tdt",
    "L_o","L_s\n",file=outfile,sep="\t")
for(i in 1:12) {
    g <- models[i,1]
    p <- models[i,2]
    z <- fbsize(g,p)
    cat(z$gamma,z$p,z$y,z$n1,z$pA,z$h1,z$n2,z$h2,z$n3,
        z$lambdao,z$lambdas,file=outfile,append=TRUE,sep="\t")
    cat("\n",file=outfile,append=TRUE)
}
table1 <- read.table(outfile,header=TRUE,sep="\t")
nc <- c(4,7,9)
table1[,nc] <- ceiling(table1[,nc])
dc <- c(3,5,6,8,10,11)
table1[,dc] <- round(table1[,dc],2)
unlink(outfile)
# APOE-4, Scott WK, Pericak-Vance, MA & Haines JL
# Genetic analysis of complex diseases 1327
g <- 4.5
p <- 0.15
cat("\nAlzheimer's:\n\n")
#> 
#> Alzheimer's:
data.frame(fbsize(g,p))
#>   gamma    p         y       n1        pA        h1      n2        h2       n3  lambdao  lambdas
#> 1   4.5 0.15 0.6256916 162.6246 0.8181818 0.4598361 108.994 0.6207625 39.97688 1.671594 1.784353
table1
#>    gamma    p    Y   N_asp  P_A   H1 N_tdt   H2 N_asp.tdt  L_o  L_s
#> 1    4.0 0.01 0.52    6402 0.80 0.05  1201 0.11       257 1.08 1.09
#> 2    4.0 0.10 0.60     277 0.80 0.35   165 0.54        53 1.48 1.54
#> 3    4.0 0.50 0.58     446 0.80 0.50   113 0.42        67 1.36 1.39
#> 4    4.0 0.80 0.53    3024 0.80 0.24   244 0.16       177 1.12 1.13
#> 5    2.0 0.01 0.50  445964 0.67 0.03  6371 0.04      2155 1.01 1.01
#> 6    2.0 0.10 0.52    8087 0.67 0.25   761 0.32       290 1.07 1.08
#> 7    2.0 0.50 0.53    3753 0.67 0.50   373 0.47       197 1.11 1.11
#> 8    2.0 0.80 0.51   17909 0.67 0.27   701 0.22       431 1.05 1.05
#> 9    1.5 0.01 0.50 6944779 0.60 0.02 21138 0.03      8508 1.00 1.00
#> 10   1.5 0.10 0.51  101926 0.60 0.21  2427 0.25      1030 1.02 1.02
#> 11   1.5 0.50 0.51   27048 0.60 0.50  1039 0.49       530 1.04 1.04
#> 12   1.5 0.80 0.51  101926 0.60 0.29  1820 0.25      1030 1.02 1.02

Population-based design

The example involving population-based design is as follows,

library(gap)
kp <- c(0.01,0.05,0.10,0.2)
models <- matrix(c(
          4.0, 0.01,
          4.0, 0.10,
          4.0, 0.50, 
          4.0, 0.80,
          2.0, 0.01,
          2.0, 0.10,
          2.0, 0.50,
          2.0, 0.80,
          1.5, 0.01,    
          1.5, 0.10,
          1.5, 0.50,
          1.5, 0.80), ncol=2, byrow=TRUE)
outfile <- "pbsize.txt"
cat("gamma","p","p1","p5","p10","p20\n",sep="\t",file=outfile)
for(i in 1:dim(models)[1])
{
   g <- models[i,1]
   p <- models[i,2]
   n <- vector()
   for(k in kp) n <- c(n,ceiling(pbsize(k,g,p)))
   cat(models[i,1:2],n,sep="\t",file=outfile,append=TRUE)
   cat("\n",file=outfile,append=TRUE)
} 
table5 <- read.table(outfile,header=TRUE,sep="\t")
table5
#>    gamma    p      p1     p5    p10   p20
#> 1    4.0 0.01   46681   8959   4244  1887
#> 2    4.0 0.10    8180   1570    744   331
#> 3    4.0 0.50   10891   2091    991   441
#> 4    4.0 0.80   31473   6041   2862  1272
#> 5    2.0 0.01  403970  77530  36725 16323
#> 6    2.0 0.10   52709  10116   4792  2130
#> 7    2.0 0.50   35285   6772   3208  1426
#> 8    2.0 0.80   79391  15237   7218  3208
#> 9    1.5 0.01 1599920 307056 145448 64644
#> 10   1.5 0.10  192105  36869  17465  7762
#> 11   1.5 0.50   98013  18811   8911  3961
#> 12   1.5 0.80  192105  36869  17465  7762

Case-cohort design

For case-cohort design, we obtain results for ARIC and EPIC studies.

library(gap)
# ARIC study
outfile <- "aric.txt"
n <- 15792
pD <- 0.03
p1 <- 0.25
alpha <- 0.05
theta <- c(1.35,1.40,1.45)
beta <- 0.2
s_nb <- c(1463,722,468)
cat("n","pD","p1","hr","q","power","ssize\n",file=outfile,sep="\t")
for(i in 1:3)
{
  q <- s_nb[i]/n
  power <- ccsize(n,q,pD,p1,log(theta[i]),alpha,beta,TRUE)
  ssize <- ccsize(n,q,pD,p1,log(theta[i]),alpha,beta)
  cat(n,"\t",pD,"\t",p1,"\t",theta[i],"\t",q,"\t",
      signif(power,3),"\t",ssize,"\n",
      file=outfile,append=TRUE)
}
read.table(outfile,header=TRUE,sep="\t")
#>       n   pD   p1   hr          q power ssize
#> 1 15792 0.03 0.25 1.35 0.09264184   0.8  1463
#> 2 15792 0.03 0.25 1.40 0.04571935   0.8   722
#> 3 15792 0.03 0.25 1.45 0.02963526   0.8   468
unlink(outfile)
# EPIC study
outfile <- "epic.txt"
n <- 25000
alpha <- 0.00000005
beta <- 0.2
s_pD <- c(0.3,0.2,0.1,0.05)
s_p1 <- seq(0.1,0.5,by=0.1)
s_hr <- seq(1.1,1.4,by=0.1)
cat("n","pD","p1","hr","alpha","ssize\n",file=outfile,sep="\t")
# direct calculation
for(pD in s_pD)
{
   for(p1 in s_p1)
   {
      for(hr in s_hr)
      {
         ssize <- ccsize(n,q,pD,p1,log(hr),alpha,beta)
         if (ssize>0) cat(n,"\t",pD,"\t",p1,"\t",hr,"\t",alpha,"\t",
                          ssize,"\n",
                          file=outfile,append=TRUE)
      }
   }
}
read.table(outfile,header=TRUE,sep="\t")
#>        n  pD  p1  hr alpha ssize
#> 1  25000 0.3 0.1 1.3 5e-08 14391
#> 2  25000 0.3 0.1 1.4 5e-08  5732
#> 3  25000 0.3 0.2 1.2 5e-08 21529
#> 4  25000 0.3 0.2 1.3 5e-08  5099
#> 5  25000 0.3 0.2 1.4 5e-08  2613
#> 6  25000 0.3 0.3 1.2 5e-08 11095
#> 7  25000 0.3 0.3 1.3 5e-08  3490
#> 8  25000 0.3 0.3 1.4 5e-08  1882
#> 9  25000 0.3 0.4 1.2 5e-08  8596
#> 10 25000 0.3 0.4 1.3 5e-08  2934
#> 11 25000 0.3 0.4 1.4 5e-08  1611
#> 12 25000 0.3 0.5 1.2 5e-08  7995
#> 13 25000 0.3 0.5 1.3 5e-08  2786
#> 14 25000 0.3 0.5 1.4 5e-08  1538
#> 15 25000 0.2 0.1 1.4 5e-08  9277
#> 16 25000 0.2 0.2 1.3 5e-08  7725
#> 17 25000 0.2 0.2 1.4 5e-08  3164
#> 18 25000 0.2 0.3 1.3 5e-08  4548
#> 19 25000 0.2 0.3 1.4 5e-08  2152
#> 20 25000 0.2 0.4 1.2 5e-08 20131
#> 21 25000 0.2 0.4 1.3 5e-08  3648
#> 22 25000 0.2 0.4 1.4 5e-08  1805
#> 23 25000 0.2 0.5 1.2 5e-08 17120
#> 24 25000 0.2 0.5 1.3 5e-08  3422
#> 25 25000 0.2 0.5 1.4 5e-08  1713
#> 26 25000 0.1 0.2 1.4 5e-08  8615
#> 27 25000 0.1 0.3 1.4 5e-08  3776
#> 28 25000 0.1 0.4 1.3 5e-08 13479
#> 29 25000 0.1 0.4 1.4 5e-08  2824
#> 30 25000 0.1 0.5 1.3 5e-08 10837
#> 31 25000 0.1 0.5 1.4 5e-08  2606
unlink(outfile)

Graphics

I now include some figures from the documentation that may be of interest.

Genome-wide association

The following code is used to obtain a Q-Q plot via qqunif function,

library(gap)
u_obs <- runif(1000)
r <- qqunif(u_obs,pch=21,bg="blue",bty="n")
u_exp <- r$y
hits <- u_exp >= 2.30103
points(r$x[hits],u_exp[hits],pch=21,bg="green")
legend("topleft",sprintf("GC.lambda=%.4f",gc.lambda(u_obs)))

A Q-Q plot

Based on a chicken genome scan data, the code below generates a Manhattan plot, demonstrating the use of gaps to separate chromosomes.

ord <- with(w4,order(chr,pos))
w4 <- w4[ord,]
oldpar <- par()
par(cex=0.6)
colors <- c(rep(c("blue","red"),15),"red")
mhtplot(w4,control=mht.control(colors=colors,gap=1000),pch=19,srt=0)
axis(2,cex.axis=2)
suggestiveline <- -log10(3.60036E-05)
genomewideline <- -log10(1.8E-06)
abline(h=suggestiveline, col="blue")
abline(h=genomewideline, col="green")
abline(h=0)

A genome-wide association study on chickens

The code below obtains a Manhattan plot with gene annotation,

data <- with(mhtdata,cbind(chr,pos,p))
glist <- c("IRS1","SPRY2","FTO","GRIK3","SNED1","HTR1A","MARCH3","WISP3",
           "PPP1R3B","RP1L1","FDFT1","SLC39A14","GFRA1","MC4R")
hdata <- subset(mhtdata,gene%in%glist)[c("chr","pos","p","gene")]
color <- rep(c("lightgray","gray"),11)
glen <- length(glist)
hcolor <- rep("red",glen)  
par(las=2, xpd=TRUE, cex.axis=1.8, cex=0.4)
ops <- mht.control(colors=color,yline=1.5,xline=3)
hops <- hmht.control(data=hdata,colors=hcolor)
mhtplot(data,ops,hops,pch=19)
axis(2,pos=2,at=1:16,cex.axis=0.5)

A Manhattan plot with gene annotation

All these look familiar, so revised form of the function called mhtplot2 was created for additional features such as centering the chromosome ticks, allowing for more sophisticated coloring schemes, using prespecified fonts, etc. Please refer to the function’s documentation example.

We could also go further with a circos Manhattan plot,

circos.mhtplot(mhtdata, glist)
#> Note: 11 points are out of plotting region in sector 'chr16', track '3'.

A circos Manhattan plot

We now experiment with Miami plot,

mhtdata <- within(mhtdata,{pr=p})
miamiplot(mhtdata,chr="chr",bp="pos",p="p",pr="pr",snp="rsn")

A Miami plot

We now illustrate with a truncated Manhattan plot,

par(oma=c(0,0,0,0), mar=c(5,6.5,1,1))
mhtplot.trunc(IL.12B, chr="Chromosome", bp="Position", z="Z",
   snp="MarkerName",
   suggestiveline=FALSE, genomewideline=-log10(5e-10),
   cex.mtext=1.2, cex.text=0.7,
   annotatelog10P=-log10(5e-10), annotateTop = FALSE,
   highlight=with(genes,gene),
   mtext.line=3, y.brk1=115, y.brk2=200, trunc.yaxis=TRUE, delta=0.01,
   cex.axis=1.2, cex=0.5, font=2, font.axis=1,
   y.ax.space=20,
   col = c("blue4", "skyblue")
)

Association of IL-12B

The code below obtains a regional association plot with the asplot function,

asplot(CDKNlocus, CDKNmap, CDKNgenes, best.pval=5.4e-8, sf=c(3,6))
#> - CDKN2A 
#> - CDKN2B
title("CDKN2A/CDKN2B Region")

A regional association plot

The function predates the currently popular locuszoom software but leaves the option open for generating such plots on the fly and locally.

Note that all these can serve as templates to customize features of your own.

Copy number variation

A plot of copy number variation (CNV) is shown here,

cnvplot(gap.datasets::cnv)

A CNV plot

Effect size plot

The code below obtains an effect size plot via the ESplot function.

library(gap)
rs12075 <- data.frame(id=c("CCL2","CCL7","CCL8","CCL11","CCL13","CXCL6","Monocytes"),
                      b=c(0.1694,-0.0899,-0.0973,0.0749,0.189,0.0816,0.0338387),
                      se=c(0.0113,0.013,0.0116,0.0114,0.0114,0.0115,0.00713386))
ESplot(rs12075)

rs12075 and traits

Forest plot

It is possible to draw many forest plots given a list of variants; an example is given (only the first one here) here

data(OPG,package="gap.datasets")
METAL_forestplot(OPGtbl[2:6,],OPGall,OPGrsid,width=8.75,height=5)
#> Joining, by = "MarkerName"
#> Joining, by = "MarkerName"
#> Loading required namespace: meta

Forest plots

Significance level for z ~ Normal(0,1)

When a Normally distributed association statistic z is very large, its corresponding p value is very small. A genomewide significance is declared at 0.05/1000000=5e-8 with Bonferroni correction assuming 1 million SNPs are tested. This short note describes how to get -log10(p), which can be used in a Q-Q plot and software such as DEPICT. The solution here is generic since z is also the square root of a chi-squared statistic, for instance.

log(p) and log10(p)

First thing first, we therefore have the R function z <- function(p) qnorm(p/2,lower.tail=FALSE).

First thing first, here are the answers for log(p) and log10(p) given z,

# log(p) for a standard normal deviate z based on log()
logp <- function(z) log(2)+pnorm(-abs(z), lower.tail=TRUE, log.p=TRUE)

# log10(p) for a standard normal deviate z based on log()
log10p <- function(z) log(2, base=10)+pnorm(-abs(z), lower.tail=TRUE, log.p=TRUE)/log(10)

Note logp() will be used for functions such as qnorm() as in R/gap function cs() whereas log10p() is more appropriate for Manhattan plot and used in R/gap sentinels().

The schematic diagram is shown below.

Normal(0,1) distribution

Rationale

We start with z=1.96 whose corresponding p value is approximately 0.05.

2*pnorm(-1.96,lower.tail=TRUE)

giving an acceptable value 0.04999579, so we proceed to get log10(p)

log10(2)+log10(pnorm(-abs(z),lower.tail=TRUE))

leading to the expression above from the fact that log10(X)=log(X)/log(10) since log(), being the natural log function, ln() – so log(exp(1)) = 1, in R, works far better on the numerator of the second term. The use of -abs() just makes sure we are working on the lower tail of the standard Normal distribution from which our p value is calculated.

Benchmark

Now we have a stress test,

z <- 20000
-log10p(z)

giving -log10(p) = 86858901.

p, log(p), log10(p) and the multiple precision arithmetic

We would be curious about the p value itself as well, which is furnished with the Rmpfr package

require(Rmpfr)
2*pnorm(mpfr(-abs(z),100),lower.tail=TRUE,log.p=FALSE)
mpfr(log(2),100) + pnorm(mpfr(-abs(z),100),lower.tail=TRUE,log.p=TRUE)

giving p = 1.660579603192917090365313727164e-86858901 and -log(p) = -200000010.1292789076808554854177, respectively. To carry on we have -log10(p) = -log(p)/log(10)=86858901.

To make -log10(p) usable in R we obtain it directly through

as.numeric(-log10(2*pnorm(mpfr(-abs(z),100),lower.tail=TRUE)))

which actually yields exactly the same 86858901.

If we go very far to have z=50000. then -log10(p)=542868107 but we have less luck with Rmpfr.

One may wonder the P value in this case, which is 6.6666145952e-542868108 or simply 6.67e-542868108.

The magic function for doing this is defined as follows,

pvalue <- function (z, decimals = 2)
{
    lp <- -log10p(z)
    exponent <- ceiling(lp)
    base <- 10^-(lp - exponent)
    paste0(round(base, decimals), "e", -exponent)
}

and it is more appropriate to express p values in scientific format so they can be handled as follows,

log10pvalue <- function(p=NULL,base=NULL,exponent=NULL)
{
  if(!is.null(p))
  {
    p <- format(p,scientific=TRUE)
    p2 <- strsplit(p,"e")
    base <- as.numeric(lapply(p2,"[",1))
    exponent <- as.numeric(lapply(p2,"[",2))
  } else if(is.null(base) | is.null(exponent)) stop("base and exponent should both be specified")
  log10(base)+exponent
}

used as log10pvalue(p) when p<=1e-323, or log10pvalue(base=1,exponent=-323) otherwise.

One can also derive logpvalue for natural base (e) similarly.

We end with a quick look-up table

require(gap)
v <- data.frame()
for (z in c(5,10,30,40,50,100,500,1000,2000,3000,5000))
{
  vi <- c(z,pvalue(z),logp(z),log10p(z))
  v <- rbind(v,vi)
}
names(v) <- c("z","P","log(P)","log10(P)")
knitr::kable(v,caption="Table 1. z,P,log(P) and log10(P)")

Table 1. z,P,log(P) and log10(P)
z	P	log(P)	log10(P)
5	5.73e-7	-14.3718512134288	-6.24161567672667
10	1.52e-23	-52.5381379699525	-22.8170234098221
30	9.81e-198	-453.628096775783	-197.008179265997
40	7.31e-350	-803.915294833194	-349.135976463682
50	2.16e-545	-1254.13821395886	-544.665305866333
100	2.69e-2174	-5004.83106151365	-2173.57051287337
500	2.47e-54290	-125006.440403451	-54289.6072695865
1000	4.58e-217151	-500007.133547632	-217150.339011999
2000	4.34e-868593	-2000007.82669406	-868592.362896546
3000	1.8e-1954329	-4500008.23215903	-1954328.74374587
5000	1.51e-5428685	-12500008.7429846	-5428684.82082061

Application

The mhtplot.trunc() function in R/gap accepts three types of arguments:

1. P values of association statistics, which could be very small.
log10p. log10(P).
1. normal statistics that could be very large.

In all three cases, a log10(P) counterpart is obtained internally and to accommodate extreme value, the y-axis allows for truncation leaving out a given range to highlight the largest.

A plot for the GIANT data is shown here, https://jinghuazhao.github.io/Omics-analysis/BMI/.

Fine-mapping

The function cs obtains credible set.

# zcat METAL/4E.BP1-1.tbl.gz | \
# awk 'NR==1 || ($1==4 && $2 >= 187158034 - 1e6 && $2 < 187158034 + 1e6)' >  4E.BP1.z
tbl <- within(read.delim("4E.BP1.z"),{logp <- logp(Effect/StdErr)})
z <- cs(tbl)
l <- cs(tbl,log_p="logp")

Linear regression

Several functions related to linear regression are detailed here.

Some preparations

Let \(\mbox{x} = SNP\ dosage\). Note that \(\mbox{Var}(\mbox{x})=2f(1-f)\), \(f=MAF\) or \(1-MAF\) by symmetry.

Our linear regression model is \(\mbox{y}=a + b\mbox{x} + e\). We have \(\mbox{Var}(\mbox{y}) = b^2\mbox{Var}(\mbox{x}) + \mbox{Var}(e)\). Moreover, \(\mbox{Var}(b)=\mbox{Var}(e)(\mbox{x}'\mbox{x})^{-1}=\mbox{Var}(e)/S_\mbox{xx}\), we have \(\mbox{Var}(e) = \mbox{Var}(b)S_\mbox{xx} = N \mbox{Var}(b) \mbox{Var}(\mbox{x})\). Consequently, let \(z = {b}/{SE(b)}\), we have

\[\begin{eqnarray*} \mbox{Var}(\mbox{y}) &=& \mbox{Var}(\mbox{x})(b^2+N\mbox{Var}(b)) \hspace{100cm} \cr &=& \mbox{Var}(\mbox{x})\mbox{Var}(\mbox{b})(z^2+N) \cr &=& 2f(1-f)(z^2+N)\mbox{Var}(b) \end{eqnarray*}\]

Moreover, the mean and the variance of the multiple correlation coefficient or the coefficient of determination (\(R^2\)) are known⁸ to be \({1}/{(N-1)}\) and \({2(N-2)}/{\left[(N-1)^2(N+1)\right]}\), respectively.

We also need some established results of a ratio (R/S)¹, i.e., the mean

\[ \begin{align} E(R/S) \approx \frac{\mu_R}{\mu_S}-\frac{\mbox{Cov}(R,S)}{\mu_S^2}+\frac{\sigma_S^2\mu_R}{\mu_S^3} \hspace{100cm} \end{align} \]

and more importantly the variance

\[ \begin{align} \mbox{Var}(R/S) \approx \frac{\mu_R^2}{\mu_S^2} \left[ \frac{\sigma_R^2}{\mu_R^2} -2\frac{\mbox{Cov}(R,S)}{\mu_R\;\mu_S} +\frac{\sigma_S^2}{\mu_S^2} \right] \hspace{100cm} \end{align} \]

where \(\mu_R\), \(\mu_S\), \(\sigma_R^2\), \(\sigma_S^2\) are the means and the variances for R and S, respectively.

Finally, we need some facts about \(\chi_1^2\), \(\chi^2\) distribution of one degree of freedom. For \(z \sim N(0,1)\), \(z^2\sim \chi_1^2\), whose mean and variance are 1 and 2, respectively.

We now have the following results.

The proportion of variants explained (PVE)

From above we have

\[ \begin{align} \mbox{PVE}_{\mbox{linear regression}} & = \frac{\mbox{Var}(b\mbox{x})}{\mbox{Var}(\mbox{y})} \hspace{100cm} \\ & = \frac{\mbox{Var}(\mbox{x})b^2}{\mbox{Var}(\mbox{x})(b^2+N\mbox{Var}(b))} \\ & = \frac{\mbox{z}^2}{\mbox{z}^2+N} \end{align} \]

On the other hand, for a simple linear regression \(R^2\equiv r^2\) where \(r\) is the Pearson correlation coefficient, which is readily from the \(t\)-statistic of the regression slope, i.e., \(r={t}/{\sqrt{t^2+N-2}}\). so assuming \(t \equiv \ z \sim \chi_1^2\)

\[ \begin{align} \mbox{PVE}_{t-\mbox{statistic}} & = \frac{\chi^2}{\chi^2+N-2} \hspace{100cm} \end{align} \]

To obtain coherent estimates of the asymptotic means and variances of both forms we resort to variance of a ratio (R/S). All the required elements are listed in a table below.

Characteristics	Linear regression	\(t\)-statistic
\(\mu_R\)	1	1
\(\sigma_R^2\)	2	2
\(\mu_S\)	\(N+1\)	\(N-1\)
\(\sigma_S^2\)	2	2
\(\mbox{Cov}(R,S)\)	2	2

then we have the means and the variances for PVE.

Characteristics	Linear regression	\(t\)-statistic
mean	\(\frac{1}{N+1}\left[1-\frac{2}{N+1}+\frac{2}{(N+1)^2}\right]\)	\(\frac{1}{N-1}\left[1-\frac{2}{N-1}+\frac{2}{(N-1)^2}\right]\)
variance	\(\frac{2}{(N+1)^2}\left[1-\frac{1}{N+1}\right]^2\)	\(\frac{2}{(N-1)^2}\left[1-\frac{1}{N-1}\right]^2\)

Finally, our approximation of PVE for a protein with \(T\) independent pQTLs from the meta-analysis

Characteristics	Linear regression	\(t\)-statistic
estimate	\(\sum_{i=1}^T{\frac{\chi_i^2}{\chi_i^2+N_i}}\)	\(\sum_{i=1}^T{\frac{\chi_i^2}{\chi_i^2+N_i-2}}\)
variance	\(\sum_{i=1}^T\frac{2}{(N_i+1)^2}\)	\(\sum_{i=1}^T\frac{2}{(N_i-1)^2}\)

The effect size and standard error estimates

When \(\mbox{Var}(\mbox{y})=1\), as in cis eQTLGen data, we have \(b\) and its standard error (se) as follows,

\[ \begin{align} b & = z/d \hspace{100cm} \\ se & = 1/d \end{align} \]

where \(d = \sqrt{2f(1-f)(z^2+N)}\).

Now three functions are in place.

get_b_se

A record of the eQTLGen data is shown below

        SNP    Pvalue SNPChr  SNPPos AssessedAllele OtherAllele Zscore
 rs1003563 2.308e-06     12 6424577              A           G 4.7245
            Gene GeneSymbol GeneChr GenePos NrCohorts NrSamples         FDR
 ENSG00000111321       LTBR      12 6492472        34     23991 0.006278872
 BonferroniP hg19_chr hg19_pos AlleleA AlleleB allA_total allAB_total
           1       12  6424577       A       G       2574        8483
 allB_total AlleleB_all
       7859   0.6396966

from which we obtain the effect size and its standard error as follows,

get_b_se(0.6396966,23991,4.7245)
#>               b          se
#> [1,] 0.04490488 0.009504684

get_pve_se

This function obtains proportion of explained variation (PVE) from n, z; its standard error is based on variance of the ratio (correction=TRUE) or \(r^2\).

get_sdy

We continue with the eQTLGen example above,

get_sdy(0.6396966,23991,0.04490488,0.009504684)
#> [1] 1

and indeed the eQTLGen data were standardized.

Polygenic modeling

In line with the recent surge of interest in the polygenic models, a separate vignette is available through vignette("h2",package="gap.examples") demonstrating aspect of the models on heritability.

Brief documentations are available for functions h2G, h2GE and h2l.

Mendelian randomization

The function was originally developed to rework on data generated from GSMR, although it could be any harmonised data.

knitr::kable(mr,caption="Table 1. LIF.R and CAD/FEV1")

Table 1. LIF.R and CAD/FEV1
SNP	b.LIF.R	SE.LIF.R	b.FEV1	SE.FEV1	b.CAD	SE.CAD
rs188743906	0.6804	0.1104	0.00177	0.01660	NA	NA
rs2289779	-0.0788	0.0134	0.00104	0.00261	-0.007543	0.0092258
rs117804300	-0.2281	0.0390	-0.00392	0.00855	0.109372	0.0362219
rs7033492	-0.0968	0.0147	-0.00585	0.00269	0.022793	0.0119903
rs10793962	0.2098	0.0212	0.00378	0.00536	-0.014567	0.0138196
rs635634	-0.2885	0.0153	-0.02040	0.00334	0.077157	0.0117123
rs176690	-0.0973	0.0142	0.00293	0.00306	-0.000007	0.0107781
rs147278971	-0.2336	0.0378	-0.01240	0.00792	0.079873	0.0397491
rs11562629	0.1155	0.0181	0.00960	0.00378	-0.010040	0.0151460

res <- gsmr(mr, "LIF.R", c("CAD","FEV1"),other_plots=TRUE)

Mendelian randomization

f <- "INF1_CAD-FEV1.csv"
write.table(with(res,r), file=f, quote=FALSE, col.names=FALSE, row.names=FALSE, sep=",")
top <- function(r)
       sapply(c("IVW","EGGER","WM","PWM"), function(x) as.numeric(gap::pvalue(r[[paste0("b",x)]]/r[[paste0("seb",x)]])))
r <- read.csv(f,as.is=TRUE)
p <- top(r)
knitr::kable(data.frame(r,p),caption="Table 2. LIFR variant rs635634 and CAD/FEV1",digits=3)

Table 2. LIFR variant rs635634 and CAD/FEV1
IV	bIVW	sebIVW	CochQ	CochQp	bEGGER	sebEGGER	intEGGER	seintEGGER	bWM	sebWM	bPWM	sebPWM	IVW	EGGER	WM	PWM
LIF.R.CAD	-0.187	0.050	2.116	0.953	-0.184	0.060	0.001	0.010	-0.247	0.045	-0.248	0.044	0	0.002	0	0
LIF.R.FEV1	0.045	0.012	0.482	1.000	0.049	0.015	0.001	0.002	0.062	0.012	0.055	0.014	0	0.001	0	0

unlink(f)

The nearest reference for the implemented approaches is Ligthart et al.⁹

We would be of interest to contrast their effect sizes as well,

mr_names <- names(mr)
LIF.R <- cbind(mr[grepl("SNP|LIF.R",mr_names)],trait="LIF.R"); names(LIF.R) <- c("SNP","b","se","trait")
FEV1 <- cbind(mr[grepl("SNP|FEV1",mr_names)],trait="FEV1"); names(FEV1) <- c("SNP","b","se","trait")
CAD <- cbind(mr[grepl("SNP|CAD",mr_names)],trait="CAD"); names(CAD) <- c("SNP","b","se","trait")
mr2 <- within(rbind(LIF.R,FEV1,CAD),{y=b})
library(ggplot2)
p <- ggplot(mr2,aes(y = SNP, x = y))+
theme_bw()+
geom_point()+
facet_wrap(~trait,ncol=3,scales="free_x")+
geom_segment(aes(x = b-1.96*se, xend = b+1.96*se, yend = SNP))+
geom_vline(lty=2, aes(xintercept=0), colour = 'red')+
xlab("Effect size")+
ylab("")
p
#> Warning: Removed 1 rows containing missing values (geom_point).
#> Warning: Removed 1 rows containing missing values (geom_segment).

Combined forest plots for LIF.R, FEV1 and CAD

Miscellaneous functions

chr_pos_a1_a2 and inv_chr_pos_a1_a2

They are functions to handle SNPid.

require(gap)
s <- chr_pos_a1_a2(1,c(123,321),letters[1:2],letters[2:1])
inv_chr_pos_a1_a2(s)
#>               chr pos a1 a2
#> chr1:123_A_B chr1 123  A  B
#> chr1:321_A_B chr1 321  A  B
inv_chr_pos_a1_a2("chr1:123-A_B",seps=c(":","-","_"))
#>               chr pos a1 a2
#> chr1:123-A_B chr1 123  A  B

gc.lambda

The definition is as follows,

gc.lambda <- function(p) {
  p <- p[!is.na(p)]
  n <- length(p)

  obs <- qchisq(p,1,lower.tail=FALSE)
  exp <- qchisq(1:n/n,1,lower.tail=FALSE)

  lambda <- median(obs)/median(exp)
  return(lambda)
}

# A simplified version is as follows,
# obs <- median(chisq)
# exp <- qchisq(0.5, 1) # 0.4549364
# lambda <- obs/exp
# see also estlambda from GenABEL and qq.chisq from snpStats

# A related function

lambda1000 <- function(lambda, ncases, ncontrols)
  1 + (lambda - 1) * (1 / ncases + 1 / ncontrols)/( 1 / 1000 + 1 / 1000)

invnormal

The function is widely used in various consortium analyses and defined as follows,

invnormal <- function(x) qnorm((rank(x,na.last="keep")-0.5)/sum(!is.na(x)))

An example use on data from Poisson distribution is as follows,

set.seed(12345)
Ni <- rpois(50, lambda = 4); table(factor(Ni, 0:max(Ni)))
#> 
#>  0  1  2  3  4  5  6  7  8  9 
#>  2  4  6 11  8 10  4  2  2  1
y <- invnormal(Ni)
sd(y)
#> [1] 0.9755074
mean(y)
#> [1] 0.002650512
Ni <- 1:50
y <- invnormal(Ni)
mean(y)
#> [1] -1.810184e-17
sd(y)
#> [1] 0.9973999

revStrand

This functions obtains allele(s) on the opposite strand.

alleles <- c("a","c","G","t")
revStrand(alleles)
#> [1] "t" "g" "C" "a"

snptest_sample

This is a function to output sample file for SNPTEST.

d <- data.frame(ID_1=1,ID_2=1,missing=0,PC1=1,PC2=2,D1=1,P1=10)
snptest_sample(d,C=paste0("PC",1:2),D=paste0("D",1:1),P=paste0("P",1:1))

The commands above generates a file named ``snptest.sample.

Known bugs

Unaware of any bug. However, better memory management is expected. There are apparent issues with the Shiny interfaces for power/sample size calculation which produce less outputs for certain configurations than expected.

Summary

I believe by now the package should have given you a flavour of initiatives I have made so far in relation to how the project was envisaged. More importantly, it is clear that availability of the package will serve as a platform on which future work can be accumulated and collaboration can be built.

Bibliography

Guo, S. W. & Lange, K. Genetic mapping of complex traits: Promises, problems, and prospects. Theor Popul Biol 57, 1–11 (2000).

Zhao, J. H. & Tan, Q. Integrated analysis of genetic data with R. Hum Genomics 2, 258–65 (2006).

Zhao, J. H. & Tan, Q. Genetic dissection of complex traits in silico: Approaches, problems and soluations. Current Bioinformatics 1, 359–369 (2006).

Zhao, J. H. Mixed-effects Cox models of alcohol dependence in extended families. BMC Genet 6, S127 (2005).

Zhao, J. H. Pedigree-drawing with R and graphviz. Bioinformatics 22, 1013–4 (2006).

Zhao, J. H. & Sham, P. C. Generic number systems and haplotype analysis. Comp Meth Prog Biomed 70, 1–9 (2003).

Zhao, J. H. gap: Genetic analysis package. Journal of Statistical Software 23, 1–18 (2007).

Kotz, S., Read, C. B., Balakrishnan, N., Vidakovic, B. & Johnson, N. L. Encyclopedia of statistical sciences. (John Wiley & Sons, Inc., 2006).

Ligthart, S. et al. Genome analyses of >200,000 individuals identify 58 loci for chronic inflammation and highlight pathways that link inflammation and complex disorders. Am J Hum Genet 103, 691–706 (2018).

10.

Chow, G. C. Tests of equality between sets of coefficients in two linear regression. Econometrica 28, 591–605 (1960).

11.

Devlin, B. & Roeder, K. Genomic control for association studies. Biometrics 55, 997–1004 (1999).

12.

Elston, R. C. On the correlation between correlations. Biometrika 62, 133–140 (1975).

13.

Gholamic, K. & Thomas, A. A linear time algorithm for calculation of multiple pairwise kinship coefficients and genetic index of familiality. Comp Biomed Res 27, 342–350 (1994).

14.

Hartung, J., Knapp, G. & Sinha, B. K. Statistical meta-analysis with applications. (Wiley, 2008).

15.

Hirotsu, C., Aoki, S., Inada, T. & Kitao, Y. An exact test for the association between the disease and alleles at highly polymorphic loci with particular interest in the haplotype analysis. Biometrics 57, 769–778 (2001).

16.

Miller, M. B. Genomic scanning and the transmission/disequilibrium test: Analysis of error rates. Genet Epidemiol 14, 851–856 (1997).

17.

Risch, N. & Merikangas, K. Reply to Scott el al. Science 275, 1329–1330. (1997).

18.

Sham, P. C. Transmission/disequilibrium tests for multiallelic loci. Am J Hum Genet 61, 774–778 (1997).

19.

Sham, P. C. Statistics in human genetics. (Edward Arnold, 1998).

20.

Skol, A. D., Scott, L. J., Abecasis, G. R. & Boehnke, M. Joint analysis is more efficient than replication-based analysis for two-stage genome-wide association studies. Nat Genet 38, 209–13 (2006).

21.

Spielman, R. S. & Ewens, W. J. The TDT and other family-based tests for linkage disequilibrium and association. Am J Hum Genet 59, 983–9 (1996).

22.

Wacholder, S., Chanock, S., Garcia-Closas, M., El Ghormli, L. & Rothman, N. Assessing the probability that a positive report is false: An approach for molecular epidemiology studies. J Natl Cancer Inst 96, 434–42 (2004).

23.

Wakefield, J. A bayesian measure of the probability of false discovery in genetic epidemiology studies. Am J Hum Genet 81, 208–226 (2007).

24.

Wang, K. A likelihood approach for quantitative-trait-locus mapping with selected pedigrees. Biometrics 61, 465–473 (2005).

25.

Williams, C. J., Christian, J. C. & Norton, J. A. Jr. TWINAN90: A FORTRAN programfor conducting ANOVA-based and likelihood-based analyses of twin data. Comp Meth Prog Biomed 38, 167–76 (1992).

26.

Zaykin, D. V. et al. Testing association of statistically inferred haplotypes with discrete and continuous traits in samples of unrelated individuals. Hum Hered 53, 79–91 (2002).

27.

Zhao, J. H., Sham, P. C. & Curtis, D. A program for the Monte Carlo evaluation of significance of the extended transmission/disequilibrium test. Am J Hum Genet 64, 1484–1485 (1999).

28.

Zhao, J. H., Lissarrague, S., Essioux, L. & Sham, P. C. GENECOUNTING: Haplotype analysis with missing genotypes. Bioinformatics 18, 1694–5 (2002).

29.

Zhao, J. H. 2LD. GENECOUNTING and HAP: Computer programs for linkage disequilibrium analysis. Bioinformatics 20, 1325–6 (2004).

Appendix

Assuming that you have already loaded the package via library(gap), you can use lsf.str("package:gap") and data(package="gap") to generate a list of functions and a list of datasets, respectvely. If this looks odd to you, you might try search() within R to examine what is available in your environment before issuing the lsf.str command.

#>  [1] ".GlobalEnv"           "package:ggplot2"      "package:lattice"      "package:pedigree"     "package:reshape"      "package:HaploSim"    
#>  [7] "package:kinship2"     "package:quadprog"     "package:Matrix"       "package:gap"          "package:gap.datasets" "package:stats"       
#> [13] "package:graphics"     "package:grDevices"    "package:utils"        "package:datasets"     "package:methods"      "Autoloads"           
#> [19] "package:base"
#> AE3 : function (model, random, data, seed = 1234, n.sim = 50000, verbose = TRUE)  
#> BFDP : function (a, b, pi1, W, logscale = FALSE)  
#> Cox.T : function (parms, case, control, k)  
#> Cox.est : function (case, ctl, k0, initial)  
#> DevH0dominant : function (parms, case, control, k)  
#> DevH0dominant.est : function (case, ctl, k0, initial)  
#> DevH0recessive : function (parms, case, control, k)  
#> DevH0recessive.est : function (case, ctl, k0, initial)  
#> DevHaGdominant : function (parms, case, control, k)  
#> DevHaGdominant.est : function (case, ctl, k0, initial)  
#> DevHaGrecessive : function (parms, case, control, k)  
#> DevHaGrecessive.est : function (case, ctl, k0, initial)  
#> ESplot : function (ESdat, alpha = 0.05)  
#> FPRP : function (a, b, pi0, ORlist, logscale = FALSE)  
#> HapDesign : function (HaploEM)  
#> HapFreqSE : function (HaploEM)  
#> KCC : function (model, GRR, p1, K)  
#> LD22 : function (h, n)  
#> LDkl : function (n1 = 2, n2 = 2, h, n, optrho = 2, verbose = FALSE)  
#> MCMCgrm : function (model, prior, data, GRM, eps = 0, n.thin = 10, n.burnin = 3000, n.iter = 13000, ...)  
#> METAL_forestplot : function (tbl, all, rsid, package = "meta", split = FALSE, ...)  
#> PARn : function (p, RRlist)  
#> ReadGRM : function (prefix = 51)  
#> ReadGRMBin : function (prefix, AllN = FALSE, size = 4)  
#> ReadGRMPCA : function (prefix)  
#> ReadGRMPLINK : function (prefix, diag = 1)  
#> VR : function (v1, vv1, v2, vv2, c12)  
#> WriteGRM : function (prefix = 51, id, N, GRM)  
#> WriteGRMBin : function (prefix, grm, N, id, size = 4)  
#> WriteGRMSAS : function (grmlist, outfile = "gwas")  
#> a2g : function (a1, a2)  
#> ab : function (type = "power", n = 25000, a = 0.15, sa = 0.01, b = log(1.19), sb = 0.01, alpha = 0.05, fold = 1)  
#> allDuplicated : function (x)  
#> allele.recode : function (a1, a2, miss.val = NA)  
#> asplot : function (locus, map, genes, flanking = 1000, best.pval = NULL, sf = c(4, 4), logpmax = 10, pch = 21)  
#> b2r : function (b, s, rho, n)  
#> bt : function (x)  
#> ccsize : function (n, q, pD, p1, theta, alpha, beta = 0.2, power = FALSE, verbose = FALSE)  
#> chow.test : function (y1, x1, y2, x2, x = NULL)  
#> chr_pos_a1_a2 : function (chr, pos, a1, a2, prefix = "chr", seps = c(":", "_", "_"), uppercase = TRUE)  
#> circos.cis.vs.trans.plot : function (hits, panel, id, radius = 1e+06)  
#> circos.cnvplot : function (data)  
#> circos.mhtplot : function (data, glist)  
#> cis.vs.trans.classification : function (hits, panel, id, radius = 1e+06)  
#> cnvplot : function (data)  
#> comp.score : function (ibddata = "ibd_dist.out", phenotype = "pheno.dat", mean = 0, var = 1, h2 = 0.3)  
#> cov.invlogit : function (logit.p1, logit.p2, cov.logit)  
#> cs : function (tbl, b = "Effect", se = "StdErr", log_p = NULL, cutoff = 0.95)  
#> fbsize : function (gamma, p, alpha = c(1e-04, 1e-08, 1e-08), beta = 0.2, debug = 0, error = 0)  
#> g2a : function (g)  
#> g2a.c : function (g)  
#> gc.control : function (xdata = FALSE, convll = 1, handle.miss = 0, eps = 1e-06, tol = 1e-08, maxit = 50, pl = 0.001, assignment = "assign.dat", verbose = T)  
#> gc.em : function (data, locus.label = NA, converge.eps = 1e-06, maxiter = 500, handle.miss = 0, miss.val = 0, control = gc.control())  
#> gc.lambda : function (p)  
#> gcode : function (a1, a2)  
#> gcontrol : function (data, zeta = 1000, kappa = 4, tau2 = 1, epsilon = 0.01, ngib = 500, burn = 50, idum = 2348)  
#> gcontrol2 : function (p, col = palette()[4], lcol = palette()[2], ...)  
#> gcp : function (y, cc, g, handle.miss = 1, miss.val = 0, n.sim = 0, locus.label = NULL, quietly = FALSE)  
#> genecounting : function (data, weight = NULL, loci = NULL, control = gc.control())  
#> geno.recode : function (geno, miss.val = 0)  
#> getPTE : function (b1, b2, rho, sdx1 = 1, sdx2 = 1)  
#> get_b_se : function (f, n, z)  
#> get_pve_se : function (n, z, correction = TRUE)  
#> get_sdy : function (f, n, b, se, method = "mean", ...)  
#> getb1star : function (b1, b2, rho, sdx1 = 1, sdx2 = 1)  
#> gif : function (data, gifset)  
#> grec2g : function (h, n, t)  
#> grid2d : function (chrlen, plot = TRUE, cex = 0.6)  
#> gsmr : function (data, X, Y, alpha = 0.05, other_plots = FALSE)  
#> h2.jags : function (y, x, G, eps = 1e-04, sigma.p = 0, sigma.r = 1, parms = c("b", "p", "r", "h2"), ...)  
#> h2G : function (V, VCOV, verbose = TRUE)  
#> h2GE : function (V, VCOV, verbose = TRUE)  
#> h2_mzdz : function (mzDat = NULL, dzDat = NULL, rmz = NULL, rdz = NULL, nmz = NULL, ndz = NULL, selV = NULL)  
#> h2l : function (K = 0.05, P = 0.5, h2, se, verbose = TRUE)  
#> hap : function (id, data, nloci, loci = rep(2, nloci), names = paste("loci", 1:nloci, sep = ""), control = hap.control())  
#> hap.control : function (mb = 0, pr = 0, po = 0.001, to = 0.001, th = 1, maxit = 100, n = 0, ss = 0, rs = 0, rp = 0, ro = 0, rv = 0, sd = 0, mm = 0, mi = 0, 
#>     mc = 50, ds = 0.1, de = 0, q = 0, hapfile = "hap.out", assignfile = "assign.out")  
#> hap.em : function (id, data, locus.label = NA, converge.eps = 1e-06, maxiter = 500, miss.val = 0)  
#> hap.score : function (y, geno, trait.type = "gaussian", offset = NA, x.adj = NA, skip.haplo = 0.005, locus.label = NA, miss.val = 0, n.sim = 0, method = "gc", 
#>     id = NA, handle.miss = 0, mloci = NA, sexid = NA)  
#> hmht.control : function (data = NULL, colors = NULL, yoffset = 0.25, cex = 1.5, boxed = FALSE)  
#> htr : function (y, x, n.sim = 0)  
#> hwe : function (data, data.type = "allele", yates.correct = FALSE, miss.val = 0)  
#> hwe.cc : function (model, case, ctrl, k0, initial1, initial2)  
#> hwe.hardy : function (a, alleles = 3, seed = 3000, sample = c(1000, 1000, 5000))  
#> hwe.jags : function (k, n, delta = rep(1/k, k), lambda = 0, lambdamu = -1, lambdasd = 1, parms = c("p", "f", "q", "theta", "lambda"), ...)  
#> inv_chr_pos_a1_a2 : function (chr_pos_a1_a2, prefix = "chr", seps = c(":", "_", "_"))  
#> invlogit : function (x = 0)  
#> invnormal : function (x)  
#> ixy : function (x)  
#> k : function (r, N, adjust = TRUE)  
#> kin.morgan : function (ped, verbose = FALSE)  
#> klem : function (obs, k = 2, l = 2)  
#> lambda1000 : function (lambda, ncases, ncontrols)  
#> log10p : function (z)  
#> log10pvalue : function (p = NULL, base = NULL, exponent = NULL)  
#> logit : function (p = 0.5)  
#> logp : function (z)  
#> m2plem : function (a1, a2)  
#> makeRLEplot : function (E, log2.data = TRUE, groups = NULL, col.group = NULL, showTitle = FALSE, title = "Relative log expression (RLE) plot", ...)  
#> makeped : function (pifile = "pedfile.pre", pofile = "pedfile.ped", auto.select = 1, with.loop = 0, loop.file = NA, auto.proband = 1, proband.file = NA)  
#> masize : function (model, opts, alpha = 0.025, gamma = 0.2)  
#> metap : function (data, N, verbose = "Y", prefixp = "p", prefixn = "n")  
#> metareg : function (data, N, verbose = "Y", prefixb = "b", prefixse = "se")  
#> mht.control : function (type = "p", usepos = FALSE, logscale = TRUE, base = 10, cutoffs = NULL, colors = NULL, labels = NULL, srt = 45, gap = NULL, cex = 0.4, 
#>     yline = 3, xline = 3)  
#> mhtplot : function (data, control = mht.control(), hcontrol = hmht.control(), ...)  
#> mhtplot.trunc : function (x, chr = "CHR", bp = "BP", p = NULL, log10p = NULL, z = NULL, snp = "SNP", col = c("gray10", "gray60"), chrlabs = NULL, suggestiveline = -log10(1e-05), 
#>     genomewideline = -log10(5e-08), highlight = NULL, annotatelog10P = NULL, annotateTop = FALSE, cex.mtext = 1.5, cex.text = 0.7, mtext.line = 2, 
#>     y.ax.space = 5, y.brk1, y.brk2, trunc.yaxis = TRUE, cex.axis = 1.2, delta = 0.05, ...)  
#> mhtplot2 : function (data, control = mht.control(), hcontrol = hmht.control(), ...)  
#> mia : function (hapfile = "hap.out", assfile = "assign.out", miafile = "mia.out", so = 0, ns = 0, mi = 0, allsnps = 0, sas = 0)  
#> miamiplot : function (x, chr = "CHR", bp = "BP", p = "P", pr = "PR", snp = "SNP", col = c("midnightblue", "chartreuse4"), col2 = c("royalblue1", "seagreen1"), 
#>     ymax = NULL, highlight = NULL, highlight.add = NULL, pch = 19, cex = 0.75, cex.lab = 1, xlab = "Chromosome", ylab = "-log10(P) [y>0]; log10(P) [y<0]", 
#>     lcols = c("red", "black"), lwds = c(5, 2), ltys = c(1, 2), main = "", ...)  
#> micombine : function (est, std.err, confidence = 0.95)  
#> mr.boot : function (bXG, sebXG, bYG, sebYG, w, n.boot = 1000, method = "median")  
#> mtdt : function (x, n.sim = 0)  
#> mtdt2 : function (x, verbose = TRUE, n.sim = NULL, ...)  
#> muvar : function (n.loci = 1, y1 = c(0, 1, 1), y12 = c(1, 1, 1, 1, 1, 0, 0, 0, 0), p1 = 0.99, p2 = 0.9)  
#> mvmeta : function (b, V)  
#> pbsize : function (kp, gamma = 4.5, p = 0.15, alpha = 5e-08, beta = 0.2)  
#> pbsize2 : function (N, fc = 0.5, alpha = 0.05, gamma = 4.5, p = 0.15, kp = 0.1, model = "additive")  
#> pedtodot : function (pedfile, makeped = FALSE, sink = TRUE, page = "B5", url = "https://jinghuazhao.github.io/", height = 0.5, width = 0.75, rotate = 0, 
#>     dir = "none")  
#> pfc : function (famdata, enum = 0)  
#> pfc.sim : function (famdata, n.sim = 1e+06, n.loop = 1)  
#> pgc : function (data, handle.miss = 1, is.genotype = 0, with.id = 0)  
#> plem2m : function (a)  
#> plot.hap.score : function (x, ...)  
#> pqtl2dplot : function (d, chrlen = gap::hg19, snp_name = "SNP", snp_chr = "Chr", snp_pos = "bp", gene_chr = "p.chr", gene_start = "p.start", gene_end = "p.end", 
#>     protein = "p.target.short", gene = "p.gene", lp = "log10p", cis = "cis", plot = TRUE, cex = 0.6)  
#> pqtl2dplotly : function (d, chrlen = gap::hg19)  
#> pqtl3dplotly : function (d, chrlen = gap::hg19, zmax = 300)  
#> print.hap.score : function (x, ...)  
#> pvalue : function (z, decimals = 2)  
#> qqfun : function (x, distribution = "norm", ylab = deparse(substitute(x)), xlab = paste(distribution, "quantiles"), main = NULL, las = par("las"), envelope = 0.95, 
#>     labels = FALSE, col = palette()[4], lcol = palette()[2], xlim = NULL, ylim = NULL, lwd = 1, pch = 1, bg = palette()[4], cex = 0.4, line = c("quartiles", 
#>         "robust", "none"), ...)  
#> qqunif : function (u, type = "unif", logscale = TRUE, base = 10, col = palette()[4], lcol = palette()[2], ci = FALSE, alpha = 0.05, ...)  
#> read.ms.output : function (msout, is.file = TRUE, xpose = TRUE, verbose = TRUE, outfile = NULL, outfileonly = FALSE)  
#> revStrand : function (allele)  
#> revhap : function (loci, hapid)  
#> revhap.i : function (loci, hapid)  
#> runshinygap : function (...)  
#> s2k : function (y1, y2)  
#> se.exp : function (p, se.p)  
#> se.invlogit : function (logit.p, se.logit)  
#> sentinels : function (p, pid, st, debug = FALSE, flanking = 1e+06, chr = "Chrom", pos = "End", b = "Effect", se = "StdErr", log_p = NULL, snp = "MarkerName", 
#>     sep = ",")  
#> snp.ES : function (beta, SE, N)  
#> snp.HWE : function (g)  
#> snp.PAR : function (RR, MAF, unit = 2)  
#> snptest_sample : function (data, sample_file = "snptest.sample", ID_1 = "ID_1", ID_2 = "ID_2", missing = "missing", C = NULL, D = NULL, P = NULL)  
#> solve_skol : function (rootfun, target, lo, hi, e)  
#> textbox : function (label, name = NULL, gp = NULL, vp = NULL)  
#> toETDT : function (a)  
#> tscc : function (model, GRR, p1, n1, n2, M, alpha.genome, pi.samples, pi.markers, K)  
#> ungcode : function (g)  
#> weighted.median : function (x, w)  
#> whscore : function (allele, type)  
#> x2 : function (p1, p2, n1, n2)  
#> xy : function (x)  
#> z : function (p1, p2, n1, n2, r)
#>      Package LibPath                                                      Item   Title                       
#> [1,] "gap"   "/rds/user/jhz22/hpc-work/work/RtmpehWav8/Rinstacff724b054c" "hg18" "Internal functions for gap"
#> [2,] "gap"   "/rds/user/jhz22/hpc-work/work/RtmpehWav8/Rinstacff724b054c" "hg19" "Internal functions for gap"
#> [3,] "gap"   "/rds/user/jhz22/hpc-work/work/RtmpehWav8/Rinstacff724b054c" "hg38" "Internal functions for gap"
#> [4,] "gap"   "/rds/user/jhz22/hpc-work/work/RtmpehWav8/Rinstacff724b054c" "mr"   "Internal functions for gap"

Notes by Howard Seltman from Carnegie Mellon University: Pittsburgh, PA, USA.↩︎

Genetic Analysis Package

Jing Hua Zhao, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK, https://jinghuazhao.github.io/.

2022-05-10

Introduction

Implementation

Independent programs

Demos

Examples

Pedigree drawing

Kinship calculation

Pedigree diagram

Kinship calculation

Study design

Family-based design

Population-based design

Case-cohort design

Graphics

Genome-wide association

Copy number variation

Effect size plot

Forest plot

Significance level for z ~ Normal(0,1)

log(p) and log10(p)

Rationale

Benchmark

p, log(p), log10(p) and the multiple precision arithmetic

Application

Fine-mapping

Linear regression

Some preparations

The proportion of variants explained (PVE)

The effect size and standard error estimates

get_b_se

get_pve_se

get_sdy

Polygenic modeling

Mendelian randomization

Miscellaneous functions

chr_pos_a1_a2 and inv_chr_pos_a1_a2

gc.lambda

invnormal

revStrand

snptest_sample

Known bugs

Summary

Bibliography

Appendix