This package was initiated to integrate some C/Fortran/SAS programs I have written or used over the years. As such, it would rather be a long-term project, but an immediate benefit would be something complementary to other packages currently available from CRAN, e.g. genetics, hwde, etc. I hope eventually this will be part of a bigger effort to fulfill most of the requirements foreseen by many,1 within the portable environment of R for data management, analysis, graphics and object-oriented programming. My view has been outlined more formally2,3 in relation to other package systems and also on package kinship.4,5
The number of functions are quite limited and experimental, but I already feel the enormous advantage by shifting to R and would like sooner rather than later to share my work with others. I will not claim this work as exclusively done by me, but would like to invite others to join me and enlarge the collections and improve them.
With my recent work on genomewide association studies (GWASs) especially protein GWASs, I have added many functions such as METAL_forestplot
which handles data from software METAL and sentinels
which extracts sentinels from GWAS summary statistics in a way that is very appealing to us compared to some other established software. At the meantime, the size of the package surpasses the limit as imposed by CRAN, thus the old good feature of S
as with R
that value both code and data alike has to suffer slightly in that gap.datasets and gap.examples are spun off as two separate packages; they do deserve a glimpse however for some general ideas.
Several other updates are worth mentioning:
use of markdown from the noweb/Sweave format which allows for Citation Style Language (CSL) (https://citationstyles.org/) to format the bibliography.
multiple figures generated from a code section to be embedded.
Experimental implementation of a shiny App (runshinygap
()).
roxygen2 style documentation in companion with R code to allow for easier generation of .Rd files through devtools::document().
It is likely these experiences will be shared with/useful for other colleagues in converting their work into R.
The following list shows the data and functions currently available.
Name | Description |
---|---|
ANALYSIS | |
AE3 | AE model using nuclear family trios |
bt | Bradley-Terry model for contingency table |
ccsize | Power and sample size for case-cohort design |
cs | Credibel set |
fbsize | Sample size for family-based linkage and association design |
gc.em | Gene counting for haplotype analysis |
gcontrol | genomic control |
gcontrol2 | genomic control based on p values |
gcp | Permutation tests using GENECOUNTING |
gc.lambda | Estionmation of the genomic control inflation statistic (lambda) |
genecounting | Gene counting for haplotype analysis |
gif | Kinship coefficient and genetic index of familiality |
grmMCMC | Mixed modeling with genetic relationship matrices |
gsmr | Mendelian randomization analysis |
hap | Haplotype reconstruction |
hap.em | Gene counting for haplotype analysis |
hap.score | Score statistics for association of traits with haplotypes |
htr | Haplotype trend regression |
hwe | Hardy-Weinberg equilibrium test for a multiallelic marker |
hwe.cc | A likelihood ratio test of population Hardy-Weinberg equilibrium |
hwe.hardy | Hardy-Weinberg equilibrium test using MCMC |
invnormal | Inverse normal transformation |
kin.morgan | kinship matrix for simple pedigree |
LD22 | LD statistics for two diallelic markers |
LDkl | LD statistics for two multiallelic markers |
lambda1000 | A standardized estimate of the genomic inflation scaling to |
a study of 1,000 cases and 1,000 controls | |
log10p | log10(p) for a standard normal deviate |
log10pvalue | log10(p) for a P value including its scientific format |
logp | log(p) for a normal deviate |
masize | Sample size calculation for mediation analysis |
mia | multiple imputation analysis for hap |
mtdt | Transmission/disequilibrium test of a multiallelic marker |
mtdt2 | Transmission/disequilibrium test of a multiallelic marker |
by Bradley-Terry model | |
mvmeta | Multivariate meta-analysis based on generalized least squares |
pbsize | Power for population-based association design |
pbsize2 | Power for case-control association design |
pfc | Probability of familial clustering of disease |
pfc.sim | Probability of familial clustering of disease |
pgc | Preparing weight for GENECOUNTING |
print.hap.score | Print a hap.score object |
s2k | Statistics for 2 by K table |
sentinels | Sentinel identification from GWAS summary statistics |
tscc | Power calculation for two-stage case-control design |
GRAPHICS | |
asplot | Regional association plot |
ESplot | Effect-size plot |
circos.cnvplot | circos plot of CNVs |
circos.cis.vs.trans.plot | circos plot of cis/trans classification |
circos.mhtplot | circos Manhattan plot with gene annotation |
cnvplot | genomewide plot of CNVs |
METAL_forestplot | forest plot as R/meta’s forest for METAL outputs |
makeRLEplot | make relative log expression plot |
mhtplot | Manhattan plot |
mhtplot2 | Manhattan plot with annotations |
pqtl2dplot | 2D pQTL plot |
pqtl2dplotly | 2D pQTL plotly |
pqtl3dplotly | 3D pQTL plotly |
mhtplot.trunc | truncated Manhattan plot |
miamiplot | Miami plot |
pedtodot | Converting pedigree(s) to dot file(s) |
plot.hap.score | Plot haplotype frequencies versus haplotype score statistics |
qqfun | Quantile-comparison plots |
qqunif | Q-Q plot for uniformly distributed random variable |
UTILITIES | |
SNP | Functions for single nucleotide polymorphisms (SNPs) |
BFDP | Bayesian false-discovery probability |
FPRP | False-positive report probability |
ab | Test/Power calculation for mediating effect |
b2r | Obtain correlation coefficients and their variance-covariances |
chow.test | Chow’s test for heterogeneity in two regressions |
chr_pos_a1_a2 | Form SNPID from chromosome, posistion and alleles |
cis.vs.trans.classification | a cis/trans classifier |
comp.score | score statistics for testing genetic linkage of quantitative trait |
GRM functions | ReadGRM, ReadGRMBin, ReadGRMPLINK, |
ReadGRMPCA, WriteGRM, | |
WriteGRMBin, WriteGRMSAS | |
handle genomic relationship matrix involving other software | |
get_b_se | Get b and se from AF, n, and z |
get_pve_se | Get pve and its standard error from n, z |
get_sdy | Get sd(y) from AF, n, b, se |
h2G | A utility function for heritability |
h2GE | A utility function for heritability involving gene-environment interaction |
h2l | A utility function for converting observed heritability to its counterpart |
under liability threshold model | |
h2_mzdz | Heritability estimation according to twin correlations |
klem | Haplotype frequency estimation based on a genotype table |
of two multiallelic markers | |
makeped | A function to prepare pedigrees in post-MAKEPED format |
metap | Meta-analysis of p values |
metareg | Fixed and random effects model for meta-analysis |
muvar | Means and variances under 1- and 2- locus (diallelic) QTL model |
pvalue | P value for a normal deviate |
read.ms.output | A utility function to read ms output |
revStrand | Allele on the reverse strand |
runshinygap | Start shinygap |
snptest_sample | A utility to generate SNPTEST sample file |
twinan90 | Classic twin models |
whscore | Whittemore-Halpern scores for allele-sharing |
weighted.median | Weighted median with interpolation |
Assuming proper installation, you will be able to obtain the list by typing library(help=gap)
or view the list within a web browser via help.start()
. See Appendix on how to obtain a full list of functions.
This file can be viewed with command vignette("gap", package="gap")
.
You can cut and paste examples at end of each function’s documentation.
Both genecounting
and hap
are able to handle SNPs and multiallelic markers, with the former be flexible enough to include features such as X-linked data and the later being able to handle large number of SNPs. But they are unable to recode allele labels automatically, so functions gc.em
and hap.em
are in haplo.em
format and used by a modified function hap.score
in association testing.
It is notable that multilocus data are handled differently from that in hwde and elegant definitions of basic genetic data can be found in the genetics package.
Incidentally, I found my C mixed-radixed sorting routine6 is much faster than R’s internal function.
With exceptions such as function pfc
which is very computer-intensive, most functions in the package can easily be adapted for analysis of large datasets involving either SNPs or multiallelic markers. Some are utility functions, e.g. muvar
and whscore
, which will be part of the other analysis routines in the future.
The benefit with R compared to standalone programs is that for users, all functions have unified format. For developers, it is able to incorporate their C/C++ programs more easily and avoid repetitive work such as preparing own routines for matrix algebra and linear models. Further advantage can be taken from packages in Bioconductor, which are designed and written to deal with large number of genes.
To facilitate comparisons and individual preferences, The source codes for 2LD, EHPLUS, GENECOUNTING, HAP, now hosted at GitHub, have enjoyed great popularity ahead of the genomewide association studies (GWAS) therefore are likely to be more familiar than their R couunterparts in gap
. However, you need to follow their instructions to compile for a particular computer system.
I have included ms
code (which is required by read.ms.output
and .xls files to accompany read.ms.output
and FPRP
and BFDP
functions as with a classic twin example for ACE model in OpenMx. The package is now available from CRAN.
For these models it is actually simpler to use facilities as in package mets, which is now in suggests.
A final category is twinan90
, which is now dropped from the package function list due to difficulty to keep up with the requirements by the R
environment but nevertheless you will still be able to compile and use otherwise.
This has been a template for self-contained examples:
library(gap)
demo(gap)
See examples of haplotype analysis there – additional examples are given below.
I would like to highlight pedtodot
, pbsize
, fbsize
and ccsize
functions used for pedigree drawing and power/sample calculations in a genome-wide asssociatoin study as reported.7
I have included the original file for the R News as well as put examples in separate vignettes. They can be accessed via vignette("rnews",package="gap.examples")
and vignette("pedtodot", package="gap.examples")
, respectively.
Next, I will provide an example for kinship calculation using kin.morgan
. It is recommended that individuals in a pedigree are ordered so that parents always precede their children. In this regard, package pedigree can be used, and package kinship2 can be used to produce pedigree diagram as with kinship matrix.
The pedigree diagram is as follows,
library(gap)
#> Loading required package: gap.datasets
#> gap version 1.2.3-6
# pedigree diagram
data(lukas, package="gap.datasets")
library(kinship2)
#> Loading required package: Matrix
#> Loading required package: quadprog
<- with(lukas,pedigree(id,father,mother,sex))
ped plot(ped,cex=0.4)
We then turn to the kinship calculation.
# unordered individuals
library(gap)
<- kin.morgan(lukas)
gk1 write.table(gk1$kin.matrix,"gap_1.txt",quote=FALSE)
library(kinship2)
<- kinship(lukas[,1],lukas[,2],lukas[,3])
kk1 write.table(kk1,"kinship_1.txt",quote=FALSE)
<- gk1$kin.matrix-kk1
d sum(abs(d))
#> [1] 2.443634
# order individuals so that parents precede their children
library(pedigree)
#> Loading required package: HaploSim
#> Loading required package: reshape
#>
#> Attaching package: 'reshape'
#> The following object is masked from 'package:Matrix':
#>
#> expand
<- orderPed(lukas)
op <- lukas[order(op),]
olukas <- kin.morgan(olukas)
gk2
write.table(olukas,"olukas.csv",quote=FALSE)
write.table(gk2$kin.matrix,"gap_2.txt",quote=FALSE)
<- kinship(olukas[,1],olukas[,2],olukas[,3])
kk2 write.table(kk2,"kinship_2.txt",quote=FALSE)
<- gk2$kin.matrix-kk2
z sum(abs(z))
#> [1] 0
We see that in the second case, the result agrees with kinship2.
It now has an experimental work via Shiny from inst/shinygap
.
The example involving family-based design is as follows,
options(width=150)
library(gap)
<- matrix(c(
models 4.0, 0.01,
4.0, 0.10,
4.0, 0.50,
4.0, 0.80,
2.0, 0.01,
2.0, 0.10,
2.0, 0.50,
2.0, 0.80,
1.5, 0.01,
1.5, 0.10,
1.5, 0.50,
1.5, 0.80), ncol=2, byrow=TRUE)
<- "fbsize.txt"
outfile cat("gamma","p","Y","N_asp","P_A","H1","N_tdt","H2","N_asp/tdt",
"L_o","L_s\n",file=outfile,sep="\t")
for(i in 1:12) {
<- models[i,1]
g <- models[i,2]
p <- fbsize(g,p)
z cat(z$gamma,z$p,z$y,z$n1,z$pA,z$h1,z$n2,z$h2,z$n3,
$lambdao,z$lambdas,file=outfile,append=TRUE,sep="\t")
zcat("\n",file=outfile,append=TRUE)
}<- read.table(outfile,header=TRUE,sep="\t")
table1 <- c(4,7,9)
nc <- ceiling(table1[,nc])
table1[,nc] <- c(3,5,6,8,10,11)
dc <- round(table1[,dc],2)
table1[,dc] unlink(outfile)
# APOE-4, Scott WK, Pericak-Vance, MA & Haines JL
# Genetic analysis of complex diseases 1327
<- 4.5
g <- 0.15
p cat("\nAlzheimer's:\n\n")
#>
#> Alzheimer's:
data.frame(fbsize(g,p))
#> gamma p y n1 pA h1 n2 h2 n3 lambdao lambdas
#> 1 4.5 0.15 0.6256916 162.6246 0.8181818 0.4598361 108.994 0.6207625 39.97688 1.671594 1.784353
table1#> gamma p Y N_asp P_A H1 N_tdt H2 N_asp.tdt L_o L_s
#> 1 4.0 0.01 0.52 6402 0.80 0.05 1201 0.11 257 1.08 1.09
#> 2 4.0 0.10 0.60 277 0.80 0.35 165 0.54 53 1.48 1.54
#> 3 4.0 0.50 0.58 446 0.80 0.50 113 0.42 67 1.36 1.39
#> 4 4.0 0.80 0.53 3024 0.80 0.24 244 0.16 177 1.12 1.13
#> 5 2.0 0.01 0.50 445964 0.67 0.03 6371 0.04 2155 1.01 1.01
#> 6 2.0 0.10 0.52 8087 0.67 0.25 761 0.32 290 1.07 1.08
#> 7 2.0 0.50 0.53 3753 0.67 0.50 373 0.47 197 1.11 1.11
#> 8 2.0 0.80 0.51 17909 0.67 0.27 701 0.22 431 1.05 1.05
#> 9 1.5 0.01 0.50 6944779 0.60 0.02 21138 0.03 8508 1.00 1.00
#> 10 1.5 0.10 0.51 101926 0.60 0.21 2427 0.25 1030 1.02 1.02
#> 11 1.5 0.50 0.51 27048 0.60 0.50 1039 0.49 530 1.04 1.04
#> 12 1.5 0.80 0.51 101926 0.60 0.29 1820 0.25 1030 1.02 1.02
The example involving population-based design is as follows,
library(gap)
<- c(0.01,0.05,0.10,0.2)
kp <- matrix(c(
models 4.0, 0.01,
4.0, 0.10,
4.0, 0.50,
4.0, 0.80,
2.0, 0.01,
2.0, 0.10,
2.0, 0.50,
2.0, 0.80,
1.5, 0.01,
1.5, 0.10,
1.5, 0.50,
1.5, 0.80), ncol=2, byrow=TRUE)
<- "pbsize.txt"
outfile cat("gamma","p","p1","p5","p10","p20\n",sep="\t",file=outfile)
for(i in 1:dim(models)[1])
{<- models[i,1]
g <- models[i,2]
p <- vector()
n for(k in kp) n <- c(n,ceiling(pbsize(k,g,p)))
cat(models[i,1:2],n,sep="\t",file=outfile,append=TRUE)
cat("\n",file=outfile,append=TRUE)
} <- read.table(outfile,header=TRUE,sep="\t")
table5
table5#> gamma p p1 p5 p10 p20
#> 1 4.0 0.01 46681 8959 4244 1887
#> 2 4.0 0.10 8180 1570 744 331
#> 3 4.0 0.50 10891 2091 991 441
#> 4 4.0 0.80 31473 6041 2862 1272
#> 5 2.0 0.01 403970 77530 36725 16323
#> 6 2.0 0.10 52709 10116 4792 2130
#> 7 2.0 0.50 35285 6772 3208 1426
#> 8 2.0 0.80 79391 15237 7218 3208
#> 9 1.5 0.01 1599920 307056 145448 64644
#> 10 1.5 0.10 192105 36869 17465 7762
#> 11 1.5 0.50 98013 18811 8911 3961
#> 12 1.5 0.80 192105 36869 17465 7762
For case-cohort design, we obtain results for ARIC and EPIC studies.
library(gap)
# ARIC study
<- "aric.txt"
outfile <- 15792
n <- 0.03
pD <- 0.25
p1 <- 0.05
alpha <- c(1.35,1.40,1.45)
theta <- 0.2
beta <- c(1463,722,468)
s_nb cat("n","pD","p1","hr","q","power","ssize\n",file=outfile,sep="\t")
for(i in 1:3)
{<- s_nb[i]/n
q <- ccsize(n,q,pD,p1,log(theta[i]),alpha,beta,TRUE)
power <- ccsize(n,q,pD,p1,log(theta[i]),alpha,beta)
ssize cat(n,"\t",pD,"\t",p1,"\t",theta[i],"\t",q,"\t",
signif(power,3),"\t",ssize,"\n",
file=outfile,append=TRUE)
}read.table(outfile,header=TRUE,sep="\t")
#> n pD p1 hr q power ssize
#> 1 15792 0.03 0.25 1.35 0.09264184 0.8 1463
#> 2 15792 0.03 0.25 1.40 0.04571935 0.8 722
#> 3 15792 0.03 0.25 1.45 0.02963526 0.8 468
unlink(outfile)
# EPIC study
<- "epic.txt"
outfile <- 25000
n <- 0.00000005
alpha <- 0.2
beta <- c(0.3,0.2,0.1,0.05)
s_pD <- seq(0.1,0.5,by=0.1)
s_p1 <- seq(1.1,1.4,by=0.1)
s_hr cat("n","pD","p1","hr","alpha","ssize\n",file=outfile,sep="\t")
# direct calculation
for(pD in s_pD)
{for(p1 in s_p1)
{for(hr in s_hr)
{<- ccsize(n,q,pD,p1,log(hr),alpha,beta)
ssize if (ssize>0) cat(n,"\t",pD,"\t",p1,"\t",hr,"\t",alpha,"\t",
"\n",
ssize,file=outfile,append=TRUE)
}
}
}read.table(outfile,header=TRUE,sep="\t")
#> n pD p1 hr alpha ssize
#> 1 25000 0.3 0.1 1.3 5e-08 14391
#> 2 25000 0.3 0.1 1.4 5e-08 5732
#> 3 25000 0.3 0.2 1.2 5e-08 21529
#> 4 25000 0.3 0.2 1.3 5e-08 5099
#> 5 25000 0.3 0.2 1.4 5e-08 2613
#> 6 25000 0.3 0.3 1.2 5e-08 11095
#> 7 25000 0.3 0.3 1.3 5e-08 3490
#> 8 25000 0.3 0.3 1.4 5e-08 1882
#> 9 25000 0.3 0.4 1.2 5e-08 8596
#> 10 25000 0.3 0.4 1.3 5e-08 2934
#> 11 25000 0.3 0.4 1.4 5e-08 1611
#> 12 25000 0.3 0.5 1.2 5e-08 7995
#> 13 25000 0.3 0.5 1.3 5e-08 2786
#> 14 25000 0.3 0.5 1.4 5e-08 1538
#> 15 25000 0.2 0.1 1.4 5e-08 9277
#> 16 25000 0.2 0.2 1.3 5e-08 7725
#> 17 25000 0.2 0.2 1.4 5e-08 3164
#> 18 25000 0.2 0.3 1.3 5e-08 4548
#> 19 25000 0.2 0.3 1.4 5e-08 2152
#> 20 25000 0.2 0.4 1.2 5e-08 20131
#> 21 25000 0.2 0.4 1.3 5e-08 3648
#> 22 25000 0.2 0.4 1.4 5e-08 1805
#> 23 25000 0.2 0.5 1.2 5e-08 17120
#> 24 25000 0.2 0.5 1.3 5e-08 3422
#> 25 25000 0.2 0.5 1.4 5e-08 1713
#> 26 25000 0.1 0.2 1.4 5e-08 8615
#> 27 25000 0.1 0.3 1.4 5e-08 3776
#> 28 25000 0.1 0.4 1.3 5e-08 13479
#> 29 25000 0.1 0.4 1.4 5e-08 2824
#> 30 25000 0.1 0.5 1.3 5e-08 10837
#> 31 25000 0.1 0.5 1.4 5e-08 2606
unlink(outfile)
I now include some figures from the documentation that may be of interest.
The following code is used to obtain a Q-Q plot via qqunif
function,
library(gap)
<- runif(1000)
u_obs <- qqunif(u_obs,pch=21,bg="blue",bty="n")
r <- r$y
u_exp <- u_exp >= 2.30103
hits points(r$x[hits],u_exp[hits],pch=21,bg="green")
legend("topleft",sprintf("GC.lambda=%.4f",gc.lambda(u_obs)))
Based on a chicken genome scan data, the code below generates a Manhattan plot, demonstrating the use of gaps to separate chromosomes.
<- with(w4,order(chr,pos))
ord <- w4[ord,]
w4 <- par()
oldpar par(cex=0.6)
<- c(rep(c("blue","red"),15),"red")
colors mhtplot(w4,control=mht.control(colors=colors,gap=1000),pch=19,srt=0)
axis(2,cex.axis=2)
<- -log10(3.60036E-05)
suggestiveline <- -log10(1.8E-06)
genomewideline abline(h=suggestiveline, col="blue")
abline(h=genomewideline, col="green")
abline(h=0)
The code below obtains a Manhattan plot with gene annotation,
<- with(mhtdata,cbind(chr,pos,p))
data <- c("IRS1","SPRY2","FTO","GRIK3","SNED1","HTR1A","MARCH3","WISP3",
glist "PPP1R3B","RP1L1","FDFT1","SLC39A14","GFRA1","MC4R")
<- subset(mhtdata,gene%in%glist)[c("chr","pos","p","gene")]
hdata <- rep(c("lightgray","gray"),11)
color <- length(glist)
glen <- rep("red",glen)
hcolor par(las=2, xpd=TRUE, cex.axis=1.8, cex=0.4)
<- mht.control(colors=color,yline=1.5,xline=3)
ops <- hmht.control(data=hdata,colors=hcolor)
hops mhtplot(data,ops,hops,pch=19)
axis(2,pos=2,at=1:16,cex.axis=0.5)
All these look familiar, so revised form of the function called mhtplot2
was created for additional features such as centering the chromosome ticks, allowing for more sophisticated coloring schemes, using prespecified fonts, etc. Please refer to the function’s documentation example.
We could also go further with a circos Manhattan plot,
circos.mhtplot(mhtdata, glist)
#> Note: 11 points are out of plotting region in sector 'chr16', track '3'.
We now experiment with Miami plot,
<- within(mhtdata,{pr=p})
mhtdata miamiplot(mhtdata,chr="chr",bp="pos",p="p",pr="pr",snp="rsn")
We now illustrate with a truncated Manhattan plot,
par(oma=c(0,0,0,0), mar=c(5,6.5,1,1))
mhtplot.trunc(IL.12B, chr="Chromosome", bp="Position", z="Z",
snp="MarkerName",
suggestiveline=FALSE, genomewideline=-log10(5e-10),
cex.mtext=1.2, cex.text=0.7,
annotatelog10P=-log10(5e-10), annotateTop = FALSE,
highlight=with(genes,gene),
mtext.line=3, y.brk1=115, y.brk2=200, trunc.yaxis=TRUE, delta=0.01,
cex.axis=1.2, cex=0.5, font=2, font.axis=1,
y.ax.space=20,
col = c("blue4", "skyblue")
)
The code below obtains a regional association plot with the asplot
function,
asplot(CDKNlocus, CDKNmap, CDKNgenes, best.pval=5.4e-8, sf=c(3,6))
#> - CDKN2A
#> - CDKN2B
title("CDKN2A/CDKN2B Region")
The function predates the currently popular locuszoom software but leaves the option open for generating such plots on the fly and locally.
Note that all these can serve as templates to customize features of your own.
A plot of copy number variation (CNV) is shown here,
cnvplot(gap.datasets::cnv)
The code below obtains an effect size plot via the ESplot
function.
library(gap)
<- data.frame(id=c("CCL2","CCL7","CCL8","CCL11","CCL13","CXCL6","Monocytes"),
rs12075 b=c(0.1694,-0.0899,-0.0973,0.0749,0.189,0.0816,0.0338387),
se=c(0.0113,0.013,0.0116,0.0114,0.0114,0.0115,0.00713386))
ESplot(rs12075)
It is possible to draw many forest plots given a list of variants; an example is given (only the first one here) here
data(OPG,package="gap.datasets")
METAL_forestplot(OPGtbl[2:6,],OPGall,OPGrsid,width=8.75,height=5)
#> Joining, by = "MarkerName"
#> Joining, by = "MarkerName"
#> Loading required namespace: meta
When a Normally distributed association statistic z is very large, its corresponding p value is very small. A genomewide significance is declared at 0.05/1000000=5e-8 with Bonferroni correction assuming 1 million SNPs are tested. This short note describes how to get -log10(p), which can be used in a Q-Q plot and software such as DEPICT. The solution here is generic since z is also the square root of a chi-squared statistic, for instance.
First thing first, we therefore have the R function z <- function(p) qnorm(p/2,lower.tail=FALSE)
.
First thing first, here are the answers for log(p) and log10(p) given z,
# log(p) for a standard normal deviate z based on log()
<- function(z) log(2)+pnorm(-abs(z), lower.tail=TRUE, log.p=TRUE)
logp
# log10(p) for a standard normal deviate z based on log()
<- function(z) log(2, base=10)+pnorm(-abs(z), lower.tail=TRUE, log.p=TRUE)/log(10) log10p
Note logp()
will be used for functions such as qnorm()
as in R/gap function cs()
whereas log10p()
is more appropriate for Manhattan plot and used in R/gap sentinels()
.
The schematic diagram is shown below.
We start with z=1.96 whose corresponding p value is approximately 0.05.
2*pnorm(-1.96,lower.tail=TRUE)
giving an acceptable value 0.04999579, so we proceed to get log10(p)
log10(2)+log10(pnorm(-abs(z),lower.tail=TRUE))
leading to the expression above from the fact that log10(X)=log(X)/log(10) since log(), being the natural log function, ln() – so log(exp(1)) = 1, in R, works far better on the numerator of the second term. The use of -abs() just makes sure we are working on the lower tail of the standard Normal distribution from which our p value is calculated.
Now we have a stress test,
<- 20000
z -log10p(z)
giving -log10(p) = 86858901.
We would be curious about the p value itself as well, which is furnished with the Rmpfr package
require(Rmpfr)
2*pnorm(mpfr(-abs(z),100),lower.tail=TRUE,log.p=FALSE)
mpfr(log(2),100) + pnorm(mpfr(-abs(z),100),lower.tail=TRUE,log.p=TRUE)
giving p = 1.660579603192917090365313727164e-86858901 and -log(p) = -200000010.1292789076808554854177, respectively. To carry on we have -log10(p) = -log(p)/log(10)=86858901.
To make -log10(p) usable in R we obtain it directly through
as.numeric(-log10(2*pnorm(mpfr(-abs(z),100),lower.tail=TRUE)))
which actually yields exactly the same 86858901.
If we go very far to have z=50000. then -log10(p)=542868107 but we have less luck with Rmpfr.
One may wonder the P value in this case, which is 6.6666145952e-542868108 or simply 6.67e-542868108.
The magic function for doing this is defined as follows,
<- function (z, decimals = 2)
pvalue
{<- -log10p(z)
lp <- ceiling(lp)
exponent <- 10^-(lp - exponent)
base paste0(round(base, decimals), "e", -exponent)
}
and it is more appropriate to express p values in scientific format so they can be handled as follows,
<- function(p=NULL,base=NULL,exponent=NULL)
log10pvalue
{if(!is.null(p))
{<- format(p,scientific=TRUE)
p <- strsplit(p,"e")
p2 <- as.numeric(lapply(p2,"[",1))
base <- as.numeric(lapply(p2,"[",2))
exponent else if(is.null(base) | is.null(exponent)) stop("base and exponent should both be specified")
} log10(base)+exponent
}
used as log10pvalue(p)
when p<=1e-323, or log10pvalue(base=1,exponent=-323) otherwise.
One can also derive logpvalue for natural base (e) similarly.
We end with a quick look-up table
require(gap)
<- data.frame()
v for (z in c(5,10,30,40,50,100,500,1000,2000,3000,5000))
{<- c(z,pvalue(z),logp(z),log10p(z))
vi <- rbind(v,vi)
v
}names(v) <- c("z","P","log(P)","log10(P)")
::kable(v,caption="Table 1. z,P,log(P) and log10(P)") knitr
z | P | log(P) | log10(P) |
---|---|---|---|
5 | 5.73e-7 | -14.3718512134288 | -6.24161567672667 |
10 | 1.52e-23 | -52.5381379699525 | -22.8170234098221 |
30 | 9.81e-198 | -453.628096775783 | -197.008179265997 |
40 | 7.31e-350 | -803.915294833194 | -349.135976463682 |
50 | 2.16e-545 | -1254.13821395886 | -544.665305866333 |
100 | 2.69e-2174 | -5004.83106151365 | -2173.57051287337 |
500 | 2.47e-54290 | -125006.440403451 | -54289.6072695865 |
1000 | 4.58e-217151 | -500007.133547632 | -217150.339011999 |
2000 | 4.34e-868593 | -2000007.82669406 | -868592.362896546 |
3000 | 1.8e-1954329 | -4500008.23215903 | -1954328.74374587 |
5000 | 1.51e-5428685 | -12500008.7429846 | -5428684.82082061 |
The mhtplot.trunc()
function in R/gap accepts three types of arguments:
In all three cases, a log10(P) counterpart is obtained internally and to accommodate extreme value, the y-axis allows for truncation leaving out a given range to highlight the largest.
A plot for the GIANT data is shown here, https://jinghuazhao.github.io/Omics-analysis/BMI/.
The function cs
obtains credible set.
# zcat METAL/4E.BP1-1.tbl.gz | \
# awk 'NR==1 || ($1==4 && $2 >= 187158034 - 1e6 && $2 < 187158034 + 1e6)' > 4E.BP1.z
<- within(read.delim("4E.BP1.z"),{logp <- logp(Effect/StdErr)})
tbl <- cs(tbl)
z <- cs(tbl,log_p="logp") l
Several functions related to linear regression are detailed here.
Let \(\mbox{x} = SNP\ dosage\). Note that \(\mbox{Var}(\mbox{x})=2f(1-f)\), \(f=MAF\) or \(1-MAF\) by symmetry.
Our linear regression model is \(\mbox{y}=a + b\mbox{x} + e\). We have \(\mbox{Var}(\mbox{y}) = b^2\mbox{Var}(\mbox{x}) + \mbox{Var}(e)\). Moreover, \(\mbox{Var}(b)=\mbox{Var}(e)(\mbox{x}'\mbox{x})^{-1}=\mbox{Var}(e)/S_\mbox{xx}\), we have \(\mbox{Var}(e) = \mbox{Var}(b)S_\mbox{xx} = N \mbox{Var}(b) \mbox{Var}(\mbox{x})\). Consequently, let \(z = {b}/{SE(b)}\), we have
\[\begin{eqnarray*} \mbox{Var}(\mbox{y}) &=& \mbox{Var}(\mbox{x})(b^2+N\mbox{Var}(b)) \hspace{100cm} \cr &=& \mbox{Var}(\mbox{x})\mbox{Var}(\mbox{b})(z^2+N) \cr &=& 2f(1-f)(z^2+N)\mbox{Var}(b) \end{eqnarray*}\]
Moreover, the mean and the variance of the multiple correlation coefficient or the coefficient of determination (\(R^2\)) are known8 to be \({1}/{(N-1)}\) and \({2(N-2)}/{\left[(N-1)^2(N+1)\right]}\), respectively.
We also need some established results of a ratio (R/S)1, i.e., the mean
\[ \begin{align} E(R/S) \approx \frac{\mu_R}{\mu_S}-\frac{\mbox{Cov}(R,S)}{\mu_S^2}+\frac{\sigma_S^2\mu_R}{\mu_S^3} \hspace{100cm} \end{align} \]
and more importantly the variance
\[ \begin{align} \mbox{Var}(R/S) \approx \frac{\mu_R^2}{\mu_S^2} \left[ \frac{\sigma_R^2}{\mu_R^2} -2\frac{\mbox{Cov}(R,S)}{\mu_R\;\mu_S} +\frac{\sigma_S^2}{\mu_S^2} \right] \hspace{100cm} \end{align} \]
where \(\mu_R\), \(\mu_S\), \(\sigma_R^2\), \(\sigma_S^2\) are the means and the variances for R and S, respectively.
Finally, we need some facts about \(\chi_1^2\), \(\chi^2\) distribution of one degree of freedom. For \(z \sim N(0,1)\), \(z^2\sim \chi_1^2\), whose mean and variance are 1 and 2, respectively.
We now have the following results.
From above we have
\[ \begin{align} \mbox{PVE}_{\mbox{linear regression}} & = \frac{\mbox{Var}(b\mbox{x})}{\mbox{Var}(\mbox{y})} \hspace{100cm} \\ & = \frac{\mbox{Var}(\mbox{x})b^2}{\mbox{Var}(\mbox{x})(b^2+N\mbox{Var}(b))} \\ & = \frac{\mbox{z}^2}{\mbox{z}^2+N} \end{align} \]
On the other hand, for a simple linear regression \(R^2\equiv r^2\) where \(r\) is the Pearson correlation coefficient, which is readily from the \(t\)-statistic of the regression slope, i.e., \(r={t}/{\sqrt{t^2+N-2}}\). so assuming \(t \equiv \ z \sim \chi_1^2\)
\[ \begin{align} \mbox{PVE}_{t-\mbox{statistic}} & = \frac{\chi^2}{\chi^2+N-2} \hspace{100cm} \end{align} \]
To obtain coherent estimates of the asymptotic means and variances of both forms we resort to variance of a ratio (R/S). All the required elements are listed in a table below.
Characteristics | Linear regression | \(t\)-statistic |
---|---|---|
\(\mu_R\) | 1 | 1 |
\(\sigma_R^2\) | 2 | 2 |
\(\mu_S\) | \(N+1\) | \(N-1\) |
\(\sigma_S^2\) | 2 | 2 |
\(\mbox{Cov}(R,S)\) | 2 | 2 |
then we have the means and the variances for PVE.
Characteristics | Linear regression | \(t\)-statistic |
---|---|---|
mean | \(\frac{1}{N+1}\left[1-\frac{2}{N+1}+\frac{2}{(N+1)^2}\right]\) | \(\frac{1}{N-1}\left[1-\frac{2}{N-1}+\frac{2}{(N-1)^2}\right]\) |
variance | \(\frac{2}{(N+1)^2}\left[1-\frac{1}{N+1}\right]^2\) | \(\frac{2}{(N-1)^2}\left[1-\frac{1}{N-1}\right]^2\) |
Finally, our approximation of PVE for a protein with \(T\) independent pQTLs from the meta-analysis
Characteristics | Linear regression | \(t\)-statistic |
---|---|---|
estimate | \(\sum_{i=1}^T{\frac{\chi_i^2}{\chi_i^2+N_i}}\) | \(\sum_{i=1}^T{\frac{\chi_i^2}{\chi_i^2+N_i-2}}\) |
variance | \(\sum_{i=1}^T\frac{2}{(N_i+1)^2}\) | \(\sum_{i=1}^T\frac{2}{(N_i-1)^2}\) |
When \(\mbox{Var}(\mbox{y})=1\), as in cis eQTLGen data, we have \(b\) and its standard error (se) as follows,
\[ \begin{align} b & = z/d \hspace{100cm} \\ se & = 1/d \end{align} \]
where \(d = \sqrt{2f(1-f)(z^2+N)}\).
Now three functions are in place.
A record of the eQTLGen data is shown below
SNP Pvalue SNPChr SNPPos AssessedAllele OtherAllele Zscore
rs1003563 2.308e-06 12 6424577 A G 4.7245
Gene GeneSymbol GeneChr GenePos NrCohorts NrSamples FDR
ENSG00000111321 LTBR 12 6492472 34 23991 0.006278872
BonferroniP hg19_chr hg19_pos AlleleA AlleleB allA_total allAB_total
1 12 6424577 A G 2574 8483
allB_total AlleleB_all
7859 0.6396966
from which we obtain the effect size and its standard error as follows,
get_b_se(0.6396966,23991,4.7245)
#> b se
#> [1,] 0.04490488 0.009504684
This function obtains proportion of explained variation (PVE) from n, z; its standard error is based on variance of the ratio (correction=TRUE) or \(r^2\).
We continue with the eQTLGen example above,
get_sdy(0.6396966,23991,0.04490488,0.009504684)
#> [1] 1
and indeed the eQTLGen data were standardized.
In line with the recent surge of interest in the polygenic models, a separate vignette is available through vignette("h2",package="gap.examples")
demonstrating aspect of the models on heritability.
Brief documentations are available for functions h2G
, h2GE
and h2l
.
The function was originally developed to rework on data generated from GSMR, although it could be any harmonised data.
::kable(mr,caption="Table 1. LIF.R and CAD/FEV1") knitr
SNP | b.LIF.R | SE.LIF.R | b.FEV1 | SE.FEV1 | b.CAD | SE.CAD |
---|---|---|---|---|---|---|
rs188743906 | 0.6804 | 0.1104 | 0.00177 | 0.01660 | NA | NA |
rs2289779 | -0.0788 | 0.0134 | 0.00104 | 0.00261 | -0.007543 | 0.0092258 |
rs117804300 | -0.2281 | 0.0390 | -0.00392 | 0.00855 | 0.109372 | 0.0362219 |
rs7033492 | -0.0968 | 0.0147 | -0.00585 | 0.00269 | 0.022793 | 0.0119903 |
rs10793962 | 0.2098 | 0.0212 | 0.00378 | 0.00536 | -0.014567 | 0.0138196 |
rs635634 | -0.2885 | 0.0153 | -0.02040 | 0.00334 | 0.077157 | 0.0117123 |
rs176690 | -0.0973 | 0.0142 | 0.00293 | 0.00306 | -0.000007 | 0.0107781 |
rs147278971 | -0.2336 | 0.0378 | -0.01240 | 0.00792 | 0.079873 | 0.0397491 |
rs11562629 | 0.1155 | 0.0181 | 0.00960 | 0.00378 | -0.010040 | 0.0151460 |
<- gsmr(mr, "LIF.R", c("CAD","FEV1"),other_plots=TRUE) res
<- "INF1_CAD-FEV1.csv"
f write.table(with(res,r), file=f, quote=FALSE, col.names=FALSE, row.names=FALSE, sep=",")
<- function(r)
top sapply(c("IVW","EGGER","WM","PWM"), function(x) as.numeric(gap::pvalue(r[[paste0("b",x)]]/r[[paste0("seb",x)]])))
<- read.csv(f,as.is=TRUE)
r <- top(r)
p ::kable(data.frame(r,p),caption="Table 2. LIFR variant rs635634 and CAD/FEV1",digits=3) knitr
IV | bIVW | sebIVW | CochQ | CochQp | bEGGER | sebEGGER | intEGGER | seintEGGER | bWM | sebWM | bPWM | sebPWM | IVW | EGGER | WM | PWM |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
LIF.R.CAD | -0.187 | 0.050 | 2.116 | 0.953 | -0.184 | 0.060 | 0.001 | 0.010 | -0.247 | 0.045 | -0.248 | 0.044 | 0 | 0.002 | 0 | 0 |
LIF.R.FEV1 | 0.045 | 0.012 | 0.482 | 1.000 | 0.049 | 0.015 | 0.001 | 0.002 | 0.062 | 0.012 | 0.055 | 0.014 | 0 | 0.001 | 0 | 0 |
unlink(f)
The nearest reference for the implemented approaches is Ligthart et al.9
We would be of interest to contrast their effect sizes as well,
<- names(mr)
mr_names <- cbind(mr[grepl("SNP|LIF.R",mr_names)],trait="LIF.R"); names(LIF.R) <- c("SNP","b","se","trait")
LIF.R <- cbind(mr[grepl("SNP|FEV1",mr_names)],trait="FEV1"); names(FEV1) <- c("SNP","b","se","trait")
FEV1 <- cbind(mr[grepl("SNP|CAD",mr_names)],trait="CAD"); names(CAD) <- c("SNP","b","se","trait")
CAD <- within(rbind(LIF.R,FEV1,CAD),{y=b})
mr2 library(ggplot2)
<- ggplot(mr2,aes(y = SNP, x = y))+
p theme_bw()+
geom_point()+
facet_wrap(~trait,ncol=3,scales="free_x")+
geom_segment(aes(x = b-1.96*se, xend = b+1.96*se, yend = SNP))+
geom_vline(lty=2, aes(xintercept=0), colour = 'red')+
xlab("Effect size")+
ylab("")
p#> Warning: Removed 1 rows containing missing values (geom_point).
#> Warning: Removed 1 rows containing missing values (geom_segment).
They are functions to handle SNPid.
require(gap)
<- chr_pos_a1_a2(1,c(123,321),letters[1:2],letters[2:1])
s inv_chr_pos_a1_a2(s)
#> chr pos a1 a2
#> chr1:123_A_B chr1 123 A B
#> chr1:321_A_B chr1 321 A B
inv_chr_pos_a1_a2("chr1:123-A_B",seps=c(":","-","_"))
#> chr pos a1 a2
#> chr1:123-A_B chr1 123 A B
The definition is as follows,
<- function(p) {
gc.lambda <- p[!is.na(p)]
p <- length(p)
n
<- qchisq(p,1,lower.tail=FALSE)
obs <- qchisq(1:n/n,1,lower.tail=FALSE)
exp
<- median(obs)/median(exp)
lambda return(lambda)
}
# A simplified version is as follows,
# obs <- median(chisq)
# exp <- qchisq(0.5, 1) # 0.4549364
# lambda <- obs/exp
# see also estlambda from GenABEL and qq.chisq from snpStats
# A related function
<- function(lambda, ncases, ncontrols)
lambda1000 1 + (lambda - 1) * (1 / ncases + 1 / ncontrols)/( 1 / 1000 + 1 / 1000)
The function is widely used in various consortium analyses and defined as follows,
<- function(x) qnorm((rank(x,na.last="keep")-0.5)/sum(!is.na(x))) invnormal
An example use on data from Poisson distribution is as follows,
set.seed(12345)
<- rpois(50, lambda = 4); table(factor(Ni, 0:max(Ni)))
Ni #>
#> 0 1 2 3 4 5 6 7 8 9
#> 2 4 6 11 8 10 4 2 2 1
<- invnormal(Ni)
y sd(y)
#> [1] 0.9755074
mean(y)
#> [1] 0.002650512
<- 1:50
Ni <- invnormal(Ni)
y mean(y)
#> [1] -1.810184e-17
sd(y)
#> [1] 0.9973999
This functions obtains allele(s) on the opposite strand.
<- c("a","c","G","t")
alleles revStrand(alleles)
#> [1] "t" "g" "C" "a"
This is a function to output sample file for SNPTEST.
<- data.frame(ID_1=1,ID_2=1,missing=0,PC1=1,PC2=2,D1=1,P1=10)
d snptest_sample(d,C=paste0("PC",1:2),D=paste0("D",1:1),P=paste0("P",1:1))
The commands above generates a file named ``snptest.sample.
Unaware of any bug. However, better memory management is expected. There are apparent issues with the Shiny interfaces for power/sample size calculation which produce less outputs for certain configurations than expected.
I believe by now the package should have given you a flavour of initiatives I have made so far in relation to how the project was envisaged. More importantly, it is clear that availability of the package will serve as a platform on which future work can be accumulated and collaboration can be built.
Assuming that you have already loaded the package via library(gap)
, you can use lsf.str("package:gap")
and data(package="gap")
to generate a list of functions and a list of datasets, respectvely. If this looks odd to you, you might try search()
within R
to examine what is available in your environment before issuing the lsf.str
command.
#> [1] ".GlobalEnv" "package:ggplot2" "package:lattice" "package:pedigree" "package:reshape" "package:HaploSim"
#> [7] "package:kinship2" "package:quadprog" "package:Matrix" "package:gap" "package:gap.datasets" "package:stats"
#> [13] "package:graphics" "package:grDevices" "package:utils" "package:datasets" "package:methods" "Autoloads"
#> [19] "package:base"
#> AE3 : function (model, random, data, seed = 1234, n.sim = 50000, verbose = TRUE)
#> BFDP : function (a, b, pi1, W, logscale = FALSE)
#> Cox.T : function (parms, case, control, k)
#> Cox.est : function (case, ctl, k0, initial)
#> DevH0dominant : function (parms, case, control, k)
#> DevH0dominant.est : function (case, ctl, k0, initial)
#> DevH0recessive : function (parms, case, control, k)
#> DevH0recessive.est : function (case, ctl, k0, initial)
#> DevHaGdominant : function (parms, case, control, k)
#> DevHaGdominant.est : function (case, ctl, k0, initial)
#> DevHaGrecessive : function (parms, case, control, k)
#> DevHaGrecessive.est : function (case, ctl, k0, initial)
#> ESplot : function (ESdat, alpha = 0.05)
#> FPRP : function (a, b, pi0, ORlist, logscale = FALSE)
#> HapDesign : function (HaploEM)
#> HapFreqSE : function (HaploEM)
#> KCC : function (model, GRR, p1, K)
#> LD22 : function (h, n)
#> LDkl : function (n1 = 2, n2 = 2, h, n, optrho = 2, verbose = FALSE)
#> MCMCgrm : function (model, prior, data, GRM, eps = 0, n.thin = 10, n.burnin = 3000, n.iter = 13000, ...)
#> METAL_forestplot : function (tbl, all, rsid, package = "meta", split = FALSE, ...)
#> PARn : function (p, RRlist)
#> ReadGRM : function (prefix = 51)
#> ReadGRMBin : function (prefix, AllN = FALSE, size = 4)
#> ReadGRMPCA : function (prefix)
#> ReadGRMPLINK : function (prefix, diag = 1)
#> VR : function (v1, vv1, v2, vv2, c12)
#> WriteGRM : function (prefix = 51, id, N, GRM)
#> WriteGRMBin : function (prefix, grm, N, id, size = 4)
#> WriteGRMSAS : function (grmlist, outfile = "gwas")
#> a2g : function (a1, a2)
#> ab : function (type = "power", n = 25000, a = 0.15, sa = 0.01, b = log(1.19), sb = 0.01, alpha = 0.05, fold = 1)
#> allDuplicated : function (x)
#> allele.recode : function (a1, a2, miss.val = NA)
#> asplot : function (locus, map, genes, flanking = 1000, best.pval = NULL, sf = c(4, 4), logpmax = 10, pch = 21)
#> b2r : function (b, s, rho, n)
#> bt : function (x)
#> ccsize : function (n, q, pD, p1, theta, alpha, beta = 0.2, power = FALSE, verbose = FALSE)
#> chow.test : function (y1, x1, y2, x2, x = NULL)
#> chr_pos_a1_a2 : function (chr, pos, a1, a2, prefix = "chr", seps = c(":", "_", "_"), uppercase = TRUE)
#> circos.cis.vs.trans.plot : function (hits, panel, id, radius = 1e+06)
#> circos.cnvplot : function (data)
#> circos.mhtplot : function (data, glist)
#> cis.vs.trans.classification : function (hits, panel, id, radius = 1e+06)
#> cnvplot : function (data)
#> comp.score : function (ibddata = "ibd_dist.out", phenotype = "pheno.dat", mean = 0, var = 1, h2 = 0.3)
#> cov.invlogit : function (logit.p1, logit.p2, cov.logit)
#> cs : function (tbl, b = "Effect", se = "StdErr", log_p = NULL, cutoff = 0.95)
#> fbsize : function (gamma, p, alpha = c(1e-04, 1e-08, 1e-08), beta = 0.2, debug = 0, error = 0)
#> g2a : function (g)
#> g2a.c : function (g)
#> gc.control : function (xdata = FALSE, convll = 1, handle.miss = 0, eps = 1e-06, tol = 1e-08, maxit = 50, pl = 0.001, assignment = "assign.dat", verbose = T)
#> gc.em : function (data, locus.label = NA, converge.eps = 1e-06, maxiter = 500, handle.miss = 0, miss.val = 0, control = gc.control())
#> gc.lambda : function (p)
#> gcode : function (a1, a2)
#> gcontrol : function (data, zeta = 1000, kappa = 4, tau2 = 1, epsilon = 0.01, ngib = 500, burn = 50, idum = 2348)
#> gcontrol2 : function (p, col = palette()[4], lcol = palette()[2], ...)
#> gcp : function (y, cc, g, handle.miss = 1, miss.val = 0, n.sim = 0, locus.label = NULL, quietly = FALSE)
#> genecounting : function (data, weight = NULL, loci = NULL, control = gc.control())
#> geno.recode : function (geno, miss.val = 0)
#> getPTE : function (b1, b2, rho, sdx1 = 1, sdx2 = 1)
#> get_b_se : function (f, n, z)
#> get_pve_se : function (n, z, correction = TRUE)
#> get_sdy : function (f, n, b, se, method = "mean", ...)
#> getb1star : function (b1, b2, rho, sdx1 = 1, sdx2 = 1)
#> gif : function (data, gifset)
#> grec2g : function (h, n, t)
#> grid2d : function (chrlen, plot = TRUE, cex = 0.6)
#> gsmr : function (data, X, Y, alpha = 0.05, other_plots = FALSE)
#> h2.jags : function (y, x, G, eps = 1e-04, sigma.p = 0, sigma.r = 1, parms = c("b", "p", "r", "h2"), ...)
#> h2G : function (V, VCOV, verbose = TRUE)
#> h2GE : function (V, VCOV, verbose = TRUE)
#> h2_mzdz : function (mzDat = NULL, dzDat = NULL, rmz = NULL, rdz = NULL, nmz = NULL, ndz = NULL, selV = NULL)
#> h2l : function (K = 0.05, P = 0.5, h2, se, verbose = TRUE)
#> hap : function (id, data, nloci, loci = rep(2, nloci), names = paste("loci", 1:nloci, sep = ""), control = hap.control())
#> hap.control : function (mb = 0, pr = 0, po = 0.001, to = 0.001, th = 1, maxit = 100, n = 0, ss = 0, rs = 0, rp = 0, ro = 0, rv = 0, sd = 0, mm = 0, mi = 0,
#> mc = 50, ds = 0.1, de = 0, q = 0, hapfile = "hap.out", assignfile = "assign.out")
#> hap.em : function (id, data, locus.label = NA, converge.eps = 1e-06, maxiter = 500, miss.val = 0)
#> hap.score : function (y, geno, trait.type = "gaussian", offset = NA, x.adj = NA, skip.haplo = 0.005, locus.label = NA, miss.val = 0, n.sim = 0, method = "gc",
#> id = NA, handle.miss = 0, mloci = NA, sexid = NA)
#> hmht.control : function (data = NULL, colors = NULL, yoffset = 0.25, cex = 1.5, boxed = FALSE)
#> htr : function (y, x, n.sim = 0)
#> hwe : function (data, data.type = "allele", yates.correct = FALSE, miss.val = 0)
#> hwe.cc : function (model, case, ctrl, k0, initial1, initial2)
#> hwe.hardy : function (a, alleles = 3, seed = 3000, sample = c(1000, 1000, 5000))
#> hwe.jags : function (k, n, delta = rep(1/k, k), lambda = 0, lambdamu = -1, lambdasd = 1, parms = c("p", "f", "q", "theta", "lambda"), ...)
#> inv_chr_pos_a1_a2 : function (chr_pos_a1_a2, prefix = "chr", seps = c(":", "_", "_"))
#> invlogit : function (x = 0)
#> invnormal : function (x)
#> ixy : function (x)
#> k : function (r, N, adjust = TRUE)
#> kin.morgan : function (ped, verbose = FALSE)
#> klem : function (obs, k = 2, l = 2)
#> lambda1000 : function (lambda, ncases, ncontrols)
#> log10p : function (z)
#> log10pvalue : function (p = NULL, base = NULL, exponent = NULL)
#> logit : function (p = 0.5)
#> logp : function (z)
#> m2plem : function (a1, a2)
#> makeRLEplot : function (E, log2.data = TRUE, groups = NULL, col.group = NULL, showTitle = FALSE, title = "Relative log expression (RLE) plot", ...)
#> makeped : function (pifile = "pedfile.pre", pofile = "pedfile.ped", auto.select = 1, with.loop = 0, loop.file = NA, auto.proband = 1, proband.file = NA)
#> masize : function (model, opts, alpha = 0.025, gamma = 0.2)
#> metap : function (data, N, verbose = "Y", prefixp = "p", prefixn = "n")
#> metareg : function (data, N, verbose = "Y", prefixb = "b", prefixse = "se")
#> mht.control : function (type = "p", usepos = FALSE, logscale = TRUE, base = 10, cutoffs = NULL, colors = NULL, labels = NULL, srt = 45, gap = NULL, cex = 0.4,
#> yline = 3, xline = 3)
#> mhtplot : function (data, control = mht.control(), hcontrol = hmht.control(), ...)
#> mhtplot.trunc : function (x, chr = "CHR", bp = "BP", p = NULL, log10p = NULL, z = NULL, snp = "SNP", col = c("gray10", "gray60"), chrlabs = NULL, suggestiveline = -log10(1e-05),
#> genomewideline = -log10(5e-08), highlight = NULL, annotatelog10P = NULL, annotateTop = FALSE, cex.mtext = 1.5, cex.text = 0.7, mtext.line = 2,
#> y.ax.space = 5, y.brk1, y.brk2, trunc.yaxis = TRUE, cex.axis = 1.2, delta = 0.05, ...)
#> mhtplot2 : function (data, control = mht.control(), hcontrol = hmht.control(), ...)
#> mia : function (hapfile = "hap.out", assfile = "assign.out", miafile = "mia.out", so = 0, ns = 0, mi = 0, allsnps = 0, sas = 0)
#> miamiplot : function (x, chr = "CHR", bp = "BP", p = "P", pr = "PR", snp = "SNP", col = c("midnightblue", "chartreuse4"), col2 = c("royalblue1", "seagreen1"),
#> ymax = NULL, highlight = NULL, highlight.add = NULL, pch = 19, cex = 0.75, cex.lab = 1, xlab = "Chromosome", ylab = "-log10(P) [y>0]; log10(P) [y<0]",
#> lcols = c("red", "black"), lwds = c(5, 2), ltys = c(1, 2), main = "", ...)
#> micombine : function (est, std.err, confidence = 0.95)
#> mr.boot : function (bXG, sebXG, bYG, sebYG, w, n.boot = 1000, method = "median")
#> mtdt : function (x, n.sim = 0)
#> mtdt2 : function (x, verbose = TRUE, n.sim = NULL, ...)
#> muvar : function (n.loci = 1, y1 = c(0, 1, 1), y12 = c(1, 1, 1, 1, 1, 0, 0, 0, 0), p1 = 0.99, p2 = 0.9)
#> mvmeta : function (b, V)
#> pbsize : function (kp, gamma = 4.5, p = 0.15, alpha = 5e-08, beta = 0.2)
#> pbsize2 : function (N, fc = 0.5, alpha = 0.05, gamma = 4.5, p = 0.15, kp = 0.1, model = "additive")
#> pedtodot : function (pedfile, makeped = FALSE, sink = TRUE, page = "B5", url = "https://jinghuazhao.github.io/", height = 0.5, width = 0.75, rotate = 0,
#> dir = "none")
#> pfc : function (famdata, enum = 0)
#> pfc.sim : function (famdata, n.sim = 1e+06, n.loop = 1)
#> pgc : function (data, handle.miss = 1, is.genotype = 0, with.id = 0)
#> plem2m : function (a)
#> plot.hap.score : function (x, ...)
#> pqtl2dplot : function (d, chrlen = gap::hg19, snp_name = "SNP", snp_chr = "Chr", snp_pos = "bp", gene_chr = "p.chr", gene_start = "p.start", gene_end = "p.end",
#> protein = "p.target.short", gene = "p.gene", lp = "log10p", cis = "cis", plot = TRUE, cex = 0.6)
#> pqtl2dplotly : function (d, chrlen = gap::hg19)
#> pqtl3dplotly : function (d, chrlen = gap::hg19, zmax = 300)
#> print.hap.score : function (x, ...)
#> pvalue : function (z, decimals = 2)
#> qqfun : function (x, distribution = "norm", ylab = deparse(substitute(x)), xlab = paste(distribution, "quantiles"), main = NULL, las = par("las"), envelope = 0.95,
#> labels = FALSE, col = palette()[4], lcol = palette()[2], xlim = NULL, ylim = NULL, lwd = 1, pch = 1, bg = palette()[4], cex = 0.4, line = c("quartiles",
#> "robust", "none"), ...)
#> qqunif : function (u, type = "unif", logscale = TRUE, base = 10, col = palette()[4], lcol = palette()[2], ci = FALSE, alpha = 0.05, ...)
#> read.ms.output : function (msout, is.file = TRUE, xpose = TRUE, verbose = TRUE, outfile = NULL, outfileonly = FALSE)
#> revStrand : function (allele)
#> revhap : function (loci, hapid)
#> revhap.i : function (loci, hapid)
#> runshinygap : function (...)
#> s2k : function (y1, y2)
#> se.exp : function (p, se.p)
#> se.invlogit : function (logit.p, se.logit)
#> sentinels : function (p, pid, st, debug = FALSE, flanking = 1e+06, chr = "Chrom", pos = "End", b = "Effect", se = "StdErr", log_p = NULL, snp = "MarkerName",
#> sep = ",")
#> snp.ES : function (beta, SE, N)
#> snp.HWE : function (g)
#> snp.PAR : function (RR, MAF, unit = 2)
#> snptest_sample : function (data, sample_file = "snptest.sample", ID_1 = "ID_1", ID_2 = "ID_2", missing = "missing", C = NULL, D = NULL, P = NULL)
#> solve_skol : function (rootfun, target, lo, hi, e)
#> textbox : function (label, name = NULL, gp = NULL, vp = NULL)
#> toETDT : function (a)
#> tscc : function (model, GRR, p1, n1, n2, M, alpha.genome, pi.samples, pi.markers, K)
#> ungcode : function (g)
#> weighted.median : function (x, w)
#> whscore : function (allele, type)
#> x2 : function (p1, p2, n1, n2)
#> xy : function (x)
#> z : function (p1, p2, n1, n2, r)
#> Package LibPath Item Title
#> [1,] "gap" "/rds/user/jhz22/hpc-work/work/RtmpehWav8/Rinstacff724b054c" "hg18" "Internal functions for gap"
#> [2,] "gap" "/rds/user/jhz22/hpc-work/work/RtmpehWav8/Rinstacff724b054c" "hg19" "Internal functions for gap"
#> [3,] "gap" "/rds/user/jhz22/hpc-work/work/RtmpehWav8/Rinstacff724b054c" "hg38" "Internal functions for gap"
#> [4,] "gap" "/rds/user/jhz22/hpc-work/work/RtmpehWav8/Rinstacff724b054c" "mr" "Internal functions for gap"
Notes by Howard Seltman from Carnegie Mellon University: Pittsburgh, PA, USA.↩︎