SimPhe

This vignette introduces how to use SimPhe, an R package dedicated to simulate multiple phenotyes based on genotyping data. The main feature of this package is the possibility to take genetic epistatic effects, not only additive-additive interaction but also additive-dominance and dominance-dominance, heritability and multiple phenotypes into account. Moreover, we provide a variety of convenient functions, including suggestion for set the variation of random variable according to user specified genetic effects and flexible support for input formats, as well as output formats. It also supports the input of plink formats.

Sample implementation

# load package
library(SimPhe)

First, we show how easy to get phenotype(s) by using sim.phe. Before running sim.phe, we need specify the parameter file and genotype file for simulation. After installing SimPhe, these two toy files exist in package folder. To get the path, type

# get file path of simulation parameters
# (two shared SNP pairs and one independent SNP pair for each phenotype)
fpar.path <- system.file("extdata", "simupars.txt", package="SimPhe")

# get file path of genotype file: rows are individuals and columns are SNPs
fgeno.path <- system.file("extdata", "10SNP.txt", package="SimPhe")

Then simulate the phenotypes as designed in the parameter file after loading package:

# simulate phenotype(s)
phe <- sim.phe(sim.pars = fpar.path, 
               fgeno = fgeno.path, 
               ftype = "snp.head", 
               seed = 123, 
               fwrite = FALSE)

In the parameter file, we conduct two phenotypes contrbituted by two common SNP pairs with epistatic effects and one independent SNP pair with epistatic effects, the example of simulation parameters, same as the example file, is also included in SimPhe. Users could touch it by typing genepars.

genepars

## $P1mean
##   mean
## 1   10
## 
## $P1main
##     SNP additive dominance
## 1 SNP01      4.0       2.0
## 2 SNP02      2.5       1.5
## 3 SNP03      5.5       1.8
## 4 SNP04     -2.5       1.0
## 5 SNP05      8.5       9.8
## 6 SNP06      8.7       8.4
## 
## $P1epistasis
##    SNPA  SNPB additive_additive additive_dominance dominance_additive
## 1 SNP01 SNP02              10.0                2.0                2.8
## 2 SNP03 SNP04               8.0                3.0                3.5
## 3 SNP05 SNP06              20.5               20.6               20.4
##   dominance_dominance
## 1                 1.8
## 2                 2.5
## 3                10.2
## 
## $P1heritability
##   heritability
## 1          0.6
## 
## $P2mean
##   mean
## 1   20
## 
## $P2main
##     SNP additive dominance
## 1 SNP01      8.0       2.0
## 2 SNP02      4.5       1.8
## 3 SNP03      6.5       1.5
## 4 SNP04     -3.5      -1.2
## 5 SNP07     10.4       8.6
## 6 SNP08     12.3      11.9
## 
## $P2epistasis
##    SNPA  SNPB additive_additive additive_dominance dominance_additive
## 1 SNP01 SNP02              15.0                4.0                3.0
## 2 SNP03 SNP04              12.0                5.0                3.5
## 3 SNP07 SNP08              21.5               20.9               20.5
##   dominance_dominance
## 1                 2.0
## 2                 2.5
## 3                20.3

Phenotype 1 has been set with certain heritability but phenotype 2 has not. We will show whether the heritability of the simulated phenotype 1 is same as the set. SimPhe includes the coefficients and the allele frequencies for simulating phenotype 1: gene.coef and allele.freq, which is extracted from the simulation parameters.

# regression coefficients for simulating phenotype 1
gene.coefficients

## $epi.par1
##    SNPA  SNPB additiveA dominanceA additiveB dominanceB additive_additive
## 1 SNP01 SNP02         4          2       2.5        1.5                10
##   additive_dominance dominance_additive dominance_dominance
## 1                  2                2.8                 1.8
## 
## $epi.par2
##    SNPA  SNPB additiveA dominanceA additiveB dominanceB additive_additive
## 1 SNP03 SNP04       5.5        1.8      -2.5          1                 8
##   additive_dominance dominance_additive dominance_dominance
## 1                  3                3.5                 2.5
## 
## $epi.par3
##    SNPA  SNPB additiveA dominanceA additiveB dominanceB additive_additive
## 1 SNP05 SNP06       8.5        9.8       8.7        8.4              20.5
##   additive_dominance dominance_additive dominance_dominance
## 1               20.6               20.4                10.2

# allele frequencies of SNPs for simulating phenotype 1
allele.freq

##     SNP major.frequency minor.frequency
## 1 SNP05       0.6447673       0.3552327
## 2 SNP01       0.7238366       0.2761634
## 3 SNP03       0.7394188       0.2605812
## 4 SNP06       0.7334176       0.2665824
## 5 SNP02       0.6433986       0.3566014
## 6 SNP04       0.6581386       0.3418614

# calculate genetic variance
genevar <- calc.gene.var(gene.coefficients, allele.freq)

# the variance of simulated phenotype 1
phe1var <- var(phe[, "p1"])

# heritability of simulated phenotype 1
simuht <- genevar / phe1var
simuht

## [1] 0.6297038

The result is not the exactly 0.6 due to the (pseudo)random numbers generated in R. To get phenotype 2 with a specific heritability, for example, 0.45, we could proceed as:

# get regression coefficients settings for phenotype 2
genecoef <- get.gene.coef(main.pars = specify.pars(genetic.pars = genepars,
                                                   effect.type  = "main",
                                                   phe.index    = 2),
                          epi.pars  = specify.pars(genetic.pars = genepars,
                                                   effect.type  = "epistasis",
                                                   phe.index    = 2))
# get genotype data
genotype <- read.geno(fname = fgeno.path, ftype = "snp.head")

# get allele frequencies of the SNPs set for phenotype 2
freq2 <- get.freq(geno = genotype,
                  epi.pars  = specify.pars(genetic.pars = genepars,
                                           effect.type  = "epistasis",
                                           phe.index    = 2))

# get noise variation
exp.noise.var <- get.noise.var(gene.coef = genecoef,
                               freq = freq2,
                               heritability = 0.45)

Then when simulating a phenotype, just give this value as argument noise.var to function sim.phe. It will generate a phenotype which has a heritability close to 0.45.

As we mentioned earlier in this article, buiding the correlation by setting the shared interactive SNP pairs cannot be controlled. We can take a look at the correlation between the simulated phenotype 1 and phenotype 2:

# correlation test
cor.test(phe[, "p1"], phe[, "p2"])

## 
##  Pearson's product-moment correlation
## 
## data:  phe[, "p1"] and phe[, "p2"]
## t = 2.2112, df = 98, p-value = 0.02935
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.0225377 0.3973936
## sample estimates:
##       cor 
## 0.2179908

According to the result of correlation test, the two simulated phenotypes are significantly correlated but the correlation coefficient is small and the amount cannot easily be predicted. To get a certain amount or higher correlation, we can conduct correlation by applying the correlation matrix to two independent variables. For two phenotypes, if we set different SNP pairs for each, we assume these two phenotypes are independent. Here we give another parameter file to but same genotype file to the arguments in sim.phe, fgenetic.pars and fgeno:

# get file path of simulation parameters
# (different SNP pairs for each phenotype)
fpar.path <- system.file("extdata", "sep_simupars.txt", package="SimPhe")

# simulate phenotype(s)
indphe <- sim.phe(sim.pars = fpar.path,
                  fgeno = fgeno.path,
                  ftype = "snp.head",
                  seed = 123,
                  fwrite = FALSE)

We can test the correlation between initial phenotypes with seperated SNP pairs settings:

# test original correlation
cor.test(indphe[, "p1"], indphe[, "p2"])

## 
##  Pearson's product-moment correlation
## 
## data:  indphe[, "p1"] and indphe[, "p2"]
## t = -0.094853, df = 98, p-value = 0.9246
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.2056123  0.1871892
## sample estimates:
##          cor 
## -0.009581143

Apparently, these two phenotypes are not related. We could continue our work on coverting them to be correlated. First, conduct a correlation matrix:

# correlation matrix
corm <- matrix(c(1, 0.6, 0.6, 1), ncol = 2)
corm

##      [,1] [,2]
## [1,]  1.0  0.6
## [2,]  0.6  1.0

Before applying correlation matrix to simulated phenotypes, we would like to know what the data looks like:

apply(indphe, 2, mean)

##       p1       p2 
## 20.18028 33.24045

apply(indphe, 2, sd)

##       p1       p2 
## 20.05408 17.92715

Then we can build correlation between the two initial phenotypes:

# build correlation
corphe <- build.cor.phe(indphe, corMtr = corm)

Now let’s test the correlation between the two new phenotypes and see if there is anything difference:

# check mean and standard deviation of new data set
apply(corphe, 2, mean)

##       p1       p2 
## 20.18028 33.24045

apply(corphe, 2, sd)

##       p1       p2 
## 20.05408 17.84452

# test correlation
cor.test(corphe[, "p1"], corphe[, "p2"])

## 
##  Pearson's product-moment correlation
## 
## data:  corphe[, "p1"] and corphe[, "p2"]
## t = 7.3301, df = 98, p-value = 6.615e-11
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.4514244 0.7086646
## sample estimates:
##       cor 
## 0.5950781

Obviously, there is no significant difference on means and standard deviations between the initial phenotypes and the new phenotypes.

SimPhe

Beibei Jiang

2018-09-13

Installation

Sample implementation

References