Demonstration of package heritEWAS

DNA methylation can be thought of as a type of “mark” on DNA that can affect gene expression. Most methylation marks are erased soon after conception but methylation is known to be effectively inherited from parents to their offspring in rare cases. The heritEWAS package provides functions to identify these heritable DNA methylation marks using the method of (Joo et al., 2018). This approach looks for Mendelian patterns of inheritance within families and is based on the relationship structure of each family and on methylation data for a subset of the family members (but no genotype data). This method can handle large, multi-generational families and it is optimised for methylation data at thousands or millions of methylation sites, such as the data generated by epigenome-wide association studies (EWAS) of related individuals.

You can install the released version of heritEWAS from CRAN with:

install.packages("heritEWAS")
library(heritEWAS)

To use this package, you will need two sets of data (described in more detail in the help page for the function genotype_combinations):

  1. A data frame containing the pedigree data, i.e. the relationship structure of the families. Each row of the data frame corresponds to a person, and the columns correspond to each person’s individual identifier (indiv) and the identifiers of his or her mother (mother) and father (father), as well as a family identifier (family) and a binary flag (typed) which is 1 for people who have methylation data available. No family should contain a pedigree loop, such as one caused by inbreeding.

  2. A matrix of the M-values, with rows corresponding to methylation sites (i.e. CpG probes) and columns corresponding to people. The column names should match the individual identifiers of people in the pedigree data with typed = 1.

The pedigree data should look something like the following (where extra variables like aff and age can be included but they will be ignored):

head(ped)
#>   family  indiv mother father sex aff age typed
#> 1    aey aey001   <NA>   <NA>   M   0  28     0
#> 2    aey aey002 aey005 aey006   F   0  41     1
#> 3    aey aey003 aey002 aey001   F   0  70     1
#> 4    aey aey004 aey002 aey001   F   0  12     0
#> 5    aey aey005   <NA>   <NA>   F   0  18     1
#> 6    aey aey006 aey008 aey009   M   0  13     1
unique(ped$family)
#>  [1] "aey" "awz" "beu" "ety" "ibz" "kph" "ljd" "mps" "msn" "msp" "nmc" "nqc"
#> [13] "opq" "qor" "rpb" "rvn" "thv" "wkt" "yiu" "yvf"

And the M-values matrix should look something like:

# Colnames are the individual IDs of the pedigree data
M_values[1:5, 1:5]
#>                aey002     aey003     aey005     aey006     aey007
#> cg18584561 -3.8530708 -4.8507816 -3.9702189 -4.5818866 -4.2766219
#> cg01074083 -2.3250118 -3.3565441 -0.7297221  1.3038464 -0.1940888
#> cg27639199 -4.0406730 -4.0770266 -3.6542961 -3.8844059 -3.6013861
#> cg26773954 -0.7413546 -1.1655354  0.9955952 -1.0797262 -0.8736215
#> cg04417708 -0.2233104 -0.2838067  1.7158682 -0.3687787 -0.4975771

The main goal of the package heritEWAS is to calculate a statistic \(\Delta l\) for each methylation site. This statistic measures the strength of evidence that the site’s M-values follow a Mendelian pattern of inheritance within the families, with larger values of \(\Delta l\) corresponding to more heritable methylation sites. This statistic is the difference in maximised log-likelihoods of two statistical models, and can be interpreted as a difference in the Bayesian information criteria of the two models; see (Joo et al. 2018).

The most time-consuming part of the calculation of \(\Delta l\) is the same for all methylation sites, so the heritEWAS package calculates this part once, stores the output, then re-uses this calculation for each methylation site.
This part of the calculation is performed by the function genotype_combinations():

typed_genos <- genotype_combinations(ped)
#> Family 1 (aey): Number of genotype combinations to check: 104 
#> Family 2 (awz): Number of genotype combinations to check: 16 
#> Family 3 (beu): Number of genotype combinations to check: 96 
#> Family 4 (ety): Number of genotype combinations to check: 66 
#> Family 5 (ibz): Number of genotype combinations to check: 192 
#> Family 6 (kph): Number of genotype combinations to check: 64 
#> Family 7 (ljd): Number of genotype combinations to check: 16 
#> Family 8 (mps): Number of genotype combinations to check: 68 
#> Family 9 (msn): Number of genotype combinations to check: 66 
#> Family 10 (msp): Number of genotype combinations to check: 132 
#> Family 11 (nmc): Number of genotype combinations to check: 7 
#> Family 12 (nqc): Number of genotype combinations to check: 64 
#> Family 13 (opq): Number of genotype combinations to check: 256 
#> Family 14 (qor): Number of genotype combinations to check: 33 
#> Family 15 (rpb): Number of genotype combinations to check: 68 
#> Family 16 (rvn): Number of genotype combinations to check: 16 
#> Family 17 (thv): Number of genotype combinations to check: 16 
#> Family 18 (wkt): Number of genotype combinations to check: 104 
#> Family 19 (yiu): Number of genotype combinations to check: 68 
#> Family 20 (yvf): Number of genotype combinations to check: 192

The results are stored in a named list of data frames, with one data frame per family. Each data frame gives the probability of each possible combination of genotypes for those family members with methylation data (i.e. those with typed = 1). The possible genotypes for each person are 0 and 1, corresponding to non-carriers and carriers (respectively) of a hypothetical genetic variant that controls methylation at a given methylation site under one of the two statistical models used to define \(\Delta l\). Impossible genotype combinations (those with a probability of 0) are excluded from the output of genotype_combinations().

str(typed_genos)
#> List of 20
#>  $ aey:'data.frame': 36 obs. of  9 variables:
#>   ..$ p     : num [1:36] 0.93458 0.00935 0.00467 0.00467 0.00234 ...
#>   ..$ aey002: num [1:36] 0 0 0 0 1 1 1 1 0 0 ...
#>   ..$ aey003: num [1:36] 0 1 0 0 0 0 1 1 0 0 ...
#>   ..$ aey005: num [1:36] 0 0 1 1 1 1 1 1 0 0 ...
#>   ..$ aey006: num [1:36] 0 0 0 0 0 0 0 0 0 0 ...
#>   ..$ aey007: num [1:36] 0 0 0 1 0 1 0 1 0 0 ...
#>   ..$ aey008: num [1:36] 0 0 0 0 0 0 0 0 1 1 ...
#>   ..$ aey009: num [1:36] 0 0 0 0 0 0 0 0 0 0 ...
#>   ..$ aey012: num [1:36] 0 0 0 0 0 0 0 0 0 1 ...
#>  $ awz:'data.frame': 13 obs. of  5 variables:
#>   ..$ p     : num [1:13] 0.95714 0.00476 0.00476 0.00476 0.00952 ...
#>   ..$ awz002: int [1:13] 0 0 0 0 0 1 1 1 1 1 ...
#>   ..$ awz007: int [1:13] 0 0 0 0 1 0 0 0 0 1 ...
#>   ..$ awz011: int [1:13] 0 0 1 1 0 0 0 1 1 0 ...
#>   ..$ awz013: int [1:13] 0 1 0 1 0 0 1 0 1 0 ...
#>  $ beu:'data.frame': 35 obs. of  8 variables:
#>   ..$ p     : num [1:35] 0.93925 0.00935 0.00467 0.00234 0.00234 ...
#>   ..$ beu001: num [1:35] 0 0 0 0 0 0 0 1 1 1 ...
#>   ..$ beu002: num [1:35] 0 0 0 0 0 0 0 0 0 0 ...
#>   ..$ beu003: num [1:35] 0 0 0 0 0 0 0 0 0 1 ...
#>   ..$ beu004: num [1:35] 0 0 0 0 0 0 0 0 1 0 ...
#>   ..$ beu006: num [1:35] 0 0 0 1 1 1 1 0 0 0 ...
#>   ..$ beu007: num [1:35] 0 0 1 0 0 1 1 0 0 0 ...
#>   ..$ beu010: num [1:35] 0 1 0 0 1 0 1 0 0 0 ...
#>  $ ety:'data.frame': 51 obs. of  8 variables:
#>   ..$ p     : num [1:51] 0.940421 0.004673 0.001168 0.000876 0.000292 ...
#>   ..$ ety003: num [1:51] 0 0 0 0 0 0 0 0 0 0 ...
#>   ..$ ety004: num [1:51] 0 0 0 0 0 0 0 0 0 0 ...
#>   ..$ ety005: num [1:51] 0 0 0 0 0 0 0 1 1 1 ...
#>   ..$ ety006: num [1:51] 0 0 0 1 1 1 1 0 0 1 ...
#>   ..$ ety007: num [1:51] 0 0 1 0 0 1 1 0 1 0 ...
#>   ..$ ety011: num [1:51] 0 0 0 0 0 0 0 0 0 0 ...
#>   ..$ ety013: num [1:51] 0 1 0 0 1 0 1 0 0 0 ...
#>  $ ibz:'data.frame': 99 obs. of  9 variables:
#>   ..$ p     : num [1:99] 0.943925 0.009346 0.009346 0.000876 0.000292 ...
#>   ..$ ibz003: num [1:99] 0 0 0 0 0 0 0 0 0 0 ...
#>   ..$ ibz004: num [1:99] 0 1 0 0 0 0 0 0 0 0 ...
#>   ..$ ibz006: num [1:99] 0 0 0 1 1 1 1 1 1 1 ...
#>   ..$ ibz007: num [1:99] 0 0 0 0 0 0 0 0 0 0 ...
#>   ..$ ibz008: num [1:99] 0 0 0 0 0 0 0 0 0 0 ...
#>   ..$ ibz009: num [1:99] 0 0 0 0 0 0 0 1 1 1 ...
#>   ..$ ibz011: num [1:99] 0 0 0 0 0 1 1 0 0 1 ...
#>   ..$ ibz016: num [1:99] 0 0 1 0 1 0 1 0 1 0 ...
#>  $ kph:'data.frame': 64 obs. of  7 variables:
#>   ..$ p     : num [1:64] 0.94911 0.00506 0.00506 0.00506 0.00149 ...
#>   ..$ kph001: int [1:64] 0 0 0 0 0 0 0 0 0 0 ...
#>   ..$ kph003: int [1:64] 0 0 0 0 0 0 0 0 0 0 ...
#>   ..$ kph004: int [1:64] 0 0 0 0 0 0 0 0 1 1 ...
#>   ..$ kph005: int [1:64] 0 0 0 0 1 1 1 1 0 0 ...
#>   ..$ kph009: int [1:64] 0 0 1 1 0 0 1 1 0 0 ...
#>   ..$ kph011: int [1:64] 0 1 0 1 0 1 0 1 0 1 ...
#>  $ ljd:'data.frame': 13 obs. of  5 variables:
#>   ..$ p     : num [1:13] 0.95714 0.00476 0.00476 0.00476 0.00238 ...
#>   ..$ ljd004: int [1:13] 0 0 0 0 0 0 0 0 1 1 ...
#>   ..$ ljd005: int [1:13] 0 0 0 0 1 1 1 1 0 1 ...
#>   ..$ ljd009: int [1:13] 0 0 1 1 0 0 1 1 0 0 ...
#>   ..$ ljd010: int [1:13] 0 1 0 1 0 1 0 1 0 0 ...
#>  $ mps:'data.frame': 68 obs. of  8 variables:
#>   ..$ p     : num [1:68] 0.936186 0.009492 0.000438 0.000146 0.001606 ...
#>   ..$ mps001: num [1:68] 0 0 0 0 0 0 0 0 0 0 ...
#>   ..$ mps003: num [1:68] 0 0 0 0 0 0 0 0 0 0 ...
#>   ..$ mps005: num [1:68] 0 0 0 0 0 0 0 0 1 1 ...
#>   ..$ mps006: num [1:68] 0 0 0 0 1 1 1 1 0 0 ...
#>   ..$ mps008: num [1:68] 0 0 0 0 0 0 0 0 0 0 ...
#>   ..$ mps011: num [1:68] 0 0 1 1 0 0 1 1 0 0 ...
#>   ..$ mps016: num [1:68] 0 1 0 1 0 1 0 1 0 1 ...
#>  $ msn:'data.frame': 50 obs. of  8 variables:
#>   ..$ p     : num [1:50] 0.936332 0.00993 0.000876 0.000876 0.000292 ...
#>   ..$ msn001: num [1:50] 0 0 0 0 0 0 0 0 0 0 ...
#>   ..$ msn003: num [1:50] 0 0 0 0 0 0 0 0 0 0 ...
#>   ..$ msn005: num [1:50] 0 0 0 0 0 0 1 1 1 1 ...
#>   ..$ msn007: num [1:50] 0 0 1 1 1 1 0 0 1 1 ...
#>   ..$ msn010: num [1:50] 0 1 0 0 1 1 0 1 0 0 ...
#>   ..$ msn011: num [1:50] 0 0 0 0 0 0 0 0 0 0 ...
#>   ..$ msn013: num [1:50] 0 0 0 1 0 1 0 0 0 1 ...
#>  $ msp:'data.frame': 132 obs. of  9 variables:
#>   ..$ p     : num [1:132] 0.931732 0.000365 0.000365 0.000365 0.001825 ...
#>   ..$ msp001: num [1:132] 0 0 0 0 0 0 0 0 0 0 ...
#>   ..$ msp004: num [1:132] 0 0 0 0 0 0 0 0 0 0 ...
#>   ..$ msp008: num [1:132] 0 0 0 0 0 0 0 0 0 0 ...
#>   ..$ msp009: num [1:132] 0 0 0 0 0 0 0 0 1 1 ...
#>   ..$ msp010: num [1:132] 0 0 0 0 1 1 1 1 0 0 ...
#>   ..$ msp011: num [1:132] 0 0 0 0 0 0 0 0 0 0 ...
#>   ..$ msp013: int [1:132] 0 0 1 1 0 0 1 1 0 0 ...
#>   ..$ msp014: int [1:132] 0 1 0 1 0 1 0 1 0 1 ...
#>  $ nmc:'data.frame': 6 obs. of  5 variables:
#>   ..$ p     : num [1:6] 0.94286 0.00952 0.00952 0.00952 0.00952 ...
#>   ..$ nmc001: num [1:6] 0 0 1 0 1 0
#>   ..$ nmc002: num [1:6] 0 1 1 0 0 0
#>   ..$ nmc003: num [1:6] 0 0 0 1 1 0
#>   ..$ nmc010: num [1:6] 0 0 0 0 0 1
#>  $ nqc:'data.frame': 49 obs. of  7 variables:
#>   ..$ p     : num [1:49] 0.94217 0.00292 0.00292 0.00292 0.00292 ...
#>   ..$ nqc004: int [1:49] 0 0 0 0 0 0 0 0 0 0 ...
#>   ..$ nqc005: int [1:49] 0 0 0 0 0 0 0 0 0 1 ...
#>   ..$ nqc008: int [1:49] 0 0 0 0 0 0 0 0 1 0 ...
#>   ..$ nqc009: int [1:49] 0 0 0 0 1 1 1 1 0 0 ...
#>   ..$ nqc012: int [1:49] 0 0 1 1 0 0 1 1 0 0 ...
#>   ..$ nqc013: int [1:49] 0 1 0 1 0 1 0 1 0 0 ...
#>  $ opq:'data.frame': 151 obs. of  9 variables:
#>   ..$ p     : num [1:151] 0.93925 0.00117 0.00117 0.00117 0.00117 ...
#>   ..$ opq002: int [1:151] 0 0 0 0 0 0 0 0 0 0 ...
#>   ..$ opq004: int [1:151] 0 0 0 0 0 0 0 0 0 0 ...
#>   ..$ opq005: int [1:151] 0 0 0 0 0 0 0 0 0 0 ...
#>   ..$ opq008: int [1:151] 0 0 0 0 0 0 0 0 0 0 ...
#>   ..$ opq011: int [1:151] 0 0 0 0 0 0 0 0 1 1 ...
#>   ..$ opq013: int [1:151] 0 0 0 0 1 1 1 1 0 0 ...
#>   ..$ opq014: int [1:151] 0 0 1 1 0 0 1 1 0 0 ...
#>   ..$ opq015: int [1:151] 0 1 0 1 0 1 0 1 0 1 ...
#>  $ qor:'data.frame': 26 obs. of  7 variables:
#>   ..$ p     : num [1:26] 0.92775 0.00573 0.00573 0.00573 0.00573 ...
#>   ..$ qor002: num [1:26] 0 0 0 0 0 0 0 0 0 0 ...
#>   ..$ qor005: num [1:26] 0 0 0 0 0 0 0 0 1 1 ...
#>   ..$ qor010: num [1:26] 0 0 0 0 1 1 1 1 0 0 ...
#>   ..$ qor011: num [1:26] 0 0 0 0 0 0 0 0 0 0 ...
#>   ..$ qor014: num [1:26] 0 0 1 1 0 0 1 1 0 0 ...
#>   ..$ qor017: num [1:26] 0 1 0 1 0 1 0 1 0 1 ...
#>  $ rpb:'data.frame': 44 obs. of  8 variables:
#>   ..$ p     : num [1:44] 0.945238 0.000595 0.000595 0.000595 0.000595 ...
#>   ..$ rpb001: num [1:44] 0 0 0 0 0 0 0 0 0 0 ...
#>   ..$ rpb003: num [1:44] 0 0 0 0 0 0 0 0 0 0 ...
#>   ..$ rpb004: num [1:44] 0 0 0 0 0 1 1 1 1 1 ...
#>   ..$ rpb006: num [1:44] 0 1 1 1 1 0 1 1 1 1 ...
#>   ..$ rpb007: num [1:44] 0 0 0 0 0 0 0 0 0 0 ...
#>   ..$ rpb009: int [1:44] 0 0 0 1 1 0 0 0 1 1 ...
#>   ..$ rpb014: int [1:44] 0 0 1 0 1 0 0 1 0 1 ...
#>  $ rvn:'data.frame': 13 obs. of  5 variables:
#>   ..$ p     : num [1:13] 0.95714 0.00476 0.00476 0.00476 0.00952 ...
#>   ..$ rvn005: int [1:13] 0 0 0 0 0 1 1 1 1 1 ...
#>   ..$ rvn007: int [1:13] 0 0 0 0 1 0 0 0 0 1 ...
#>   ..$ rvn009: int [1:13] 0 0 1 1 0 0 0 1 1 0 ...
#>   ..$ rvn013: int [1:13] 0 1 0 1 0 0 1 0 1 0 ...
#>  $ thv:'data.frame': 16 obs. of  5 variables:
#>   ..$ p     : num [1:16] 0.95595 0.00357 0.00357 0.00357 0.00357 ...
#>   ..$ thv005: int [1:16] 0 0 0 0 0 0 0 0 1 1 ...
#>   ..$ thv006: int [1:16] 0 0 0 0 1 1 1 1 0 0 ...
#>   ..$ thv011: int [1:16] 0 0 1 1 0 0 1 1 0 0 ...
#>   ..$ thv012: int [1:16] 0 1 0 1 0 1 0 1 0 1 ...
#>  $ wkt:'data.frame': 45 obs. of  9 variables:
#>   ..$ p     : num [1:45] 0.942857 0.004762 0.004762 0.000595 0.000595 ...
#>   ..$ wkt004: num [1:45] 0 1 1 1 1 1 1 1 1 1 ...
#>   ..$ wkt007: num [1:45] 0 0 0 0 0 0 0 0 0 0 ...
#>   ..$ wkt008: num [1:45] 0 0 0 1 1 1 1 1 1 1 ...
#>   ..$ wkt011: num [1:45] 0 0 1 0 0 0 0 0 0 0 ...
#>   ..$ wkt012: num [1:45] 0 0 0 0 0 0 0 0 0 0 ...
#>   ..$ wkt013: num [1:45] 0 0 0 0 0 0 0 1 1 1 ...
#>   ..$ wkt014: num [1:45] 0 0 0 0 0 1 1 0 0 1 ...
#>   ..$ wkt015: num [1:45] 0 0 0 0 1 0 1 0 1 0 ...
#>  $ yiu:'data.frame': 47 obs. of  8 variables:
#>   ..$ p     : num [1:47] 0.93575 0.00117 0.00117 0.00117 0.00584 ...
#>   ..$ yiu001: num [1:47] 0 0 0 0 0 0 0 0 0 0 ...
#>   ..$ yiu003: num [1:47] 0 0 0 0 0 0 0 0 0 0 ...
#>   ..$ yiu005: num [1:47] 0 0 0 0 0 0 0 0 1 1 ...
#>   ..$ yiu008: num [1:47] 0 0 0 0 1 1 1 1 0 0 ...
#>   ..$ yiu011: num [1:47] 0 0 0 0 0 0 0 0 0 0 ...
#>   ..$ yiu014: int [1:47] 0 0 1 1 0 0 1 1 0 0 ...
#>   ..$ yiu018: int [1:47] 0 1 0 1 0 1 0 1 0 1 ...
#>  $ yvf:'data.frame': 63 obs. of  9 variables:
#>   ..$ p     : num [1:63] 0.93224 0.00234 0.00234 0.00234 0.00234 ...
#>   ..$ yvf001: num [1:63] 0 0 0 0 0 0 0 0 0 0 ...
#>   ..$ yvf004: num [1:63] 0 0 0 0 0 0 0 0 0 0 ...
#>   ..$ yvf006: num [1:63] 0 0 0 0 0 0 0 0 1 1 ...
#>   ..$ yvf007: num [1:63] 0 0 0 0 0 0 0 0 0 0 ...
#>   ..$ yvf008: num [1:63] 0 0 0 0 0 0 0 0 0 0 ...
#>   ..$ yvf011: int [1:63] 0 0 0 0 1 1 1 1 0 0 ...
#>   ..$ yvf013: int [1:63] 0 0 1 1 0 0 1 1 0 0 ...
#>   ..$ yvf014: int [1:63] 0 1 0 1 0 1 0 1 0 1 ...

Given the genotype probabilities, we can use the package’s main function ML_estimates() to compute \(\Delta l\) for each site. The output is a data frame with rows corresponding to methylation sites and columns giving details about certain fitted models (see the help page of the function ML_estimates for more details). In particular, the column delta.l gives the statistic \(\Delta l\), which measures how heritable each methylation site is.

MLEs <- ML_estimates(typed_genos, M_values, ncores = 2)
#> Loading data
#> Computing log-likelihood for the standard mixture model
#> Computing log-likelihood for the Mendelian model
head(MLEs)
#>     methylation.site   delta.l    ll.null    ll.mix ll.mendel    mu.null
#> 3         cg27639199 118.09775 -271.71287 -264.2143 -146.1166 -2.6183850
#> 1         cg18584561  90.00927 -237.56529 -231.8907 -141.8814 -3.2970715
#> 5         cg04417708  54.40022 -249.14781 -241.2967 -186.8965  0.5123842
#> 486       cg20592836  49.01353 -287.01184 -275.2788 -226.2653 -2.2974237
#> 380       cg06536614  47.62643   48.11356  103.9804  151.6069 -2.8489881
#> 6         cg05865327  26.85557 -262.41616 -257.5835 -230.7279  0.2555829
#>       sd.null mu0.mendel sd0.mendel mu1.mendel sd1.mendel    mu0.mix    sd0.mix
#> 3   2.0214565 -3.7764376 0.21607159  0.4513136  1.3477016 -2.5365336 2.06441478
#> 1   1.5481184 -4.2069460 0.25838259 -0.8795158  0.7369130 -3.2561007 1.56614986
#> 5   1.6947389 -0.4606702 0.68962203  3.0981023  0.1964459  0.3379929 1.60045489
#> 486 2.2780993 -1.1515825 2.30146074 -4.0355995  0.1204863 -2.1302684 2.31615821
#> 380 0.1661564 -2.9096112 0.02112367 -2.6747390  0.2544657 -2.9061301 0.05273047
#> 6   1.8798420 -0.5085749 1.44145564  2.7821332  0.1835955  0.1361195 1.83725157
#>       mu1.mix    sd1.mix
#> 3   -3.811011 0.09404426
#> 1   -4.292643 0.01360576
#> 5    3.190368 0.06669716
#> 486 -4.047780 0.04947619
#> 380 -2.453558 0.14302755
#> 6    2.828646 0.12484766

References

Joo JE, Dowty JG, Milne RL, Wong EM, Dugue PA, English D, Hopper JL, Goldgar DE, Giles GG, Southey MC, kConFab. Heritable DNA methylation marks associated with susceptibility to breast cancer. Nat Commun. 2018 Feb 28;9(1):867.