Introduction to APIS

Ronan Griot \(^{1,2}\), François Allal \(^{3}\), Marc Vandeputte \(^{2,3}\) \(^{1}\)SYSAAF, Station LPGP/INRA, Campus de Beaulieu, Rennes, France \(^{2}\)GABI, INRA, AgroParisTech, Université Paris-Saclay, France \(^{3}\)MARBEC, Univ. Montpellier, Ifremer, CNRS, IRD, Palavas-les-Flots, France

2020-12-02

Description

This package includes all the functions to assign with APIS. Parentage assignment is widely used for farmed and natural populations. As most of the likelihood software are based on simulation, the estimation of the simulation parameters is a key point for assignment reliability. Among those parameters, the proportion of missing parent is one of the most important. To avoid estimation of missing parents, we developed APIS (Auto-Adaptive Parentage Inference Software), based on observed average Mendelian transmission probabilities. In this package, you will find all the functions to perform parentage assign.

Install and load the package

library(APIS)

Formate your data

APIS requires matrices of characters as inputs. Each matrix has individuals as rows, markers as columns. The individual labels are set as rownames. Marker labels are set as colnames. Each cell is the genotype of one marker, coded “All1/All2”. For example “A/A”, “A/B”, “B/B” for bi-allelic markers and “NA/NA” for missing value. For multi-allelic markers, use the generic coding “All1/All2”.

data("APIS_offspring")
data("APIS_sire")
data("APIS_dam")

head(APIS_offspring[,1:10])
##             marker_1 marker_2 marker_3 marker_4 marker_5 marker_6 marker_7
## offspring_1 "A/B"    "A/A"    "A/B"    "A/A"    "B/B"    "A/B"    "A/A"   
## offspring_2 "A/B"    "A/A"    "A/B"    "B/B"    "A/B"    "A/A"    "A/A"   
## offspring_3 "A/B"    "A/B"    "A/B"    "B/B"    "B/B"    "A/B"    "A/B"   
## offspring_4 "A/A"    "B/B"    "B/B"    "A/B"    "B/B"    "A/B"    "A/B"   
## offspring_5 "A/B"    "A/A"    "A/B"    "A/A"    "B/B"    "A/B"    "A/A"   
## offspring_6 "A/B"    "A/B"    "B/B"    "A/A"    "B/B"    "B/B"    "A/A"   
##             marker_8 marker_9 marker_10
## offspring_1 "A/B"    "A/B"    "A/A"    
## offspring_2 "B/B"    "A/B"    "A/A"    
## offspring_3 "A/B"    "A/B"    "B/B"    
## offspring_4 "B/B"    "A/B"    "B/B"    
## offspring_5 "A/A"    "A/A"    "A/A"    
## offspring_6 "B/B"    "A/B"    "A/B"
rownames(APIS_offspring[1:6,])
## [1] "offspring_1" "offspring_2" "offspring_3" "offspring_4" "offspring_5"
## [6] "offspring_6"

Prepare the inputs

APIS main function requires 4 inputs :

head(APIS_offspring[,1:10])
##             marker_1 marker_2 marker_3 marker_4 marker_5 marker_6 marker_7
## offspring_1 "A/B"    "A/A"    "A/B"    "A/A"    "B/B"    "A/B"    "A/A"   
## offspring_2 "A/B"    "A/A"    "A/B"    "B/B"    "A/B"    "A/A"    "A/A"   
## offspring_3 "A/B"    "A/B"    "A/B"    "B/B"    "B/B"    "A/B"    "A/B"   
## offspring_4 "A/A"    "B/B"    "B/B"    "A/B"    "B/B"    "A/B"    "A/B"   
## offspring_5 "A/B"    "A/A"    "A/B"    "A/A"    "B/B"    "A/B"    "A/A"   
## offspring_6 "A/B"    "A/B"    "B/B"    "A/A"    "B/B"    "B/B"    "A/A"   
##             marker_8 marker_9 marker_10
## offspring_1 "A/B"    "A/B"    "A/A"    
## offspring_2 "B/B"    "A/B"    "A/A"    
## offspring_3 "A/B"    "A/B"    "B/B"    
## offspring_4 "B/B"    "A/B"    "B/B"    
## offspring_5 "A/A"    "A/A"    "A/A"    
## offspring_6 "B/B"    "A/B"    "A/B"
head(APIS_sire[,1:10])
##        marker_1 marker_2 marker_3 marker_4 marker_5 marker_6 marker_7
## sire_1 "A/A"    "A/B"    "A/B"    "B/B"    "B/B"    "B/B"    "A/B"   
## sire_2 "A/B"    "A/A"    "A/B"    "A/B"    "B/B"    "B/B"    "A/B"   
## sire_3 "B/B"    "A/B"    "A/B"    "A/B"    "A/B"    "A/A"    "A/B"   
## sire_4 "A/A"    "A/B"    "A/B"    "B/B"    "B/B"    "A/B"    "A/B"   
## sire_5 "A/B"    "A/B"    "B/B"    "A/B"    "B/B"    "A/A"    "A/B"   
## sire_6 "B/B"    "A/A"    "B/B"    "A/A"    "B/B"    "A/B"    "A/A"   
##        marker_8 marker_9 marker_10
## sire_1 "A/B"    "A/B"    "A/A"    
## sire_2 "A/B"    "B/B"    "A/B"    
## sire_3 "B/B"    "A/B"    "A/B"    
## sire_4 "B/B"    "A/B"    "B/B"    
## sire_5 "A/B"    "A/A"    "B/B"    
## sire_6 "A/A"    "A/B"    "A/B"
head(APIS_dam[,1:10])
##       marker_1 marker_2 marker_3 marker_4 marker_5 marker_6 marker_7
## dam_1 "A/B"    "A/A"    "B/B"    "A/A"    "B/B"    "A/B"    "A/A"   
## dam_2 "A/A"    "A/A"    "A/B"    "A/B"    "A/B"    "A/B"    "A/A"   
## dam_3 "A/B"    "A/A"    "A/B"    "A/B"    "A/B"    "A/A"    "A/A"   
## dam_4 "A/B"    "A/B"    "A/B"    "B/B"    "B/B"    "A/B"    "A/A"   
## dam_5 "A/A"    "A/A"    "A/B"    "A/B"    "A/B"    "A/A"    "A/A"   
## dam_6 "A/B"    "A/A"    "A/B"    "A/B"    "B/B"    "A/B"    "A/B"   
##       marker_8 marker_9 marker_10
## dam_1 "A/B"    "A/B"    "A/A"    
## dam_2 "B/B"    "B/B"    "A/B"    
## dam_3 "B/B"    "A/B"    "A/B"    
## dam_4 "A/B"    "B/B"    "B/B"    
## dam_5 "A/A"    "A/B"    "A/A"    
## dam_6 "A/B"    "A/B"    "A/B"
error <- 0.05 #I accept 5% of errors in the results

Running the assignment

The main function to perform parentage assignment with APIS the “APIS()” function. Use the function as below, with default parameters for exclusion threshold and preselection of parents to maximize the reliability.

result <- APIS(off.genotype = APIS_offspring,
               sire.genotype = APIS_sire,
               dam.genotype = APIS_dam,
               error = error)

Analyse the results

APIS gives you 3 different outputs :

Pedigree header Description
off Offspring ID
sire Sire ID
dam Dam ID
Log header Description
offspring offspring ID
mrk_genotype number of markers genotyped
sire1 ID of the most likely sire
dam1 ID of the most likely dam
mismatch1 number of mismatches for the most likely parent pair (sire1, dam1)
mendel1 average Mendelian transmission probability of the most likely parent pair (sire1, dam1)
sire2 ID of the second most likely sire
dam2 ID of the second most likely dam
mismatch2 number of mismatches for the second most likely parent pair (sire2, dam2)
mendel2 average Mendelian transmission probability of the second most likely parent pair (sire2, dam2)
delta_Pmendel12 mendel1 - mendel2
sire3 ID of the third most likely sire
dam3 ID of the third most likely dam
mismatch3 number of mismatches for the third most likely parent pair (sire3, dam3)
mendel3 average Mendelian transmission probability of the third most likely parent pair (sire3, dam3)
delta_Pmendel23 mendel2 - mendel3

According to the graphs, you can change the threshold to improve your assignment.

If you want to set up your threshold on Mendelian probabilities, use :

new.result <- personalThreshold(APIS.result = result,
                                method = 'Pmendel',
                                threshold = 0.7)

If you want to set up your threshold on mismatches, use :

new.result <- personalThreshold(APIS.result = result,
                                method = 'exclusion',
                                threshold = 1)

Examples

Full data

This example uses the 100 markers data sets provided by the package.



When I look at the mismatch distributions, I prefer to use exclusion and allow for 1 mismatches (Figure 2).


new.result <- personalThreshold(APIS.result = result,
                                method = 'exclusion',
                                threshold = 1,
                                verbose = FALSE)

Degraded data

This example uses 35 markers from the example set provided by the package.

You can subet the 35, 42 or 50 first markers to reach a power of 0.90, 0.95, 0.99.


In this situation, the theoretical assignment power is low and there are missing parents. The distribution graphs do not give you more information about a new threshold value.

Thus, the better option is to keep APIS results.

Other parameters of APIS function

APIS function can handle 2 other parameters :

Use default value to get the most accurate results (except for the number of cores and verbose). If a value is specified, this will decrease computation time but can decrease assignment reliability.

Acknowlegments

This work was partially financially supported in the GeneSea project (n° R FEA 4700 16 FA 100 0005) by the French Government and the European Union (EMFF, European Maritime and Fisheries Fund) at the “Appels à projets Innovants” managed by the FranceAgrimer Office. The doctoral scholarship of Ronan Griot was partially supported by the ANRT (doctoral scholarship n° 2017/0731) and SYSAAF.

Annexe

print(sessionInfo(), locale=FALSE)
## R version 3.6.1 (2019-07-05)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 7 x64 (build 7601) Service Pack 1
## 
## Matrix products: default
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] APIS_1.0.1
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_1.0.1        knitr_1.23        magrittr_1.5     
##  [4] tidyselect_0.2.5  munsell_0.5.0     doParallel_1.0.14
##  [7] colorspace_1.4-1  R6_2.4.0          rlang_0.4.4      
## [10] foreach_1.4.4     dplyr_0.8.3       stringr_1.4.0    
## [13] tools_3.6.1       parallel_3.6.1    grid_3.6.1       
## [16] gtable_0.3.0      xfun_0.8          htmltools_0.3.6  
## [19] iterators_1.0.10  assertthat_0.2.1  yaml_2.2.0       
## [22] lazyeval_0.2.2    digest_0.6.20     tibble_2.1.3     
## [25] crayon_1.3.4      gridExtra_2.3     purrr_0.3.3      
## [28] ggplot2_3.2.1     codetools_0.2-16  glue_1.3.1       
## [31] evaluate_0.14     rmarkdown_1.14    labeling_0.3     
## [34] stringi_1.4.3     compiler_3.6.1    pillar_1.4.2     
## [37] scales_1.0.0      pkgconfig_2.0.2