Genomic analysis of model organisms frequently requires the use of databases based on human data or making comparisons to patient-derived resources. This requires harmonization of gene names into the same gene space. The babelgene
R package helps to simplify the conversion process. It provides gene orthologs/homologs:
You can install the babelgene
R package from CRAN.
install.packages("babelgene")
Load babelgene
.
library(babelgene)
The main functionality is accessed via the orthologs()
function which takes one or more genes and outputs a data frame of predicted ortholog/homolog pairs. The output data frame contains gene symbols and IDs for human and the specified species. Several examples are provided below.
Get mouse equivalents for a set of human genes.
orthologs(genes = c("TP53", "EGFR", "IL6", "TGFB1", "CD4"), species = "mouse")
#> human_symbol human_entrez human_ensembl taxon_id symbol entrez
#> 1 CD4 920 ENSG00000010610 10090 Cd4 12504
#> 2 EGFR 1956 ENSG00000146648 10090 Egfr 13649
#> 3 IL6 3569 ENSG00000136244 10090 Il6 16193
#> 4 TGFB1 7040 ENSG00000105329 10090 Tgfb1 21803
#> 5 TP53 7157 ENSG00000141510 10090 Trp53 22059
#> ensembl
#> 1 ENSMUSG00000023274
#> 2 ENSMUSG00000020122
#> 3 ENSMUSG00000025746
#> 4 ENSMUSG00000002603
#> 5 ENSMUSG00000059552
#> support
#> 1 EggNOG|Ensembl|HGNC|HomoloGene|Inparanoid|NCBI|OMA|OrthoDB|OrthoMCL|Panther|PhylomeDB|Treefam
#> 2 EggNOG|Ensembl|HGNC|HomoloGene|Inparanoid|NCBI|OMA|OrthoDB|OrthoMCL|Panther|PhylomeDB|Treefam
#> 3 Ensembl|HGNC|HomoloGene|Inparanoid|NCBI|OMA|OrthoMCL|Panther|PhylomeDB|Treefam
#> 4 EggNOG|Ensembl|HGNC|HomoloGene|Inparanoid|NCBI|OMA|OrthoDB|OrthoMCL|Panther|PhylomeDB|Treefam
#> 5 EggNOG|Ensembl|HGNC|HomoloGene|Inparanoid|NCBI|OMA|OrthoDB|OrthoMCL|Panther|PhylomeDB|Treefam
#> support_n
#> 1 12
#> 2 12
#> 3 10
#> 4 12
#> 5 12
Input genes are assumed to be human by default. You can specify if the input genes are human with the human
parameter.
orthologs(genes = "Pu", species = "fruit fly", human = FALSE)
#> human_symbol human_entrez human_ensembl taxon_id symbol entrez ensembl
#> 1 GCH1 2643 ENSG00000131979 7227 Pu 37415 FBgn0003162
#> support
#> 1 EggNOG|Ensembl|HomoloGene|Inparanoid|OMA|OrthoDB|OrthoMCL|Panther|PhylomeDB|Treefam
#> support_n
#> 1 10
It is possible to search by NCBI Entrez or Ensembl IDs instead of gene symbols.
orthologs(genes = "ENSG00000111640", species = "mouse", human = TRUE)
#> human_symbol human_entrez human_ensembl taxon_id symbol entrez
#> 1 GAPDH 2597 ENSG00000111640 10090 Gapdh 14433
#> ensembl
#> 1 ENSMUSG00000057666
#> support support_n
#> 1 Ensembl|HGNC|HomoloGene|NCBI|OMA|OrthoDB|OrthoMCL|Panther|Treefam 9
The orthologs()
function requires the species
parameter to be set. You can check all the species that can be queried with the help of the species()
function.
species()
#> taxon_id scientific_name
#> 1 28377 Anolis carolinensis
#> 2 9913 Bos taurus
#> 3 6239 Caenorhabditis elegans
#> 4 9615 Canis lupus familiaris
#> 5 7955 Danio rerio
#> 6 7227 Drosophila melanogaster
#> 7 9796 Equus caballus
#> 8 9685 Felis catus
#> 9 9031 Gallus gallus
#> 10 9544 Macaca mulatta
#> 11 13616 Monodelphis domestica
#> 12 10090 Mus musculus
#> 13 9258 Ornithorhynchus anatinus
#> 14 9598 Pan troglodytes
#> 15 10116 Rattus norvegicus
#> 16 4932 Saccharomyces cerevisiae
#> 17 284812 Schizosaccharomyces pombe 972h-
#> 18 9823 Sus scrofa
#> 19 8364 Xenopus tropicalis
#> common_name
#> 1 Carolina anole, green anole
#> 2 bovine, cattle, cow, dairy cow, domestic cattle, domestic cow, ox, oxen
#> 3 <NA>
#> 4 dog, dogs
#> 5 leopard danio, zebra danio, zebra fish, zebrafish
#> 6 fruit fly
#> 7 domestic horse, equine, horse
#> 8 cat, cats, domestic cat
#> 9 bantam, chicken, chickens, Gallus domesticus
#> 10 rhesus macaque, rhesus macaques, Rhesus monkey, rhesus monkeys
#> 11 gray short-tailed opossum
#> 12 house mouse, mouse
#> 13 duck-billed platypus, duckbill platypus, platypus
#> 14 chimpanzee
#> 15 brown rat, Norway rat, rat, rats
#> 16 baker's yeast, brewer's yeast, S. cerevisiae
#> 17 <NA>
#> 18 pig, pigs, swine, wild boar
#> 19 tropical clawed frog, western clawed frog
The package is based on the data provided by the Human Genome Organization (HUGO) Gene Nomenclature Committee (HGNC) at the European Bioinformatics Institute. The HGNC Comparison of Orthology Predictions (HCOP) integrates the orthology assertions predicted for human genes by eggNOG, Ensembl Compara, HGNC, HomoloGene, Inparanoid, NCBI Gene Orthology, OMA, OrthoDB, OrthoMCL, Panther, PhylomeDB, TreeFam and ZFIN.
The name babelgene
is derived from the Babel Fish, a fictional species of fish that could translate for humans.