This vignette illustrates the most useful functions of yatah.
For this example, we use data from Zeller et al. (2014). It is the abundances of bacteria present in 199 stool samples.
abundances <- as_tibble(yatah::abundances)
print(abundances, n_extra = 2)
#> # A tibble: 1,585 x 200
#> lineages `CCIS00146684ST… `CCIS00281083ST… `CCIS02124300ST… `CCIS02379307ST…
#> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 k__Bact… 100. 99.8 96.3 99.9
#> 2 k__Viru… 0.00697 0.128 3.70 0.00471
#> 3 k__Bact… 66.2 24.6 74.2 45.1
#> 4 k__Bact… 19.1 74.4 11.9 53.0
#> 5 k__Bact… 12.1 0.0428 7.22 0.0788
#> 6 k__Bact… 1.86 0.428 0.765 0.361
#> 7 k__Bact… 0.758 0.388 2.28 0.985
#> 8 k__Viru… 0.00697 0.128 3.70 0.00471
#> 9 k__Bact… 0.00155 0.00415 0 0
#> 10 k__Bact… 62.4 21.7 62.3 44.0
#> # … with 1,575 more rows, and 195 more variables: `CCIS02856720ST-4-0` <dbl>,
#> # `CCIS03473770ST-4-0` <dbl>, …
taxonomy <- select(abundances, lineages)
taxonomy
#> # A tibble: 1,585 x 1
#> lineages
#> <chr>
#> 1 k__Bacteria
#> 2 k__Viruses
#> 3 k__Bacteria|p__Firmicutes
#> 4 k__Bacteria|p__Bacteroidetes
#> 5 k__Bacteria|p__Actinobacteria
#> 6 k__Bacteria|p__Verrucomicrobia
#> 7 k__Bacteria|p__Proteobacteria
#> 8 k__Viruses|p__Viruses_noname
#> 9 k__Bacteria|p__Candidatus_Saccharibacteria
#> 10 k__Bacteria|p__Firmicutes|c__Clostridia
#> # … with 1,575 more rows
Here, we have all the present bacteria at all different ranks. As we are just interested in genera that belong to the Gammaproteobacteria class, we filter()
the lineages with is_clade()
and is_rank()
. The genus name is accessible with last_clade()
.
gammap_genus <-
taxonomy %>%
filter(is_clade(lineages, "Gammaproteobacteria"),
is_rank(lineages, "genus")) %>%
mutate(genus = last_clade(lineages))
gammap_genus
#> # A tibble: 26 x 2
#> lineages genus
#> <chr> <chr>
#> 1 k__Bacteria|p__Proteobacteria|c__Gammaproteobacteria|o_… Escherichia
#> 2 k__Bacteria|p__Proteobacteria|c__Gammaproteobacteria|o_… Haemophilus
#> 3 k__Bacteria|p__Proteobacteria|c__Gammaproteobacteria|o_… Enterobacteriaceae_…
#> 4 k__Bacteria|p__Proteobacteria|c__Gammaproteobacteria|o_… Pseudomonas
#> 5 k__Bacteria|p__Proteobacteria|c__Gammaproteobacteria|o_… Enterobacter
#> 6 k__Bacteria|p__Proteobacteria|c__Gammaproteobacteria|o_… Aggregatibacter
#> 7 k__Bacteria|p__Proteobacteria|c__Gammaproteobacteria|o_… Hafnia
#> 8 k__Bacteria|p__Proteobacteria|c__Gammaproteobacteria|o_… Actinobacillus
#> 9 k__Bacteria|p__Proteobacteria|c__Gammaproteobacteria|o_… Sinobacteraceae_unc…
#> 10 k__Bacteria|p__Proteobacteria|c__Gammaproteobacteria|o_… Citrobacter
#> # … with 16 more rows
It is useful to have a taxonomic table. taxtable()
do the job.
gammaprot_table <-
gammap_genus %>%
pull(lineages) %>%
taxtable()
as_tibble(gammaprot_table)
#> # A tibble: 26 x 6
#> kingdom phylum class order family genus
#> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 Bacteria Proteobact… Gammaproteob… Enterobact… Enterobacte… Escherichia
#> 2 Bacteria Proteobact… Gammaproteob… Pasteurell… Pasteurella… Haemophilus
#> 3 Bacteria Proteobact… Gammaproteob… Enterobact… Enterobacte… Enterobacteriace…
#> 4 Bacteria Proteobact… Gammaproteob… Pseudomona… Pseudomonad… Pseudomonas
#> 5 Bacteria Proteobact… Gammaproteob… Enterobact… Enterobacte… Enterobacter
#> 6 Bacteria Proteobact… Gammaproteob… Pasteurell… Pasteurella… Aggregatibacter
#> 7 Bacteria Proteobact… Gammaproteob… Enterobact… Enterobacte… Hafnia
#> 8 Bacteria Proteobact… Gammaproteob… Pasteurell… Pasteurella… Actinobacillus
#> 9 Bacteria Proteobact… Gammaproteob… Xanthomona… Sinobactera… Sinobacteraceae_…
#> 10 Bacteria Proteobact… Gammaproteob… Enterobact… Enterobacte… Citrobacter
#> # … with 16 more rows
To have a tree, use taxtree()
with a taxonomic table in input. By default, it collapses ranks with only one subrank.
gammaprot_tree <- taxtree(gammaprot_table)
gammaprot_tree
#>
#> Phylogenetic tree with 26 tips and 7 internal nodes.
#>
#> Tip labels:
#> Escherichia, Enterobacteriaceae_noname, Enterobacter, Hafnia, Citrobacter, Pantoea, ...
#> Node labels:
#> Gammaproteobacteria, Enterobacteriaceae, Pasteurellaceae, Pseudomonadales, Moraxellaceae, Xanthomonadales, ...
#>
#> Rooted; includes branch lengths.
Instead of a classical plot
, we use ggtree (Yu et al. (2017)) to display the tree.
Yu, Guangchuang, David Smith, Huachen Zhu, Yi Guan, and Tommy Tsan-Yuk Lam. 2017. “Ggtree: An R Package for Visualization and Annotation of Phylogenetic Trees with Their Covariates and Other Associated Data.” Methods in Ecology and Evolution 8 (1): 28–36. https://doi.org/10.1111/2041-210X.12628.
Zeller, Georg, Julien Tap, Anita Y Voigt, Shinichi Sunagawa, Jens Roat Kultima, Paul I Costea, Aurélien Amiot, et al. 2014. “Potential of Fecal Microbiota for Early-Stage Detection of Colorectal Cancer.” Molecular Systems Biology 10 (11). EMBO Press: 766.