ontology_index
An ontology_index
can be obtained by loading a pre-existing one - for example by calling data(hpo)
, reading ontologies encoded in OBO format into R using the function get_ontology
, or by calling the function ontology_index
explicitly. An ontology_index
is a named list
of properties for each term, where each property is represented by a list
or vector
. Each of these property lists is named by term, facilitating simple lookups of properties by term name. All valid ontology_index
objects contain id
, name
, parents
, children
and ancestors
properties for each term. Additional properties can be added to an ontology_index
, although they are not required by functions in the package. For details on how to use an ontology_index
, see the 'Introduction to ontologyX' vignette.
The function get_ontology
can read ontologies encoded in OBO format into R as ontology_index
objects. By default, the properties id
, name
, obsolete
, parents
, children
and ancestors
are populated.
To call the function:
ontology <- get_ontology(file)
The properties parents
, children
and ancestors
are determined by a given set of relations between terms: the propagate_relationships
argument ("is_a" by default). Thus the parents
of a term are set of terms to which it is related by any type of relation contained in propagate_relationships
; the children
are those terms related by the inverse relations and ancestors
are those obtained by propagating the propagate_relationships
relations (note: the resulting set includes the term itself).
ontology <- get_ontology(file, propagate_relationships=c("is_a", "part_of"))
The relations given in the propagate_relationships
argument should be named as they are labelled in the OBO file. In order to see a complete list of relations used in an OBO file, pass the file's path to the function get_relation_names
. E.g. for the gene ontology:
get_relation_names("go.obo")
## [1] "is_a" "regulates" "part_of"
## [4] "has_part" "happens_during" "negatively_regulates"
## [7] "positively_regulates" "occurs_in" "ends_during"
Additional information is often present in the original file - for example definitions, labelled by the def
tag in OBO format. get_ontology
decides which properties to export based on the extract_tags
argument. By default extract_tags="minimal"
, resulting in only the properties id
, name
, obsolete
, parents
, children
and ancestors
being exported. It is possible to include all properties given in the file by setting extract_tags="everything"
. The names of the properties included in the returned ontology_index
are then the same as the names of the tags in OBO format.
ontology <- get_ontology(file, extract_tags="everything")
All properties are stored in the returned ontology_index
as lists, except for the following, which are coerced to character
or logical
vectors as appropriate: "id", "name", "def", "comment", "obsolete", "created_by", "creation_date"
.
Further properties can be mapped to vectors if required, modifying the returned ontology_index
as a list, e.g.
ontology$property <- simplify2array(ontology$property)
Modifying an existing ontology_index
to add term properties is the same as adding to a list
or data.frame
. In the example below, we add the number of children for each term:
ontology$number_of_children <- sapply(ontology$children, length)
In the same manner, a valid ontology_index
can be built up from scratch as a list, of course requiring that the standard properties are included for use with functions in ontologyIndex
.
In order to read in ontologies in OWL syntax, it is recommended to first convert to OBO format, for example using the ROBOT command line tool https://github.com/ontodev/robot.
ontology_index
explicitlyThe function ontology_index
can be used to create an object with class ontology_index
. This could be useful for instance if the user wished to convert a directed acyclic graph (DAG) with edges representing sub/super-class relationships into an ontology_index
. It is similar to the function data.frame
: it accepts a variable number of arguments corresponding to properties for ontological terms, which must each be a vector or list of the same length (except the version
argument, which can be any object and should contain any information about the version of the ontology). The only mandatory argument is the parents
argument, and should be a list
of character
vectors giving the IDs of the 'parents'/'superclasses' of each term. The term IDs can either be supplied as the names
attribute of the parents
or as a separate id
argument of the same length as parents
. The human-readable term names can be passed as the names
argument (defaults to the same as id
). As usual the children
and ancestors
properties are derived from the parents
. Warnings are generated if any IDs given in the parents
argument are not in the id
argument.
A simple invocation:
animal_superclasses <- list(animal=character(0), mammal="animal", cat="mammal", fish="animal")
animal_ontology <- ontology_index(parents=animal_superclasses)
unclass(animal_ontology)
## $id
## animal mammal cat fish
## "animal" "mammal" "cat" "fish"
##
## $name
## animal mammal cat fish
## "animal" "mammal" "cat" "fish"
##
## $parents
## $parents$animal
## character(0)
##
## $parents$mammal
## [1] "animal"
##
## $parents$cat
## [1] "mammal"
##
## $parents$fish
## [1] "animal"
##
##
## $children
## $children$animal
## [1] "mammal" "fish"
##
## $children$mammal
## [1] "cat"
##
## $children$cat
## character(0)
##
## $children$fish
## character(0)
##
##
## $ancestors
## $ancestors$animal
## [1] "animal"
##
## $ancestors$mammal
## [1] "animal" "mammal"
##
## $ancestors$cat
## [1] "animal" "mammal" "cat"
##
## $ancestors$fish
## [1] "animal" "fish"
##
##
## $obsolete
## animal mammal cat fish
## FALSE FALSE FALSE FALSE
For more details, see the help page for the function, ?ontology_index
.