Creating an ontology_index

Daniel Greene

2021-02-03

An ontology_index can be obtained by loading a pre-existing one - for example by calling data(hpo), reading ontologies encoded in OBO format into R using the function get_ontology, or by calling the function ontology_index explicitly. An ontology_index is a named list of properties for each term, where each property is represented by a list or vector. Each of these property lists is named by term, facilitating simple lookups of properties by term name. All valid ontology_index objects contain id, name, parents, children and ancestors properties for each term. Additional properties can be added to an ontology_index, although they are not required by functions in the package. For details on how to use an ontology_index, see the 'Introduction to ontologyX' vignette.

Reading in an OBO file

The function get_ontology can read ontologies encoded in OBO format into R as ontology_index objects. By default, the properties id, name, obsolete, parents, children and ancestors are populated.

To call the function:

ontology <- get_ontology(file)

The properties parents, children and ancestors are determined by a given set of relations between terms: the propagate_relationships argument ("is_a" by default). Thus the parents of a term are set of terms to which it is related by any type of relation contained in propagate_relationships; the children are those terms related by the inverse relations and ancestors are those obtained by propagating the propagate_relationships relations (note: the resulting set includes the term itself).

ontology <- get_ontology(file, propagate_relationships=c("is_a", "part_of"))

The relations given in the propagate_relationships argument should be named as they are labelled in the OBO file. In order to see a complete list of relations used in an OBO file, pass the file's path to the function get_relation_names. E.g. for the gene ontology:

get_relation_names("go.obo")
## [1] "is_a"                 "regulates"            "part_of"             
## [4] "has_part"             "happens_during"       "negatively_regulates"
## [7] "positively_regulates" "occurs_in"            "ends_during"

Additional information is often present in the original file - for example definitions, labelled by the def tag in OBO format. get_ontology decides which properties to export based on the extract_tags argument. By default extract_tags="minimal", resulting in only the properties id, name, obsolete, parents, children and ancestors being exported. It is possible to include all properties given in the file by setting extract_tags="everything". The names of the properties included in the returned ontology_index are then the same as the names of the tags in OBO format.

ontology <- get_ontology(file, extract_tags="everything")

All properties are stored in the returned ontology_index as lists, except for the following, which are coerced to character or logical vectors as appropriate: "id", "name", "def", "comment", "obsolete", "created_by", "creation_date".

Further properties can be mapped to vectors if required, modifying the returned ontology_index as a list, e.g.

ontology$property <- simplify2array(ontology$property)

Adding term properties

Modifying an existing ontology_index to add term properties is the same as adding to a list or data.frame. In the example below, we add the number of children for each term:

ontology$number_of_children <- sapply(ontology$children, length)

In the same manner, a valid ontology_index can be built up from scratch as a list, of course requiring that the standard properties are included for use with functions in ontologyIndex.

Converting from OWL to OBO format

In order to read in ontologies in OWL syntax, it is recommended to first convert to OBO format, for example using the ROBOT command line tool https://github.com/ontodev/robot.

Creating an ontology_index explicitly

The function ontology_index can be used to create an object with class ontology_index. This could be useful for instance if the user wished to convert a directed acyclic graph (DAG) with edges representing sub/super-class relationships into an ontology_index. It is similar to the function data.frame: it accepts a variable number of arguments corresponding to properties for ontological terms, which must each be a vector or list of the same length (except the version argument, which can be any object and should contain any information about the version of the ontology). The only mandatory argument is the parents argument, and should be a list of character vectors giving the IDs of the 'parents'/'superclasses' of each term. The term IDs can either be supplied as the names attribute of the parents or as a separate id argument of the same length as parents. The human-readable term names can be passed as the names argument (defaults to the same as id). As usual the children and ancestors properties are derived from the parents. Warnings are generated if any IDs given in the parents argument are not in the id argument.

A simple invocation:

animal_superclasses <- list(animal=character(0), mammal="animal", cat="mammal", fish="animal")
animal_ontology <- ontology_index(parents=animal_superclasses)
unclass(animal_ontology)
## $id
##   animal   mammal      cat     fish 
## "animal" "mammal"    "cat"   "fish" 
## 
## $name
##   animal   mammal      cat     fish 
## "animal" "mammal"    "cat"   "fish" 
## 
## $parents
## $parents$animal
## character(0)
## 
## $parents$mammal
## [1] "animal"
## 
## $parents$cat
## [1] "mammal"
## 
## $parents$fish
## [1] "animal"
## 
## 
## $children
## $children$animal
## [1] "mammal" "fish"  
## 
## $children$mammal
## [1] "cat"
## 
## $children$cat
## character(0)
## 
## $children$fish
## character(0)
## 
## 
## $ancestors
## $ancestors$animal
## [1] "animal"
## 
## $ancestors$mammal
## [1] "animal" "mammal"
## 
## $ancestors$cat
## [1] "animal" "mammal" "cat"   
## 
## $ancestors$fish
## [1] "animal" "fish"  
## 
## 
## $obsolete
## animal mammal    cat   fish 
##  FALSE  FALSE  FALSE  FALSE

For more details, see the help page for the function, ?ontology_index.