The core of this rule-based mediation strategy for model annotation is the telomere ontology, the core ontology for the Proctor et al. model. Each syntactic ontology is separately mapped to this ontology. Just as the syntactic ontologies provide input data to the telomere ontology, they also can provide an output route. This ability gives them the scope to act as a translation system from any syntactic ontology to any other syntactic ontology. It is through this bi-directionality of the information flow that new knowledge can be returned to the originator of a query. Here we present a summary of each of the syntactic ontologies built for these use cases together with a summary of the telomere ontology itself. There are as many syntactic ontologies as there are data formats, with data sources sharing a common format also sharing a syntactic ontology.
The data sources used were BioGRID [13], Pathway Commons (http://www.pathwaycommons.org), and UniProtKB [14].
Table 1 provides an overview of the main types of information retrieved from each of the data sources.
A
identifies a data type that can always be found from the associated data source, for example downloading a BioGRID interaction file will always include interactions and interaction types.
However, some data types are not always available from a given data source.
Such partial associations are shown with a
.
Table 1 describes the information provided by the data sources in the context of the use cases only.
For instance, even though interaction data may be present within UniProtKB entries, as yet no mapping rules have been written and therefore that column is left blank.
|
The for BioGRID's entity identification column in Table 1 represents the lack of a UniProtKB identifier for some interactors.
Specifically for the use cases, the BioGRID entity representing `rad9' does not have a cross reference to UniProtKB.
Localization information is also not available from the BioGRID input data. For Pathway Commons, all data types are theoretically available as the BioPAX format models them.
The actual instance data returned from Pathway Commons does not contain information on either localisation or interaction type.
Retrieved UniProtKB information consists of entity localisation and identification, though localisation information is not always present.
The SBML syntactic ontology is being used as an output rather than as an input for these use cases, however SBML models may provide any of the described data types.
An existing SBML syntactic ontology, MFO, allows both input of user queries and output of rule-based mediation responses [15]. It is used as an input point for all data sources in SBML format. Syntactic ontologies have been deliberately created as direct translations of non-OWL data formats into OWL. The purpose of a syntactic ontology is to act as a literal, syntactic description of the data source in OWL. As it is the core ontology where the integration and the majority of the inference will occur, it is there that all of the semantic modelling is performed.
Of the four data sources required for the use cases described in the submission, one syntactic ontology had been created by the authors in a previous work, another did not need to be explicitly generated because it was already in OWL-DL, and the other two needed to be written. Those latter two syntactic ontologies were generated using the XMLTab plugin for Protégé 3.4 RC1 (http://protegewiki.stanford.edu/index.php/XML_Tab). This plugin has a number of advantages and disadvantages, but overall it was a good choice for the initial creation of the new syntactic ontologies. After initial generation of the OWL files, changes to the initial OWL files can be made at any time, as needed.
The particular advantages of using XMLTab include:
However, XMLTab is not the perfect choice. Some things in particular would be useful in whatever application is used when this work is scaled for larger data integration tasks:
The use of existing tools to implement rule-based mediation increases its usability for other researchers as well as decreases the development type. Therefore wherever possible, existing tools were used.
RBM Home