Rule-Based Mediation for Model Annotation: Background and Comparison with Previous Work

Return to rule-based mediation project page.

This is a more thorough grounding in the background of the work than is available in the main paper due to space constraints. Please see the main paper for more information about the rule-based mediation procedure itself.

Annotation of Systems Biology Models

Aids to model annotation exist, but rely extensively on the expert knowledge of the modeller for identification of appropriate additions. Saint, for example, allows modellers to select appropriate MIRIAM annotations and create new SBML reactions from suggested interactions via a lightweight web interface [1]. Taverna workflows have also been used to annotate models from multiple data sources [2]. Tools such as SemanticSBML [3] create MIRIAM annotations but do not create any other model elements, and those such as LibAnnotationSBML connect to and serve data from a large number of web services but have a simpler user interface [4]. In the BioNetGen [5], which runs in Virtual Cell, a modeller describes molecules and modules of interest as well as the rules for their interactions, and then an SBML model is generated the appropriate species and transitions. However, this system requires some level of manual intervention and neither adds MIRIAM annotations nor integrates more than those two data formats, instead accepting only BioPAX-formatted data. There is a need for computational approaches that automate the integration of multiple sources to enable the model annotation process.

Data Integration in the Life Sciences

Previous work with ontology mapping and other semantic data integration methodologies in the life sciences includes the TAMBIS ontology-based query system [6]; mapping the Gene Ontology to UMLS [7]; creating databases using RDF with S3DB [8]; the integration of Entrez Gene/HomoloGene with BioPAX via the EKoM [9]; the integration of data in SBML and BioPAX formats using the SBPAX linker ontology and the Systems Biology Linker (SyBiL) software [10]; and the database integration system OntoFusion[11]. See below for further discussion of these systems.

Comparison with Previous Work

Rule-based mediation, which provides ontology mapping combined with a semantically- and biologically-rich core ontology, is most closely related to the BGLaV system described in the paper, but also has similarities with other semantic data integration systems created for the life sciences. Older integration techniques such as TAMBIS use only a core ontology, mapping both the semantics and syntax of the underlying data sources into the ontology [6]. TAMBIS also assumes only one data source per format type. In contrast, rule-based mediation separates the tasks of resolving syntactic and semantic heterogeneity through the use of syntactic ontologies as well as a core ontology. Also, many different data sources sharing a common format can be parsed into a single syntactic ontology. Lomax et al.'s approach to ontology mapping created an alignment of the Gene Ontology to UMLS [7]. This research linked two mature ontologies such that data annotated with one ontology can also be associated with the second. This one-off, specific implementation of ontology mapping is not generally applicable for more complex data integration needs. The RDF triple-store S3DB [8] allows users to add whatever triples they require, but there is no specific support for OWL and the semantic benefits such a language provides.

The integration of Entrez Gene/HomoloGene with BioPAX via EKoM [9] is a method which has a similar purpose and structure to rule-based mediation. Sahoo et al. needed to retrieve large amounts of data and store them in a semantically-meaningful way. Instead of using SQWRL, the authors made use of a similar query language called the SPARQL. While their purpose, to identify genes of interest in pathway and interaction data, is similar to this work's goal of annotating biological model species and reactions, their methodology differs slightly. They have aligned two ontologies (EKoM and BioPAX), each of which matches one data source of interest, and then populated those ontologies directly from the data sources. Querying of the large amount of integrated data can then be performed. However, the Sahoo et al. method has a number of limitations: addition of new data in different formats would require the modification of the main ontology, and perhaps even the stored queries used to retrieve information from it; SPARQL only understands RDF and not OWL, thus limiting its functionality; and there is no description of exporting the results in one of the formats used to import data.

SBPAX is a bridging ontology between SBML and BioPAX created to integrate the information available in both formats [10]. In this approach, data can be either converted from one format to another, or merged together and then saved in a target format. Only concepts common to both formats are created within SBPAX. While SBPAX provides a useful approach for conversion and merging of model and pathway data, is not a generic solution. While the SBPAX approach could theoretically be adjusted to add more data formats, currently only SBML and BioPAX are supported. In contrast, rule-based mediation was designed, from the beginning, to incorporate multiple formats. Further, the SBPAX bridging ontology---which performs a function similar to a core ontology---only describes the union of SBML and BioPAX concepts, without describing a more general biological domain of interest. The addition of new data formats to the SBPAX approach will require modifications to the bridging ontology which, due to its tight coupling with SBML and BioPAX concepts, may be both complex and time-consuming to reconcile. However, the efforts of Ruebenacker et al. to reconcile the two formats share some similarities with aspects of rule-based mediation.

OntoFusion uses the local-as-view method to build virtual local schemas---one per data format---that describe those formats using semantically-identical concepts from the shared domain, or core, ontology [11]. While this allows for automated alignment of different data types because their virtual schemas share a common core ontology, the method relies on query translation rather than data materialisation, which can be costly. Further, input data can only be described using the core ontology, which may not have the expressivity to describe all of the information in each format of interest.

References

  1. Lister, A. L., Pocock, M., Taschuk, M., Wipat, A., November 2009. Saint: a lightweight integration environment for model annotation. Bioinformatics 25 (22), 3026-3027. http://dx.doi.org/10.1093/bioinformatics/btp523
  2. Li, P., Oinn, T., Soiland, S., Kell, D. B., January 2008. Automated manipulation of systems biology models using libsbml within taverna workflows. Bioinformatics (Oxford, England) 24 (2), 287-289. http://dx.doi.org/10.1093/bioinformatics/btm578
  3. Krause, F., Uhlendorf, J., Lubitz, T., Schulz, M., Klipp, E., Liebermeister, W., November 2009. Annotation and merging of sbml models with semanticsbml. Bioinformatics, btp642+. http://dx.doi.org/10.1093/bioinformatics/btp642
  4. Swainston, N., Mendes, P., September 2009. libannotationsbml: a library for exploiting sbml annotations. Bioinformatics (Oxford, England) 25 (17), 2292-2293. http://dx.doi.org/10.1093/bioinformatics/btp392
  5. Blinov, M. L., Ruebenacker, O., Moraru, I. I., September 2008. Complexity and modularity of intracellular networks: a systematic approach for modelling and simulation. IET systems biology 2 (5), 363-368. http://dx.doi.org/10.1049/iet-syb:20080092
  6. Stevens, R., Baker, P., Bechhofer, S., Ng, G., Jacoby, A., Paton, N. W., Goble, C. A., Brass, A., February 2000. Tambis: transparent access to multiple bioinformatics information sources. Bioinformatics (Oxford, England) 16 (2), 184-185. http://dx.doi.org/10.1093/bioinformatics/16.2.184
  7. Lomax, J., McCray, A. T., 2004. Mapping the gene ontology into the unified medical language system. Comparative and functional genomics 5 (4), 354-361. http://dx.doi.org/10.1002/cfg.407
  8. Deus, H. F., Stanislaus, R., Veiga, D. F., Behrens, C., Wistuba, I. I., Minna, J. D., Garner, H. R., Swisher, S. G., Roth, J. A., Correa, A. M., Broom, B., Coombes, K., Chang, A., Vogel, L. H., Almeida, J. S., August 2008. A semantic web management model for integrative biomedical informatics. PLoS ONE 3 (8), e2946+. http://dx.doi.org/10.1371/journal.pone.0002946
  9. Sahoo, S. S. S., Bodenreider, O., Rutter, J. L. L., Skinner, K. J. J., Sheth, A. P. P., February 2008. An ontology-driven semantic mashup of gene and biological pathway information: Application to the domain of nicotine dependence. Journal of biomedical informatics. http://dx.doi.org/10.1016/j.jbi.2008.02.006
  10. Ruebenacker, O., Moraru, I. I., Schaff, J. C., Blinov, M. L., 2009. Integrating biopax pathway knowledge with sbml models. IET Systems Biology 3 (5), 317-328.
  11. Alonso-Calvo, R., Maojo, V., Billhardt, H., Martin-Sanchez, F., García-Remesal, M., Pérez-Rey, D., February 2007. An agent- and ontology-based system for integrating public gene, protein, and disease databases. J Biomed Inform 40 (1), 17-29. http://dx.doi.org/10.1016/j.jbi.2006.02.014

Return to the rule-based mediation project page.

Copyright © 2010 CISBAN