Skip to content

Mondo Ontology

Ontology Mapping Data Integration

Mondo Case Study Infobox

  • Author: Nicolas Matentzoglu (@matentzn)
  • Last updated: 2025-02-15
  • Mapping Type: Mapping Type
  • Status of this case study: Status

High-level summary

The goal of Mondo is to create a unified disease ontology by integrating multiple disease vocabularies and classifications, ensuring semantic interoperability across biomedical databases. The Mondo mappings, in particular, connect disease concepts in the Mondo ontology to other related resources, such as medical terminologies.

Domain

Biomedical and clinical informatics, focusing on disease classification and standardization.

Purpose of the mapping

Multiple user groups with distinct interests rely on and have a stake in the Mondo mappings. Here is a representative selection of user stories:

  • Clinical Researcher:
    • Map disease terms from multiple coding systems (like OMIM, Orphanet, and ICD-10) to MONDO to merge and analyze patient data from different cohorts in a consistent and interoperable way.
    • Accurately count the number of distinct disease entities across disparate resources (e.g. How many rare diseases are there?)
  • Bioinformatician / data scientist: Normalize disease annotations across datasets that use different ontologies to build machine learning models on unified disease labels.
  • Health Data Integrator: Map local diagnosis codes to MONDO terms using available mappings to enable semantic search and cross-database queries across clinical records and research databases.
  • Ontology/Database curator: Review mapping candidates generated by automated tools for their inclusion into a resource.
  • Knowledge Graph Developer: Connect information provided by siloed disease resources on the same disease but using different identifiers.
  • Clinical Decision Support Developer: Bridge between clinical terminologies like ICD-10 and research vocabularies like Orphanet to surface relevant rare disease information for clinicians at the point of care.
  • Pharmaceutical Scientist: Connect disease indications across clinical trials, literature, and omics databases to identify new therapeutic opportunities.

Most of these user stories share a single key requirement: the ability to accurately match and merge data about the same disease where different identifiers where used to refer to the disease.

However, not all use cases require the exact same level of rigour. For example, the data scientist might be more interested in complete mappings that account for all diseases in their data set, while not being too concerned about potential granularity mismatches in the mapping. For example, a match between a term "diabetes mellitus" in one database without further information about the type, and "type 2 diabetes mellitus" in another might be ok if the intention is to leverage this connection with machine learning models that are resistant to noise. In other use cases, such as the clinical researcher that seeks to determine how many rare disease are there to inform budget negotiations for a government proposal, accuracy is paramount.

As mappings are always context (or rather, use-case-) dependent, detailed metadata is needed on confidence, mapping justifications, mapping precision (exact or broad) to separate which mappings are useful for which use case.

Other use cases for the mappings include:

  • Harmonizing disease concepts across multiple biomedical and clinical terminologies (e.g., OMIM, Orphanet, DOID, NCIt).
  • Enabling consistent and computable disease annotations across research datasets, biobanks, and electronic health records.
  • Facilitating disease data integration for translational research and multi-omics analysis.
  • Supporting rare disease diagnosis and research through unified identifiers and curated mappings.
  • Powering disease-centric knowledge graphs with interoperable and ontology-backed disease representations.
  • Enhancing semantic search and reasoning in biomedical databases and AI applications.
  • Enabling cross-resource disease linking in tools like ClinGen, Monarch Initiative, and GA4GH Beacon.
  • Supporting clinical decision support systems, differential diagnosis, and patient stratification.
  • Assisting drug discovery and repurposing workflows through standardized disease classifications.
  • Enabling FAIR (Findable, Accessible, Interoperable, Reusable) principles in disease-related data management.
  • Bridging the gap between rare disease vocabularies and common disease terminologies.

Type of mapped resources

  • Biomedical ontologies (e.g., DOID, NCIT).
  • Terminologies and classifications (e.g., Orphanet (ORDO), ICD10, ICD11).
  • Disease-related data from various resources (eg NORD, GARD, OMIM).

Tools used for creating the mapping

  • Lexical matching tools (e.g., OAK lexmatch)
  • Custom scripts (e.g., Python), see Mondo Ingest
  • Manual curation (Domain expert review and ontology alignment)

Type of mapping relations

  • Exact match (skos:exactMatch): A term in an external source is conceptually identical to a term in Mondo.
  • Broad match (skos:broadMatch): A term in an external source is conceptually narrower to a term in Mondo.
  • Narrow match (skos:narrowMatch): A term in an external source is conceptually broader to a term in Mondo.
  • Related match (skos:relatedMatch): A term in an external source is conceptually related, but neither identical, nor broader, nor narrower to a term in Mondo.

Examples (samples) of different types of mapping implementations

Result of an initial (unreviewed) matching process

An ontology curator gets presented mappings such as the following one for review. The mapping provides a detailed justification for a lexical matching process, which enables the curator to quickly accept or dismiss this automatically generated mapping.

subject_id subject_label object_id predicate_id object_label mapping_justification mapping_tool confidence subject_match_field object_match_field match_string
MONDO:0005641 aleutian mink disease DOID:2934 skos:exactMatch aleutian mink disease semapv:LexicalMatching oaklib 0.8497788951776651 rdfs:label rdfs:label aleutian mink disorder
MONDO:0005676 borna disease DOID:5154 skos:exactMatch borna disease semapv:LexicalMatching oaklib 0.8497788951776651 rdfs:label rdfs:label borna disorder
MONDO:0007744 cholesterol-ester transfer protein deficiency DOID:0111368 skos:exactMatch cholesterol-ester transfer protein deficiency semapv:LexicalMatching oaklib 0.8497788951776651 rdfs:label rdfs:label cholesterol-ester transfer protein deficiency
MONDO:0007988 autosomal dominant primary microcephaly DOID:0061100 skos:exactMatch autosomal dominant primary microcephaly semapv:LexicalMatching oaklib 0.8497788951776651 rdfs:label rdfs:label autosomal dominant primary microcephaly
MONDO:0009297 familial renal glucosuria DOID:0070613 skos:exactMatch familial renal glucosuria semapv:LexicalMatching oaklib 0.9411764705882353 skos:exactMatch skos:exactMatch mesh:d006030

Mondo is using the SSSOM format for their representation of mappings. For a complete list of mappings see here.