Mondo Ontology
Ontology Mapping Data Integration
Mondo Case Study Infobox
- Author: Nicolas Matentzoglu (@matentzn)
- Last updated: 2025-02-15
- Mapping Type:
- Status of this case study:
High-level summary¶
The goal of Mondo is to create a unified disease ontology by integrating multiple disease vocabularies and classifications, ensuring semantic interoperability across biomedical databases. The Mondo mappings, in particular, connect disease concepts in the Mondo ontology to other related resources, such as medical terminologies.
Domain¶
Biomedical and clinical informatics, focusing on disease classification and standardization.
Purpose of the mapping¶
Multiple user groups with distinct interests rely on and have a stake in the Mondo mappings. Here is a representative selection of user stories:
- Clinical Researcher:
- Map disease terms from multiple coding systems (like OMIM, Orphanet, and ICD-10) to MONDO to merge and analyze patient data from different cohorts in a consistent and interoperable way.
- Accurately count the number of distinct disease entities across disparate resources (e.g. How many rare diseases are there?)
- Bioinformatician / data scientist: Normalize disease annotations across datasets that use different ontologies to build machine learning models on unified disease labels.
- Health Data Integrator: Map local diagnosis codes to MONDO terms using available mappings to enable semantic search and cross-database queries across clinical records and research databases.
- Ontology/Database curator: Review mapping candidates generated by automated tools for their inclusion into a resource.
- Knowledge Graph Developer: Connect information provided by siloed disease resources on the same disease but using different identifiers.
- Clinical Decision Support Developer: Bridge between clinical terminologies like ICD-10 and research vocabularies like Orphanet to surface relevant rare disease information for clinicians at the point of care.
- Pharmaceutical Scientist: Connect disease indications across clinical trials, literature, and omics databases to identify new therapeutic opportunities.
Most of these user stories share a single key requirement: the ability to accurately match and merge data about the same disease where different identifiers where used to refer to the disease.
However, not all use cases require the exact same level of rigour. For example, the data scientist might be more interested in complete mappings that account for all diseases in their data set, while not being too concerned about potential granularity mismatches in the mapping. For example, a match between a term "diabetes mellitus" in one database without further information about the type, and "type 2 diabetes mellitus" in another might be ok if the intention is to leverage this connection with machine learning models that are resistant to noise. In other use cases, such as the clinical researcher that seeks to determine how many rare disease are there to inform budget negotiations for a government proposal, accuracy is paramount.
As mappings are always context (or rather, use-case-) dependent, detailed metadata is needed on confidence, mapping justifications, mapping precision (exact or broad) to separate which mappings are useful for which use case.
Other use cases for the mappings include:
- Harmonizing disease concepts across multiple biomedical and clinical terminologies (e.g., OMIM, Orphanet, DOID, NCIt).
- Enabling consistent and computable disease annotations across research datasets, biobanks, and electronic health records.
- Facilitating disease data integration for translational research and multi-omics analysis.
- Supporting rare disease diagnosis and research through unified identifiers and curated mappings.
- Powering disease-centric knowledge graphs with interoperable and ontology-backed disease representations.
- Enhancing semantic search and reasoning in biomedical databases and AI applications.
- Enabling cross-resource disease linking in tools like ClinGen, Monarch Initiative, and GA4GH Beacon.
- Supporting clinical decision support systems, differential diagnosis, and patient stratification.
- Assisting drug discovery and repurposing workflows through standardized disease classifications.
- Enabling FAIR (Findable, Accessible, Interoperable, Reusable) principles in disease-related data management.
- Bridging the gap between rare disease vocabularies and common disease terminologies.
Type of mapped resources¶
- Biomedical ontologies (e.g., DOID, NCIT).
- Terminologies and classifications (e.g., Orphanet (ORDO), ICD10, ICD11).
- Disease-related data from various resources (eg NORD, GARD, OMIM).
Links to existing mappings¶
Tools used for creating the mapping¶
- Lexical matching tools (e.g., OAK lexmatch)
- Custom scripts (e.g., Python), see Mondo Ingest
- Manual curation (Domain expert review and ontology alignment)
Type of mapping relations¶
- Exact match (
skos:exactMatch
): A term in an external source is conceptually identical to a term in Mondo. - Broad match (
skos:broadMatch
): A term in an external source is conceptually narrower to a term in Mondo. - Narrow match (
skos:narrowMatch
): A term in an external source is conceptually broader to a term in Mondo. - Related match (
skos:relatedMatch
): A term in an external source is conceptually related, but neither identical, nor broader, nor narrower to a term in Mondo.
Examples (samples) of different types of mapping implementations¶
Result of an initial (unreviewed) matching process¶
An ontology curator gets presented mappings such as the following one for review. The mapping provides a detailed justification for a lexical matching process, which enables the curator to quickly accept or dismiss this automatically generated mapping.
subject_id | subject_label | object_id | predicate_id | object_label | mapping_justification | mapping_tool | confidence | subject_match_field | object_match_field | match_string |
---|---|---|---|---|---|---|---|---|---|---|
MONDO:0005641 | aleutian mink disease | DOID:2934 | skos:exactMatch | aleutian mink disease | semapv:LexicalMatching | oaklib | 0.8497788951776651 | rdfs:label | rdfs:label | aleutian mink disorder |
MONDO:0005676 | borna disease | DOID:5154 | skos:exactMatch | borna disease | semapv:LexicalMatching | oaklib | 0.8497788951776651 | rdfs:label | rdfs:label | borna disorder |
MONDO:0007744 | cholesterol-ester transfer protein deficiency | DOID:0111368 | skos:exactMatch | cholesterol-ester transfer protein deficiency | semapv:LexicalMatching | oaklib | 0.8497788951776651 | rdfs:label | rdfs:label | cholesterol-ester transfer protein deficiency |
MONDO:0007988 | autosomal dominant primary microcephaly | DOID:0061100 | skos:exactMatch | autosomal dominant primary microcephaly | semapv:LexicalMatching | oaklib | 0.8497788951776651 | rdfs:label | rdfs:label | autosomal dominant primary microcephaly |
MONDO:0009297 | familial renal glucosuria | DOID:0070613 | skos:exactMatch | familial renal glucosuria | semapv:LexicalMatching | oaklib | 0.9411764705882353 | skos:exactMatch | skos:exactMatch | mesh:d006030 |
Mondo is using the SSSOM format for their representation of mappings. For a complete list of mappings see here.