What is a mapping?

The word "mapping" is pretty overloaded in practice: for some people, it simply means "a correspondence of one term to another equivalent or near equivalent term." But even here, there is little understanding to what a "term" is in this sentence, or what "almost equivalent" means - and, there are many different kinds of mappings used in practice that are not "equivalent" at all. In its very essence, an individual mapping maps one information entity, i.e. a representation of a real world entity, to another information entity - how, and what these strings could be, will be the subject of the following section.

In the following, we consider an information entity a sequence of characters which has a well defined relationship to some thing in the real world, for example: - an ontology id like HP:0004934 corresponds to the concept of "Vascular calcification" in the real world. Note that HP:0004934 is annotated with the rdfs:label "Vascular calcification". The label itself is not necessarily a term - it could change, for example to "Abnormal calcification of the vasculature", and still retain the same meaning. - "Vascular calcification" may be a term in my controlled vocabulary which I understand to correspond to that respective disease (not all controlled vocabularies have IDs for their terms). This happens for example in clinical data models that do not use formal identifiers to refer to the values of slots in their data model, like "MARRIED" in /datamodel/marital_status. - Examples of terms: - IDs of classes in an ontology - elements of a clinical value set - codes of clinical terminologies such as Z63.1 - TLDR: terms correspond to things in the world and that correspondence is not subject to change. Labels can change without changing the meaning of a term.

An attempt at a practical categorisation

In our experience, there are roughly four kinds of mappings:

string-string: Relating one string, or label, to another string, or label. Understanding such mappings is fundamental to understanding all the other kinds of mappings.
string-term: Relating a specific string or "label" to their corresponding term in a terminology or ontology. We usually refer to these as synonyms, but there may be other words used in this case.
term-term: Relating a term, for example a class in an ontology, to another term. This is what most people in the ontology domain would understand when thy hear "ontology mappings".
complex mappings: Relating two sets of terms. These are the rarest and most complicated kinds of mappings, as they related for example two phenotypic profiles (sets of phenotypes) with each other. We will discuss some more examples in a future lesson.

In some ways, these four kinds of mappings can be very different. We do believe, however, that there are enough important commonalities such as common features, widely overlapping use cases and overlapping toolkits to consider them together. In the following, we will discuss these in more detail, including important features of mappings and useful tools.

Important features of mappings

Mappings have historically been neglected as second-class citizens in the medical terminology and ontology worlds - the metadata is insufficient to allow for precise analyses and clinical decision support, they are frequently stale and out of date, etc. The question "Where can I find the canonical mappings between X and Y"? is often shrugged off and developers are pointed to aggregators such as OxO or UMLS which combine manually curated mappings with automated ones causing "mapping hairballs".

There are many important metadata elements to consider, but the ones that are by far the most important to consider one way or another are:

Precision: Is the mapping exact, broad or merely closely related?
Confidence: Do I trust the mapping? Was is done manually by an expert in my domain, or by an algorithm?
Source version: Which version of the term (or its corresponding ontology) was mapped? Is there a newer mapping which has a more suitable match for my term?

Whenever you handle mappings (either create, or reuse), make sure you are keenly aware of at least these three metrics, and capture them. You may even want to consider using a proper mapping model like the Simple Shared Standard for Ontology Mappings (SSSOM) which will make your mappings FAIR and reusable.

String-string mappings

String-string mappings are mappings that relate two strings. The task of matching two strings is ubiquitous for example in database search fields (where a user search string needs to be mapped to some strings in a database). Most, if not all effective ontology matching techniques will employ some form of string-string matching. For example, to match simple variations of labels such as "abnormal heart" and "heart abnormality", various techniques such as Stemming and bag of words can be employed effectively. Other techniques such as edit-distance or Levenshtein can be used to quantify the similarity of two strings, which can provide useful insights into mapping candidates.

String-term mappings / synonyms

String-term mappings relate a specific string or "label" to their corresponding term in a terminology or ontology. Here, we refer to these as "synonyms", but there may be other cases for string-term mappings beyond synonymy.

There are a lot of use cases for synonyms so we will name just a few here that are relevant to typical workflows of Semantic Engineers in the life sciences.

Thesauri are reference tools for finding synonyms of terms. Modern ontologies often include very rich thesauri, with some ontologies like Mondo capturing more than 70,000 exact and 35,000 related synonyms. They can provide a huge boost to traditional NLP pipelines by providing synonyms that can be used for both Named Entity Recognition and Entity Resolution. Some insight on how, for example, Uberon was used to boost text mining can be found here.

Term-term mappings / ontology mappings

Term-term mappings relate a term, for example a class in an ontology, to another term, usually from another ontology or database. The term-term case of mappings is what most people in the ontology domain would understand when they hear "ontology mappings". This is also what most people understand when they here "Entity Resolution" in the database world - the task of determining whether, in essence, two rows in a database correspond to the same thing (as an example of a tool doing ER see deepmatcher, or py-entitymatcher). For a list standard entity matching toolkit outside the ontology sphere see here.

Some examples of domain-specific mapping of importance to the biomedical domain

Phenotype ontology mappings

Mapping phenotypes across species holds great promise for leveraging the knowledge generated by Model Organism Database communities (MODs) for understanding human disease. There is a lot of work happening at the moment (2021) to provide standard mappings between species specific phenotype ontologies to drive translational research (example). Tools such as Exomiser leverage such mappings to perform clinical diagnostic tasks such as variant prioritisation. Another app you can try out that leverages cross-species mappings is the Monarch Initiatives Phenotype Profile Search.

Disease ontology mappings

Medical terminology and ontology mapping is a huge deal in medical informatics (example). Mondo is a particularly rich source of well provenanced disease ontology mappings.

How should you map your data to ontologies?

There are no one size fits all strategies for mapping your data to ontologies. There are many research areas that have something to give in this process. Here, we outline some ideas on how to think about the problem.

Case 1: Mapping internal controlled vocabularies

Case 2: Mappings from free text

Examples: - Monarch Text Annotator

Case 3: Mappings between public controlled vocabularies and ontologies

How to solve the problem of mapping hairballs

String-term mappings

Overview of automated approaches - Simple matches (string, string pre-pro, fuzzy string) - Graph-based matches (incl. semantic similarity) - NLP/Machine Learning

Practical: - Try to get the same mappings as before using techniques - Exact - Simple preprocessing - Levenshtein - Jaccard similarity - Embedding similarity (?)