Skip to content

Simple Standard for Sharing Ontological Mappings (SSSOM)

SSSOM banner

The Simple Standard for Sharing Ontological Mappings (SSSOM) is a community-driven standard designed to facilitate the exchange and integration of semantic entity mappings. As data interoperability becomes increasingly crucial across various domains, SSSOM provides a standardized format to share mappings, enabling researchers and developers to more easily connect and utilize diverse datasets. By establishing a common framework, SSSOM enhances the consistency, quality, and discoverability of mappings, thereby supporting more effective data integration and analysis.

  • Standardization: SSSOM provides a unified format for representing semantic, or ontological, mappings, making it easier for different systems and organizations to exchange mapping data consistently.
  • Interoperability: By using SSSOM, data from diverse sources can be integrated more seamlessly, allowing for improved data analysis and research across various fields, including biology, healthcare, and information technology.

Beyond defining the standard itself, the SSSOM Core Team and the SSSOM community also develop reference tools and software libraries for working with the standard.

SSSOM at a glance: Model and Exchange Format

Basic model

The data model of SSSOM is centered around two fundamental concepts: mappings and mapping sets.

A SSSOM mapping is a statement that there is a correspondence between two semantic entities. It comprises two components:

  1. The core mapping (or raw mapping), which is a triple <subject, predicate, object> that represents the correspondence itself between a subject entity, for example a class in an ontology, and an object entity, for example an identifier in some database, via a semantic mapping predicate, for example skos:exactMatch.
  2. Metadata that provide supplementary pieces of information about the core mapping. This notably includes information about the provenance of the statement (for example, who authored the statement), the confidence with which the mappings holds, and its justification (a reason that supports the fidelity of the mapping between the subject and the object, such as expert review, or exact lexical matching on the entities' primary names).

A SSSOM mapping set is a collection of SSSOM mappings. Mapping sets can also be associated with metadata, such as license statements, or a description.

Example

While the SSSOM model is quite general and mappings can be shared in different formats, the most common format is the SSSOM/TSV format. Here is a tabular representation of some example mappings for illustration purposes:

subject_id subject_label predicate_id object_id object_label mapping_justification author_id confidence comment
KF_FOOD:F001 apple skos:exactMatch FOODON:00002473 apple (whole) semapv:ManualMappingCuration orcid:0000-0002-7356-1779 0.95 "We could map to FOODON:03310788 instead to cover sliced apples, but only 'whole' apple types exist."
KF_FOOD:F002 gala skos:exactMatch FOODON:00003348 Gala apple (whole) semapv:ManualMappingCuration orcid:0000-0002-7356-1779 1.0
KF_FOOD:F003 pink skos:exactMatch FOODON:00004186 Pink apple (whole) semapv:ManualMappingCuration orcid:0000-0002-7356-1779 0.9 "We could map to FOODON:00004187 instead which more specifically refers to 'raw' Pink apples. Decided against to be consistent with other mapping choices."
KF_FOOD:F004 braeburn skos:broadMatch FOODON:00002473 apple (whole) semapv:ManualMappingCuration orcid:0000-0002-7356-1779 1.0

In the TSV format, mapping set metadata is included at the top of the file, before the mappings themselves, in yaml-like key-value pairs:

curie_map:
  FOODON: http://purl.obolibrary.org/obo/FOODON_
  KF_FOOD: https://kewl-foodie.inc/food/
  orcid: https://orcid.org/
mapping_set_id: https://w3id.org/sssom/tutorial/example1.sssom.tsv
mapping_set_description: Manually curated alignment of KEWL FOODIE INC internal food and nutrition database with Food Ontology (FOODON). Intended to be used for ontological analysis and grouping of KEWL FOODIE INC related data.
license: https://creativecommons.org/licenses/by/4.0/
mapping_date: 2022-05-02

See here for concrete examples.

Quick reference for mapping metadata

For mapping set metadata please see here.

Column/Field Description Value Examples Required
subject_id The ID of the subject of the mapping. entity reference (e.g. CURIE in TSV) HP:0009894 Optional
subject_label The label of subject of the mapping. string Thickened ears Recommended
subject_category The conceptual category to which the subject belongs to. This can be a string denoting the category or a term from a controlled vocabulary. This slot is deliberately underspecified. Conceptual categories can range from those that are found in general upper ontologies such as BFO (e.g. process, temporal region, etc) to those that serve as upper ontologies in specific domains, such as COB or BioLink (e.g. gene, disease, chemical entity). The purpose of this optional field is documentation for human reviewers - when a category is known and documented clearly, the cost of interpreting and evaluating the mapping decreases. string UBERON:0001062 Optional
predicate_id The ID of the predicate or relation that relates the subject and object of this match. entity reference (e.g. CURIE in TSV) owl:sameAs Required
predicate_label The label of the predicate/relation of the mapping. string has cross-reference Optional
predicate_modifier A modifier for negating the predicate. See https://github.com/mapping-commons/sssom/issues/40 for discussion Not Not Optional
object_id The ID of the object of the mapping. entity reference (e.g. CURIE in TSV) HP:0009894 Optional
object_label The label of object of the mapping. string Thickened ears Recommended
object_category The conceptual category to which the subject belongs to. This can be a string denoting the category or a term from a controlled vocabulary. This slot is deliberately underspecified. Conceptual categories can range from those that are found in general upper ontologies such as BFO (e.g. process, temporal region, etc) to those that serve as upper ontologies in specific domains, such as COB or BioLink (e.g. gene, disease, chemical entity). The purpose of this optional field is documentation for human reviewers - when a category is known and documented clearly, the cost of interpreting and evaluating the mapping decreases. string UBERON:0001062 Optional
mapping_justification A mapping justification is an action (or the written representation of that action) of showing a mapping to be right or reasonable. entity reference (e.g. CURIE in TSV) semapv:LexicalMatching Required
author_id Identifies the persons or groups responsible for asserting the mappings. Recommended to be a list of ORCIDs or otherwise identifying URIs. entity reference (e.g. CURIE in TSV) orcid:0000-0002-7356-1779|orcid:0000-0002-6601-2165 Optional
author_label A string identifying the author of this mapping. In the spirit of provenance, consider using author_id instead. string Nicolas Matentzoglu|Chris Mungall Optional
reviewer_id Identifies the persons or groups that reviewed and confirmed the mapping. Recommended to be a list of ORCIDs or otherwise identifying URIs. entity reference (e.g. CURIE in TSV) orcid:0000-0002-7356-1779|orcid:0000-0002-6601-2165 Optional
reviewer_label A string identifying the reviewer of this mapping. In the spirit of provenance, consider using reviewer_id instead. string Nicolas Matentzoglu|Chris Mungall Optional
creator_id Identifies the persons or groups responsible for the creation of the mapping. The creator is the agent that put the mapping in its published form, which may be different from the author, which is a person that was actively involved in the assertion of the mapping. Recommended to be a list of ORCIDs or otherwise identifying URIs. entity reference (e.g. CURIE in TSV) orcid:0000-0002-7356-1779|orcid:0000-0002-6601-2165 Optional
creator_label A string identifying the creator of this mapping. In the spirit of provenance, consider using creator_id instead. string Nicolas Matentzoglu|Chris Mungall Optional
license A url to the license of the mapping. In absence of a license we assume no license. uri https://creativecommons.org/licenses/by/4.0/ Optional
subject_type The type of entity that is being mapped. owl class, owl object property, owl data property, owl annotation property, owl named individual, skos concept, rdfs resource, rdfs class, rdfs literal, rdfs datatype, rdf property owl:Class Optional
subject_source URI of vocabulary or identifier source for the subject. entity reference (e.g. CURIE in TSV) obo:mondo.owl Optional
subject_source_version Version IRI or version string of the source of the subject term. string http://purl.obolibrary.org/obo/mondo/releases/2021-01-30/mondo.owl Optional
object_type The type of entity that is being mapped. owl class, owl object property, owl data property, owl annotation property, owl named individual, skos concept, rdfs resource, rdfs class, rdfs literal, rdfs datatype, rdf property owl:Class Optional
object_source URI of vocabulary or identifier source for the object. entity reference (e.g. CURIE in TSV) obo:mondo.owl Optional
object_source_version Version IRI or version string of the source of the object term. string http://purl.obolibrary.org/obo/mondo/releases/2021-01-30/mondo.owl Optional
mapping_provider URL pointing to the source that provided the mapping, for example an ontology that already contains the mappings, or a database from which it was derived. uri https://www.ohdsi.org/ Optional
mapping_source The mapping set this mapping was originally defined in. mapping_source is used for example when merging multiple mapping sets or deriving one mapping set from another. entity reference (e.g. CURIE in TSV) MONDO_MAPPINGS:mondo_exactmatch_ncit.sssom.tsv Optional
mapping_cardinality A string indicating whether this mapping is from a 1:1 (the subject_id maps to a single object_id), 1:n (the subject maps to more than one object_id), n:1, 1:0, 0:1 or n:n group. Note that this is a convenience field that should be derivable from the mapping set. 1:1, 1:n, n:1, 1:0, 0:1, n:n 1:1 Optional
mapping_tool A reference to the tool or algorithm that was used to generate the mapping. Should be a URL pointing to more info about it, but can be free text. string https://github.com/AgreementMakerLight/AML-Project Optional
mapping_tool_version Version string that denotes the version of the mapping tool used. string v3.2 Optional
mapping_date The date the mapping was asserted. This is different from the date the mapping was published or compiled in a SSSOM file. date 2021-01-01 Optional
publication_date The date the mapping was published. This is different from the date the mapping was asserted. date 2021-01-01 Optional
confidence A score between 0 and 1 to denote the confidence or probability that the match is correct, where 1 denotes total confidence. double 0.95 Optional
curation_rule A curation rule is a (potentially) complex condition executed by an agent that led to the establishment of a mapping. Curation rules often involve complex domain-specific considerations, which are hard to capture in an automated fashion. The curation rule is captured as a resource rather than a string, which enables higher levels of transparency and sharing across mapping sets. The URI representation of the curation rule is expected to be a resolvable identifier which provides details about the nature of the curation rule. entity reference (e.g. CURIE in TSV) DISEASE_MAPPING_COMMONS_RULES:MPR2 Optional
curation_rule_text A curation rule is a (potentially) complex condition executed by an agent that led to the establishment of a mapping. Curation rules often involve complex domain-specific considerations, which are hard to capture in an automated fashion. The curation rule should be captured as a resource (entity reference) rather than a string (see curation_rule element), which enables higher levels of transparency and sharing across mapping sets. The textual representation of curation rule is intended to be used in cases where (1) the creation of a resource is not practical from the perspective of the mapping_provider and (2) as an additional piece of metadata to augment the curation_rule element with a human readable text. string The two phenotypes inhere in homologous structures and exhibit the same phenotypic quality. Optional
subject_match_field A list of properties (term annotations on the subject) that was used for the match. entity reference (e.g. CURIE in TSV) rdfs:label Optional
object_match_field A list of properties (term annotations on the object) that was used for the match. entity reference (e.g. CURIE in TSV) rdfs:label Optional
match_string String that is shared by subj/obj. It is recommended to indicate the fields for the match using the object and subject_match_field slots. string gala Optional
subject_preprocessing Method of preprocessing applied to the fields of the subject. If different preprocessing steps were performed on different fields, it is recommended to store the match in separate rows. entity reference (e.g. CURIE in TSV) semapv:Stemming Optional
object_preprocessing Method of preprocessing applied to the fields of the object. If different preprocessing steps were performed on different fields, it is recommended to store the match in separate rows. entity reference (e.g. CURIE in TSV) semapv:Stemming Optional
similarity_score A score between 0 and 1 to denote the similarity between two entities, where 1 denotes equivalence, and 0 denotes disjointness. The score is meant to be used in conjunction with the similarity_measure field, to document, for example, the lexical or semantic match of a matching algorithm. double 0.95 Optional
similarity_measure The measure used for computing a similarity score. This field is meant to be used in conjunction with the similarity_score field, to document, for example, the lexical or semantic match of a matching algorithm. To make processing this field as unambiguous as possible, we recommend using wikidata CURIEs, but the type of this field is deliberately unspecified. string https://www.wikidata.org/entity/Q865360 Optional
see_also A URL specific for the mapping instance. E.g. for kboom we have a per-mapping image that shows surrounding axioms that drive probability. Could also be a github issue URL that discussed a complicated alignment string https://github.com/mapping-commons/mh_mapping_initiative/pull/41 Optional
issue_tracker_item The issue tracker item discussing this mapping. entity reference (e.g. CURIE in TSV) SSSOM_GITHUB_ISSUE:166 Optional
other Pipe separated list of key value pairs for properties not part of the SSSOM spec. Can be used to encode additional provenance data. NOTE. This field is not recommended for general use, and should be used sparingly. See https://github.com/mapping-commons/sssom/blob/master/examples/schema/extension-slots.sssom.tsv for an alternative approach based on extension slots. string N/A Optional
comment Free text field containing either curator notes or text generated by tool providing additional informative information. string This mapping is weird in that the hierarchical position of the two terms is very different. Optional

General

Publications

Related software

  • sssom-py (reference implementation of the standard, a toolkit and API for processing mappings, written in Python)
  • SSSOM-Java (an implementation of the SSSOM standard for the Java language)

The SSSOM Core Team

Contact

The preferred way to contact the SSSOM team is through the issue tracker (for problems with SSSOM) or the GitHub discussion forums (for general questions).

You can find any of the members of the SSSOM core team on GitHub. Their GitHub profiles usually also provide email addresses.

You can also reach us in the OBO Foundry Slack, in the #sssom channel.

Steering committee

The Steering committee is a self-appointed group of SSSOM contributors, whose aim is to drive the evolution of the standard and coordinate community contributions.

Documentation/specification editors

Contributors

Acknowledgements