Skip to content

SSSOM Official Data Model Documentation

SSSOM banner

Datamodel for Simple Standard for Sharing Ontological Mappings (SSSOM)

Schema PURL: https://w3id.org/sssom/schema/

Introduction

While the SSSOM model is quite general and mappings can be shared in different formats, the most common format is the SSSOM/TSV format. Here is a tabular representation of some example mappings for illustration purposes:

subject_id subject_label predicate_id object_id object_label mapping_justification author_id confidence comment
KF_FOOD:F001 apple skos:exactMatch FOODON:00002473 apple (whole) semapv:ManualMappingCuration orcid:0000-0002-7356-1779 0.95 "We could map to FOODON:03310788 instead to cover sliced apples, but only 'whole' apple types exist."
KF_FOOD:F002 gala skos:exactMatch FOODON:00003348 Gala apple (whole) semapv:ManualMappingCuration orcid:0000-0002-7356-1779 1.0
KF_FOOD:F003 pink skos:exactMatch FOODON:00004186 Pink apple (whole) semapv:ManualMappingCuration orcid:0000-0002-7356-1779 0.9 "We could map to FOODON:00004187 instead which more specifically refers to 'raw' Pink apples. Decided against to be consistent with other mapping choices."
KF_FOOD:F004 braeburn skos:broadMatch FOODON:00002473 apple (whole) semapv:ManualMappingCuration orcid:0000-0002-7356-1779 1.0

In the TSV format, mapping set metadata is included at the top of the file, before the mappings themselves, in yaml-like key-value pairs:

Example header (YAML format)

curie_map:
  FOODON: http://purl.obolibrary.org/obo/FOODON_
  KF_FOOD: https://kewl-foodie.inc/food/
  orcid: https://orcid.org/
mapping_set_id: https://w3id.org/sssom/tutorial/example1.sssom.tsv
mapping_set_description: >
  Manually curated alignment of KEWL FOODIE INC internal food and 
  nutrition database with Food Ontology (FOODON). Intended to be 
  used for ontological analysis and grouping of KEWL FOODIE INC 
  related data.
license: https://creativecommons.org/licenses/by/4.0/
mapping_date: 2022-05-02

See here for concrete examples.

Mapping metadata elements

Mapping: Represents an individual mapping between a pair of entities

Column/Field Description Required
subject_id The ID of the subject of the mapping. Optional
subject_label The label of subject of the mapping. Recommended
subject_category The conceptual category to which the subject belongs to. This can be a string denoting the category or a term from a controlled vocabulary. This slot is deliberately underspecified. Conceptual categories can range from those that are found in general upper ontologies such as BFO (e.g. process, temporal region, etc) to those that serve as upper ontologies in specific domains, such as COB or BioLink (e.g. gene, disease, chemical entity). The purpose of this optional field is documentation for human reviewers - when a category is known and documented clearly, the cost of interpreting and evaluating the mapping decreases. Optional
predicate_id The ID of the predicate or relation that relates the subject and object of this match. Required
predicate_label The label of the predicate/relation of the mapping. Optional
predicate_modifier A modifier for negating the predicate. See https://github.com/mapping-commons/sssom/issues/40 for discussion Optional
object_id The ID of the object of the mapping. Optional
object_label The label of object of the mapping. Recommended
object_category The conceptual category to which the subject belongs to. This can be a string denoting the category or a term from a controlled vocabulary. This slot is deliberately underspecified. Conceptual categories can range from those that are found in general upper ontologies such as BFO (e.g. process, temporal region, etc) to those that serve as upper ontologies in specific domains, such as COB or BioLink (e.g. gene, disease, chemical entity). The purpose of this optional field is documentation for human reviewers - when a category is known and documented clearly, the cost of interpreting and evaluating the mapping decreases. Optional
mapping_justification A mapping justification is an action (or the written representation of that action) of showing a mapping to be right or reasonable. Required
author_id Identifies the persons or groups responsible for asserting the mappings. Recommended to be a list of ORCIDs or otherwise identifying URIs. Optional
author_label A string identifying the author of this mapping. In the spirit of provenance, consider using author_id instead. Optional
reviewer_id Identifies the persons or groups that reviewed and confirmed the mapping. Recommended to be a list of ORCIDs or otherwise identifying URIs. Optional
reviewer_label A string identifying the reviewer of this mapping. In the spirit of provenance, consider using reviewer_id instead. Optional
creator_id Identifies the persons or groups responsible for the creation of the mapping. The creator is the agent that put the mapping in its published form, which may be different from the author, which is a person that was actively involved in the assertion of the mapping. Recommended to be a list of ORCIDs or otherwise identifying URIs. Optional
creator_label A string identifying the creator of this mapping. In the spirit of provenance, consider using creator_id instead. Optional
license A url to the license of the mapping. In absence of a license we assume no license. Optional
subject_type The type of entity that is being mapped. Optional
subject_source URI of vocabulary or identifier source for the subject. Optional
subject_source_version Version IRI or version string of the source of the subject term. Optional
object_type The type of entity that is being mapped. Optional
object_source URI of vocabulary or identifier source for the object. Optional
object_source_version Version IRI or version string of the source of the object term. Optional
predicate_type The type of the predicate used to map the subject and object entities. Optional
mapping_provider URL pointing to the source that provided the mapping, for example an ontology that already contains the mappings, or a database from which it was derived. Optional
mapping_source The mapping set this mapping was originally defined in. mapping_source is used for example when merging multiple mapping sets or deriving one mapping set from another. Optional
mapping_cardinality A string indicating whether this mapping is from a 1:1 (the subject_id maps to a single object_id), 1:n (the subject maps to more than one object_id), n:1, 1:0, 0:1 or n:n group. Note that this is a convenience field that should be derivable from the mapping set. Optional
mapping_tool A reference to the tool or algorithm that was used to generate the mapping. Should be a URL pointing to more info about it, but can be free text. Optional
mapping_tool_version Version string that denotes the version of the mapping tool used. Optional
mapping_date The date the mapping was asserted. This is different from the date the mapping was published or compiled in a SSSOM file. Optional
publication_date The date the mapping was published. This is different from the date the mapping was asserted. Optional
confidence A score between 0 and 1 to denote the confidence or probability that the match is correct, where 1 denotes total confidence. Optional
curation_rule A curation rule is a (potentially) complex condition executed by an agent that led to the establishment of a mapping. Curation rules often involve complex domain-specific considerations, which are hard to capture in an automated fashion. The curation rule is captured as a resource rather than a string, which enables higher levels of transparency and sharing across mapping sets. The URI representation of the curation rule is expected to be a resolvable identifier which provides details about the nature of the curation rule. Optional
curation_rule_text A curation rule is a (potentially) complex condition executed by an agent that led to the establishment of a mapping. Curation rules often involve complex domain-specific considerations, which are hard to capture in an automated fashion. The curation rule should be captured as a resource (entity reference) rather than a string (see curation_rule element), which enables higher levels of transparency and sharing across mapping sets. The textual representation of curation rule is intended to be used in cases where (1) the creation of a resource is not practical from the perspective of the mapping_provider and (2) as an additional piece of metadata to augment the curation_rule element with a human readable text. Optional
subject_match_field A list of properties (term annotations on the subject) that was used for the match. Optional
object_match_field A list of properties (term annotations on the object) that was used for the match. Optional
match_string String that is shared by subj/obj. It is recommended to indicate the fields for the match using the object and subject_match_field slots. Optional
subject_preprocessing Method of preprocessing applied to the fields of the subject. If different preprocessing steps were performed on different fields, it is recommended to store the match in separate rows. Optional
object_preprocessing Method of preprocessing applied to the fields of the object. If different preprocessing steps were performed on different fields, it is recommended to store the match in separate rows. Optional
similarity_score A score between 0 and 1 to denote the similarity between two entities, where 1 denotes equivalence, and 0 denotes disjointness. The score is meant to be used in conjunction with the similarity_measure field, to document, for example, the lexical or semantic match of a matching algorithm. Optional
similarity_measure The measure used for computing a similarity score. This field is meant to be used in conjunction with the similarity_score field, to document, for example, the lexical or semantic match of a matching algorithm. To make processing this field as unambiguous as possible, we recommend using wikidata CURIEs, but the type of this field is deliberately unspecified. Optional
see_also A URL specific for the mapping instance. E.g. for kboom we have a per-mapping image that shows surrounding axioms that drive probability. Could also be a github issue URL that discussed a complicated alignment Optional
issue_tracker_item The issue tracker item discussing this mapping. Optional
other Pipe separated list of key value pairs for properties not part of the SSSOM spec. Can be used to encode additional provenance data. NOTE. This field is not recommended for general use, and should be used sparingly. See https://github.com/mapping-commons/sssom/blob/master/examples/schema/extension-slots.sssom.tsv for an alternative approach based on extension slots. Optional
comment Free text field containing either curator notes or text generated by tool providing additional informative information. Optional

Mappings set metadata elements

MappingSet: Represents a set of mappings

Column/Field Description Required
curie_map A dictionary that contains prefixes as keys and their URI expansions as values. Optional
mappings Contains a list of mapping objects. Recommended
mapping_set_id A globally unique identifier for the mapping set (not each individual mapping). Should be IRI, ideally resolvable. Required
mapping_set_version A version string for the mapping. Optional
mapping_set_source A mapping set or set of mapping set that was used to derive the mapping set. Optional
mapping_set_title The display name of a mapping set. Optional
mapping_set_description A description of the mapping set. Optional
creator_id Identifies the persons or groups responsible for the creation of the mapping. The creator is the agent that put the mapping in its published form, which may be different from the author, which is a person that was actively involved in the assertion of the mapping. Recommended to be a list of ORCIDs or otherwise identifying URIs. Optional
creator_label A string identifying the creator of this mapping. In the spirit of provenance, consider using creator_id instead. Optional
license A url to the license of the mapping. In absence of a license we assume no license. Optional
subject_type The type of entity that is being mapped. Optional
subject_source URI of vocabulary or identifier source for the subject. Optional
subject_source_version Version IRI or version string of the source of the subject term. Optional
object_type The type of entity that is being mapped. Optional
object_source URI of vocabulary or identifier source for the object. Optional
object_source_version Version IRI or version string of the source of the object term. Optional
predicate_type The type of the predicate used to map the subject and object entities. Optional
mapping_provider URL pointing to the source that provided the mapping, for example an ontology that already contains the mappings, or a database from which it was derived. Optional
mapping_tool A reference to the tool or algorithm that was used to generate the mapping. Should be a URL pointing to more info about it, but can be free text. Optional
mapping_tool_version Version string that denotes the version of the mapping tool used. Optional
mapping_date The date the mapping was asserted. This is different from the date the mapping was published or compiled in a SSSOM file. Optional
publication_date The date the mapping was published. This is different from the date the mapping was asserted. Optional
subject_match_field A list of properties (term annotations on the subject) that was used for the match. Optional
object_match_field A list of properties (term annotations on the object) that was used for the match. Optional
subject_preprocessing Method of preprocessing applied to the fields of the subject. If different preprocessing steps were performed on different fields, it is recommended to store the match in separate rows. Optional
object_preprocessing Method of preprocessing applied to the fields of the object. If different preprocessing steps were performed on different fields, it is recommended to store the match in separate rows. Optional
see_also A URL specific for the mapping instance. E.g. for kboom we have a per-mapping image that shows surrounding axioms that drive probability. Could also be a github issue URL that discussed a complicated alignment Optional
issue_tracker A URL location of the issue tracker for this entity. Optional
other Pipe separated list of key value pairs for properties not part of the SSSOM spec. Can be used to encode additional provenance data. NOTE. This field is not recommended for general use, and should be used sparingly. See https://github.com/mapping-commons/sssom/blob/master/examples/schema/extension-slots.sssom.tsv for an alternative approach based on extension slots. Optional
comment Free text field containing either curator notes or text generated by tool providing additional informative information. Optional
extension_definitions A list that defines the extension slots used in the mapping set. Optional

Index (all classes, enums and elements)

Columns/Slots/Fields

Slot Description
author_id Identifies the persons or groups responsible for asserting the mappings
author_label A string identifying the author of this mapping
comment Free text field containing either curator notes or text generated by tool pro...
confidence A score between 0 and 1 to denote the confidence or probability that the matc...
creator_id Identifies the persons or groups responsible for the creation of the mapping
creator_label A string identifying the creator of this mapping
curation_rule A curation rule is a (potentially) complex condition executed by an agent tha...
curation_rule_text A curation rule is a (potentially) complex condition executed by an agent tha...
curie_map A dictionary that contains prefixes as keys and their URI expansions as value...
documentation A URL to the documentation of this mapping commons
extension_definitions A list that defines the extension slots used in the mapping set
homepage A URL to a homepage of this mapping commons
imports A list of registries that should be imported into this one
issue_tracker A URL location of the issue tracker for this entity
issue_tracker_item The issue tracker item discussing this mapping
last_updated The date this reference was last updated
license A url to the license of the mapping
local_name The local name assigned to file that corresponds to the downloaded mapping se...
mapping_cardinality A string indicating whether this mapping is from a 1:1 (the subject_id maps t...
mapping_date The date the mapping was asserted
mapping_justification A mapping justification is an action (or the written representation of that a...
mapping_provider URL pointing to the source that provided the mapping, for example an ontology...
mapping_registry_description The description of a mapping registry
mapping_registry_id The unique identifier of a mapping registry
mapping_registry_title The title of a mapping registry
mapping_set_description A description of the mapping set
mapping_set_group Set by the owners of the mapping registry
mapping_set_id A globally unique identifier for the mapping set (not each individual mapping...
mapping_set_references A list of mapping set references
mapping_set_source A mapping set or set of mapping set that was used to derive the mapping set
mapping_set_title The display name of a mapping set
mapping_set_version A version string for the mapping
mapping_source The mapping set this mapping was originally defined in
mapping_tool A reference to the tool or algorithm that was used to generate the mapping
mapping_tool_version Version string that denotes the version of the mapping tool used
mappings Contains a list of mapping objects
match_string String that is shared by subj/obj
mirror_from A URL location from which to obtain a resource, such as a mapping set
object_category The conceptual category to which the subject belongs to
object_id The ID of the object of the mapping
object_label The label of object of the mapping
object_match_field A list of properties (term annotations on the object) that was used for the m...
object_preprocessing Method of preprocessing applied to the fields of the object
object_source URI of vocabulary or identifier source for the object
object_source_version Version IRI or version string of the source of the object term
object_type The type of entity that is being mapped
other Pipe separated list of key value pairs for properties not part of the SSSOM s...
predicate_id The ID of the predicate or relation that relates the subject and object of th...
predicate_label The label of the predicate/relation of the mapping
predicate_modifier A modifier for negating the predicate
predicate_type The type of the predicate used to map the subject and object entities
prefix_name
prefix_url
propagated Indicates whether a slot can be propagated from a mapping down to individual ...
property The property associated with the extension slot
publication_date The date the mapping was published
registry_confidence This value is set by the registry that indexes the mapping set
reviewer_id Identifies the persons or groups that reviewed and confirmed the mapping
reviewer_label A string identifying the reviewer of this mapping
see_also A URL specific for the mapping instance
similarity_measure The measure used for computing a similarity score
similarity_score A score between 0 and 1 to denote the similarity between two entities, where ...
slot_name The name of the extension slot
subject_category The conceptual category to which the subject belongs to
subject_id The ID of the subject of the mapping
subject_label The label of subject of the mapping
subject_match_field A list of properties (term annotations on the subject) that was used for the ...
subject_preprocessing Method of preprocessing applied to the fields of the subject
subject_source URI of vocabulary or identifier source for the subject
subject_source_version Version IRI or version string of the source of the subject term
subject_type The type of entity that is being mapped
type_hint Expected type of the values of the extension slot

Classes

Class Description
ExtensionDefinition A definition of an extension (non-standard) slot.
Mapping Represents an individual mapping between a pair of entities.
MappingRegistry A registry for managing mapping sets. It holds a set of mapping set references, and can import other registries.
MappingSet Represents a set of mappings.
MappingSetReference A reference to a mapping set. It allows to augment mapping set metadata from the perspective of the registry, for example, providing confidence, or a local filename or a grouping.
NoTermFound sssom:NoTermFound can be used in place of a subject_id or object_id when the corresponding entity could not be found. It SHOULD be used in conjuction with a corresponding subject_source or object_source to signify where the term was not found.
Prefix None
Propagatable Metamodel extension class to describe slots whose value can be propagated down from the MappingSet class to the Mapping class.

Enumerations

Enumeration Description
EntityTypeEnum
MappingCardinalityEnum
PredicateModifierEnum

Types

Type Description
Boolean A binary (true or false) value
Curie a compact URI
Date a date (year, month and day) in an idealized calendar
DateOrDatetime Either a date or a datetime
Datetime The combination of a date and time
Decimal A real number with arbitrary precision that conforms to the xsd:decimal speci...
Double A real number that conforms to the xsd:double specification
EntityReference A reference to an entity involved in the mapping
Float A real number that conforms to the xsd:float specification
Integer An integer
Jsonpath A string encoding a JSON Path
Jsonpointer A string encoding a JSON Pointer
Ncname Prefix part of CURIE
Nodeidentifier A URI, CURIE or BNODE that represents a node in a model
Objectidentifier A URI or CURIE that represents an object in the model
Sparqlpath A string encoding a SPARQL Property Path
String A character string
Time A time object represents a (local) time of day, independent of any particular...
Uri a complete URI
Uriorcurie a URI or a CURIE