Skip to content

The SSSOM data model

The SSSOM data model (hereafter “the model”) defines the data structure to represent and manipulate SSSOM concepts. The model is formally described as a LinkML schema, from which the documentation is derived.

This section provides an overview of the model and supplementary informations that may not be found in the schema (and its derived documentation) itself. Of note, the schema, not this section, is always the authoritative source of truth for all questions pertaining to the model.

Overview

The model consists in a handful of classes, the most important of them being the Mapping class and the MappingSet class. Any SSSOM implementation MUST support those two classes and all their slots; support for the other classes is OPTIONAL.

The Mapping class represents an individual mapping. Fundamental slots in that class are:

  • subject_id and object_id, referring to the entities being mapped to each other;
  • predicate_id, referring to the relationship between the mapped entities;
  • mapping_justification, which should provide the justification for the mapping.

Those slots are mandatory (including the mapping_justification slot: the SSSOM standard posits that there can be no mapping without some form of justification) and an implementation MUST NOT allow the creation of a mapping object that does not have a value for any one of them.

Other slots are intended to provide further details about a mapping. Those “further details” are sometimes referred to as “mapping metadata”, though the SSSOM standard makes no formal distinction between “data” and “metadata” – there are only “data about a mapping”.

The MappingSet class represents, well, a set of individual mappings, which are contained in the mappings slot (a list of Mapping instances). Other slots in that class are intended either to provide further details about the set itself (sometimes referred to as “mapping set metadata”, with the same caveat as above regarding the data/metadata distinction), or to provide common details for all the mappings in the set (see the Propagation of mapping set slots section further below for details).

Of note, within a set, a mapping may not necessarily be uniquely identified by the combination of its four mandatory slots (subject_id, predicate_id, object_id, and mapping_justification). A set may very well contain several mappings with the same subject, predicate, object, and justification, but that differ on some of the other, complementary slots.

Identifiers

Throughout the model, identifiers to external resources are represented using the custom type EntityReference (based on the LinkML type uriorcurie), which accepts both full-length IRIs and CURIEs as possible identifier formats. (Note however that serialisation formats may mandate the use of one identifier format over the other; for example, the SSSOM/TSV format requires the systematic use of CURIEs, whereas the OWL/RDF format conversely requires the systematic use of IRIs).

Whenever the CURIE syntax is used in a mapping set (whether this is by choice of the SSSOM producer, or because it is mandated by the serialisation format), all CURIEs MUST be unambiguously resolvable into corresponding full-length IRIs without requiring any external resources. This means that any prefix name used MUST be properly declared in the set’s curie_map slot, which is a dictionary associating a prefix name to an IRI prefix.

By exception, prefix names listed in the table found in the IRI prefixes section are considered “built-in”. As such, they MAY be omitted from the curie_map. If they are not omitted, they MUST point to the same IRI prefixes as in the aforementioned table.

Propagation of mapping set slots

As mentioned briefly above, there are two different types of slots in the MappingSet class:

  • slots that provide informations about the set itself;
  • slots that provide informations about all the mappings in the set.

The latter are called “propagatable slots”. In the LinkML model, they are marked with a propagated annotation whose value is set to true.

For convenience, here is the current list of propagatable slots:

  • mapping_date,
  • mapping_provider,
  • mapping_tool,
  • mapping_tool_version,
  • object_match_field,
  • object_preprocessing,
  • object_source,
  • object_source_version,
  • object_type,
  • subject_match_field,
  • subject_preprocessing,
  • subject_source,
  • subject_source_version,
  • subject_type,
  • predicate_type.

When a mapping set object has a value in one of its propagatable slots, this MUST be interpreted as if all mappings within the set had that same value in their corresponding slot. For example, if a set has the value foo in its mapping_tool slot, all the mappings in that set MUST be treated as if they had the value foo in their mapping_tool slot.

This mechanism is intended as a convenience, so that a slot which has the same value for all mappings in a set can be specified only once at the level of the set rather than for each individual mapping.

Slots that are not in the above list (“non-propagatable slots”) describe the mapping set itself, not the mappings it contains, even if the slot also exists on the Mapping class. For example, the creator_id slot, when used in the MappingSet class, is intended to refer to the creators of the set, not the creators of the individual mappings (which may be different, and which are listed in the creator_id slot of every mapping).

Allowed and common mapping predicates

Implementations MUST accept any arbitrary predicate in the predicate_id slot.

The following mapping predicates are considered common, and implementations MAY encourage users to use them:

Predicate Description
owl:sameAs The subject and the object are instances (OWL individuals), and the two instances are the same.
owl:equivalentClass The subject and the object are OWL classes, and the two classes are the same.
owl:equivalentProperty The subject and the object are OWL object, data, or annotation properties, and the two properties are the same.
rdfs:subClassOf The subject and the object are OWL classes, and the subject is a subclass of the object.
rdfs:subPropertyOf The subject and the object are OWL object, data, or annotation properties, and the subject is a subproperty of the object.
skos:relatedMatch The subject and the object are associated in some unspecified way.
skos:closeMatch The subject and the object are sufficiently similar that they can be used interchangeably in some information retrieval applications.
skos:exactMatch The subject and the object can, with a high degree of confidence, be used interchangeably across a wide range of information retrieval applications.
skos:narrowMatch The object is a narrower concept than the subject.
skos:broadMatch The object is a broader concept than the subject.
oboInOwl:hasDbXref Two terms are related in some way. The meaning is frequently consistent across a single set of mappings. Note this property is often overloaded even where the terms are of a different nature (e.g. interpro2go).
rdfs:seeAlso The subject and the object are associated in some unspecified way. The object IRI often resolves to a resource on the web that provides additional information.

In addition, predicates from the following sources MAY also be encouraged:

Literal mappings

The SSSOM model is primarily intended to represent mappings between semantic entities. However, it may also be used to represent mappings where at least one side is a literal string that does not have an identifier of its own. Any such mapping is henceforth called a literal mapping.

To represent a mapping whose subject (resp. object) is a literal:

  • the subject_type (resp. object_type) slot MUST be set to rdfs literal;
  • the subject_label (resp. object_label) slot MUST be set to the literal itself;
  • the subject_id (resp. object_id) slot MAY be left empty.

The last point is an exception to the normal rules about required slots, which state that a mapping must always have a subject_id and an object_id. Implementations MUST accept a mapping without a subject_id (resp. object_id) if and only if the subject_type (resp. object_type) slot is set to rdfs literal.

All other slots in the Mapping class may be used normally in a literal mapping, with the same meaning as for a non-literal mapping.

When computing the cardinality of mappings in a set (e.g. to set the value of the mapping_cardinality slot), if the mapping has a literal subject (resp. object), then the subject_label (resp. object_label) slot must be used for determining the number of occurrences of the subject (resp. object) in the set.

Representing unmapped entities

The special value sssom:NoTermFound MAY be used as the object_id of a mapping to explicitly state that the subject of said mapping cannot be mapped to any entity in the domain represented by the object_source slot.

Likewise, the sssom:NoTermFound value MAY be used as the subject_id of a mapping to state that the object of said mapping cannot be mapped to any entity in the domain represented by the subject_source slot.

When that special value is used as the subject_id (respectively object_id), the subject_source (respectively object_source) slot SHOULD be defined.

The sssom:NoTermFound value MUST NOT be used in any other slot than subject_id or object_id.

The meaning of the NOT predicate modifier in a mapping that refers to sssom:NoTermFound is unspecified.

When computing cardinality values (to fill the mapping_cardinality slot), mappings that refer to sssom:NoTermFound MUST be ignored.

Non-standard slots

Implementations are only REQUIRED to support the standard metadata slots defined in the SSSOM LinkML schema.

However, implementations MAY support the use of supplementary, non-standard slots (hereafter called extension slots or simply extensions). There are two types of extension slots: defined extension slots and undefined extension slots.

Defined extensions

Defined extensions are non-standard slots that are explicitly declared (or, defined) before being used. Implementations SHOULD support the use of defined extensions.

Extensions are defined in the extension_definition slot of the MappingSet object. Each definition is comprised of three elements:

  • the name of the slot, as it will appear when used in a mapping set (slot_name);
  • a property intended to specify the meaning of the slot (property);
  • the type of values expected by the slot (type_hint).

A definition MUST have at least a slot_name. The name MUST be a XML “non-colonized name” (“NCName”, see Namespaces in XML, §2). The name MUST NOT match the name of an existing standard slot.

To avoid any conflicy with a future version of the SSSOM specification (which could introduce new standard slot names), implementations are strongly encouraged to craft extension slot names that start with the ext_ prefix. No new standard slot with a name starting with ext_ will ever be introduced in any future version of the standard. (This is an advice for SSSOM producers only; SSSOM consumers MUST NOT reject an extension slot solely on the basis that its name does not start with ext.)

A definition SHOULD have a property. If it does not, implementations MUST automatically construct a default property by concatenating the prefix http://sssom.invalid/ with the name of the extension.

The slot name and the property MUST be unique to each definition. No two definitions can share the same name and/or the same property.

A definition MAY have a type_hint. If it does not, a default type of http://www.w3.org/2001/XMLSchema#string is assumed.

Once defined, an extension slot may be used as a supplementary slot in either the Mapping class or the MappingSet class (or both), as if it was a normal, standard slot. How those slots are represented internally and provided to client code is left at the discretion of the implementations.

Undefined extensions

Undefined extensions are non-standard slots that are not explicitly defined as described in the previous section. Implementations MAY support undefined extensions.

Upon encountering a non-standard slot that is not a defined extension, an implementation that supports undefined extensions MUST behave as if the slot had been defined with:

  • a property constructed by catenating the prefix http://sssom.invalid/ to the name of the slot;
  • a type_hint of http://www.w3.org/2001/XMLSchema#string.

Restrictions on the values of extension slots

General restrictions

The following restrictions apply to all extension slots, regardless of whether they are defined or undefined.

Each mapping set and each mapping can have at most one value for each extension slot. The expected behaviour upon encountering a repeated extension slot is unspecified.

An extension value MUST be either a string or an instance of a simple data type such as a numerical value (integer or floating point), a boolean value, or a date or datetime value. In particular, composite data structures (e.g. lists or dictionaries) MUST NOT be used as extension values.

It is always possible to use arbitrarily complex values by encoding them as literal strings. However, how complex values would be encoded is out of scope of this specification; implementations MUST treat such values as opaque strings.

Further restrictions for typed defined extensions

If a defined extension slot has a type_hint other than http://www.w3.org/2001/XMLSchema#string, implementations MAY enforce further constraints on extension values based on the type hint, according to the following table:

Type hint Constraints
http://www.w3.org/2001/XMLSchema#integer Implementations MAY check that the value is an integer
http://www.w3.org/2001/XMLSchema#double Implementations MAY check that the value is a floating number
http://www.w3.org/2001/XMLSchema#boolean Implementations MAY check that the value is either true or false
http://www.w3.org/2001/XMLSchema#date Implementations MAY check that the value is a date in the ISO 8601 format (yyyy-mm-dd)
http://www.w3.org/2001/XMLSchema#datetime Implementations MAY check that the value is a date and time value in the ISO 8601 format (yyyy-mm-ddThh:mm:ssTZ)

Implementations MAY decide to recognise more types and to enforce type-specific constraints. For example, an implementation could recognise the type http://www.w3.org/2001/XMLSchema#negativeInteger and check that the value starts with a minus sign.