SSSOM Official Data Model Documentation
Datamodel for Simple Standard for Sharing Ontological Mappings (SSSOM)
Schema PURL: https://w3id.org/sssom/schema/
Introduction
While the SSSOM model is quite general and mappings can be shared in different formats, the most common format is the SSSOM/TSV format. Here is a tabular representation of some example mappings for illustration purposes:
subject_id | subject_label | predicate_id | object_id | object_label | mapping_justification | author_id | confidence | comment |
---|---|---|---|---|---|---|---|---|
KF_FOOD:F001 | apple | skos:exactMatch | FOODON:00002473 | apple (whole) | semapv:ManualMappingCuration | orcid:0000-0002-7356-1779 | 0.95 | "We could map to FOODON:03310788 instead to cover sliced apples, but only 'whole' apple types exist." |
KF_FOOD:F002 | gala | skos:exactMatch | FOODON:00003348 | Gala apple (whole) | semapv:ManualMappingCuration | orcid:0000-0002-7356-1779 | 1.0 | |
KF_FOOD:F003 | pink | skos:exactMatch | FOODON:00004186 | Pink apple (whole) | semapv:ManualMappingCuration | orcid:0000-0002-7356-1779 | 0.9 | "We could map to FOODON:00004187 instead which more specifically refers to 'raw' Pink apples. Decided against to be consistent with other mapping choices." |
KF_FOOD:F004 | braeburn | skos:broadMatch | FOODON:00002473 | apple (whole) | semapv:ManualMappingCuration | orcid:0000-0002-7356-1779 | 1.0 |
In the TSV format, mapping set metadata is included at the top of the file, before the mappings themselves, in yaml-like key-value pairs:
Example header (YAML format)
curie_map: FOODON: http://purl.obolibrary.org/obo/FOODON_ KF_FOOD: https://kewl-foodie.inc/food/ orcid: https://orcid.org/ mapping_set_id: https://w3id.org/sssom/tutorial/example1.sssom.tsv mapping_set_description: > Manually curated alignment of KEWL FOODIE INC internal food and nutrition database with Food Ontology (FOODON). Intended to be used for ontological analysis and grouping of KEWL FOODIE INC related data. license: https://creativecommons.org/licenses/by/4.0/ mapping_date: 2022-05-02
See here for concrete examples.
Mapping metadata elements
Mapping: Represents an individual mapping between a pair of entities
Column/Field | Description | Required |
---|---|---|
subject_id | The ID of the subject of the mapping. | Optional |
subject_label | The label of subject of the mapping. | Recommended |
subject_category | The conceptual category to which the subject belongs to. This can be a string denoting the category or a term from a controlled vocabulary. This slot is deliberately underspecified. Conceptual categories can range from those that are found in general upper ontologies such as BFO (e.g. process, temporal region, etc) to those that serve as upper ontologies in specific domains, such as COB or BioLink (e.g. gene, disease, chemical entity). The purpose of this optional field is documentation for human reviewers - when a category is known and documented clearly, the cost of interpreting and evaluating the mapping decreases. | Optional |
predicate_id | The ID of the predicate or relation that relates the subject and object of this match. | Required |
predicate_label | The label of the predicate/relation of the mapping. | Optional |
predicate_modifier | A modifier for negating the predicate. See https://github.com/mapping-commons/sssom/issues/40 for discussion | Optional |
object_id | The ID of the object of the mapping. | Optional |
object_label | The label of object of the mapping. | Recommended |
object_category | The conceptual category to which the subject belongs to. This can be a string denoting the category or a term from a controlled vocabulary. This slot is deliberately underspecified. Conceptual categories can range from those that are found in general upper ontologies such as BFO (e.g. process, temporal region, etc) to those that serve as upper ontologies in specific domains, such as COB or BioLink (e.g. gene, disease, chemical entity). The purpose of this optional field is documentation for human reviewers - when a category is known and documented clearly, the cost of interpreting and evaluating the mapping decreases. | Optional |
mapping_justification | A mapping justification is an action (or the written representation of that action) of showing a mapping to be right or reasonable. | Required |
author_id | Identifies the persons or groups responsible for asserting the mappings. Recommended to be a list of ORCIDs or otherwise identifying URIs. | Optional |
author_label | A string identifying the author of this mapping. In the spirit of provenance, consider using author_id instead. | Optional |
reviewer_id | Identifies the persons or groups that reviewed and confirmed the mapping. Recommended to be a list of ORCIDs or otherwise identifying URIs. | Optional |
reviewer_label | A string identifying the reviewer of this mapping. In the spirit of provenance, consider using reviewer_id instead. | Optional |
creator_id | Identifies the persons or groups responsible for the creation of the mapping. The creator is the agent that put the mapping in its published form, which may be different from the author, which is a person that was actively involved in the assertion of the mapping. Recommended to be a list of ORCIDs or otherwise identifying URIs. | Optional |
creator_label | A string identifying the creator of this mapping. In the spirit of provenance, consider using creator_id instead. | Optional |
license | A url to the license of the mapping. In absence of a license we assume no license. | Optional |
subject_type | The type of entity that is being mapped. | Optional |
subject_source | URI of vocabulary or identifier source for the subject. | Optional |
subject_source_version | Version IRI or version string of the source of the subject term. | Optional |
object_type | The type of entity that is being mapped. | Optional |
object_source | URI of vocabulary or identifier source for the object. | Optional |
object_source_version | Version IRI or version string of the source of the object term. | Optional |
predicate_type | The type of the predicate used to map the subject and object entities. | Optional |
mapping_provider | URL pointing to the source that provided the mapping, for example an ontology that already contains the mappings, or a database from which it was derived. | Optional |
mapping_source | The mapping set this mapping was originally defined in. mapping_source is used for example when merging multiple mapping sets or deriving one mapping set from another. | Optional |
mapping_cardinality | A string indicating whether this mapping is from a 1:1 (the subject_id maps to a single object_id), 1:n (the subject maps to more than one object_id), n:1, 1:0, 0:1 or n:n group. Note that this is a convenience field that should be derivable from the mapping set. | Optional |
mapping_tool | A reference to the tool or algorithm that was used to generate the mapping. Should be a URL pointing to more info about it, but can be free text. | Optional |
mapping_tool_version | Version string that denotes the version of the mapping tool used. | Optional |
mapping_date | The date the mapping was asserted. This is different from the date the mapping was published or compiled in a SSSOM file. | Optional |
publication_date | The date the mapping was published. This is different from the date the mapping was asserted. | Optional |
confidence | A score between 0 and 1 to denote the confidence or probability that the match is correct, where 1 denotes total confidence. | Optional |
curation_rule | A curation rule is a (potentially) complex condition executed by an agent that led to the establishment of a mapping. Curation rules often involve complex domain-specific considerations, which are hard to capture in an automated fashion. The curation rule is captured as a resource rather than a string, which enables higher levels of transparency and sharing across mapping sets. The URI representation of the curation rule is expected to be a resolvable identifier which provides details about the nature of the curation rule. | Optional |
curation_rule_text | A curation rule is a (potentially) complex condition executed by an agent that led to the establishment of a mapping. Curation rules often involve complex domain-specific considerations, which are hard to capture in an automated fashion. The curation rule should be captured as a resource (entity reference) rather than a string (see curation_rule element), which enables higher levels of transparency and sharing across mapping sets. The textual representation of curation rule is intended to be used in cases where (1) the creation of a resource is not practical from the perspective of the mapping_provider and (2) as an additional piece of metadata to augment the curation_rule element with a human readable text. | Optional |
subject_match_field | A list of properties (term annotations on the subject) that was used for the match. | Optional |
object_match_field | A list of properties (term annotations on the object) that was used for the match. | Optional |
match_string | String that is shared by subj/obj. It is recommended to indicate the fields for the match using the object and subject_match_field slots. | Optional |
subject_preprocessing | Method of preprocessing applied to the fields of the subject. If different preprocessing steps were performed on different fields, it is recommended to store the match in separate rows. | Optional |
object_preprocessing | Method of preprocessing applied to the fields of the object. If different preprocessing steps were performed on different fields, it is recommended to store the match in separate rows. | Optional |
similarity_score | A score between 0 and 1 to denote the similarity between two entities, where 1 denotes equivalence, and 0 denotes disjointness. The score is meant to be used in conjunction with the similarity_measure field, to document, for example, the lexical or semantic match of a matching algorithm. | Optional |
similarity_measure | The measure used for computing a similarity score. This field is meant to be used in conjunction with the similarity_score field, to document, for example, the lexical or semantic match of a matching algorithm. To make processing this field as unambiguous as possible, we recommend using wikidata CURIEs, but the type of this field is deliberately unspecified. | Optional |
see_also | A URL specific for the mapping instance. E.g. for kboom we have a per-mapping image that shows surrounding axioms that drive probability. Could also be a github issue URL that discussed a complicated alignment | Optional |
issue_tracker_item | The issue tracker item discussing this mapping. | Optional |
other | Pipe separated list of key value pairs for properties not part of the SSSOM spec. Can be used to encode additional provenance data. NOTE. This field is not recommended for general use, and should be used sparingly. See https://github.com/mapping-commons/sssom/blob/master/examples/schema/extension-slots.sssom.tsv for an alternative approach based on extension slots. | Optional |
comment | Free text field containing either curator notes or text generated by tool providing additional informative information. | Optional |
Mappings set metadata elements
MappingSet: Represents a set of mappings
Column/Field | Description | Required |
---|---|---|
curie_map | A dictionary that contains prefixes as keys and their URI expansions as values. | Optional |
mappings | Contains a list of mapping objects. | Recommended |
mapping_set_id | A globally unique identifier for the mapping set (not each individual mapping). Should be IRI, ideally resolvable. | Required |
mapping_set_version | A version string for the mapping. | Optional |
mapping_set_source | A mapping set or set of mapping set that was used to derive the mapping set. | Optional |
mapping_set_title | The display name of a mapping set. | Optional |
mapping_set_description | A description of the mapping set. | Optional |
creator_id | Identifies the persons or groups responsible for the creation of the mapping. The creator is the agent that put the mapping in its published form, which may be different from the author, which is a person that was actively involved in the assertion of the mapping. Recommended to be a list of ORCIDs or otherwise identifying URIs. | Optional |
creator_label | A string identifying the creator of this mapping. In the spirit of provenance, consider using creator_id instead. | Optional |
license | A url to the license of the mapping. In absence of a license we assume no license. | Optional |
subject_type | The type of entity that is being mapped. | Optional |
subject_source | URI of vocabulary or identifier source for the subject. | Optional |
subject_source_version | Version IRI or version string of the source of the subject term. | Optional |
object_type | The type of entity that is being mapped. | Optional |
object_source | URI of vocabulary or identifier source for the object. | Optional |
object_source_version | Version IRI or version string of the source of the object term. | Optional |
predicate_type | The type of the predicate used to map the subject and object entities. | Optional |
mapping_provider | URL pointing to the source that provided the mapping, for example an ontology that already contains the mappings, or a database from which it was derived. | Optional |
mapping_tool | A reference to the tool or algorithm that was used to generate the mapping. Should be a URL pointing to more info about it, but can be free text. | Optional |
mapping_tool_version | Version string that denotes the version of the mapping tool used. | Optional |
mapping_date | The date the mapping was asserted. This is different from the date the mapping was published or compiled in a SSSOM file. | Optional |
publication_date | The date the mapping was published. This is different from the date the mapping was asserted. | Optional |
subject_match_field | A list of properties (term annotations on the subject) that was used for the match. | Optional |
object_match_field | A list of properties (term annotations on the object) that was used for the match. | Optional |
subject_preprocessing | Method of preprocessing applied to the fields of the subject. If different preprocessing steps were performed on different fields, it is recommended to store the match in separate rows. | Optional |
object_preprocessing | Method of preprocessing applied to the fields of the object. If different preprocessing steps were performed on different fields, it is recommended to store the match in separate rows. | Optional |
see_also | A URL specific for the mapping instance. E.g. for kboom we have a per-mapping image that shows surrounding axioms that drive probability. Could also be a github issue URL that discussed a complicated alignment | Optional |
issue_tracker | A URL location of the issue tracker for this entity. | Optional |
other | Pipe separated list of key value pairs for properties not part of the SSSOM spec. Can be used to encode additional provenance data. NOTE. This field is not recommended for general use, and should be used sparingly. See https://github.com/mapping-commons/sssom/blob/master/examples/schema/extension-slots.sssom.tsv for an alternative approach based on extension slots. | Optional |
comment | Free text field containing either curator notes or text generated by tool providing additional informative information. | Optional |
extension_definitions | A list that defines the extension slots used in the mapping set. | Optional |
Index (all classes, enums and elements)
Columns/Slots/Fields
Slot | Description |
---|---|
author_id | Identifies the persons or groups responsible for asserting the mappings |
author_label | A string identifying the author of this mapping |
comment | Free text field containing either curator notes or text generated by tool pro... |
confidence | A score between 0 and 1 to denote the confidence or probability that the matc... |
creator_id | Identifies the persons or groups responsible for the creation of the mapping |
creator_label | A string identifying the creator of this mapping |
curation_rule | A curation rule is a (potentially) complex condition executed by an agent tha... |
curation_rule_text | A curation rule is a (potentially) complex condition executed by an agent tha... |
curie_map | A dictionary that contains prefixes as keys and their URI expansions as value... |
documentation | A URL to the documentation of this mapping commons |
extension_definitions | A list that defines the extension slots used in the mapping set |
homepage | A URL to a homepage of this mapping commons |
imports | A list of registries that should be imported into this one |
issue_tracker | A URL location of the issue tracker for this entity |
issue_tracker_item | The issue tracker item discussing this mapping |
last_updated | The date this reference was last updated |
license | A url to the license of the mapping |
local_name | The local name assigned to file that corresponds to the downloaded mapping se... |
mapping_cardinality | A string indicating whether this mapping is from a 1:1 (the subject_id maps t... |
mapping_date | The date the mapping was asserted |
mapping_justification | A mapping justification is an action (or the written representation of that a... |
mapping_provider | URL pointing to the source that provided the mapping, for example an ontology... |
mapping_registry_description | The description of a mapping registry |
mapping_registry_id | The unique identifier of a mapping registry |
mapping_registry_title | The title of a mapping registry |
mapping_set_description | A description of the mapping set |
mapping_set_group | Set by the owners of the mapping registry |
mapping_set_id | A globally unique identifier for the mapping set (not each individual mapping... |
mapping_set_references | A list of mapping set references |
mapping_set_source | A mapping set or set of mapping set that was used to derive the mapping set |
mapping_set_title | The display name of a mapping set |
mapping_set_version | A version string for the mapping |
mapping_source | The mapping set this mapping was originally defined in |
mapping_tool | A reference to the tool or algorithm that was used to generate the mapping |
mapping_tool_version | Version string that denotes the version of the mapping tool used |
mappings | Contains a list of mapping objects |
match_string | String that is shared by subj/obj |
mirror_from | A URL location from which to obtain a resource, such as a mapping set |
object_category | The conceptual category to which the subject belongs to |
object_id | The ID of the object of the mapping |
object_label | The label of object of the mapping |
object_match_field | A list of properties (term annotations on the object) that was used for the m... |
object_preprocessing | Method of preprocessing applied to the fields of the object |
object_source | URI of vocabulary or identifier source for the object |
object_source_version | Version IRI or version string of the source of the object term |
object_type | The type of entity that is being mapped |
other | Pipe separated list of key value pairs for properties not part of the SSSOM s... |
predicate_id | The ID of the predicate or relation that relates the subject and object of th... |
predicate_label | The label of the predicate/relation of the mapping |
predicate_modifier | A modifier for negating the predicate |
predicate_type | The type of the predicate used to map the subject and object entities |
prefix_name | |
prefix_url | |
propagated | Indicates whether a slot can be propagated from a mapping down to individual ... |
property | The property associated with the extension slot |
publication_date | The date the mapping was published |
registry_confidence | This value is set by the registry that indexes the mapping set |
reviewer_id | Identifies the persons or groups that reviewed and confirmed the mapping |
reviewer_label | A string identifying the reviewer of this mapping |
see_also | A URL specific for the mapping instance |
similarity_measure | The measure used for computing a similarity score |
similarity_score | A score between 0 and 1 to denote the similarity between two entities, where ... |
slot_name | The name of the extension slot |
subject_category | The conceptual category to which the subject belongs to |
subject_id | The ID of the subject of the mapping |
subject_label | The label of subject of the mapping |
subject_match_field | A list of properties (term annotations on the subject) that was used for the ... |
subject_preprocessing | Method of preprocessing applied to the fields of the subject |
subject_source | URI of vocabulary or identifier source for the subject |
subject_source_version | Version IRI or version string of the source of the subject term |
subject_type | The type of entity that is being mapped |
type_hint | Expected type of the values of the extension slot |
Classes
Class | Description |
---|---|
ExtensionDefinition | A definition of an extension (non-standard) slot. |
Mapping | Represents an individual mapping between a pair of entities. |
MappingRegistry | A registry for managing mapping sets. It holds a set of mapping set references, and can import other registries. |
MappingSet | Represents a set of mappings. |
MappingSetReference | A reference to a mapping set. It allows to augment mapping set metadata from the perspective of the registry, for example, providing confidence, or a local filename or a grouping. |
NoTermFound | sssom:NoTermFound can be used in place of a subject_id or object_id when the corresponding entity could not be found. It SHOULD be used in conjuction with a corresponding subject_source or object_source to signify where the term was not found. |
Prefix | None |
Propagatable | Metamodel extension class to describe slots whose value can be propagated down from the MappingSet class to the Mapping class. |
Enumerations
Enumeration | Description |
---|---|
EntityTypeEnum | |
MappingCardinalityEnum | |
PredicateModifierEnum |
Types
Type | Description |
---|---|
Boolean | A binary (true or false) value |
Curie | a compact URI |
Date | a date (year, month and day) in an idealized calendar |
DateOrDatetime | Either a date or a datetime |
Datetime | The combination of a date and time |
Decimal | A real number with arbitrary precision that conforms to the xsd:decimal speci... |
Double | A real number that conforms to the xsd:double specification |
EntityReference | A reference to an entity involved in the mapping |
Float | A real number that conforms to the xsd:float specification |
Integer | An integer |
Jsonpath | A string encoding a JSON Path |
Jsonpointer | A string encoding a JSON Pointer |
Ncname | Prefix part of CURIE |
Nodeidentifier | A URI, CURIE or BNODE that represents a node in a model |
Objectidentifier | A URI or CURIE that represents an object in the model |
Sparqlpath | A string encoding a SPARQL Property Path |
String | A character string |
Time | A time object represents a (local) time of day, independent of any particular... |
Uri | a complete URI |
Uriorcurie | a URI or a CURIE |