sssom package

Submodules

sssom.cli module

Command line interface for SSSOM.

Why does this file exist, and why not put this in __main__? You might be tempted to import things from __main__ later, but that will cause problems–the code will get executed twice:

  • When you run python3 -m sssom python will execute``__main__.py`` as a script. That means there won’t be any sssom.__main__ in sys.modules.

  • When you import __main__ it will get executed again (as a module) because there’s no sssom.__main__ in sys.modules .

sssom.cli.dynamically_generate_sssom_options(options)[source]

Dynamically generate click options.

Parameters:

options – List of all possible options.

Return type:

Callable[[Any], Any]

Returns:

Click options deduced from user input into parameters.

sssom.cliques module

Utilities for identifying and working with cliques/SCCs in mappings graphs.

sssom.cliques.get_src(src, curie)[source]

Get prefix of subject/object in the MappingSetDataFrame.

Parameters:
  • src (Optional[str]) – Source

  • curie (str) – CURIE

Returns:

Source

sssom.cliques.group_values(d)[source]

Group all keys in the dictionary that share the same value.

Return type:

Dict[str, List[str]]

sssom.cliques.split_into_cliques(msdf)[source]

Split a MappingSetDataFrames documents corresponding to a strongly connected components of the associated graph.

Parameters:

msdf (MappingSetDataFrame) – MappingSetDataFrame object

Raises:
  • TypeError – If Mappings is not of type List

  • TypeError – If each mapping is not of type Mapping

  • TypeError – If Mappings is not of type List

Return type:

List[MappingSetDocument]

Returns:

List of MappingSetDocument objects

sssom.cliques.summarize_cliques(doc)[source]

Summarize stats on a clique doc.

sssom.cliques.to_digraph(msdf)[source]

Convert to a graph where the nodes are entities’ CURIEs and edges are their mappings.

Return type:

DiGraph

sssom.cliquesummary module

sssom.constants module

Constants.

sssom.constants.MetadataType

The type for metadata that gets passed around in many places

alias of Dict[str, Any]

class sssom.constants.SEMAPV(value)[source]

Bases: Enum

SEMAPV Enum containing different mapping_justification.

See also: https://mapping-commons.github.io/semantic-mapping-vocabulary/#matchingprocess

CompositeMatching = 'semapv:CompositeMatching'
CrossSpeciesBroadMatch = 'semapv:crossSpeciesBroadMatch'
CrossSpeciesExactMatch = 'semapv:crossSpeciesExactMatch'
CrossSpeciesNarrowMatch = 'semapv:crossSpeciesNarrowMatch'
LexicalMatching = 'semapv:LexicalMatching'
LexicalSimilarityThresholdMatching = 'semapv:LexicalSimilarityThresholdMatching'
LogicalReasoning = 'semapv:LogicalReasoning'
ManualMappingCuration = 'semapv:ManualMappingCuration'
MappingChaining = 'semapv:MappingChaining'
MappingInversion = 'semapv:MappingInversion'
MappingReview = 'semapv:MappingReview'
SemanticSimilarityThresholdMatching = 'semapv:SemanticSimilarityThresholdMatching'
UnspecifiedMatching = 'semapv:UnspecifiedMatching'
class sssom.constants.SSSOMSchemaView[source]

Bases: object

SchemaView class from linkml which is instantiated when necessary.

Reason for this: https://github.com/mapping-commons/sssom-py/issues/322 Implemented via PR: https://github.com/mapping-commons/sssom-py/pull/323

static __new__(cls)[source]

Create a instance of the SSSOM schema view if non-existent.

property dict: dict

Return SchemaView as a dictionary.

property double_slots: Set[str]

Return the slot names for SSSOMSchemaView object.

property entity_reference_slots: Set[str]

Return set of entity reference slots.

instance = <sssom.constants.SSSOMSchemaView object>
property mapping_enum_keys: Set[str]

Return a set of mapping enum keys.

property mapping_set_slots: List[str]

Return list of mapping set slots.

property mapping_slots: List[str]

Return list of mapping slots.

property multivalued_slots: Set[str]

Return set of multivalued slots.

property slots: Dict[str, str]

Return the slots for SSSOMSchemaView object.

property view: SchemaView

Return SchemaView object.

class sssom.constants.SchemaValidationType(value)[source]

Bases: str, Enum

Schema validation types.

JsonSchema = 'JsonSchema'
PrefixMapCompleteness = 'PrefixMapCompleteness'
Shacl = 'Shacl'
Sparql = 'Sparql'
StrictCurieFormat = 'StrictCurieFormat'
__format__(format_spec)

Returns format using actual value type unless __str__ has been overridden.

sssom.constants.generate_mapping_set_id()[source]

Generate a mapping set ID.

Return type:

str

sssom.constants.get_default_metadata()[source]

Get default metadata.

Return type:

Dict[str, Any]

Returns:

A metadata dictionary containing a default license with value DEFAULT_LICENSE and an auto-generated mapping set ID

If you want to combine some metadata you loaded but ensure that there is also default metadata, the best tool is collections.ChainMap. You can do:

my_metadata: dict | None = ...

from collections import ChainMap
from sssom import get_default_metadata

metadata = dict(ChainMap(
    my_metadata or {},
    get_default_metadata()
))

sssom.context module

Utilities for loading JSON-LD contexts.

sssom.context.ConverterHint

A type hint that specifies a place where one of three options can be given: 1. a legacy prefix mapping dictionary can be given, which will get upgraded

into a curies.Converter,

  1. a converter can be given, which might get modified. In SSSOM-py, this typically means chaining behind the “default” prefix map

  2. None, which means a default converter is loaded

alias of None | Mapping[str, str] | Converter

sssom.context.ensure_converter(prefix_map=None, *, use_defaults=True)[source]

Ensure a converter is available.

Parameters:
  • prefix_map (Union[None, Mapping[str, str], Converter]) –

    One of the following:

    1. An empty dictionary or None. This results in using the default extended prefix map (currently based on a variant of the Bioregistry) if use_defaults is set to true, otherwise just the builtin prefix map including the prefixes in SSSOM_BUILT_IN_PREFIXES

    2. A non-empty dictionary representing a prefix map. This is loaded as a converter with Converter.from_prefix_map(). It is chained behind the builtin prefix map to ensure none of the SSSOM_BUILT_IN_PREFIXES are overwritten with non-default values

    3. A pre-instantiated curies.Converter. Similarly to a prefix map passed into this function, this is chained behind the builtin prefix map

  • use_defaults (bool) – If an empty dictionary or None is passed to this function, this parameter chooses if the extended prefix map (currently based on a variant of the Bioregistry) gets loaded.

Return type:

Converter

Returns:

A re-usable converter

sssom.context.get_converter()[source]

Get a converter.

Return type:

Converter

sssom.io module

I/O utilities for SSSOM.

sssom.io.annotate_file(input, output=None, replace_multivalued=False, **kwargs)[source]

Annotate a file i.e. add custom metadata to the mapping set.

Parameters:
  • input (str) – SSSOM tsv file to be queried over.

  • output (Optional[TextIO]) – Output location.

  • replace_multivalued (bool) – Multivalued slots should be replaced or not, defaults to False

  • kwargs – Options provided by user which are added to the metadata (e.g.: –mapping_set_id http://example.org/abcd)

Return type:

MappingSetDataFrame

Returns:

Annotated MappingSetDataFrame object.

sssom.io.convert_file(input_path, output, output_format=None)[source]

Convert a file from one format to another.

Parameters:
  • input_path (str) – The path to the input SSSOM tsv file

  • output (TextIO) – The path to the output file. If none is given, will default to using stdout.

  • output_format (Optional[str]) – The format to which the SSSOM TSV should be converted.

Return type:

None

sssom.io.extract_iris(input, converter)[source]

Recursively extracts a list of IRIs from a string or file.

Parameters:
  • input (Union[str, Path, Iterable[Union[str, Path]]]) – CURIE OR list of CURIEs OR file path containing the same.

  • converter (Converter) – Prefix map of mapping set (possibly) containing custom prefix:IRI combination.

Return type:

List[str]

Returns:

A list of IRIs.

sssom.io.filter_file(input, output=None, **kwargs)[source]

Filter a dataframe by dynamically generating queries based on user input.

e.g. sssom filter –subject_id x:% –subject_id y:% –object_id y:% –object_id z:% tests/data/basic.tsv

yields the query:

“SELECT * FROM df WHERE (subject_id LIKE ‘x:%’ OR subject_id LIKE ‘y:%’)

AND (object_id LIKE ‘y:%’ OR object_id LIKE ‘z:%’) “ and displays the output.

Parameters:
  • input (str) – DataFrame to be queried over.

  • output (Optional[TextIO]) – Output location.

  • kwargs – Filter options provided by user which generate queries (e.g.: –subject_id x:%).

Raises:

ValueError – If parameter provided is invalid.

Return type:

MappingSetDataFrame

Returns:

Filtered MappingSetDataFrame object.

sssom.io.get_metadata_and_prefix_map(metadata_path=None, *, prefix_map_mode=None)[source]

Load metadata and a prefix map in a deprecated way. :rtype: Tuple[Converter, Dict[str, Any]]

Deprecated since version 0.4.3: This functionality for loading SSSOM metadata from a YAML file is deprecated from the public API since it has internal assumptions which are usually not valid for downstream users.

sssom.io.parse_file(input_path, output, *, input_format=None, metadata_path=None, prefix_map_mode=None, clean_prefixes=True, strict_clean_prefixes=True, embedded_mode=True, mapping_predicate_filter=None)[source]

Parse an SSSOM metadata file and write to a table.

Parameters:
  • input_path (str) – The path to the input file in one of the legal formats, eg obographs, aligmentapi-xml

  • output (TextIO) – The path to the output file.

  • input_format (Optional[str]) – The string denoting the input format.

  • metadata_path (Optional[str]) – The path to a file containing the sssom metadata (including prefix_map) to be used during parse.

  • prefix_map_mode (Optional[Literal['metadata_only', 'sssom_default_only', 'merged']]) – Defines whether the prefix map in the metadata should be extended or replaced with the SSSOM default prefix map derived from the bioregistry.

  • clean_prefixes (bool) – If True (default), records with unknown prefixes are removed from the SSSOM file.

  • strict_clean_prefixes (bool) – If True (default), clean_prefixes() will be in strict mode.

Return type:

None

:param embedded_mode:If True (default), the dataframe and metadata are exported in one file (tsv), else two separate files (tsv and yaml). :type mapping_predicate_filter: Optional[tuple] :param mapping_predicate_filter: Optional list of mapping predicates or filepath containing the same.

sssom.io.run_sql_query(query, inputs, output=None)[source]

Run a SQL query over one or more SSSOM files.

Each of the N inputs is assigned a table name df1, df2, …, dfN

Alternatively, the filenames can be used as table names - these are first stemmed E.g. ~/dir/my.sssom.tsv becomes a table called ‘my’

Example:

sssom dosql -Q “SELECT * FROM df1 WHERE confidence>0.5 ORDER BY confidence” my.sssom.tsv

Example:

sssom dosql -Q “SELECT file1.*,file2.object_id AS ext_object_id, file2.object_label AS ext_object_label FROM file1 INNER JOIN file2 WHERE file1.object_id = file2.subject_id” FROM file1.sssom.tsv file2.sssom.tsv

Parameters:
  • query (str) – Query to be executed over a pandas DataFrame (msdf.df).

  • inputs (List[str]) – Input files that form the source tables for query.

  • output (Optional[TextIO]) – Output.

Return type:

MappingSetDataFrame

Returns:

Filtered MappingSetDataFrame object.

sssom.io.split_file(input_path, output_directory)[source]

Split an SSSOM TSV by prefixes and relations.

Parameters:
  • input_path (str) – The path to the input file in one of the legal formats, eg obographs, aligmentapi-xml

  • output_directory (Union[str, Path]) – The directory to which the split file should be exported.

Return type:

None

sssom.io.validate_file(input_path, validation_types)[source]

Validate the incoming SSSOM TSV according to the SSSOM specification.

Parameters:
  • input_path (str) – The path to the input file in one of the legal formats, eg obographs, aligmentapi-xml

  • validation_types (List[SchemaValidationType]) – A list of validation types to run.

Return type:

None

sssom.parsers module

SSSOM parsers.

sssom.parsers.from_alignment_minidom(dom, prefix_map=None, meta=None, mapping_predicates=None)[source]

Read a minidom Document object.

Parameters:
  • dom (Document) – XML (minidom) object

  • prefix_map (Union[None, Mapping[str, str], Converter]) – A prefix map

  • meta (Optional[Dict[str, Any]]) – Optional meta data

  • mapping_predicates (Optional[List[str]]) – Optional list of mapping predicates to extract

Return type:

MappingSetDataFrame

Returns:

MappingSetDocument

Raises:

ValueError – for alignment format: xml element said, but not set to yes. Only XML is supported!

sssom.parsers.from_obographs(jsondoc, prefix_map=None, meta=None, mapping_predicates=None)[source]

Convert a obographs json object to an SSSOM data frame.

Parameters:
  • jsondoc (Dict) – The JSON object representing the ontology in obographs format

  • prefix_map (Union[None, Mapping[str, str], Converter]) – The prefix map to be used

  • meta (Optional[Dict[str, Any]]) – Any additional metadata that needs to be added to the resulting SSSOM data frame, defaults to None

  • mapping_predicates (Optional[List[str]]) – Optional list of mapping predicates to extract

Raises:

Exception – When there is no CURIE

Return type:

MappingSetDataFrame

Returns:

An SSSOM data frame (MappingSetDataFrame)

sssom.parsers.from_sssom_dataframe(df, prefix_map=None, meta=None)[source]

Convert a dataframe to a MappingSetDataFrame.

Parameters:
  • df (DataFrame) – A mappings dataframe

  • prefix_map (Union[None, Mapping[str, str], Converter]) – A prefix map

  • meta (Optional[Dict[str, Any]]) – A metadata dictionary

Return type:

MappingSetDataFrame

Returns:

MappingSetDataFrame

sssom.parsers.from_sssom_json(jsondoc, prefix_map=None, meta=None)[source]

Load a mapping set dataframe from a JSON object.

Parameters:
  • jsondoc (Union[str, dict, TextIO]) – JSON document

  • prefix_map (Union[None, Mapping[str, str], Converter]) – Prefix map

  • meta (Optional[Dict[str, Any]]) – metadata used to augment the metadata existing in the mapping set

Return type:

MappingSetDataFrame

Returns:

MappingSetDataFrame object

sssom.parsers.from_sssom_rdf(g, prefix_map=None, meta=None)[source]

Convert an SSSOM RDF graph into a SSSOM data table.

Parameters:
  • g (Graph) – the Graph (rdflib)

  • prefix_map (Union[None, Mapping[str, str], Converter]) – A dictionary containing the prefix map, defaults to None

  • meta (Optional[Dict[str, Any]]) – Potentially additional metadata, defaults to None

Return type:

MappingSetDataFrame

Returns:

MappingSetDataFrame object

sssom.parsers.get_parsing_function(input_format, filename)[source]

Return appropriate parser function based on input format of file.

Parameters:
  • input_format (Optional[str]) – File format

  • filename (str) – Filename

Raises:

Exception – Unknown file format

Return type:

Callable

Returns:

Appropriate ‘read’ function

sssom.parsers.parse_alignment_xml(file_path, prefix_map=None, meta=None, mapping_predicates=None)[source]

Parse a TSV -> MappingSetDocument -> MappingSetDataFrame.

Return type:

MappingSetDataFrame

sssom.parsers.parse_obographs_json(file_path, prefix_map=None, meta=None, mapping_predicates=None)[source]

Parse an obographs file as a JSON object and translates it into a MappingSetDataFrame.

Parameters:
  • file_path (str) – The path to the obographs file

  • prefix_map (Union[None, Mapping[str, str], Converter]) – an optional prefix map

  • meta (Optional[Dict[str, Any]]) – an optional dictionary of metadata elements

  • mapping_predicates (Optional[List[str]]) – an optional list of mapping predicates that should be extracted

Return type:

MappingSetDataFrame

Returns:

A SSSOM MappingSetDataFrame

sssom.parsers.parse_sssom_json(file_path, prefix_map=None, meta=None, **kwargs)[source]

Parse a TSV to a MappingSetDocument to a MappingSetDataFrame.

Return type:

MappingSetDataFrame

sssom.parsers.parse_sssom_rdf(file_path, prefix_map=None, meta=None, serialisation='turtle', **kwargs)[source]

Parse a TSV to a MappingSetDocument to a MappingSetDataFrame.

Return type:

MappingSetDataFrame

sssom.parsers.parse_sssom_table(file_path, prefix_map=None, meta=None, **kwargs)[source]

Parse a TSV to a MappingSetDocument to a MappingSetDataFrame.

Return type:

MappingSetDataFrame

sssom.parsers.split_dataframe(msdf)[source]

Group the mapping set dataframe into several subdataframes by prefix.

Parameters:

msdf (MappingSetDataFrame) – MappingSetDataFrame object

Raises:

RuntimeError – DataFrame object within MappingSetDataFrame is None

Return type:

Dict[str, MappingSetDataFrame]

Returns:

Mapping object

sssom.parsers.split_dataframe_by_prefix(msdf, subject_prefixes, object_prefixes, relations)[source]

Split a mapping set dataframe by prefix.

Parameters:
  • msdf (MappingSetDataFrame) – An SSSOM MappingSetDataFrame

  • subject_prefixes (Iterable[str]) – a list of prefixes pertaining to the subject

  • object_prefixes (Iterable[str]) – a list of prefixes pertaining to the object

  • relations (Iterable[str]) – a list of relations of interest

Return type:

Dict[str, MappingSetDataFrame]

Returns:

a dict of SSSOM data frame names to MappingSetDataFrame

sssom.parsers.to_mapping_set_document(msdf)[source]

Convert a MappingSetDataFrame to a MappingSetDocument.

Return type:

MappingSetDocument

sssom.rdf_util module

Rewriting functionality for RDFlib graphs.

sssom.rdf_util.rewire_graph(g, mset, subject_to_object=True, precedence=None)[source]

Rewire an RDF Graph replacing using equivalence mappings.

Return type:

int

sssom.sparql_util module

Utilities for querying mappings with SPARQL.

class sssom.sparql_util.EndpointConfig(url, graph, converter, predmap, predicates, limit, include_object_labels=False)[source]

Bases: object

A container for a SPARQL endpoint’s configuration.

converter: Converter
graph: URIRef
include_object_labels: bool = False
limit: Optional[int]
predicates: Optional[List[str]]
predmap: Dict[str, str]
url: str
sssom.sparql_util.query_mappings(config)[source]

Query a SPARQL endpoint to obtain a set of mappings.

Return type:

MappingSetDataFrame

sssom.sssom_document module

Additional SSSOM object models.

class sssom.sssom_document.MappingSetDocument(mapping_set, converter)[source]

Bases: object

Represents a single SSSOM document.

A document is simply a holder for a MappingSet object plus a CURIE map

converter: Converter
mapping_set: MappingSet

a set of mappings plus metadata

Type:

The main part of the document

property prefix_map: Dict[str, str]

Get a prefix map.

sssom.util module

Utility functions.

class sssom.util.EntityPair(subject_entity, object_entity)[source]

Bases: object

A tuple of entities.

Note that (e1,e2) == (e2,e1)

object_entity: Uriorcurie
subject_entity: Uriorcurie
sssom.util.KEY_FEATURES = ['subject_id', 'predicate_id', 'object_id', 'predicate_modifier']

The 4 columns whose combination would be used as primary keys while merging/grouping

class sssom.util.MappingSetDataFrame(df, converter=<factory>, metadata=<factory>)[source]

Bases: object

A collection of mappings represented as a DataFrame, together with additional metadata.

clean_context()[source]

Clean up the context.

Return type:

None

clean_prefix_map(strict=True)[source]

Remove unused prefixes from the internal prefix map based on the internal dataframe.

Parameters:

strict (bool) – Boolean if True, errors out if all prefixes in dataframe are not listed in the ‘curie_map’.

Raises:

ValueError – If prefixes absent in ‘curie_map’ and strict flag = True

Return type:

None

converter: Converter
df: DataFrame
classmethod from_mapping_set(mapping_set, *, converter=None)[source]

Instantiate from a mapping set and an optional converter.

Parameters:
  • mapping_set (MappingSet) – A mapping set

  • converter (Union[None, Mapping[str, str], Converter]) – A prefix map or pre-instantiated converter. If none given, uses a default prefix map derived from the Bioregistry.

Return type:

MappingSetDataFrame

Returns:

A mapping set dataframe

classmethod from_mapping_set_document(doc)[source]

Instantiate from a mapping set document.

Return type:

MappingSetDataFrame

classmethod from_mappings(mappings, *, converter=None, metadata=None)[source]

Instantiate from a list of mappings, mapping set metadata, and an optional converter.

Return type:

MappingSetDataFrame

merge(*msdfs, inplace=True)[source]

Merge two MappingSetDataframes.

Parameters:
  • msdfs (MappingSetDataFrame) – Multiple/Single MappingSetDataFrame(s) to merge with self

  • inplace (bool) – If true, msdf2 is merged into the calling MappingSetDataFrame, if false, it simply return the merged data frame.

Return type:

MappingSetDataFrame

Returns:

Merged MappingSetDataFrame

metadata: Dict[str, Any]
property prefix_map

Get a simple, bijective prefix map.

remove_mappings(msdf)[source]

Remove mappings in right msdf from left msdf.

Parameters:

msdf (MappingSetDataFrame) – MappingSetDataframe object to be removed from primary msdf object.

Return type:

None

standardize_references()[source]

Standardize this MSDF’s dataframe and metadata with respect to its converter.

Return type:

None

to_mapping_set()[source]

Get a mapping set.

Return type:

MappingSet

to_mapping_set_document()[source]

Get a mapping set document.

Return type:

MappingSetDocument

to_mappings()[source]

Get a mapping set.

Return type:

List[Mapping]

classmethod with_converter(converter, df, metadata=None)[source]

Instantiate with a converter instead of a vanilla prefix map.

Return type:

MappingSetDataFrame

class sssom.util.MappingSetDiff(unique_tuples1=None, unique_tuples2=None, common_tuples=None, combined_dataframe=None)[source]

Bases: object

Represents a difference between two mapping sets.

Currently this is limited to diffs at the level of entity-pairs. For example, if file1 has A owl:equivalentClass B, and file2 has A skos:closeMatch B, this is considered a mapping in common.

combined_dataframe: Optional[DataFrame] = None

Dataframe that combines with left and right dataframes with information injected into the comment column

common_tuples: Optional[Set[EntityPair]] = None
unique_tuples1: Optional[Set[EntityPair]] = None
unique_tuples2: Optional[Set[EntityPair]] = None
sssom.util.add_default_confidence(df, confidence=nan)[source]

Add confidence column to DataFrame if absent and initializes to 0.95.

If confidence column already exists, only fill in the None ones by 0.95.

Parameters:

df (DataFrame) – DataFrame whose confidence column needs to be filled.

Return type:

DataFrame

Returns:

DataFrame with a complete confidence column.

sssom.util.are_params_slots(params)[source]

Check if parameters conform to the slots in MAPPING_SET_SLOTS.

Parameters:

params (dict) – Dictionary of parameters.

Raises:

ValueError – If params are not slots.

Return type:

bool

Returns:

True/False

sssom.util.assign_default_confidence(df)[source]

Assign numpy.nan to confidence that are blank.

Parameters:

df (DataFrame) – SSSOM DataFrame

Return type:

Tuple[DataFrame, DataFrame]

Returns:

A Tuple consisting of the original DataFrame and dataframe consisting of empty confidence values.

sssom.util.augment_metadata(msdf, meta, replace_multivalued=False)[source]

Augment metadata with parameters passed.

Parameters:
  • msdf (MappingSetDataFrame) – MappingSetDataFrame (MSDF) object.

  • meta (dict) – Dictionary that needs to be added/updated to the metadata of the MSDF.

  • replace_multivalued (bool) – Multivalued slots should be replaced or not, defaults to False.

Raises:

ValueError – If type of slot is neither str nor list.

Return type:

MappingSetDataFrame

Returns:

MSDF with updated metadata.

sssom.util.collapse(df)[source]

Collapse rows with same S/P/O and combines confidence.

Return type:

DataFrame

sssom.util.compare_dataframes(df1, df2)[source]

Perform a diff between two SSSOM dataframes.

Parameters:
  • df1 (DataFrame) – A mapping dataframe

  • df2 (DataFrame) – A mapping dataframe

Return type:

MappingSetDiff

Returns:

A mapping set diff

Warning

currently does not discriminate between mappings with different predicates

sssom.util.create_entity(identifier, mappings)[source]

Create an Entity object.

Parameters:
  • identifier (str) – Entity Id

  • mappings (Dict[str, Any]) – Mapping dictionary

Return type:

Uriorcurie

Returns:

An Entity object

sssom.util.dataframe_to_ptable(df, *, inverse_factor=None, default_confidence=None)[source]

Export a KBOOM table.

Parameters:
  • df (DataFrame) – Pandas DataFrame

  • inverse_factor (Optional[float]) – Multiplier to (1 - confidence), defaults to 0.5

  • default_confidence (Optional[float]) – Default confidence to be assigned if absent.

Raises:
  • ValueError – Predicate value error

  • ValueError – Predicate type value error

Returns:

List of rows

sssom.util.deal_with_negation(df)[source]

Combine negative and positive rows with matching [SUBJECT_ID, OBJECT_ID, CONFIDENCE] combination.

Rule: negative trumps positive if modulus of confidence values are equal.

Parameters:

df (DataFrame) – Merged Pandas DataFrame

Return type:

DataFrame

Returns:

Pandas DataFrame with negations addressed

Raises:

ValueError – If the dataframe is none after assigning default confidence

sssom.util.filter_out_prefixes(df, filter_prefixes, features=None, require_all_prefixes=False)[source]

Filter out rows which contains a CURIE with a prefix in the filter_prefixes list.

Parameters:
  • df (DataFrame) – Pandas DataFrame of SSSOM Mapping

  • filter_prefixes (List[str]) – List of prefixes

  • features (Optional[list]) – List of dataframe column names dataframe to consider

  • require_all_prefixes (bool) – If True, all prefixes must be present in a row to be filtered out

Return type:

DataFrame

Returns:

Pandas Dataframe

sssom.util.filter_prefixes(df, filter_prefixes, features=None, require_all_prefixes=True)[source]

Filter out rows which do NOT contain a CURIE with a prefix in the filter_prefixes list.

Parameters:
  • df (DataFrame) – Pandas DataFrame of SSSOM Mapping

  • filter_prefixes (List[str]) – List of prefixes

  • features (Optional[list]) – List of dataframe column names dataframe to consider

  • require_all_prefixes (bool) – If True, all prefixes must be present in a row to be filtered out

Return type:

DataFrame

Returns:

Pandas Dataframe

sssom.util.filter_redundant_rows(df, ignore_predicate=False)[source]

Remove rows if there is another row with same S/O and higher confidence.

Parameters:
  • df (DataFrame) – Pandas DataFrame to filter

  • ignore_predicate (bool) – If true, the predicate_id column is ignored, defaults to False

Return type:

DataFrame

Returns:

Filtered pandas DataFrame

sssom.util.get_all_prefixes(msdf)[source]

Fetch all prefixes in the MappingSetDataFrame.

Parameters:

msdf (MappingSetDataFrame) – MappingSetDataFrame

Raises:
  • ValidationError – If slot is wrong.

  • ValidationError – If slot is wrong.

Return type:

Set[str]

Returns:

List of all prefixes.

sssom.util.get_dict_from_mapping(map_obj)[source]

Get information for linkml objects (MatchTypeEnum, PredicateModifierEnum) from the Mapping object and return the dictionary form of the object.

Parameters:

map_obj (Union[Any, Dict[Any, Any], Mapping]) – Mapping object

Return type:

dict

Returns:

Dictionary

sssom.util.get_file_extension(file)[source]

Get file extension.

Parameters:

file (Union[str, Path, TextIO]) – File path

Return type:

str

Returns:

format of the file passed, default tsv

sssom.util.get_prefix_from_curie(curie)[source]

Get the prefix from a CURIE.

Return type:

str

sssom.util.get_prefixes_used_in_metadata(meta)[source]

Get a set of prefixes used in CURIEs in the metadata.

Return type:

Set[str]

sssom.util.get_prefixes_used_in_table(df, converter)[source]

Get a list of prefixes used in CURIEs in key feature columns in a dataframe.

Return type:

Set[str]

sssom.util.get_row_based_on_hierarchy(df)[source]

Get row based on hierarchy of predicates.

The hierarchy is as follows: # owl:equivalentClass # owl:equivalentProperty # rdfs:subClassOf # rdfs:subPropertyOf # owl:sameAs # skos:exactMatch # skos:closeMatch # skos:broadMatch # skos:narrowMatch # oboInOwl:hasDbXref # skos:relatedMatch # rdfs:seeAlso

Parameters:

df (DataFrame) – Dataframe containing multiple predicates for same subject and object.

Return type:

DataFrame

Returns:

Dataframe with a single row which ranks higher in the hierarchy.

Raises:

KeyError – if no rows are available

sssom.util.group_mappings(df)[source]

Group mappings by EntityPairs.

Return type:

Dict[EntityPair, List[Series]]

sssom.util.inject_metadata_into_df(msdf)[source]

Inject metadata dictionary key-value pair into DataFrame columns in a MappingSetDataFrame.DataFrame.

Parameters:

msdf (MappingSetDataFrame) – MappingSetDataFrame with metadata separate.

Return type:

MappingSetDataFrame

Returns:

MappingSetDataFrame with metadata as columns

sssom.util.invert_mappings(df, subject_prefix=None, merge_inverted=True, update_justification=True, predicate_invert_dictionary=None)[source]

Switching subject and objects based on their prefixes and adjusting predicates accordingly.

Parameters:
  • df (DataFrame) – Pandas dataframe.

  • subject_prefix (Optional[str]) – Prefix of subjects desired.

  • merge_inverted (bool) – If True (default), add inverted dataframe to input else, just return inverted data.

  • update_justification (bool) – If True (default), the justification is updated to “sempav:MappingInversion”, else it is left as it is.

  • predicate_invert_dictionary (Optional[dict]) – YAML file providing the inverse mapping for predicates.

Return type:

DataFrame

Returns:

Pandas dataframe with all subject IDs having the same prefix.

sssom.util.is_multivalued_slot(slot)[source]

Check whether the slot is multivalued according to the SSSOM specification.

Parameters:

slot (str) – Slot name

Return type:

bool

Returns:

Slot is multivalued or no

sssom.util.merge_msdf(*msdfs, reconcile=False)[source]

Merge multiple MappingSetDataFrames into one.

Parameters:
  • msdfs (MappingSetDataFrame) – A Tuple of MappingSetDataFrames to be merged

  • reconcile (bool) – If reconcile=True, then dedupe(remove redundant lower confidence mappings) and reconcile (if msdf contains a higher confidence _negative_ mapping, then remove lower confidence positive one. If confidence is the same, prefer HumanCurated. If both HumanCurated, prefer negative mapping). Defaults to True.

Return type:

MappingSetDataFrame

Returns:

Merged MappingSetDataFrame.

sssom.util.pandas_set_no_silent_downcasting(no_silent_downcasting=True)[source]

Set pandas future.no_silent_downcasting option. Context https://github.com/pandas-dev/pandas/issues/57734.

sssom.util.parse(filename)[source]

Parse a TSV to a pandas frame.

Return type:

DataFrame

sssom.util.raise_for_bad_path(file_path)[source]

Raise exception if file path is invalid.

Parameters:

file_path (Union[str, Path]) – File path

Raises:

FileNotFoundError – Invalid file path

Return type:

None

sssom.util.reconcile_prefix_and_data(msdf, prefix_reconciliation)[source]

Reconciles prefix_map and translates CURIE switch in dataframe.

Parameters:
  • msdf (MappingSetDataFrame) – Mapping Set DataFrame.

  • prefix_reconciliation (dict) – Prefix reconcilation dictionary from a YAML file

Return type:

MappingSetDataFrame

Returns:

Mapping Set DataFrame with reconciled prefix_map and data.

This method is build on curies.remap_curie_prefixes() and curies.rewire(). Note that if you want to overwrite a CURIE prefix in the Bioregistry extended prefix map, you need to provide a place for the old one to go as in {"geo": "ncbi.geo", "geogeo": "geo"}. Just doing {"geogeo": "geo"} would not work since geo already exists.

sssom.util.remove_unmatched(df)[source]

Remove rows where no match is found.

TODO: https://github.com/OBOFoundry/SSSOM/issues/28 :type df: DataFrame :param df: Pandas DataFrame :rtype: DataFrame :return: Pandas DataFrame with ‘PREDICATE_ID’ not ‘noMatch’.

sssom.util.safe_compress(uri, converter)[source]

Parse a CURIE from an IRI.

Parameters:
  • uri (str) – The URI to parse. If this is already a CURIE, return directly.

  • converter (Converter) – Converter used for compression

Return type:

str

Returns:

A CURIE

sssom.util.sort_df_rows_columns(df, by_columns=True, by_rows=True)[source]

Canonical sorting of DataFrame columns.

Parameters:
  • df (DataFrame) – Pandas DataFrame with random column sequence.

  • by_columns (bool) – Boolean flag to sort columns canonically.

  • by_rows (bool) – Boolean flag to sort rows by column #1 (ascending order).

Return type:

DataFrame

Returns:

Pandas DataFrame columns sorted canonically.

sssom.util.sort_sssom(df)[source]

Sort SSSOM by columns.

Parameters:

df (DataFrame) – SSSOM DataFrame to be sorted.

Return type:

DataFrame

Returns:

Sorted SSSOM DataFrame

sssom.util.to_mapping_set_dataframe(doc)[source]

Convert MappingSetDocument into MappingSetDataFrame.

Parameters:

doc (MappingSetDocument) – MappingSetDocument object

Return type:

MappingSetDataFrame

Returns:

MappingSetDataFrame object

sssom.validators module

Validators.

sssom.validators.check_all_prefixes_in_curie_map(msdf, fail_on_error=True)[source]

Check all EntityReference slots are mentioned in ‘curie_map’.

Parameters:
  • msdf (MappingSetDataFrame) – MappingSetDataFrame

  • fail_on_error (bool) – if true, the function will throw an ValidationError exception when there are errors

Raises:

ValidationError – If all prefixes not in curie_map.

Return type:

None

sssom.validators.check_strict_curie_format(msdf, fail_on_error=True)[source]

Check all EntityReference slots are formatted as unambiguous curies.

Implemented rules:
  • CURIE does not contain pipe “|” character to ensure that multivalued processing of in TSV works correctly.

Parameters:
  • msdf (MappingSetDataFrame) – MappingSetDataFrame

  • fail_on_error (bool) – if true, the function will throw an ValidationError exception when there are errors

Raises:

ValidationError – If any entity reference does not follow the strict CURIE format

Return type:

None

sssom.validators.print_linkml_report(report, fail_on_error=True)[source]

Print the error messages in the report. Optionally throw exception.

Parameters:
  • report (ValidationReport) – A LinkML validation report

  • fail_on_error (bool) – if true, the function will throw an ValidationError exception when there are errors

sssom.validators.validate(msdf, validation_types, fail_on_error=True)[source]

Validate SSSOM files against sssom-schema using linkML’s validator function.

Parameters:
  • msdf (MappingSetDataFrame) – MappingSetDataFrame.

  • validation_types (List[SchemaValidationType]) – SchemaValidationType

  • fail_on_error (bool) – If true, throw an error when execution of a method has failed

Return type:

None

sssom.validators.validate_json_schema(msdf, fail_on_error=True)[source]

Validate JSON Schema using linkml’s JsonSchemaDataValidator.

Parameters:
  • msdf (MappingSetDataFrame) – MappingSetDataFrame to eb validated.

  • fail_on_error (bool) – if true, the function will throw an ValidationError exception when there are errors

Return type:

None

sssom.validators.validate_shacl(msdf, fail_on_error=True)[source]

Validate SCHACL file.

Parameters:
Raises:

NotImplementedError – Not yet implemented.

Return type:

None

sssom.validators.validate_sparql(msdf, fail_on_error=True)[source]

Validate SPARQL file.

Parameters:
  • msdf (MappingSetDataFrame) – MappingSetDataFrame

  • fail_on_error (bool) – if true, the function will throw an ValidationError exception when there are errors

Raises:

NotImplementedError – Not yet implemented.

Return type:

None

sssom.writers module

Serialization functions for SSSOM.

sssom.writers.get_writer_function(*, output_format=None, output)[source]

Get appropriate writer function based on file format.

Parameters:
  • output (TextIO) – Output file

  • output_format (Optional[str]) – Output file format, defaults to None

Raises:

ValueError – Unknown output format

Return type:

Tuple[Callable[[MappingSetDataFrame, TextIO], None], str]

Returns:

Type of writer function

sssom.writers.to_fhir_json(msdf)[source]

Convert a mapping set dataframe to a JSON object.

Parameters:

msdf (MappingSetDataFrame) – MappingSetDataFrame: Collection of mappings represented as DataFrame, together w/ additional metadata.

Return type:

Dict

Returns:

Dict: A Dictionary serializable as JSON.

Resources:
  • ConceptMap::SSSOM mapping spreadsheet:

https://docs.google.com/spreadsheets/d/1J19foBAYO8PCHwOfksaIGjNu-q5ILUKFh2HpOCgYle0/edit#gid=1389897118

TODO: add to CLI & to these functions: r4 vs r5 param TODO: What if the msdf doesn’t have everything we need? (i) metadata, e.g. yml, (ii) what if we need to override?

  • todo: later: allow any nested arbitrary override: (get in kwargs, else metadata.get(key, None))

Minor todos todo: mapping_justification: consider ValueString -> ValueCoding https://github.com/timsbiomed/issues/issues/152 todo: when/how to conform to R5 instead of R4?: https://build.fhir.org/conceptmap.html

sssom.writers.to_json(msdf)[source]

Convert a mapping set dataframe to a JSON object.

Return type:

JsonObj

sssom.writers.to_ontoportal_json(msdf)[source]

Convert a mapping set dataframe to a list of ontoportal mapping JSON objects.

Return type:

List[Dict]

sssom.writers.to_owl_graph(msdf)[source]

Convert a mapping set dataframe to OWL in an RDF graph.

Return type:

Graph

sssom.writers.to_rdf_graph(msdf)[source]

Convert a mapping set dataframe to an RDF graph.

Return type:

Graph

sssom.writers.write_fhir_json(msdf, output, serialisation='fhir_json')[source]

Write a mapping set dataframe to the file as FHIR ConceptMap JSON. :rtype: None

Deprecated since version 0.4.7: Use write_json() instead

sssom.writers.write_json(msdf, output, serialisation='json')[source]

Write a mapping set dataframe to the file as JSON.

Parameters:

serialisation

The JSON format to use. Supported formats are: - fhir_json: Outputs JSON in FHIR ConceptMap format (https://fhir-ru.github.io/conceptmap.html)

Return type:

None

sssom.writers.write_ontoportal_json(msdf, output, serialisation='ontoportal_json')[source]

Write a mapping set dataframe to the file as the ontoportal mapping JSON model. :rtype: None

Deprecated since version 0.4.7: Use write_json() instead

sssom.writers.write_owl(msdf, file, serialisation='turtle')[source]

Write a mapping set dataframe to the file as OWL.

Return type:

None

sssom.writers.write_rdf(msdf, file, serialisation=None)[source]

Write a mapping set dataframe to the file as RDF.

Return type:

None

sssom.writers.write_table(msdf, file, embedded_mode=True, serialisation='tsv', sort=False)[source]

Write a mapping set dataframe to the file as a table.

Return type:

None

sssom.writers.write_tables(sssom_dict, output_dir)[source]

Write table from MappingSetDataFrame object.

Parameters:
  • sssom_dict (Dict[str, MappingSetDataFrame]) – Dictionary of MappingSetDataframes

  • output_dir (Union[str, Path]) – The directory in which the derived SSSOM files are written

Return type:

None

Module contents

sssom-py package.