sssom package

Submodules

sssom.cli module

Command line interface for SSSOM.

Why does this file exist, and why not put this in __main__? You might be tempted to import things from __main__ later, but that will cause problems–the code will get executed twice:

When you run python3 -m sssom python will execute``__main__.py`` as a script. That means there won’t be any sssom.__main__ in sys.modules.
When you import __main__ it will get executed again (as a module) because there’s no sssom.__main__ in sys.modules .

sssom.cliques module

Utilities for identifying and working with cliques/SCCs in mappings graphs.

sssom.cliques.get_src(src, curie)[source]

Get prefix of subject/object in the MappingSetDataFrame.

Parameters:

src (Optional[str]) – Source
curie (str) – CURIE

Returns:

Source

sssom.cliques.group_values(d)[source]

Group all keys in the dictionary that share the same value.

Return type:: Dict[str, List[str]]

sssom.cliques.split_into_cliques(msdf)[source]

Split a MappingSetDataFrames documents corresponding to a strongly connected components of the associated graph.

Parameters:

msdf (MappingSetDataFrame) – MappingSetDataFrame object

Raises:

TypeError – If Mappings is not of type List
TypeError – If each mapping is not of type Mapping
TypeError – If Mappings is not of type List

Return type:

List[MappingSetDocument]

Returns:

List of MappingSetDocument objects

sssom.cliques.summarize_cliques(doc)[source]: Summarize stats on a clique doc.

sssom.cliques.to_digraph(msdf)[source]

Convert to a graph where the nodes are entities’ CURIEs and edges are their mappings.

Return type:: DiGraph

sssom.cliquesummary module

sssom.constants module

Constants.

sssom.constants.MetadataType

The type for metadata that gets passed around in many places

alias of Dict[str, Any]

sssom.constants.PathOrIO

A hint for functions that can take a path or an IO

alias of str | Path | TextIO

class sssom.constants.SEMAPV(value)[source]

Bases: Enum

SEMAPV Enum containing different mapping_justification.

CompositeMatching = 'semapv:CompositeMatching'

CrossSpeciesBroadMatch = 'semapv:crossSpeciesBroadMatch'

CrossSpeciesExactMatch = 'semapv:crossSpeciesExactMatch'

CrossSpeciesNarrowMatch = 'semapv:crossSpeciesNarrowMatch'

LexicalMatching = 'semapv:LexicalMatching'

LexicalSimilarityThresholdMatching = 'semapv:LexicalSimilarityThresholdMatching'

LogicalReasoning = 'semapv:LogicalReasoning'

ManualMappingCuration = 'semapv:ManualMappingCuration'

MappingChaining = 'semapv:MappingChaining'

MappingInversion = 'semapv:MappingInversion'

MappingReview = 'semapv:MappingReview'

SemanticSimilarityThresholdMatching = 'semapv:SemanticSimilarityThresholdMatching'

UnspecifiedMatching = 'semapv:UnspecifiedMatching'

class sssom.constants.SSSOMSchemaView[source]

Bases: object

SchemaView class from linkml which is instantiated when necessary.

Reason for this: https://github.com/mapping-commons/sssom-py/issues/322 Implemented via PR: https://github.com/mapping-commons/sssom-py/pull/323

static __new__(cls)[source]: Create a instance of the SSSOM schema view if non-existent.

property dict: dict: Return SchemaView as a dictionary.

property double_slots: Set[str]: Return the slot names for SSSOMSchemaView object.

property entity_reference_slots: Set[str]: Return set of entity reference slots.

instance = <sssom.constants.SSSOMSchemaView object>

property mapping_enum_keys: Set[str]: Return a set of mapping enum keys.

property mapping_set_slots: List[str]: Return list of mapping set slots.

property mapping_slots: List[str]: Return list of mapping slots.

property multivalued_slots: Set[str]: Return set of multivalued slots.

property slots: Dict[str, str]: Return the slots for SSSOMSchemaView object.

property view: SchemaView: Return SchemaView object.

class sssom.constants.SchemaValidationType(value)[source]

Bases: str, Enum

Schema validation types.

JsonSchema = 'JsonSchema'

PrefixMapCompleteness = 'PrefixMapCompleteness'

Shacl = 'Shacl'

Sparql = 'Sparql'

StrictCurieFormat = 'StrictCurieFormat'

__format__(format_spec): Returns format using actual value type unless __str__ has been overridden.

sssom.constants.generate_mapping_set_id()[source]

Generate a mapping set ID.

Return type:: str

sssom.constants.get_default_metadata()[source]

Get default metadata.

Return type:: Dict[str, Any]
Returns:: A metadata dictionary containing a default license with value DEFAULT_LICENSE and an auto-generated mapping set ID

If you want to combine some metadata you loaded but ensure that there is also default metadata, the best tool is collections.ChainMap. You can do:

my_metadata: dict | None = ...

from collections import ChainMap
from sssom import get_default_metadata

metadata = dict(ChainMap(
    my_metadata or {},
    get_default_metadata()
))

sssom.context module

Utilities for loading JSON-LD contexts.

sssom.context.ConverterHint

A type hint that specifies a place where one of three options can be given: 1. a legacy prefix mapping dictionary can be given, which will get upgraded

into a curies.Converter,

a converter can be given, which might get modified. In SSSOM-py, this typically means chaining behind the “default” prefix map
None, which means a default converter is loaded

alias of None | Mapping[str, str] | Converter

sssom.context.ensure_converter(prefix_map=None, *, use_defaults=True)[source]

Ensure a converter is available.

Parameters:

prefix_map (Union[None, Mapping[str, str], Converter]) –
One of the following:
1. An empty dictionary or None. This results in using the default extended prefix map (currently based on a variant of the Bioregistry) if use_defaults is set to true, otherwise just the builtin prefix map including the prefixes in SSSOM_BUILT_IN_PREFIXES
2. A non-empty dictionary representing a prefix map. This is loaded as a converter with Converter.from_prefix_map(). It is chained behind the builtin prefix map to ensure none of the SSSOM_BUILT_IN_PREFIXES are overwritten with non-default values
3. A pre-instantiated curies.Converter. Similarly to a prefix map passed into this function, this is chained behind the builtin prefix map
use_defaults (bool) – If an empty dictionary or None is passed to this function, this parameter chooses if the extended prefix map (currently based on a variant of the Bioregistry) gets loaded.

Return type:

Converter

Returns:

A re-usable converter

sssom.context.get_converter()[source]

Get a converter.

Return type:: Converter

sssom.io module

I/O utilities for SSSOM.

sssom.io.annotate_file(input, output=None, replace_multivalued=False, **kwargs)[source]

Annotate a file i.e. add custom metadata to the mapping set.

Parameters:

input (str) – SSSOM tsv file to be queried over.
output (Optional[TextIO]) – Output location.
replace_multivalued (bool) – Multivalued slots should be replaced or not, defaults to False
kwargs – Options provided by user which are added to the metadata (e.g.: –mapping_set_id http://example.org/abcd)

Return type:

MappingSetDataFrame

Returns:

Annotated MappingSetDataFrame object.

sssom.io.convert_file(input_path, output, output_format=None)[source]

Convert a file from one format to another.

Parameters:

input_path (str) – The path to the input SSSOM tsv file
output (TextIO) – The path to the output file. If none is given, will default to using stdout.
output_format (Optional[str]) – The format to which the SSSOM TSV should be converted.

Return type:

None

sssom.io.extract_iris(input, converter)[source]

Recursively extracts a list of IRIs from a string or file.

Parameters:

input (Union[str, Path, Iterable[Union[str, Path]]]) – CURIE OR list of CURIEs OR file path containing the same.
converter (Converter) – Prefix map of mapping set (possibly) containing custom prefix:IRI combination.

Return type:

List[str]

Returns:

A list of IRIs.

sssom.io.filter_file(input, output=None, **kwargs)[source]

Filter a dataframe by dynamically generating queries based on user input.

e.g. sssom filter –subject_id x:% –subject_id y:% –object_id y:% –object_id z:% tests/data/basic.tsv

yields the query:

“SELECT * FROM df WHERE (subject_id LIKE ‘x:%’ OR subject_id LIKE ‘y:%’): AND (object_id LIKE ‘y:%’ OR object_id LIKE ‘z:%’) “ and displays the output.

Parameters:

input (str) – DataFrame to be queried over.
output (Optional[TextIO]) – Output location.
kwargs – Filter options provided by user which generate queries (e.g.: –subject_id x:%).

Raises:

ValueError – If parameter provided is invalid.

Return type:

MappingSetDataFrame

Returns:

Filtered MappingSetDataFrame object.

sssom.io.get_metadata_and_prefix_map(metadata_path=None, *, prefix_map_mode=None)[source]: Load metadata and a prefix map in a deprecated way. :rtype: Tuple[Converter, Dict[str, Any]]

Deprecated since version 0.4.3: This functionality for loading SSSOM metadata from a YAML file is deprecated from the public API since it has internal assumptions which are usually not valid for downstream users.

sssom.io.parse_file(input_path, output, *, input_format=None, metadata_path=None, prefix_map_mode=None, clean_prefixes=True, strict_clean_prefixes=True, embedded_mode=True, mapping_predicate_filter=None)[source]

Parse an SSSOM metadata file and write to a table.

Parameters:

input_path (str) – The path to the input file in one of the legal formats, eg obographs, aligmentapi-xml
output (TextIO) – The path to the output file.
input_format (Optional[str]) – The string denoting the input format.
metadata_path (Optional[str]) – The path to a file containing the sssom metadata (including prefix_map) to be used during parse.
prefix_map_mode (Optional[Literal['metadata_only', 'sssom_default_only', 'merged']]) – Defines whether the prefix map in the metadata should be extended or replaced with the SSSOM default prefix map derived from the bioregistry.
clean_prefixes (bool) – If True (default), records with unknown prefixes are removed from the SSSOM file.
strict_clean_prefixes (bool) – If True (default), clean_prefixes() will be in strict mode.

Return type:

None

:param embedded_mode:If True (default), the dataframe and metadata are exported in one file (tsv), else two separate files (tsv and yaml). :type mapping_predicate_filter: Optional[tuple] :param mapping_predicate_filter: Optional list of mapping predicates or filepath containing the same.

sssom.io.run_sql_query(query, inputs, output=None)[source]

Run a SQL query over one or more SSSOM files.

Each of the N inputs is assigned a table name df1, df2, …, dfN

Alternatively, the filenames can be used as table names - these are first stemmed E.g. ~/dir/my.sssom.tsv becomes a table called ‘my’

Example:: sssom dosql -Q “SELECT * FROM df1 WHERE confidence>0.5 ORDER BY confidence” my.sssom.tsv
Example:: sssom dosql -Q “SELECT file1.*,file2.object_id AS ext_object_id, file2.object_label AS ext_object_label FROM file1 INNER JOIN file2 WHERE file1.object_id = file2.subject_id” FROM file1.sssom.tsv file2.sssom.tsv

Parameters:

query (str) – Query to be executed over a pandas DataFrame (msdf.df).
inputs (List[str]) – Input files that form the source tables for query.
output (Optional[TextIO]) – Output.

Return type:

MappingSetDataFrame

Returns:

Filtered MappingSetDataFrame object.

sssom.io.split_file(input_path, output_directory)[source]

Split an SSSOM TSV by prefixes and relations.

Parameters:

input_path (str) – The path to the input file in one of the legal formats, eg obographs, aligmentapi-xml
output_directory (Union[str, Path]) – The directory to which the split file should be exported.

Return type:

None

sssom.io.validate_file(input_path, validation_types=None, fail_on_error=True)[source]

Validate the incoming SSSOM TSV according to the SSSOM specification.

Parameters:

input_path (str) – The path to the input file in one of the legal formats, eg obographs, aligmentapi-xml
validation_types (Optional[List[SchemaValidationType]]) – A list of validation types to run.
fail_on_error (bool) – Should an exception be raised on error of _any_ validator?

Return type:

dict[SchemaValidationType, ValidationReport]

Returns:

A dictionary from validation types to validation reports

sssom.parsers module

SSSOM parsers.

sssom.parsers.from_alignment_minidom(dom, prefix_map=None, meta=None, mapping_predicates=None)[source]

Read a minidom Document object.

Parameters:

dom (Document) – XML (minidom) object
prefix_map (Union[None, Mapping[str, str], Converter]) – A prefix map
meta (Optional[Dict[str, Any]]) – Optional meta data
mapping_predicates (Optional[List[str]]) – Optional list of mapping predicates to extract

Return type:

MappingSetDataFrame

Returns:

MappingSetDocument

Raises:

ValueError – for alignment format: xml element said, but not set to yes. Only XML is supported!

sssom.parsers.from_obographs(jsondoc, prefix_map=None, meta=None, mapping_predicates=None)[source]

Convert a obographs json object to an SSSOM data frame.

Parameters:

jsondoc (Dict) – The JSON object representing the ontology in obographs format
prefix_map (Union[None, Mapping[str, str], Converter]) – The prefix map to be used
meta (Optional[Dict[str, Any]]) – Any additional metadata that needs to be added to the resulting SSSOM data frame, defaults to None
mapping_predicates (Optional[List[str]]) – Optional list of mapping predicates to extract

Raises:

Exception – When there is no CURIE

Return type:

MappingSetDataFrame

Returns:

An SSSOM data frame (MappingSetDataFrame)

sssom.parsers.from_sssom_dataframe(df, prefix_map=None, meta=None)[source]

Convert a dataframe to a MappingSetDataFrame.

Parameters:

df (DataFrame) – A mappings dataframe
prefix_map (Union[None, Mapping[str, str], Converter]) – A prefix map
meta (Optional[Dict[str, Any]]) – A metadata dictionary

Return type:

MappingSetDataFrame

Returns:

MappingSetDataFrame

sssom.parsers.from_sssom_json(jsondoc, prefix_map=None, meta=None)[source]

Load a mapping set dataframe from a JSON object.

Parameters:

jsondoc (Union[str, dict, TextIO]) – JSON document
prefix_map (Union[None, Mapping[str, str], Converter]) – Prefix map
meta (Optional[Dict[str, Any]]) – metadata used to augment the metadata existing in the mapping set

Return type:

MappingSetDataFrame

Returns:

MappingSetDataFrame object

sssom.parsers.from_sssom_rdf(g, prefix_map=None, meta=None)[source]

Convert an SSSOM RDF graph into a SSSOM data table.

Parameters:

g (Graph) – the Graph (rdflib)
prefix_map (Union[None, Mapping[str, str], Converter]) – A dictionary containing the prefix map, defaults to None
meta (Optional[Dict[str, Any]]) – Potentially additional metadata, defaults to None

Return type:

MappingSetDataFrame

Returns:

MappingSetDataFrame object

sssom.parsers.get_parsing_function(input_format, filename)[source]

Return appropriate parser function based on input format of file.

Parameters:

input_format (Optional[str]) – File format
filename (str) – Filename

Raises:

ValueError – Unknown file format

Return type:

Callable

Returns:

Appropriate ‘read’ function

sssom.parsers.parse_alignment_xml(file_path, prefix_map=None, meta=None, mapping_predicates=None)[source]

Parse a TSV -> MappingSetDocument -> MappingSetDataFrame.

Return type:: MappingSetDataFrame

sssom.parsers.parse_csv(*args, **kwargs)[source]

Parse a SSSOM CSV file, forwarding arguments to parse_sssom_table().

Return type:: MappingSetDataFrame

sssom.parsers.parse_obographs_json(file_path, prefix_map=None, meta=None, mapping_predicates=None)[source]

Parse an obographs file as a JSON object and translates it into a MappingSetDataFrame.

Parameters:

file_path (Union[str, Path]) – The path to the obographs file
prefix_map (Union[None, Mapping[str, str], Converter]) – an optional prefix map
meta (Optional[Dict[str, Any]]) – an optional dictionary of metadata elements
mapping_predicates (Optional[List[str]]) – an optional list of mapping predicates that should be extracted

Return type:

MappingSetDataFrame

Returns:

A SSSOM MappingSetDataFrame

sssom.parsers.parse_sssom_json(file_path, prefix_map=None, meta=None, **kwargs)[source]

Parse a TSV to a MappingSetDocument to a MappingSetDataFrame.

Return type:: MappingSetDataFrame

sssom.parsers.parse_sssom_rdf(file_path, prefix_map=None, meta=None, serialisation='turtle', **kwargs)[source]

Parse a TSV to a MappingSetDocument to a MappingSetDataFrame.

Return type:: MappingSetDataFrame

sssom.parsers.parse_sssom_table(file_path, prefix_map=None, meta=None, *, strict=False, sep=None, **kwargs)[source]

Parse a SSSOM CSV or TSV file.

Parameters:

file_path (Union[str, Path, TextIO]) – A file path, URL, or I/O object that contains SSSOM encoded in TSV
prefix_map (Union[None, Mapping[str, str], Converter]) – A prefix map or curies.Converter used to validate prefixes, CURIEs, and IRIs appearing in the SSSOM TSV
meta (Optional[Dict[str, Any]]) – Additional document-level metadata for the SSSOM TSV document that is not contained within the document itself. For example, this may come from a companion SSSOM YAML file.
strict (bool) – If true, will fail parsing for undefined prefixes, CURIEs, or IRIs
sep (Optional[str]) – The seperator. If not given, inferred from file name
kwargs (Any) – Additional keyword arguments (unhandled)

Return type:

MappingSetDataFrame

Returns:

A parsed dataframe wrapper object

sssom.parsers.parse_tsv(*args, **kwargs)[source]

Parse a SSSOM TSV file, forwarding arguments to parse_sssom_table().

Return type:: MappingSetDataFrame

sssom.parsers.split_dataframe(msdf)[source]

Group the mapping set dataframe into several subdataframes by prefix.

Parameters:: msdf (MappingSetDataFrame) – MappingSetDataFrame object
Raises:: RuntimeError – DataFrame object within MappingSetDataFrame is None
Return type:: Dict[str, MappingSetDataFrame]
Returns:: Mapping object

sssom.parsers.split_dataframe_by_prefix(msdf, subject_prefixes, object_prefixes, relations)[source]

Split a mapping set dataframe by prefix.

Parameters:

msdf (MappingSetDataFrame) – An SSSOM MappingSetDataFrame
subject_prefixes (Iterable[str]) – a list of prefixes pertaining to the subject
object_prefixes (Iterable[str]) – a list of prefixes pertaining to the object
relations (Iterable[str]) – a list of relations of interest

Return type:

Dict[str, MappingSetDataFrame]

Returns:

a dict of SSSOM data frame names to MappingSetDataFrame

sssom.parsers.to_mapping_set_document(msdf)[source]

Convert a MappingSetDataFrame to a MappingSetDocument.

Return type:: MappingSetDocument

sssom.rdf_util module

Rewriting functionality for RDFlib graphs.

sssom.rdf_util.rewire_graph(g, mset, subject_to_object=True, precedence=None)[source]

Rewire an RDF Graph replacing using equivalence mappings.

Return type:: int

sssom.sparql_util module

Utilities for querying mappings with SPARQL.

class sssom.sparql_util.EndpointConfig(url, graph, converter, predmap, predicates, limit, include_object_labels=False)[source]

Bases: object

A container for a SPARQL endpoint’s configuration.

converter: Converter

graph: URIRef

include_object_labels: bool = False

limit: Optional[int]

predicates: Optional[List[str]]

predmap: Dict[str, str]

url: str

sssom.sparql_util.query_mappings(config)[source]

Query a SPARQL endpoint to obtain a set of mappings.

Return type:: MappingSetDataFrame

sssom.sssom_document module

Additional SSSOM object models.

class sssom.sssom_document.MappingSetDocument(mapping_set, converter)[source]

Bases: object

Represents a single SSSOM document.

A document is simply a holder for a MappingSet object plus a CURIE map

converter: Converter

mapping_set: MappingSet

a set of mappings plus metadata

Type:: The main part of the document

property prefix_map: Dict[str, str]: Get a prefix map.

sssom.util module

Utility functions.

class sssom.util.EntityPair(subject_entity, object_entity)[source]

Bases: object

A tuple of entities.

Note that (e1,e2) == (e2,e1)

object_entity: Uriorcurie

subject_entity: Uriorcurie

sssom.util.KEY_FEATURES = ['subject_id', 'predicate_id', 'object_id', 'predicate_modifier']: The 4 columns whose combination would be used as primary keys while merging/grouping

class sssom.util.MappingSetDataFrame(df, converter=<factory>, metadata=<factory>)[source]

Bases: object

A collection of mappings represented as a DataFrame, together with additional metadata.

clean_context()[source]

Clean up the context.

Return type:: None

clean_prefix_map(strict=True)[source]

Remove unused prefixes from the internal prefix map based on the internal dataframe.

Parameters:: strict (bool) – Boolean if True, errors out if all prefixes in dataframe are not listed in the ‘curie_map’.
Raises:: ValueError – If prefixes absent in ‘curie_map’ and strict flag = True
Return type:: None

converter: Converter

df: DataFrame

classmethod from_mapping_set(mapping_set, *, converter=None)[source]

Instantiate from a mapping set and an optional converter.

Parameters:

mapping_set (MappingSet) – A mapping set
converter (Union[None, Mapping[str, str], Converter]) – A prefix map or pre-instantiated converter. If none given, uses a default prefix map derived from the Bioregistry.

Return type:

MappingSetDataFrame

Returns:

A mapping set dataframe

classmethod from_mapping_set_document(doc)[source]

Instantiate from a mapping set document.

Return type:: MappingSetDataFrame

classmethod from_mappings(mappings, *, converter=None, metadata=None)[source]

Instantiate from a list of mappings, mapping set metadata, and an optional converter.

Return type:: MappingSetDataFrame

merge(*msdfs, inplace=True)[source]

Merge two MappingSetDataframes.

Parameters:

msdfs (MappingSetDataFrame) – Multiple/Single MappingSetDataFrame(s) to merge with self
inplace (bool) – If true, msdf2 is merged into the calling MappingSetDataFrame, if false, it simply return the merged data frame.

Return type:

MappingSetDataFrame

Returns:

Merged MappingSetDataFrame

metadata: Dict[str, Any]

property prefix_map: Get a simple, bijective prefix map.

remove_mappings(msdf)[source]

Remove mappings in right msdf from left msdf.

Parameters:: msdf (MappingSetDataFrame) – MappingSetDataframe object to be removed from primary msdf object.
Return type:: None

standardize_references()[source]

Standardize this MSDF’s dataframe and metadata with respect to its converter.

Return type:: None

to_mapping_set()[source]

Get a mapping set.

Return type:: MappingSet

to_mapping_set_document()[source]

Get a mapping set document.

Return type:: MappingSetDocument

to_mappings()[source]

Get a mapping set.

Return type:: List[Mapping]

classmethod with_converter(converter, df, metadata=None)[source]

Instantiate with a converter instead of a vanilla prefix map.

Return type:: MappingSetDataFrame

class sssom.util.MappingSetDiff(unique_tuples1=None, unique_tuples2=None, common_tuples=None, combined_dataframe=None)[source]

Bases: object

Represents a difference between two mapping sets.

Currently this is limited to diffs at the level of entity-pairs. For example, if file1 has A owl:equivalentClass B, and file2 has A skos:closeMatch B, this is considered a mapping in common.

combined_dataframe: Optional[DataFrame] = None: Dataframe that combines with left and right dataframes with information injected into the comment column

common_tuples: Optional[Set[EntityPair]] = None

unique_tuples1: Optional[Set[EntityPair]] = None

unique_tuples2: Optional[Set[EntityPair]] = None

sssom.util.add_default_confidence(df, confidence=nan)[source]

Add confidence column to DataFrame if absent and initializes to 0.95.

If confidence column already exists, only fill in the None ones by 0.95.

Parameters:: df (DataFrame) – DataFrame whose confidence column needs to be filled.
Return type:: DataFrame
Returns:: DataFrame with a complete confidence column.

sssom.util.are_params_slots(params)[source]

Check if parameters conform to the slots in MAPPING_SET_SLOTS.

Parameters:: params (dict) – Dictionary of parameters.
Raises:: ValueError – If params are not slots.
Return type:: bool
Returns:: True/False

sssom.util.assign_default_confidence(df)[source]

Assign numpy.nan to confidence that are blank.

Parameters:: df (DataFrame) – SSSOM DataFrame
Return type:: Tuple[DataFrame, DataFrame]
Returns:: A Tuple consisting of the original DataFrame and dataframe consisting of empty confidence values.

sssom.util.augment_metadata(msdf, meta, replace_multivalued=False)[source]

Augment metadata with parameters passed.

Parameters:

msdf (MappingSetDataFrame) – MappingSetDataFrame (MSDF) object.
meta (dict) – Dictionary that needs to be added/updated to the metadata of the MSDF.
replace_multivalued (bool) – Multivalued slots should be replaced or not, defaults to False.

Raises:

ValueError – If type of slot is neither str nor list.

Return type:

MappingSetDataFrame

Returns:

MSDF with updated metadata.

sssom.util.collapse(df)[source]

Collapse rows with same S/P/O and combines confidence.

Return type:: DataFrame

sssom.util.compare_dataframes(df1, df2)[source]

Perform a diff between two SSSOM dataframes.

Parameters:

df1 (DataFrame) – A mapping dataframe
df2 (DataFrame) – A mapping dataframe

Return type:

MappingSetDiff

Returns:

A mapping set diff

Warning

currently does not discriminate between mappings with different predicates

sssom.util.create_entity(identifier, mappings)[source]

Create an Entity object.

Parameters:

identifier (str) – Entity Id
mappings (Dict[str, Any]) – Mapping dictionary

Return type:

Uriorcurie

Returns:

An Entity object

sssom.util.dataframe_to_ptable(df, *, inverse_factor=None, default_confidence=None)[source]

Export a KBOOM table.

Parameters:

df (DataFrame) – Pandas DataFrame
inverse_factor (Optional[float]) – Multiplier to (1 - confidence), defaults to 0.5
default_confidence (Optional[float]) – Default confidence to be assigned if absent.

Raises:

ValueError – Predicate value error
ValueError – Predicate type value error

Returns:

List of rows

sssom.util.deal_with_negation(df)[source]

Combine negative and positive rows with matching [SUBJECT_ID, OBJECT_ID, CONFIDENCE] combination.

Rule: negative trumps positive if modulus of confidence values are equal.

Parameters:: df (DataFrame) – Merged Pandas DataFrame
Return type:: DataFrame
Returns:: Pandas DataFrame with negations addressed
Raises:: ValueError – If the dataframe is none after assigning default confidence

sssom.util.filter_out_prefixes(df, filter_prefixes, features=None, require_all_prefixes=False)[source]

Filter out rows which contains a CURIE with a prefix in the filter_prefixes list.

Parameters:

df (DataFrame) – Pandas DataFrame of SSSOM Mapping
filter_prefixes (List[str]) – List of prefixes
features (Optional[list]) – List of dataframe column names dataframe to consider
require_all_prefixes (bool) – If True, all prefixes must be present in a row to be filtered out

Return type:

DataFrame

Returns:

Pandas Dataframe

sssom.util.filter_prefixes(df, filter_prefixes, features=None, require_all_prefixes=True)[source]

Filter out rows which do NOT contain a CURIE with a prefix in the filter_prefixes list.

Parameters:

df (DataFrame) – Pandas DataFrame of SSSOM Mapping
filter_prefixes (List[str]) – List of prefixes
features (Optional[list]) – List of dataframe column names dataframe to consider
require_all_prefixes (bool) – If True, all prefixes must be present in a row to be filtered out

Return type:

DataFrame

Returns:

Pandas Dataframe

sssom.util.filter_redundant_rows(df, ignore_predicate=False)[source]

Remove rows if there is another row with same S/O and higher confidence.

Parameters:

df (DataFrame) – Pandas DataFrame to filter
ignore_predicate (bool) – If true, the predicate_id column is ignored, defaults to False

Return type:

DataFrame

Returns:

Filtered pandas DataFrame

sssom.util.get_all_prefixes(msdf)[source]

Fetch all prefixes in the MappingSetDataFrame.

Parameters:

msdf (MappingSetDataFrame) – MappingSetDataFrame

Raises:

ValidationError – If slot is wrong.
ValidationError – If slot is wrong.

Return type:

Set[str]

Returns:

List of all prefixes.

sssom.util.get_dict_from_mapping(map_obj)[source]

Get information for linkml objects (MatchTypeEnum, PredicateModifierEnum) from the Mapping object and return the dictionary form of the object.

Parameters:: map_obj (Union[Any, Dict[Any, Any], Mapping]) – Mapping object
Return type:: dict
Returns:: Dictionary

sssom.util.get_file_extension(file)[source]

Get file extension.

Parameters:: file (Union[str, Path, TextIO]) – File path
Return type:: Optional[Literal['tsv', 'csv']]
Returns:: format of the file passed, default tsv

sssom.util.get_prefix_from_curie(curie)[source]

Get the prefix from a CURIE.

Return type:: str

sssom.util.get_prefixes_used_in_metadata(meta)[source]

Get a set of prefixes used in CURIEs in the metadata.

Return type:: Set[str]

sssom.util.get_prefixes_used_in_table(df)[source]

Get a list of prefixes used in CURIEs in key feature columns in a dataframe.

Return type:: Set[str]

sssom.util.get_row_based_on_hierarchy(df)[source]

Get row based on hierarchy of predicates.

The hierarchy is as follows: # owl:equivalentClass # owl:equivalentProperty # rdfs:subClassOf # rdfs:subPropertyOf # owl:sameAs # skos:exactMatch # skos:closeMatch # skos:broadMatch # skos:narrowMatch # oboInOwl:hasDbXref # skos:relatedMatch # rdfs:seeAlso

Parameters:: df (DataFrame) – Dataframe containing multiple predicates for same subject and object.
Return type:: DataFrame
Returns:: Dataframe with a single row which ranks higher in the hierarchy.
Raises:: KeyError – if no rows are available

sssom.util.group_mappings(df)[source]

Group mappings by EntityPairs.

Return type:: Dict[EntityPair, List[Series]]

sssom.util.inject_metadata_into_df(msdf)[source]

Inject metadata dictionary key-value pair into DataFrame columns in a MappingSetDataFrame.DataFrame.

Parameters:: msdf (MappingSetDataFrame) – MappingSetDataFrame with metadata separate.
Return type:: MappingSetDataFrame
Returns:: MappingSetDataFrame with metadata as columns

sssom.util.invert_mappings(df, subject_prefix=None, merge_inverted=True, update_justification=True, predicate_invert_dictionary=None)[source]

Switching subject and objects based on their prefixes and adjusting predicates accordingly.

Parameters:

df (DataFrame) – Pandas dataframe.
subject_prefix (Optional[str]) – Prefix of subjects desired.
merge_inverted (bool) – If True (default), add inverted dataframe to input else, just return inverted data.
update_justification (bool) – If True (default), the justification is updated to “sempav:MappingInversion”, else it is left as it is.
predicate_invert_dictionary (Optional[dict]) – YAML file providing the inverse mapping for predicates.

Return type:

DataFrame

Returns:

Pandas dataframe with all subject IDs having the same prefix.

sssom.util.is_multivalued_slot(slot)[source]

Check whether the slot is multivalued according to the SSSOM specification.

Parameters:: slot (str) – Slot name
Return type:: bool
Returns:: Slot is multivalued or no

sssom.util.merge_msdf(*msdfs, reconcile=False)[source]

Merge multiple MappingSetDataFrames into one.

Parameters:

msdfs (MappingSetDataFrame) – A Tuple of MappingSetDataFrames to be merged
reconcile (bool) – If reconcile=True, then dedupe(remove redundant lower confidence mappings) and reconcile (if msdf contains a higher confidence _negative_ mapping, then remove lower confidence positive one. If confidence is the same, prefer HumanCurated. If both HumanCurated, prefer negative mapping). Defaults to True.

Return type:

MappingSetDataFrame

Returns:

Merged MappingSetDataFrame.

sssom.util.pandas_set_no_silent_downcasting(no_silent_downcasting=True)[source]: Set pandas future.no_silent_downcasting option. Context https://github.com/pandas-dev/pandas/issues/57734.

sssom.util.raise_for_bad_path(file_path)[source]

Raise exception if file path is invalid.

Parameters:: file_path (Union[str, Path]) – File path
Raises:: FileNotFoundError – Invalid file path
Return type:: None

sssom.util.reconcile_prefix_and_data(msdf, prefix_reconciliation)[source]

Reconciles prefix_map and translates CURIE switch in dataframe.

Parameters:

msdf (MappingSetDataFrame) – Mapping Set DataFrame.
prefix_reconciliation (dict) – Prefix reconcilation dictionary from a YAML file

Return type:

MappingSetDataFrame

Returns:

Mapping Set DataFrame with reconciled prefix_map and data.

This method is build on curies.remap_curie_prefixes() and curies.rewire(). Note that if you want to overwrite a CURIE prefix in the Bioregistry extended prefix map, you need to provide a place for the old one to go as in {"geo": "ncbi.geo", "geogeo": "geo"}. Just doing {"geogeo": "geo"} would not work since geo already exists.

sssom.util.remove_unmatched(df)[source]

Remove rows where no match is found.

TODO: https://github.com/OBOFoundry/SSSOM/issues/28 :type df: DataFrame :param df: Pandas DataFrame :rtype: DataFrame :return: Pandas DataFrame with ‘PREDICATE_ID’ not ‘noMatch’.

sssom.util.safe_compress(uri, converter)[source]

Parse a CURIE from an IRI.

Parameters:

uri (str) – The URI to parse. If this is already a CURIE, return directly.
converter (Converter) – Converter used for compression

Return type:

str

Returns:

A CURIE

sssom.util.sort_df_rows_columns(df, by_columns=True, by_rows=True)[source]

Canonical sorting of DataFrame columns.

Parameters:

df (DataFrame) – Pandas DataFrame with random column sequence.
by_columns (bool) – Boolean flag to sort columns canonically.
by_rows (bool) – Boolean flag to sort rows by column #1 (ascending order).

Return type:

DataFrame

Returns:

Pandas DataFrame columns sorted canonically.

sssom.util.sort_sssom(df)[source]

Sort SSSOM by columns.

Parameters:: df (DataFrame) – SSSOM DataFrame to be sorted.
Return type:: DataFrame
Returns:: Sorted SSSOM DataFrame

sssom.util.to_mapping_set_dataframe(doc)[source]

Convert MappingSetDocument into MappingSetDataFrame.

Parameters:: doc (MappingSetDocument) – MappingSetDocument object
Return type:: MappingSetDataFrame
Returns:: MappingSetDataFrame object

sssom.validators module

Validators.

sssom.validators.check_all_prefixes_in_curie_map(msdf, fail_on_error=True)[source]

Check all EntityReference slots are mentioned in ‘curie_map’.

Parameters:

msdf (MappingSetDataFrame) – MappingSetDataFrame
fail_on_error (bool) – if true, the function will throw an ValidationError exception when there are errors

Raises:

ValidationError – If all prefixes not in curie_map.

Return type:

ValidationReport

sssom.validators.check_strict_curie_format(msdf, fail_on_error=True)[source]

Check all EntityReference slots are formatted as unambiguous curies.

Implemented rules:

CURIE does not contain pipe “|” character to ensure that multivalued processing of in TSV works correctly.

Parameters:

msdf (MappingSetDataFrame) – MappingSetDataFrame
fail_on_error (bool) – if true, the function will throw an ValidationError exception when there are errors

Raises:

ValidationError – If any entity reference does not follow the strict CURIE format

Return type:

ValidationReport

sssom.validators.print_linkml_report(report, fail_on_error=True)[source]

Print the error messages in the report. Optionally throw exception.

Parameters:

report (ValidationReport) – A LinkML validation report
fail_on_error (bool) – if true, the function will throw an ValidationError exception when there are errors

sssom.validators.validate(msdf, validation_types=None, fail_on_error=True)[source]

Validate SSSOM files against sssom-schema using linkML’s validator function.

Parameters:

msdf (MappingSetDataFrame) – MappingSetDataFrame.
validation_types (Optional[List[SchemaValidationType]]) – SchemaValidationType
fail_on_error (bool) – If true, throw an error when execution of a method has failed

Return type:

dict[SchemaValidationType, ValidationReport]

Returns:

A dictionary from validation types to validation reports

sssom.validators.validate_json_schema(msdf, fail_on_error=True)[source]

Validate JSON Schema using linkml’s JsonSchemaDataValidator.

Parameters:

msdf (MappingSetDataFrame) – MappingSetDataFrame to eb validated.
fail_on_error (bool) – if true, the function will throw an ValidationError exception when there are errors

Return type:

ValidationReport

sssom.validators.validate_shacl(msdf, fail_on_error=True)[source]

Validate SCHACL file.

Parameters:

msdf (MappingSetDataFrame) – TODO: https://github.com/linkml/linkml/issues/850 .
fail_on_error (bool) – if true, the function will throw an ValidationError exception when there are errors

Raises:

NotImplementedError – Not yet implemented.

Return type:

ValidationReport

sssom.validators.validate_sparql(msdf, fail_on_error=True)[source]

Validate SPARQL file.

Parameters:

msdf (MappingSetDataFrame) – MappingSetDataFrame
fail_on_error (bool) – if true, the function will throw an ValidationError exception when there are errors

Raises:

NotImplementedError – Not yet implemented.

Return type:

ValidationReport

sssom.writers module

Serialization functions for SSSOM.

sssom.writers.get_writer_function(*, output_format=None, output)[source]

Get appropriate writer function based on file format.

Parameters:

output (TextIO) – Output file
output_format (Optional[str]) – Output file format, defaults to None

Raises:

ValueError – Unknown output format

Return type:

Tuple[Callable[[MappingSetDataFrame, TextIO], None], str]

Returns:

Type of writer function

sssom.writers.to_fhir_json(msdf)[source]

Convert a mapping set dataframe to a JSON object.

Parameters:: msdf (MappingSetDataFrame) – MappingSetDataFrame: Collection of mappings represented as DataFrame, together w/ additional metadata.
Return type:: Dict
Returns:: Dict: A Dictionary serializable as JSON.

Resources:

ConceptMap::SSSOM mapping spreadsheet:

https://docs.google.com/spreadsheets/d/1J19foBAYO8PCHwOfksaIGjNu-q5ILUKFh2HpOCgYle0/edit#gid=1389897118

TODO: add to CLI & to these functions: r4 vs r5 param TODO: What if the msdf doesn’t have everything we need? (i) metadata, e.g. yml, (ii) what if we need to override?

todo: later: allow any nested arbitrary override: (get in kwargs, else metadata.get(key, None))

Minor todos todo: mapping_justification: consider ValueString -> ValueCoding https://github.com/timsbiomed/issues/issues/152 todo: when/how to conform to R5 instead of R4?: https://build.fhir.org/conceptmap.html

sssom.writers.to_json(msdf)[source]

Convert a mapping set dataframe to a JSON object.

Return type:: JsonObj

sssom.writers.to_ontoportal_json(msdf)[source]

Convert a mapping set dataframe to a list of ontoportal mapping JSON objects.

Return type:: List[Dict]

sssom.writers.to_owl_graph(msdf)[source]

Convert a mapping set dataframe to OWL in an RDF graph.

Return type:: Graph

sssom.writers.to_rdf_graph(msdf)[source]

Convert a mapping set dataframe to an RDF graph.

Return type:: Graph

sssom.writers.write_fhir_json(msdf, output, serialisation='fhir_json')[source]: Write a mapping set dataframe to the file as FHIR ConceptMap JSON. :rtype: None

Deprecated since version 0.4.7: Use write_json() instead

sssom.writers.write_json(msdf, output, serialisation='json')[source]

Write a mapping set dataframe to the file as JSON.

Parameters:

msdf (MappingSetDataFrame) – A mapping set dataframe
output (Union[str, Path, TextIO]) – A path or write-supported file object to write JSON to
serialisation –
The JSON format to use. Supported formats are:
- fhir_json: Outputs JSON in FHIR ConceptMap format (https://fhir-ru.github.io/conceptmap.html) https://mapping-commons.github.io/sssom-py/sssom.html#sssom.writers.to_fhir_json
- json: Outputs to SSSOM JSON https://mapping-commons.github.io/sssom-py/sssom.html#sssom.writers.to_json
- ontoportal_json: Outputs JSON in Ontoportal format (https://ontoportal.org/) https://mapping-commons.github.io/sssom-py/sssom.html#sssom.writers.to_ontoportal_json

Return type:

None

sssom.writers.write_ontoportal_json(msdf, output, serialisation='ontoportal_json')[source]: Write a mapping set dataframe to the file as the ontoportal mapping JSON model. :rtype: None

Deprecated since version 0.4.7: Use write_json() instead

sssom.writers.write_owl(msdf, file, serialisation='turtle')[source]

Write a mapping set dataframe to the file as OWL.

Return type:: None

sssom.writers.write_rdf(msdf, file, serialisation=None)[source]

Write a mapping set dataframe to the file as RDF.

Return type:: None

sssom.writers.write_table(msdf, file, embedded_mode=True, serialisation='tsv', sort=False)[source]

Write a mapping set dataframe to the file as a table.

Return type:: None

sssom.writers.write_tables(sssom_dict, output_dir)[source]

Write table from MappingSetDataFrame object.

Parameters:

sssom_dict (Dict[str, MappingSetDataFrame]) – Dictionary of MappingSetDataframes
output_dir (Union[str, Path]) – The directory in which the derived SSSOM files are written

Return type:

None

sssom.writers.write_tsv(msdf, path, embedded_mode=True, sort=False)[source]

Write a mapping set to a TSV file.

Return type:: None

Module contents

sssom-py package.