sssom package
Submodules
sssom.cli module
Command line interface for SSSOM.
Why does this file exist, and why not put this in __main__
? You might be tempted to import things from __main__
later, but that will cause problems–the code will get executed twice:
When you run
python3 -m sssom
python will execute``__main__.py`` as a script. That means there won’t be anysssom.__main__
insys.modules
.When you import __main__ it will get executed again (as a module) because there’s no
sssom.__main__
insys.modules
.
sssom.cliques module
Utilities for identifying and working with cliques/SCCs in mappings graphs.
- sssom.cliques.get_src(src, curie)[source]
Get prefix of subject/object in the MappingSetDataFrame.
- Parameters:
src (
Optional
[str
]) – Sourcecurie (
str
) – CURIE
- Returns:
Source
- sssom.cliques.group_values(d)[source]
Group all keys in the dictionary that share the same value.
- Return type:
Dict
[str
,List
[str
]]
- sssom.cliques.split_into_cliques(msdf)[source]
Split a MappingSetDataFrames documents corresponding to a strongly connected components of the associated graph.
- Parameters:
msdf (
MappingSetDataFrame
) – MappingSetDataFrame object- Raises:
TypeError – If Mappings is not of type List
TypeError – If each mapping is not of type Mapping
TypeError – If Mappings is not of type List
- Return type:
List
[MappingSetDocument
]- Returns:
List of MappingSetDocument objects
sssom.cliquesummary module
sssom.constants module
Constants.
- sssom.constants.MetadataType
The type for metadata that gets passed around in many places
alias of
Dict
[str
,Any
]
- class sssom.constants.SEMAPV(value)[source]
Bases:
Enum
SEMAPV Enum containing different mapping_justification.
See also: https://mapping-commons.github.io/semantic-mapping-vocabulary/#matchingprocess
- CompositeMatching = 'semapv:CompositeMatching'
- CrossSpeciesBroadMatch = 'semapv:crossSpeciesBroadMatch'
- CrossSpeciesExactMatch = 'semapv:crossSpeciesExactMatch'
- CrossSpeciesNarrowMatch = 'semapv:crossSpeciesNarrowMatch'
- LexicalMatching = 'semapv:LexicalMatching'
- LexicalSimilarityThresholdMatching = 'semapv:LexicalSimilarityThresholdMatching'
- LogicalReasoning = 'semapv:LogicalReasoning'
- ManualMappingCuration = 'semapv:ManualMappingCuration'
- MappingChaining = 'semapv:MappingChaining'
- MappingInversion = 'semapv:MappingInversion'
- MappingReview = 'semapv:MappingReview'
- SemanticSimilarityThresholdMatching = 'semapv:SemanticSimilarityThresholdMatching'
- UnspecifiedMatching = 'semapv:UnspecifiedMatching'
- class sssom.constants.SSSOMSchemaView[source]
Bases:
object
SchemaView class from linkml which is instantiated when necessary.
Reason for this: https://github.com/mapping-commons/sssom-py/issues/322 Implemented via PR: https://github.com/mapping-commons/sssom-py/pull/323
- property dict: dict
Return SchemaView as a dictionary.
- property double_slots: Set[str]
Return the slot names for SSSOMSchemaView object.
- property entity_reference_slots: Set[str]
Return set of entity reference slots.
- instance = <sssom.constants.SSSOMSchemaView object>
- property mapping_enum_keys: Set[str]
Return a set of mapping enum keys.
- property mapping_set_slots: List[str]
Return list of mapping set slots.
- property mapping_slots: List[str]
Return list of mapping slots.
- property multivalued_slots: Set[str]
Return set of multivalued slots.
- property slots: Dict[str, str]
Return the slots for SSSOMSchemaView object.
- property view: SchemaView
Return SchemaView object.
- class sssom.constants.SchemaValidationType(value)[source]
Bases:
str
,Enum
Schema validation types.
- JsonSchema = 'JsonSchema'
- PrefixMapCompleteness = 'PrefixMapCompleteness'
- Shacl = 'Shacl'
- Sparql = 'Sparql'
- StrictCurieFormat = 'StrictCurieFormat'
- __format__(format_spec)
Returns format using actual value type unless __str__ has been overridden.
- sssom.constants.get_default_metadata()[source]
Get default metadata.
- Return type:
Dict
[str
,Any
]- Returns:
A metadata dictionary containing a default license with value
DEFAULT_LICENSE
and an auto-generated mapping set ID
If you want to combine some metadata you loaded but ensure that there is also default metadata, the best tool is
collections.ChainMap
. You can do:my_metadata: dict | None = ... from collections import ChainMap from sssom import get_default_metadata metadata = dict(ChainMap( my_metadata or {}, get_default_metadata() ))
sssom.context module
Utilities for loading JSON-LD contexts.
- sssom.context.ConverterHint
A type hint that specifies a place where one of three options can be given: 1. a legacy prefix mapping dictionary can be given, which will get upgraded
into a
curies.Converter
,a converter can be given, which might get modified. In SSSOM-py, this typically means chaining behind the “default” prefix map
None, which means a default converter is loaded
alias of
None
|Mapping
[str
,str
] |Converter
- sssom.context.ensure_converter(prefix_map=None, *, use_defaults=True)[source]
Ensure a converter is available.
- Parameters:
prefix_map (
Union
[None
,Mapping
[str
,str
],Converter
]) –One of the following:
An empty dictionary or
None
. This results in using the default extended prefix map (currently based on a variant of the Bioregistry) ifuse_defaults
is set to true, otherwise just the builtin prefix map including the prefixes inSSSOM_BUILT_IN_PREFIXES
A non-empty dictionary representing a prefix map. This is loaded as a converter with
Converter.from_prefix_map()
. It is chained behind the builtin prefix map to ensure none of theSSSOM_BUILT_IN_PREFIXES
are overwritten with non-default valuesA pre-instantiated
curies.Converter
. Similarly to a prefix map passed into this function, this is chained behind the builtin prefix map
use_defaults (
bool
) – If an empty dictionary or None is passed to this function, this parameter chooses if the extended prefix map (currently based on a variant of the Bioregistry) gets loaded.
- Return type:
Converter
- Returns:
A re-usable converter
sssom.io module
I/O utilities for SSSOM.
- sssom.io.annotate_file(input, output=None, replace_multivalued=False, **kwargs)[source]
Annotate a file i.e. add custom metadata to the mapping set.
- Parameters:
input (
str
) – SSSOM tsv file to be queried over.output (
Optional
[TextIO
]) – Output location.replace_multivalued (
bool
) – Multivalued slots should be replaced or not, defaults to Falsekwargs – Options provided by user which are added to the metadata (e.g.: –mapping_set_id http://example.org/abcd)
- Return type:
- Returns:
Annotated MappingSetDataFrame object.
- sssom.io.convert_file(input_path, output, output_format=None)[source]
Convert a file from one format to another.
- Parameters:
input_path (
str
) – The path to the input SSSOM tsv fileoutput (
TextIO
) – The path to the output file. If none is given, will default to using stdout.output_format (
Optional
[str
]) – The format to which the SSSOM TSV should be converted.
- Return type:
None
- sssom.io.extract_iris(input, converter)[source]
Recursively extracts a list of IRIs from a string or file.
- Parameters:
input (
Union
[str
,Path
,Iterable
[Union
[str
,Path
]]]) – CURIE OR list of CURIEs OR file path containing the same.converter (
Converter
) – Prefix map of mapping set (possibly) containing custom prefix:IRI combination.
- Return type:
List
[str
]- Returns:
A list of IRIs.
- sssom.io.filter_file(input, output=None, **kwargs)[source]
Filter a dataframe by dynamically generating queries based on user input.
e.g. sssom filter –subject_id x:% –subject_id y:% –object_id y:% –object_id z:% tests/data/basic.tsv
yields the query:
- “SELECT * FROM df WHERE (subject_id LIKE ‘x:%’ OR subject_id LIKE ‘y:%’)
AND (object_id LIKE ‘y:%’ OR object_id LIKE ‘z:%’) “ and displays the output.
- Parameters:
input (
str
) – DataFrame to be queried over.output (
Optional
[TextIO
]) – Output location.kwargs – Filter options provided by user which generate queries (e.g.: –subject_id x:%).
- Raises:
ValueError – If parameter provided is invalid.
- Return type:
- Returns:
Filtered MappingSetDataFrame object.
- sssom.io.get_metadata_and_prefix_map(metadata_path=None, *, prefix_map_mode=None)[source]
Load metadata and a prefix map in a deprecated way. :rtype:
Tuple
[Converter
,Dict
[str
,Any
]]Deprecated since version 0.4.3: This functionality for loading SSSOM metadata from a YAML file is deprecated from the public API since it has internal assumptions which are usually not valid for downstream users.
- sssom.io.parse_file(input_path, output, *, input_format=None, metadata_path=None, prefix_map_mode=None, clean_prefixes=True, strict_clean_prefixes=True, embedded_mode=True, mapping_predicate_filter=None)[source]
Parse an SSSOM metadata file and write to a table.
- Parameters:
input_path (
str
) – The path to the input file in one of the legal formats, eg obographs, aligmentapi-xmloutput (
TextIO
) – The path to the output file.input_format (
Optional
[str
]) – The string denoting the input format.metadata_path (
Optional
[str
]) – The path to a file containing the sssom metadata (including prefix_map) to be used during parse.prefix_map_mode (
Optional
[Literal
['metadata_only'
,'sssom_default_only'
,'merged'
]]) – Defines whether the prefix map in the metadata should be extended or replaced with the SSSOM default prefix map derived from thebioregistry
.clean_prefixes (
bool
) – If True (default), records with unknown prefixes are removed from the SSSOM file.strict_clean_prefixes (
bool
) – If True (default), clean_prefixes() will be in strict mode.
- Return type:
None
:param embedded_mode:If True (default), the dataframe and metadata are exported in one file (tsv), else two separate files (tsv and yaml). :type mapping_predicate_filter:
Optional
[tuple
] :param mapping_predicate_filter: Optional list of mapping predicates or filepath containing the same.
- sssom.io.run_sql_query(query, inputs, output=None)[source]
Run a SQL query over one or more SSSOM files.
Each of the N inputs is assigned a table name df1, df2, …, dfN
Alternatively, the filenames can be used as table names - these are first stemmed E.g. ~/dir/my.sssom.tsv becomes a table called ‘my’
- Example:
sssom dosql -Q “SELECT * FROM df1 WHERE confidence>0.5 ORDER BY confidence” my.sssom.tsv
- Example:
sssom dosql -Q “SELECT file1.*,file2.object_id AS ext_object_id, file2.object_label AS ext_object_label FROM file1 INNER JOIN file2 WHERE file1.object_id = file2.subject_id” FROM file1.sssom.tsv file2.sssom.tsv
- Parameters:
query (
str
) – Query to be executed over a pandas DataFrame (msdf.df).inputs (
List
[str
]) – Input files that form the source tables for query.output (
Optional
[TextIO
]) – Output.
- Return type:
- Returns:
Filtered MappingSetDataFrame object.
- sssom.io.split_file(input_path, output_directory)[source]
Split an SSSOM TSV by prefixes and relations.
- Parameters:
input_path (
str
) – The path to the input file in one of the legal formats, eg obographs, aligmentapi-xmloutput_directory (
Union
[str
,Path
]) – The directory to which the split file should be exported.
- Return type:
None
- sssom.io.validate_file(input_path, validation_types)[source]
Validate the incoming SSSOM TSV according to the SSSOM specification.
- Parameters:
input_path (
str
) – The path to the input file in one of the legal formats, eg obographs, aligmentapi-xmlvalidation_types (
List
[SchemaValidationType
]) – A list of validation types to run.
- Return type:
None
sssom.parsers module
SSSOM parsers.
- sssom.parsers.from_alignment_minidom(dom, prefix_map=None, meta=None, mapping_predicates=None)[source]
Read a minidom Document object.
- Parameters:
dom (
Document
) – XML (minidom) objectprefix_map (
Union
[None
,Mapping
[str
,str
],Converter
]) – A prefix mapmeta (
Optional
[Dict
[str
,Any
]]) – Optional meta datamapping_predicates (
Optional
[List
[str
]]) – Optional list of mapping predicates to extract
- Return type:
- Returns:
MappingSetDocument
- Raises:
ValueError – for alignment format: xml element said, but not set to yes. Only XML is supported!
- sssom.parsers.from_obographs(jsondoc, prefix_map=None, meta=None, mapping_predicates=None)[source]
Convert a obographs json object to an SSSOM data frame.
- Parameters:
jsondoc (
Dict
) – The JSON object representing the ontology in obographs formatprefix_map (
Union
[None
,Mapping
[str
,str
],Converter
]) – The prefix map to be usedmeta (
Optional
[Dict
[str
,Any
]]) – Any additional metadata that needs to be added to the resulting SSSOM data frame, defaults to Nonemapping_predicates (
Optional
[List
[str
]]) – Optional list of mapping predicates to extract
- Raises:
Exception – When there is no CURIE
- Return type:
- Returns:
An SSSOM data frame (MappingSetDataFrame)
- sssom.parsers.from_sssom_dataframe(df, prefix_map=None, meta=None)[source]
Convert a dataframe to a MappingSetDataFrame.
- Parameters:
df (
DataFrame
) – A mappings dataframeprefix_map (
Union
[None
,Mapping
[str
,str
],Converter
]) – A prefix mapmeta (
Optional
[Dict
[str
,Any
]]) – A metadata dictionary
- Return type:
- Returns:
MappingSetDataFrame
- sssom.parsers.from_sssom_json(jsondoc, prefix_map=None, meta=None)[source]
Load a mapping set dataframe from a JSON object.
- Parameters:
jsondoc (
Union
[str
,dict
,TextIO
]) – JSON documentprefix_map (
Union
[None
,Mapping
[str
,str
],Converter
]) – Prefix mapmeta (
Optional
[Dict
[str
,Any
]]) – metadata used to augment the metadata existing in the mapping set
- Return type:
- Returns:
MappingSetDataFrame object
- sssom.parsers.from_sssom_rdf(g, prefix_map=None, meta=None)[source]
Convert an SSSOM RDF graph into a SSSOM data table.
- Parameters:
g (
Graph
) – the Graph (rdflib)prefix_map (
Union
[None
,Mapping
[str
,str
],Converter
]) – A dictionary containing the prefix map, defaults to Nonemeta (
Optional
[Dict
[str
,Any
]]) – Potentially additional metadata, defaults to None
- Return type:
- Returns:
MappingSetDataFrame object
- sssom.parsers.get_parsing_function(input_format, filename)[source]
Return appropriate parser function based on input format of file.
- Parameters:
input_format (
Optional
[str
]) – File formatfilename (
str
) – Filename
- Raises:
Exception – Unknown file format
- Return type:
Callable
- Returns:
Appropriate ‘read’ function
- sssom.parsers.parse_alignment_xml(file_path, prefix_map=None, meta=None, mapping_predicates=None)[source]
Parse a TSV -> MappingSetDocument -> MappingSetDataFrame.
- Return type:
- sssom.parsers.parse_obographs_json(file_path, prefix_map=None, meta=None, mapping_predicates=None)[source]
Parse an obographs file as a JSON object and translates it into a MappingSetDataFrame.
- Parameters:
file_path (
str
) – The path to the obographs fileprefix_map (
Union
[None
,Mapping
[str
,str
],Converter
]) – an optional prefix mapmeta (
Optional
[Dict
[str
,Any
]]) – an optional dictionary of metadata elementsmapping_predicates (
Optional
[List
[str
]]) – an optional list of mapping predicates that should be extracted
- Return type:
- Returns:
A SSSOM MappingSetDataFrame
- sssom.parsers.parse_sssom_json(file_path, prefix_map=None, meta=None, **kwargs)[source]
Parse a TSV to a
MappingSetDocument
to aMappingSetDataFrame
.- Return type:
- sssom.parsers.parse_sssom_rdf(file_path, prefix_map=None, meta=None, serialisation='turtle', **kwargs)[source]
Parse a TSV to a
MappingSetDocument
to aMappingSetDataFrame
.- Return type:
- sssom.parsers.parse_sssom_table(file_path, prefix_map=None, meta=None, **kwargs)[source]
Parse a TSV to a
MappingSetDocument
to aMappingSetDataFrame
.- Return type:
- sssom.parsers.split_dataframe(msdf)[source]
Group the mapping set dataframe into several subdataframes by prefix.
- Parameters:
msdf (
MappingSetDataFrame
) – MappingSetDataFrame object- Raises:
RuntimeError – DataFrame object within MappingSetDataFrame is None
- Return type:
Dict
[str
,MappingSetDataFrame
]- Returns:
Mapping object
- sssom.parsers.split_dataframe_by_prefix(msdf, subject_prefixes, object_prefixes, relations)[source]
Split a mapping set dataframe by prefix.
- Parameters:
msdf (
MappingSetDataFrame
) – An SSSOM MappingSetDataFramesubject_prefixes (
Iterable
[str
]) – a list of prefixes pertaining to the subjectobject_prefixes (
Iterable
[str
]) – a list of prefixes pertaining to the objectrelations (
Iterable
[str
]) – a list of relations of interest
- Return type:
Dict
[str
,MappingSetDataFrame
]- Returns:
a dict of SSSOM data frame names to MappingSetDataFrame
sssom.rdf_util module
Rewriting functionality for RDFlib graphs.
sssom.sparql_util module
Utilities for querying mappings with SPARQL.
- class sssom.sparql_util.EndpointConfig(url, graph, converter, predmap, predicates, limit, include_object_labels=False)[source]
Bases:
object
A container for a SPARQL endpoint’s configuration.
-
converter:
Converter
-
graph:
URIRef
-
include_object_labels:
bool
= False
-
limit:
Optional
[int
]
-
predicates:
Optional
[List
[str
]]
-
predmap:
Dict
[str
,str
]
-
url:
str
-
converter:
sssom.sssom_document module
Additional SSSOM object models.
- class sssom.sssom_document.MappingSetDocument(mapping_set, converter)[source]
Bases:
object
Represents a single SSSOM document.
A document is simply a holder for a MappingSet object plus a CURIE map
-
converter:
Converter
-
mapping_set:
MappingSet
a set of mappings plus metadata
- Type:
The main part of the document
- property prefix_map: Dict[str, str]
Get a prefix map.
-
converter:
sssom.util module
Utility functions.
- class sssom.util.EntityPair(subject_entity, object_entity)[source]
Bases:
object
A tuple of entities.
Note that (e1,e2) == (e2,e1)
-
object_entity:
Uriorcurie
-
subject_entity:
Uriorcurie
-
object_entity:
- sssom.util.KEY_FEATURES = ['subject_id', 'predicate_id', 'object_id', 'predicate_modifier']
The 4 columns whose combination would be used as primary keys while merging/grouping
- class sssom.util.MappingSetDataFrame(df, converter=<factory>, metadata=<factory>)[source]
Bases:
object
A collection of mappings represented as a DataFrame, together with additional metadata.
- clean_prefix_map(strict=True)[source]
Remove unused prefixes from the internal prefix map based on the internal dataframe.
- Parameters:
strict (
bool
) – Boolean if True, errors out if all prefixes in dataframe are not listed in the ‘curie_map’.- Raises:
ValueError – If prefixes absent in ‘curie_map’ and strict flag = True
- Return type:
None
-
converter:
Converter
-
df:
DataFrame
- classmethod from_mapping_set(mapping_set, *, converter=None)[source]
Instantiate from a mapping set and an optional converter.
- Parameters:
mapping_set (
MappingSet
) – A mapping setconverter (
Union
[None
,Mapping
[str
,str
],Converter
]) – A prefix map or pre-instantiated converter. If none given, uses a default prefix map derived from the Bioregistry.
- Return type:
- Returns:
A mapping set dataframe
- classmethod from_mapping_set_document(doc)[source]
Instantiate from a mapping set document.
- Return type:
- classmethod from_mappings(mappings, *, converter=None, metadata=None)[source]
Instantiate from a list of mappings, mapping set metadata, and an optional converter.
- Return type:
- merge(*msdfs, inplace=True)[source]
Merge two MappingSetDataframes.
- Parameters:
msdfs (
MappingSetDataFrame
) – Multiple/Single MappingSetDataFrame(s) to merge with selfinplace (
bool
) – If true, msdf2 is merged into the calling MappingSetDataFrame, if false, it simply return the merged data frame.
- Return type:
- Returns:
Merged MappingSetDataFrame
-
metadata:
Dict
[str
,Any
]
- property prefix_map
Get a simple, bijective prefix map.
- remove_mappings(msdf)[source]
Remove mappings in right msdf from left msdf.
- Parameters:
msdf (
MappingSetDataFrame
) – MappingSetDataframe object to be removed from primary msdf object.- Return type:
None
- standardize_references()[source]
Standardize this MSDF’s dataframe and metadata with respect to its converter.
- Return type:
None
- class sssom.util.MappingSetDiff(unique_tuples1=None, unique_tuples2=None, common_tuples=None, combined_dataframe=None)[source]
Bases:
object
Represents a difference between two mapping sets.
Currently this is limited to diffs at the level of entity-pairs. For example, if file1 has A owl:equivalentClass B, and file2 has A skos:closeMatch B, this is considered a mapping in common.
-
combined_dataframe:
Optional
[DataFrame
] = None Dataframe that combines with left and right dataframes with information injected into the comment column
-
common_tuples:
Optional
[Set
[EntityPair
]] = None
-
unique_tuples1:
Optional
[Set
[EntityPair
]] = None
-
unique_tuples2:
Optional
[Set
[EntityPair
]] = None
-
combined_dataframe:
- sssom.util.add_default_confidence(df, confidence=nan)[source]
Add confidence column to DataFrame if absent and initializes to 0.95.
If confidence column already exists, only fill in the None ones by 0.95.
- Parameters:
df (
DataFrame
) – DataFrame whose confidence column needs to be filled.- Return type:
DataFrame
- Returns:
DataFrame with a complete confidence column.
- sssom.util.are_params_slots(params)[source]
Check if parameters conform to the slots in MAPPING_SET_SLOTS.
- Parameters:
params (
dict
) – Dictionary of parameters.- Raises:
ValueError – If params are not slots.
- Return type:
bool
- Returns:
True/False
- sssom.util.assign_default_confidence(df)[source]
Assign
numpy.nan
to confidence that are blank.- Parameters:
df (
DataFrame
) – SSSOM DataFrame- Return type:
Tuple
[DataFrame
,DataFrame
]- Returns:
A Tuple consisting of the original DataFrame and dataframe consisting of empty confidence values.
- sssom.util.augment_metadata(msdf, meta, replace_multivalued=False)[source]
Augment metadata with parameters passed.
- Parameters:
msdf (
MappingSetDataFrame
) – MappingSetDataFrame (MSDF) object.meta (
dict
) – Dictionary that needs to be added/updated to the metadata of the MSDF.replace_multivalued (
bool
) – Multivalued slots should be replaced or not, defaults to False.
- Raises:
ValueError – If type of slot is neither str nor list.
- Return type:
- Returns:
MSDF with updated metadata.
- sssom.util.collapse(df)[source]
Collapse rows with same S/P/O and combines confidence.
- Return type:
DataFrame
- sssom.util.compare_dataframes(df1, df2)[source]
Perform a diff between two SSSOM dataframes.
- Parameters:
df1 (
DataFrame
) – A mapping dataframedf2 (
DataFrame
) – A mapping dataframe
- Return type:
- Returns:
A mapping set diff
Warning
currently does not discriminate between mappings with different predicates
- sssom.util.create_entity(identifier, mappings)[source]
Create an Entity object.
- Parameters:
identifier (
str
) – Entity Idmappings (
Dict
[str
,Any
]) – Mapping dictionary
- Return type:
Uriorcurie
- Returns:
An Entity object
- sssom.util.dataframe_to_ptable(df, *, inverse_factor=None, default_confidence=None)[source]
Export a KBOOM table.
- Parameters:
df (
DataFrame
) – Pandas DataFrameinverse_factor (
Optional
[float
]) – Multiplier to (1 - confidence), defaults to 0.5default_confidence (
Optional
[float
]) – Default confidence to be assigned if absent.
- Raises:
ValueError – Predicate value error
ValueError – Predicate type value error
- Returns:
List of rows
- sssom.util.deal_with_negation(df)[source]
Combine negative and positive rows with matching [SUBJECT_ID, OBJECT_ID, CONFIDENCE] combination.
Rule: negative trumps positive if modulus of confidence values are equal.
- Parameters:
df (
DataFrame
) – Merged Pandas DataFrame- Return type:
DataFrame
- Returns:
Pandas DataFrame with negations addressed
- Raises:
ValueError – If the dataframe is none after assigning default confidence
- sssom.util.filter_out_prefixes(df, filter_prefixes, features=None, require_all_prefixes=False)[source]
Filter out rows which contains a CURIE with a prefix in the filter_prefixes list.
- Parameters:
df (
DataFrame
) – Pandas DataFrame of SSSOM Mappingfilter_prefixes (
List
[str
]) – List of prefixesfeatures (
Optional
[list
]) – List of dataframe column names dataframe to considerrequire_all_prefixes (
bool
) – If True, all prefixes must be present in a row to be filtered out
- Return type:
DataFrame
- Returns:
Pandas Dataframe
- sssom.util.filter_prefixes(df, filter_prefixes, features=None, require_all_prefixes=True)[source]
Filter out rows which do NOT contain a CURIE with a prefix in the filter_prefixes list.
- Parameters:
df (
DataFrame
) – Pandas DataFrame of SSSOM Mappingfilter_prefixes (
List
[str
]) – List of prefixesfeatures (
Optional
[list
]) – List of dataframe column names dataframe to considerrequire_all_prefixes (
bool
) – If True, all prefixes must be present in a row to be filtered out
- Return type:
DataFrame
- Returns:
Pandas Dataframe
- sssom.util.filter_redundant_rows(df, ignore_predicate=False)[source]
Remove rows if there is another row with same S/O and higher confidence.
- Parameters:
df (
DataFrame
) – Pandas DataFrame to filterignore_predicate (
bool
) – If true, the predicate_id column is ignored, defaults to False
- Return type:
DataFrame
- Returns:
Filtered pandas DataFrame
- sssom.util.get_all_prefixes(msdf)[source]
Fetch all prefixes in the MappingSetDataFrame.
- Parameters:
msdf (
MappingSetDataFrame
) – MappingSetDataFrame- Raises:
ValidationError – If slot is wrong.
ValidationError – If slot is wrong.
- Return type:
Set
[str
]- Returns:
List of all prefixes.
- sssom.util.get_dict_from_mapping(map_obj)[source]
Get information for linkml objects (MatchTypeEnum, PredicateModifierEnum) from the Mapping object and return the dictionary form of the object.
- Parameters:
map_obj (
Union
[Any
,Dict
[Any
,Any
],Mapping
]) – Mapping object- Return type:
dict
- Returns:
Dictionary
- sssom.util.get_file_extension(file)[source]
Get file extension.
- Parameters:
file (
Union
[str
,Path
,TextIO
]) – File path- Return type:
str
- Returns:
format of the file passed, default tsv
- sssom.util.get_prefixes_used_in_metadata(meta)[source]
Get a set of prefixes used in CURIEs in the metadata.
- Return type:
Set
[str
]
- sssom.util.get_prefixes_used_in_table(df, converter)[source]
Get a list of prefixes used in CURIEs in key feature columns in a dataframe.
- Return type:
Set
[str
]
- sssom.util.get_row_based_on_hierarchy(df)[source]
Get row based on hierarchy of predicates.
The hierarchy is as follows: # owl:equivalentClass # owl:equivalentProperty # rdfs:subClassOf # rdfs:subPropertyOf # owl:sameAs # skos:exactMatch # skos:closeMatch # skos:broadMatch # skos:narrowMatch # oboInOwl:hasDbXref # skos:relatedMatch # rdfs:seeAlso
- Parameters:
df (
DataFrame
) – Dataframe containing multiple predicates for same subject and object.- Return type:
DataFrame
- Returns:
Dataframe with a single row which ranks higher in the hierarchy.
- Raises:
KeyError – if no rows are available
- sssom.util.group_mappings(df)[source]
Group mappings by EntityPairs.
- Return type:
Dict
[EntityPair
,List
[Series
]]
- sssom.util.inject_metadata_into_df(msdf)[source]
Inject metadata dictionary key-value pair into DataFrame columns in a MappingSetDataFrame.DataFrame.
- Parameters:
msdf (
MappingSetDataFrame
) – MappingSetDataFrame with metadata separate.- Return type:
- Returns:
MappingSetDataFrame with metadata as columns
- sssom.util.invert_mappings(df, subject_prefix=None, merge_inverted=True, update_justification=True, predicate_invert_dictionary=None)[source]
Switching subject and objects based on their prefixes and adjusting predicates accordingly.
- Parameters:
df (
DataFrame
) – Pandas dataframe.subject_prefix (
Optional
[str
]) – Prefix of subjects desired.merge_inverted (
bool
) – If True (default), add inverted dataframe to input else, just return inverted data.update_justification (
bool
) – If True (default), the justification is updated to “sempav:MappingInversion”, else it is left as it is.predicate_invert_dictionary (
Optional
[dict
]) – YAML file providing the inverse mapping for predicates.
- Return type:
DataFrame
- Returns:
Pandas dataframe with all subject IDs having the same prefix.
- sssom.util.is_multivalued_slot(slot)[source]
Check whether the slot is multivalued according to the SSSOM specification.
- Parameters:
slot (
str
) – Slot name- Return type:
bool
- Returns:
Slot is multivalued or no
- sssom.util.merge_msdf(*msdfs, reconcile=False)[source]
Merge multiple MappingSetDataFrames into one.
- Parameters:
msdfs (
MappingSetDataFrame
) – A Tuple of MappingSetDataFrames to be mergedreconcile (
bool
) – If reconcile=True, then dedupe(remove redundant lower confidence mappings) and reconcile (if msdf contains a higher confidence _negative_ mapping, then remove lower confidence positive one. If confidence is the same, prefer HumanCurated. If both HumanCurated, prefer negative mapping). Defaults to True.
- Return type:
- Returns:
Merged MappingSetDataFrame.
- sssom.util.pandas_set_no_silent_downcasting(no_silent_downcasting=True)[source]
Set pandas future.no_silent_downcasting option. Context https://github.com/pandas-dev/pandas/issues/57734.
- sssom.util.raise_for_bad_path(file_path)[source]
Raise exception if file path is invalid.
- Parameters:
file_path (
Union
[str
,Path
]) – File path- Raises:
FileNotFoundError – Invalid file path
- Return type:
None
- sssom.util.reconcile_prefix_and_data(msdf, prefix_reconciliation)[source]
Reconciles prefix_map and translates CURIE switch in dataframe.
- Parameters:
msdf (
MappingSetDataFrame
) – Mapping Set DataFrame.prefix_reconciliation (
dict
) – Prefix reconcilation dictionary from a YAML file
- Return type:
- Returns:
Mapping Set DataFrame with reconciled prefix_map and data.
This method is build on
curies.remap_curie_prefixes()
andcuries.rewire()
. Note that if you want to overwrite a CURIE prefix in the Bioregistry extended prefix map, you need to provide a place for the old one to go as in{"geo": "ncbi.geo", "geogeo": "geo"}
. Just doing{"geogeo": "geo"}
would not work since geo already exists.
- sssom.util.remove_unmatched(df)[source]
Remove rows where no match is found.
TODO: https://github.com/OBOFoundry/SSSOM/issues/28 :type df:
DataFrame
:param df: Pandas DataFrame :rtype:DataFrame
:return: Pandas DataFrame with ‘PREDICATE_ID’ not ‘noMatch’.
- sssom.util.safe_compress(uri, converter)[source]
Parse a CURIE from an IRI.
- Parameters:
uri (
str
) – The URI to parse. If this is already a CURIE, return directly.converter (
Converter
) – Converter used for compression
- Return type:
str
- Returns:
A CURIE
- sssom.util.sort_df_rows_columns(df, by_columns=True, by_rows=True)[source]
Canonical sorting of DataFrame columns.
- Parameters:
df (
DataFrame
) – Pandas DataFrame with random column sequence.by_columns (
bool
) – Boolean flag to sort columns canonically.by_rows (
bool
) – Boolean flag to sort rows by column #1 (ascending order).
- Return type:
DataFrame
- Returns:
Pandas DataFrame columns sorted canonically.
- sssom.util.sort_sssom(df)[source]
Sort SSSOM by columns.
- Parameters:
df (
DataFrame
) – SSSOM DataFrame to be sorted.- Return type:
DataFrame
- Returns:
Sorted SSSOM DataFrame
- sssom.util.to_mapping_set_dataframe(doc)[source]
Convert MappingSetDocument into MappingSetDataFrame.
- Parameters:
doc (
MappingSetDocument
) – MappingSetDocument object- Return type:
- Returns:
MappingSetDataFrame object
sssom.validators module
Validators.
- sssom.validators.check_all_prefixes_in_curie_map(msdf, fail_on_error=True)[source]
Check all EntityReference slots are mentioned in ‘curie_map’.
- Parameters:
msdf (
MappingSetDataFrame
) – MappingSetDataFramefail_on_error (
bool
) – if true, the function will throw an ValidationError exception when there are errors
- Raises:
ValidationError – If all prefixes not in curie_map.
- Return type:
None
- sssom.validators.check_strict_curie_format(msdf, fail_on_error=True)[source]
Check all EntityReference slots are formatted as unambiguous curies.
- Implemented rules:
CURIE does not contain pipe “|” character to ensure that multivalued processing of in TSV works correctly.
- Parameters:
msdf (
MappingSetDataFrame
) – MappingSetDataFramefail_on_error (
bool
) – if true, the function will throw an ValidationError exception when there are errors
- Raises:
ValidationError – If any entity reference does not follow the strict CURIE format
- Return type:
None
- sssom.validators.print_linkml_report(report, fail_on_error=True)[source]
Print the error messages in the report. Optionally throw exception.
- Parameters:
report (
ValidationReport
) – A LinkML validation reportfail_on_error (
bool
) – if true, the function will throw an ValidationError exception when there are errors
- sssom.validators.validate(msdf, validation_types, fail_on_error=True)[source]
Validate SSSOM files against sssom-schema using linkML’s validator function.
- Parameters:
msdf (
MappingSetDataFrame
) – MappingSetDataFrame.validation_types (
List
[SchemaValidationType
]) – SchemaValidationTypefail_on_error (
bool
) – If true, throw an error when execution of a method has failed
- Return type:
None
- sssom.validators.validate_json_schema(msdf, fail_on_error=True)[source]
Validate JSON Schema using linkml’s JsonSchemaDataValidator.
- Parameters:
msdf (
MappingSetDataFrame
) – MappingSetDataFrame to eb validated.fail_on_error (
bool
) – if true, the function will throw an ValidationError exception when there are errors
- Return type:
None
- sssom.validators.validate_shacl(msdf, fail_on_error=True)[source]
Validate SCHACL file.
- Parameters:
msdf (
MappingSetDataFrame
) – TODO: https://github.com/linkml/linkml/issues/850 .fail_on_error (
bool
) – if true, the function will throw an ValidationError exception when there are errors
- Raises:
NotImplementedError – Not yet implemented.
- Return type:
None
- sssom.validators.validate_sparql(msdf, fail_on_error=True)[source]
Validate SPARQL file.
- Parameters:
msdf (
MappingSetDataFrame
) – MappingSetDataFramefail_on_error (
bool
) – if true, the function will throw an ValidationError exception when there are errors
- Raises:
NotImplementedError – Not yet implemented.
- Return type:
None
sssom.writers module
Serialization functions for SSSOM.
- sssom.writers.get_writer_function(*, output_format=None, output)[source]
Get appropriate writer function based on file format.
- Parameters:
output (
TextIO
) – Output fileoutput_format (
Optional
[str
]) – Output file format, defaults to None
- Raises:
ValueError – Unknown output format
- Return type:
Tuple
[Callable
[[MappingSetDataFrame
,TextIO
],None
],str
]- Returns:
Type of writer function
- sssom.writers.to_fhir_json(msdf)[source]
Convert a mapping set dataframe to a JSON object.
- Parameters:
msdf (
MappingSetDataFrame
) – MappingSetDataFrame: Collection of mappings represented as DataFrame, together w/ additional metadata.- Return type:
Dict
- Returns:
Dict: A Dictionary serializable as JSON.
- Resources:
ConceptMap::SSSOM mapping spreadsheet:
TODO: add to CLI & to these functions: r4 vs r5 param TODO: What if the msdf doesn’t have everything we need? (i) metadata, e.g. yml, (ii) what if we need to override?
todo: later: allow any nested arbitrary override: (get in kwargs, else metadata.get(key, None))
Minor todos todo: mapping_justification: consider ValueString -> ValueCoding https://github.com/timsbiomed/issues/issues/152 todo: when/how to conform to R5 instead of R4?: https://build.fhir.org/conceptmap.html
- sssom.writers.to_json(msdf)[source]
Convert a mapping set dataframe to a JSON object.
- Return type:
JsonObj
- sssom.writers.to_ontoportal_json(msdf)[source]
Convert a mapping set dataframe to a list of ontoportal mapping JSON objects.
- Return type:
List
[Dict
]
- sssom.writers.to_owl_graph(msdf)[source]
Convert a mapping set dataframe to OWL in an RDF graph.
- Return type:
Graph
- sssom.writers.to_rdf_graph(msdf)[source]
Convert a mapping set dataframe to an RDF graph.
- Return type:
Graph
- sssom.writers.write_fhir_json(msdf, output, serialisation='fhir_json')[source]
Write a mapping set dataframe to the file as FHIR ConceptMap JSON. :rtype:
None
Deprecated since version 0.4.7: Use write_json() instead
- sssom.writers.write_json(msdf, output, serialisation='json')[source]
Write a mapping set dataframe to the file as JSON.
- Parameters:
serialisation –
The JSON format to use. Supported formats are: - fhir_json: Outputs JSON in FHIR ConceptMap format (https://fhir-ru.github.io/conceptmap.html)
json: Outputs to SSSOM JSON https://mapping-commons.github.io/sssom-py/sssom.html#sssom.writers.to_json
ontoportal_json: Outputs JSON in Ontoportal format (https://ontoportal.org/) https://mapping-commons.github.io/sssom-py/sssom.html#sssom.writers.to_ontoportal_json
- Return type:
None
- sssom.writers.write_ontoportal_json(msdf, output, serialisation='ontoportal_json')[source]
Write a mapping set dataframe to the file as the ontoportal mapping JSON model. :rtype:
None
Deprecated since version 0.4.7: Use write_json() instead
- sssom.writers.write_owl(msdf, file, serialisation='turtle')[source]
Write a mapping set dataframe to the file as OWL.
- Return type:
None
- sssom.writers.write_rdf(msdf, file, serialisation=None)[source]
Write a mapping set dataframe to the file as RDF.
- Return type:
None
- sssom.writers.write_table(msdf, file, embedded_mode=True, serialisation='tsv', sort=False)[source]
Write a mapping set dataframe to the file as a table.
- Return type:
None
- sssom.writers.write_tables(sssom_dict, output_dir)[source]
Write table from MappingSetDataFrame object.
- Parameters:
sssom_dict (
Dict
[str
,MappingSetDataFrame
]) – Dictionary of MappingSetDataframesoutput_dir (
Union
[str
,Path
]) – The directory in which the derived SSSOM files are written
- Return type:
None
Module contents
sssom-py package.