I/O¶
High-level I/O operations used by the CLI commands.
sssom.io
¶
I/O utilities for SSSOM.
convert_file(input_path, output, output_format=None, propagate=True, condense=True)
¶
Convert a file from one format to another.
:param input_path: The path to the input SSSOM tsv file :param output: The path to the output file. If none is given, will default to using stdout. :param output_format: The format to which the SSSOM TSV should be converted. :param propagate: Propagate condensed slots in the input file. :param condense: Condense slots in the output file.
Source code in src/sssom/io.py
parse_file(input_path, output, *, input_format=None, metadata_path=None, prefix_map_mode=None, clean_prefixes=True, strict_clean_prefixes=True, embedded_mode=True, mapping_predicate_filter=None, propagate=True, condense=True)
¶
Parse an SSSOM metadata file and write to a table.
:param input_path: The path to the input file in one of the legal formats, eg obographs,
aligmentapi-xml
:param output: The path to the output file.
:param input_format: The string denoting the input format.
:param metadata_path: The path to a file containing the sssom metadata (including prefix_map) to
be used during parse.
:param prefix_map_mode: Defines whether the prefix map in the metadata should be extended or
replaced with the SSSOM default prefix map derived from the :mod:bioregistry.
:param clean_prefixes: If True (default), records with unknown prefixes are removed from the
SSSOM file.
:param strict_clean_prefixes: If True (default), clean_prefixes() will be in strict mode.
:param embedded_mode: If True (default), the dataframe and metadata are exported in one file
(tsv), else two separate files (tsv and yaml).
:param mapping_predicate_filter: Optional list of mapping predicates or filepath containing the
same.
:param propagate: If true, propagate all condensed slots in the input set.
:param condense: If true, condense slots in the output set.
Source code in src/sssom/io.py
validate_file(input_path, validation_types=None, fail_on_error=True, propagate=True)
¶
Validate the incoming SSSOM TSV according to the SSSOM specification.
:param input_path: The path to the input file in one of the legal formats, eg obographs, aligmentapi-xml :param validation_types: A list of validation types to run. :param fail_on_error: Should an exception be raised on error of any validator? :param propagate: If true, propagate condensed slots in the input set.
:returns: A dictionary from validation types to validation reports
Source code in src/sssom/io.py
split_file(input_path, output_directory, *, method=None)
¶
Split an SSSOM TSV by prefixes and relations.
:param input_path: The path to the input file in one of the legal formats, eg obographs, aligmentapi-xml :param output_directory: The directory to which the split file should be exported.
Source code in src/sssom/io.py
get_metadata_and_prefix_map(metadata_path=None, *, prefix_map_mode=None)
¶
Load metadata and a prefix map in a deprecated way.
Source code in src/sssom/io.py
extract_iris(input, converter)
¶
Recursively extracts a list of IRIs from a string or file.
:param input: CURIE OR list of CURIEs OR file path containing the same. :param converter: Prefix map of mapping set (possibly) containing custom prefix:IRI combination.
:returns: A list of IRIs.
Source code in src/sssom/io.py
run_sql_query(query, inputs, output=None)
¶
Run a SQL query over one or more SSSOM files.
Each of the N inputs is assigned a table name df1, df2, ..., dfN
Alternatively, the filenames can be used as table names - these are first stemmed E.g. ~/dir/my.sssom.tsv becomes a table called 'my'
Example: sssom dosql -Q "SELECT * FROM df1 WHERE confidence>0.5 ORDER BY confidence" my.sssom.tsv
Example:
sssom dosql -Q "SELECT file1.*,file2.object_id AS ext_object_id, file2.object_label AS
ext_object_label FROM file1 INNER JOIN file2 WHERE file1.object_id = file2.subject_id" FROM
file1.sssom.tsv file2.sssom.tsv
:param query: Query to be executed over a pandas DataFrame (msdf.df). :param inputs: Input files that form the source tables for query. :param output: Output.
:returns: Filtered MappingSetDataFrame object.
Source code in src/sssom/io.py
filter_file(input, output=None, **kwargs)
¶
Filter a dataframe by dynamically generating queries based on user input.
e.g. sssom filter --subject_id x:% --subject_id y:% --object_id y:% --object_id z:% tests/data/basic.tsv
yields the query:
"SELECT * FROM df WHERE (subject_id LIKE 'x:%' OR subject_id LIKE 'y:%') AND (object_id LIKE 'y:%' OR object_id LIKE 'z:%') " and displays the output.
:param input: DataFrame to be queried over. :param output: Output location. :param kwargs: Filter options provided by user which generate queries (e.g.: --subject_id x:%).
:returns: Filtered MappingSetDataFrame object.
:raises ValueError: If parameter provided is invalid.
Source code in src/sssom/io.py
annotate_file(input, output=None, replace_multivalued=False, **kwargs)
¶
Annotate a file i.e. add custom metadata to the mapping set.
:param input: SSSOM tsv file to be queried over.
:param output: Output location.
:param replace_multivalued: Multivalued slots should be replaced or not, defaults to False
:param kwargs: Options provided by user which are added to the metadata (e.g. --mapping_set_id
http://example.org/abcd)
:returns: Annotated MappingSetDataFrame object.