CLI Utils#
Utility methods that are used in KGX command line.
kgx.cli.cli_utils#
- kgx.cli.cli_utils.apply_operations(source: dict, graph: BaseGraph) BaseGraph [source]#
Apply operations as defined in the YAML.
- Parameters:
source (dict) – The source from the YAML
graph (kgx.graph.base_graph.BaseGraph) – The graph corresponding to the source
- Returns:
The graph corresponding to the source
- Return type:
- kgx.cli.cli_utils.get_input_file_types() Tuple [source]#
Get all input file formats supported by KGX.
- Returns:
A tuple of supported file formats
- Return type:
Tuple
- kgx.cli.cli_utils.get_output_file_types() Tuple [source]#
Get all output file formats supported by KGX.
- Returns:
A tuple of supported file formats
- Return type:
Tuple
- kgx.cli.cli_utils.get_report_format_types() Tuple [source]#
Get all graph summary report formats supported by KGX.
- Returns:
A tuple of supported file formats
- Return type:
Tuple
- kgx.cli.cli_utils.graph_summary(inputs: List[str], input_format: str, input_compression: Optional[str], output: Optional[str], report_type: str, report_format: Optional[str] = None, graph_name: Optional[str] = None, node_facet_properties: Optional[List] = None, edge_facet_properties: Optional[List] = None, error_log: str = '') Dict [source]#
Loads and summarizes a knowledge graph from a set of input files.
- Parameters:
inputs (List[str]) – Input file
input_format (str) – Input file format
input_compression (Optional[str]) – The input compression type
output (Optional[str]) – Where to write the output (stdout, by default)
report_type (str) – The summary report type: “kgx-map” or “meta-knowledge-graph”
report_format (Optional[str]) – The summary report format file types: ‘yaml’ or ‘json’
graph_name (str) – User specified name of graph being summarized
node_facet_properties (Optional[List]) – A list of node properties from which to generate counts per value for those properties. For example,
['provided_by']
edge_facet_properties (Optional[List]) – A list of edge properties (e.g. knowledge_source tags) to facet on. For example,
['original_knowledge_source', 'aggregator_knowledge_source']
error_log (str) – Where to write any graph processing error message (stderr, by default)
- Returns:
A dictionary with the graph stats
- Return type:
Dict
- kgx.cli.cli_utils.merge(merge_config: str, source: Optional[List] = None, destination: Optional[List] = None, processes: int = 1) BaseGraph [source]#
Load nodes and edges from files and KGs, as defined in a config YAML, and merge them into a single graph. The merged graph can then be written to a local/remote Neo4j instance OR be serialized into a file.
- Parameters:
- Returns:
The merged graph
- Return type:
- kgx.cli.cli_utils.neo4j_download(uri: str, username: str, password: str, output: str, output_format: str, output_compression: Optional[str], stream: bool, node_filters: Optional[Tuple] = None, edge_filters: Optional[Tuple] = None) Transformer [source]#
Download nodes and edges from Neo4j database.
- Parameters:
uri (str) – Neo4j URI. For example, https://localhost:7474
username (str) – Username for authentication
password (str) – Password for authentication
output (str) – Where to write the output (stdout, by default)
output_format (Optional[str]) – The output type (
tsv
, by default)output_compression (Optional[str]) – The output compression type
stream (bool) – Whether to parse input as a stream
node_filters (Optional[Tuple]) – Node filters
edge_filters (Optional[Tuple]) – Edge filters
- Returns:
The NeoTransformer
- Return type:
kgx.Transformer
- kgx.cli.cli_utils.neo4j_upload(inputs: List[str], input_format: str, input_compression: Optional[str], uri: str, username: str, password: str, stream: bool, node_filters: Optional[Tuple] = None, edge_filters: Optional[Tuple] = None) Transformer [source]#
Upload a set of nodes/edges to a Neo4j database.
- Parameters:
inputs (List[str]) – A list of files that contains nodes/edges
input_format (str) – The input format
input_compression (Optional[str]) – The input compression type
uri (str) – The full HTTP address for Neo4j database
username (str) – Username for authentication
password (str) – Password for authentication
stream (bool) – Whether to parse input as a stream
node_filters (Optional[Tuple]) – Node filters
edge_filters (Optional[Tuple]) – Edge filters
- Returns:
The NeoTransformer
- Return type:
kgx.Transformer
- kgx.cli.cli_utils.parse_source(key: str, source: dict, output_directory: str, prefix_map: Optional[Dict[str, str]] = None, node_property_predicates: Optional[Set[str]] = None, predicate_mappings: Optional[Dict[str, str]] = None, checkpoint: bool = False) Sink [source]#
Parse a source from a merge config YAML.
- Parameters:
key (str) – Source key
source (Dict) – Source configuration
output_directory (str) – Location to write output to
node_property_predicates (Set[str]) – A set of predicates that ought to be treated as node properties (This is applicable for RDF)
predicate_mappings (Dict[str, str]) – A mapping of predicate IRIs to property names (This is applicable for RDF)
checkpoint (bool) – Whether to serialize each individual source to a TSV
- Returns:
Returns an instance of Sink
- Return type:
- kgx.cli.cli_utils.prepare_input_args(key: str, source: Dict, output_directory: Optional[str], prefix_map: Optional[Dict[str, str]] = None, node_property_predicates: Optional[Set[str]] = None, predicate_mappings: Optional[Dict[str, str]] = None) Dict [source]#
Prepare input arguments for Transformer.
- Parameters:
key (str) – Source key
source (Dict) – Source configuration
output_directory (str) – Location to write output to
node_property_predicates (Set[str]) – A set of predicates that ought to be treated as node properties (This is applicable for RDF)
predicate_mappings (Dict[str, str]) – A mapping of predicate IRIs to property names (This is applicable for RDF)
- Returns:
Input arguments as dictionary
- Return type:
Dict
- kgx.cli.cli_utils.prepare_output_args(key: str, source: Dict, output_directory: Optional[str], reverse_prefix_map: Optional[Dict] = None, reverse_predicate_mappings: Optional[Dict] = None, property_types: Optional[Dict] = None) Dict [source]#
Prepare output arguments for Transformer.
- Parameters:
key (str) – Source key
source (Dict) – Source configuration
output_directory (str) – Location to write output to
reverse_prefix_map (Dict[str, str]) – Non-canonical CURIE mappings for export
reverse_predicate_mappings (Dict[str, str]) – A mapping of property names to predicate IRIs (This is applicable for RDF)
property_types (Dict[str, str]) – The xml property type for properties that are other than
xsd:string
. Relevant for RDF export.
- Returns:
Output arguments as dictionary
- Return type:
Dict
- kgx.cli.cli_utils.prepare_top_level_args(d: Dict) Dict [source]#
Parse top-level configuration.
- Parameters:
d (Dict) – The configuration section from the transform/merge YAML
- Returns:
A parsed dictionary with parameters from configuration
- Return type:
Dict
- kgx.cli.cli_utils.transform(inputs: Optional[List[str]], input_format: Optional[str] = None, input_compression: Optional[str] = None, output: Optional[str] = None, output_format: Optional[str] = None, output_compression: Optional[str] = None, stream: bool = False, node_filters: Optional[List[Tuple[str, str]]] = None, edge_filters: Optional[List[Tuple[str, str]]] = None, transform_config: Optional[str] = None, source: Optional[List] = None, knowledge_sources: Optional[List[Tuple[str, str]]] = None, processes: int = 1, infores_catalog: Optional[str] = None) None [source]#
Transform a Knowledge Graph from one serialization form to another.
- Parameters:
inputs (Optional[List[str]]) – A list of files that contains nodes/edges
input_format (Optional[str]) – The input format
input_compression (Optional[str]) – The input compression type
output (Optional[str]) – The output file
output_format (Optional[str]) – The output format
output_compression (Optional[str]) – The output compression type
stream (bool) – Whether to parse input as a stream
node_filters (Optional[List[Tuple[str, str]]]) – Node input filters
edge_filters (Optional[List[Tuple[str, str]]]) – Edge input filters
transform_config (Optional[str]) – The transform config YAML
source (Optional[List]) – A list of source to load from the YAML
knowledge_sources (Optional[List[Tuple[str, str]]]) – A list of named knowledge sources with (string, boolean or tuple rewrite) specification
processes (int) – Number of processes to use
infores_catalog (Optional[str]) – Optional dump of a TSV file of InfoRes CURIE to Knowledge Source mappings (not yet available in transform_config calling mode)
- kgx.cli.cli_utils.transform_source(key: str, source: Dict, output_directory: Optional[str], prefix_map: Optional[Dict[str, str]] = None, node_property_predicates: Optional[Set[str]] = None, predicate_mappings: Optional[Dict[str, str]] = None, reverse_prefix_map: Optional[Dict] = None, reverse_predicate_mappings: Optional[Dict] = None, property_types: Optional[Dict] = None, checkpoint: bool = False, preserve_graph: bool = True, stream: bool = False, infores_catalog: Optional[str] = None) Sink [source]#
Transform a source from a transform config YAML.
- Parameters:
key (str) – Source key
source (Dict) – Source configuration
output_directory (Optional[str]) – Location to write output to
node_property_predicates (Set[str]) – A set of predicates that ought to be treated as node properties (This is applicable for RDF)
predicate_mappings (Dict[str, str]) – A mapping of predicate IRIs to property names (This is applicable for RDF)
reverse_prefix_map (Dict[str, str]) – Non-canonical CURIE mappings for export
reverse_predicate_mappings (Dict[str, str]) – A mapping of property names to predicate IRIs (This is applicable for RDF)
property_types (Dict[str, str]) – The xml property type for properties that are other than
xsd:string
. Relevant for RDF export.checkpoint (bool) – Whether to serialize each individual source to a TSV
preserve_graph (true) – Whether or not to preserve the graph corresponding to the source
stream (bool) – Whether to parse input as a stream
infores_catalog (Optional[str]) – Optional dump of a TSV file of InfoRes CURIE to Knowledge Source mappings
- Returns:
Returns an instance of Sink
- Return type:
- kgx.cli.cli_utils.validate(inputs: List[str], input_format: str, input_compression: Optional[str], output: Optional[str], biolink_release: Optional[str] = None) Dict [source]#
Run KGX validator on an input file to check for Biolink Model compliance.
- Parameters:
inputs (List[str]) – Input files
input_format (str) – The input format
input_compression (Optional[str]) – The input compression type
output (Optional[str]) – Path to output file (stdout, by default)
biolink_release (Optional[str] = None) – SemVer version of Biolink Model Release used for validation (default: latest Biolink Model Toolkit version)
- Returns:
A dictionary of entities which have parse errors indexed by [message_level][error_type][message]
- Return type:
Dict