KGX CLI

kgx

Knowledge Graph Exchange CLI entrypoint.

Usage

kgx [OPTIONS] COMMAND [ARGS]...

Options

--version

Show the version and exit.

arangodb-download

Download nodes and edges from an ArangoDB database.

Parameters

uri: str

ArangoDB URI. For example, http://localhost:8529

database: str

The database name

username: str

Username for authentication

password: str

Password for authentication

output: str

Where to write the output (stdout, by default)

output_format: str

The output type (tsv, by default)

output_compression: str

The output compression type

stream: bool

Whether to parse input as a stream

node_filters: Tuple[str, str]

Node filters

edge_filters: Tuple[str, str]

Edge filters

node_collection: Tuple[str]

Names of vertex collections

edge_collection: Tuple[str]

Names of edge collections

all_collections: bool

Whether to discover and export all non-system collections

Usage

kgx arangodb-download [OPTIONS]

Options

-l, --uri <uri>

Required ArangoDB URI to download from. For example, http://localhost:8529

-d, --database <database>

Required ArangoDB database name

-u, --username <username>

Required ArangoDB username

-p, --password <password>

Required ArangoDB password

-o, --output <output>

Required Output

-f, --output-format <output_format>

Required The output format. Can be one of (‘csv’, ‘graph’, ‘json’, ‘jsonl’, ‘neo4j’, ‘arangodb’, ‘nt’, ‘jelly’, ‘null’, ‘sql’, ‘tsv’, ‘parquet’)

--output-compression <output_compression>

The output compression type

-s, --stream

Parse input as a stream

-n, --node-filters <node_filters>

Filters for filtering nodes from the input graph

-e, --edge-filters <edge_filters>

Filters for filtering edges from the input graph

--node-collection <node_collection>

Name of a vertex collection (repeatable; default: nodes)

--edge-collection <edge_collection>

Name of an edge collection (repeatable; default: edges)

--all-collections

Discover and export all non-system collections in the database

arangodb-upload

Upload a set of nodes/edges to an ArangoDB database.

Parameters

inputs: List[str]

A list of files that contains nodes/edges

input_format: str

The input format

input_compression: str

The input compression type

uri: str

The full HTTP address for ArangoDB database

database: str

The database name

username: str

Username for authentication

password: str

Password for authentication

stream: bool

Whether to parse input as a stream

node_filters: Tuple[str, str]

Node filters

edge_filters: Tuple[str, str]

Edge filters

node_collection: str

Name of the vertex collection

edge_collection: str

Name of the edge collection

curie_routing: bool

Whether to route to per-CURIE-prefix collections

Usage

kgx arangodb-upload [OPTIONS] INPUTS...

Options

-i, --input-format <input_format>

Required The input format. Can be one of (‘tsv’, ‘csv’, ‘graph’, ‘json’, ‘jsonl’, ‘obojson’, ‘obo-json’, ‘trapi-json’, ‘neo4j’, ‘arangodb’, ‘duckdb’, ‘nt’, ‘jelly’, ‘owl’, ‘sssom’, ‘parquet’)

-c, --input-compression <input_compression>

The input compression type

-l, --uri <uri>

Required ArangoDB URI to upload to. For example, http://localhost:8529

-d, --database <database>

Required ArangoDB database name

-u, --username <username>

Required ArangoDB username

-p, --password <password>

Required ArangoDB password

-s, --stream

Parse input as a stream

-n, --node-filters <node_filters>

Filters for filtering nodes from the input graph

-e, --edge-filters <edge_filters>

Filters for filtering edges from the input graph

--node-collection <node_collection>

Name of the vertex collection (default: nodes)

--edge-collection <edge_collection>

Name of the edge collection (default: edges)

--curie-routing

Route nodes/edges to per-CURIE-prefix collections (e.g., CL:1000300 -> collection CL)

Arguments

INPUTS

Required argument(s)

graph-summary

Loads and summarizes a knowledge graph from a set of input files.

Parameters

inputs: List[str]

Input file

input_format: str

Input file format

input_compression: Optional[str]

The input compression type

output: Optional[str]

Where to write the output (stdout, by default)

report_type: str

The summary get_errors type: “kgx-map” or “meta-knowledge-graph”

report_format: Optional[str]

The summary get_errors format file types: ‘yaml’ or ‘json’ (default is report_type specific)

graph_name: str

User specified name of graph being summarize

node_facet_properties: Optional[List]

A list of node properties from which to generate counts per value for those properties. For example, ['provided_by']

edge_facet_properties: Optional[List]

A list of edge properties from which to generate counts per value for those properties. For example, ['original_knowledge_source', 'aggregator_knowledge_source']

error_log: str

Where to write any graph processing error message (stderr, by default, for empty argument)

Usage

kgx graph-summary [OPTIONS] INPUTS...

Options

-i, --input-format <input_format>

Required The input format. Can be one of (‘tsv’, ‘csv’, ‘graph’, ‘json’, ‘jsonl’, ‘obojson’, ‘obo-json’, ‘trapi-json’, ‘neo4j’, ‘arangodb’, ‘duckdb’, ‘nt’, ‘jelly’, ‘owl’, ‘sssom’, ‘parquet’)

-c, --input-compression <input_compression>

The input compression type

-o, --output <output>

Required

-r, --report-type <report_type>

The summary get_errors type. Must be one of (‘kgx-map’, ‘meta-knowledge-graph’)

-f, --report-format <report_format>

The input format. Can be one of (‘yaml’, ‘json’)

-n, --graph-name <graph_name>

User specified name of graph being summarized (default: ‘Graph’)

--node-facet-properties <node_facet_properties>

A list of node properties from which to generate counts per value for those properties

--edge-facet-properties <edge_facet_properties>

A list of edge properties from which to generate counts per value for those properties

-l, --error-log <error_log>

File within which to get_errors graph data parsing errors (default: “stderr”)

Arguments

INPUTS

Required argument(s)

merge

Load nodes and edges from files and KGs, as defined in a config YAML, and merge them into a single graph. The merged graph can then be written to a local/remote Neo4j instance OR be serialized into a file.

Note

Everything here is driven by the merge-config YAML.

Parameters

merge_config: str

Merge config YAML

source: List

A list of source to load from the YAML

destination: List

A list of destination to write to, as defined in the YAML

processes: int

Number of processes to use

Usage

kgx merge [OPTIONS]

Options

--merge-config <merge_config>

Required

--source <source>

Source(s) from the YAML to process

--destination <destination>

Destination(s) from the YAML to process

-p, --processes <processes>

Number of processes to use

neo4j-download

Download nodes and edges from Neo4j database.

Parameters

uri: str

Neo4j URI. For example, https://localhost:7474

username: str

Username for authentication

password: str

Password for authentication

output: str

Where to write the output (stdout, by default)

output_format: str

The output type (tsv, by default)

output_compression: str

The output compression type

stream: bool

Whether to parse input as a stream

node_filters: Tuple[str, str]

Node filters

edge_filters: Tuple[str, str]

Edge filters

Usage

kgx neo4j-download [OPTIONS]

Options

-l, --uri <uri>

Required Neo4j URI to download from. For example, https://localhost:7474

-u, --username <username>

Required Neo4j username

-p, --password <password>

Required Neo4j password

-o, --output <output>

Required Output

-f, --output-format <output_format>

Required The output format. Can be one of (‘csv’, ‘graph’, ‘json’, ‘jsonl’, ‘neo4j’, ‘arangodb’, ‘nt’, ‘jelly’, ‘null’, ‘sql’, ‘tsv’, ‘parquet’)

-d, --output-compression <output_compression>

The output compression type

-s, --stream

Parse input as a stream

-n, --node-filters <node_filters>

Filters for filtering nodes from the input graph

-e, --edge-filters <edge_filters>

Filters for filtering edges from the input graph

neo4j-upload

Upload a set of nodes/edges to a Neo4j database.

Parameters

inputs: List[str]

A list of files that contains nodes/edges

input_format: str

The input format

input_compression: str

The input compression type

uri: str

The full HTTP address for Neo4j database

username: str

Username for authentication

password: str

Password for authentication

stream: bool

Whether to parse input as a stream

node_filters: Tuple[str, str]

Node filters

edge_filters: Tuple[str, str]

Edge filters

Usage

kgx neo4j-upload [OPTIONS] INPUTS...

Options

-i, --input-format <input_format>

Required The input format. Can be one of (‘tsv’, ‘csv’, ‘graph’, ‘json’, ‘jsonl’, ‘obojson’, ‘obo-json’, ‘trapi-json’, ‘neo4j’, ‘arangodb’, ‘duckdb’, ‘nt’, ‘jelly’, ‘owl’, ‘sssom’, ‘parquet’)

-c, --input-compression <input_compression>

The input compression type

-l, --uri <uri>

Required Neo4j URI to upload to. For example, https://localhost:7474

-u, --username <username>

Required Neo4j username

-p, --password <password>

Required Neo4j password

-s, --stream

Parse input as a stream

-n, --node-filters <node_filters>

Filters for filtering nodes from the input graph

-e, --edge-filters <edge_filters>

Filters for filtering edges from the input graph

Arguments

INPUTS

Required argument(s)

transform

Transform a Knowledge Graph from one serialization form to another.

Parameters

inputs: List[str]

A list of files that contains nodes/edges

input_format: str

The input format

input_compression: str

The input compression type

output: str

The output file

output_format: str

The output format

output_compression: str

The output compression typ

stream: bool

Whether or not to stream

node_filters: Optional[List[Tuple[str, str]]]

Node input filters

edge_filters: Optional[List[Tuple[str, str]]]

Edge input filters

transform_config: str

Transform config YAML

source: List

A list of source(s) to load from the YAML

knowledge_sources: Optional[List[Tuple[str, str]]]

A list of named knowledge sources with (string, boolean or tuple rewrite) specification

infores_catalog: Optional[str]

Optional dump of a TSV file of InfoRes CURIE to Knowledge Source mappings

processes: int

Number of processes to use

Usage

kgx transform [OPTIONS] [INPUTS]...

Options

-i, --input-format <input_format>

The input format. Can be one of (‘tsv’, ‘csv’, ‘graph’, ‘json’, ‘jsonl’, ‘obojson’, ‘obo-json’, ‘trapi-json’, ‘neo4j’, ‘arangodb’, ‘duckdb’, ‘nt’, ‘jelly’, ‘owl’, ‘sssom’, ‘parquet’)

-c, --input-compression <input_compression>

The input compression type

-o, --output <output>

Output

-f, --output-format <output_format>

The output format. Can be one of (‘tsv’, ‘csv’, ‘graph’, ‘json’, ‘jsonl’, ‘obojson’, ‘obo-json’, ‘trapi-json’, ‘neo4j’, ‘arangodb’, ‘duckdb’, ‘nt’, ‘jelly’, ‘owl’, ‘sssom’, ‘parquet’)

-d, --output-compression <output_compression>

The output compression type

--stream

Parse input as a stream

-n, --node-filters <node_filters>

Filters for filtering nodes from the input graph

-e, --edge-filters <edge_filters>

Filters for filtering edges from the input graph

--transform-config <transform_config>

Transform config YAML

--source <source>

Source(s) from the YAML to process

-k, --knowledge-sources <knowledge_sources>

A named knowledge source with (string, boolean or tuple rewrite) specification

--infores-catalog <infores_catalog>

Optional dump of a CSV file of InfoRes CURIE to Knowledge Source mappings

-p, --processes <processes>

Number of processes to use

Arguments

INPUTS

Optional argument(s)

validate

Run KGX validator on an input file to check for Biolink Model compliance.

Parameters

inputs: List[str]

Input files

input_format: str

The input format

input_compression: str

The input compression type

output: str

Path to output file

biolink_release: Optional[str]

SemVer version of Biolink Model Release used for validation (default: latest Biolink Model Toolkit version)

Usage

kgx validate [OPTIONS] INPUTS...

Options

-i, --input-format <input_format>

Required The input format. Can be one of (‘tsv’, ‘csv’, ‘graph’, ‘json’, ‘jsonl’, ‘obojson’, ‘obo-json’, ‘trapi-json’, ‘neo4j’, ‘arangodb’, ‘duckdb’, ‘nt’, ‘jelly’, ‘owl’, ‘sssom’, ‘parquet’)

-c, --input-compression <input_compression>

The input compression type

-o, --output <output>

File to write validation reports to

-b, --biolink-release <biolink_release>

Biolink Model Release (SemVer) used for validation (default: latest Biolink Model Toolkit version)

Arguments

INPUTS

Required argument(s)