Source#

A Source can be implemented for any file, local, and/or remote store that can contains a graph. A Source is responsible for reading nodes and edges from the graph.

A source must subclass kgx.source.source.Source class and must implement the following methods:

parse
read_nodes
read_edges

parse method

Responsible for parsing a graph from a file/store
Must return a generator that iterates over list of node and edge records from the graph

read_nodes method

Responsible for reading nodes from the file/store
Must return a generator that iterates over list of node records
Each node record must be a 2-tuple (node_id, node_data) where,
- node_id is the node CURIE
- node_data is a dictionary that represents the node properties

read_edges method

Responsible for reading edges from the file/store
Must return a generator that iterates over list of edge records
Each edge record must be a 4-tuple (subject_id, object_id, edge_key, edge_data) where,
- subject_id is the subject node CURIE
- object_id is the object node CURIE
- edge_key is the unique key for the edge
- edge_data is a dictionary that represents the edge properties

kgx.source.source#

Base class for all Sources in KGX.

class kgx.source.source.Source(owner)[source]#

Bases: object

A Source is responsible for reading data as records from a store where the store is a file or a database.

check_edge_filter(edge: Dict) → bool[source]#

Check if an edge passes defined edge filters.

Parameters:: edge (Dict) – An edge
Returns:: Whether the given edge has passed all defined edge filters
Return type:: bool

check_node_filter(node: Dict) → bool[source]#

Check if a node passes defined node filters.

Parameters:: node (Dict) – A node
Returns:: Whether the given node has passed all defined node filters
Return type:: bool

clear_graph_metadata()[source]#: Clears a Source graph’s internal graph_metadata. The value of such graph metadata is (now) generally a Callable function. This operation can be used in the code when the metadata is no longer needed, but may cause peculiar Python object persistent problems downstream.

get_infores_catalog() → Dict[str, str][source]#: Return the InfoRes Context of the source

set_edge_filter(key: str, value: set) → None[source]#

Set an edge filter, as defined by a key and value pair. These filters are used to filter (or reduce) the search space when fetching nodes from the underlying store.

Note

When defining the ‘subject_category’ or ‘object_category’ filter, the value should be of type set. This method also sets the ‘category’ node filter, to get a consistent set of nodes in the subgraph.

Parameters:

key (str) – The key for edge filter
value (Union[str, set]) – The value for the edge filter. Can be either a string or a set.

set_edge_filters(filters: Dict) → None[source]#

Set edge filters.

Parameters:: filters (Dict) – Edge filters

set_edge_provenance(edge_data)[source]#: Set a specific edge provenance value.

set_node_filter(key: str, value: Union[str, set]) → None[source]#

Set a node filter, as defined by a key and value pair. These filters are used to filter (or reduce) the search space when fetching nodes from the underlying store.

Note

When defining the ‘category’ filter, the value should be of type set. This method also sets the ‘subject_category’ and ‘object_category’ edge filters, to get a consistent set of nodes in the subgraph.

Parameters:

key (str) – The key for node filter
value (Union[str, set]) – The value for the node filter. Can be either a string or a set.

set_node_filters(filters: Dict) → None[source]#

Set node filters.

Parameters:: filters (Dict) – Node filters

set_node_provenance(node_data)[source]#: Set a specific node provenance value.

set_prefix_map(m: Dict) → None[source]#

Update default prefix map.

Parameters:: m (Dict) – A dictionary with prefix to IRI mappings

set_provenance_map(kwargs)[source]#: Set up a provenance (Knowledge Source to InfoRes) map

validate_edge(edge: Dict) → Optional[Dict][source]#

Given an edge as a dictionary, check for required properties. This method will return the edge dictionary with default assumptions applied, if any.

Parameters:: edge (Dict) – An edge represented as a dict
Returns:: An edge represented as a dict, with default assumptions applied.
Return type:: Dict

validate_node(node: Dict) → Optional[Dict][source]#

Given a node as a dictionary, check for required properties. This method will return the node dictionary with default assumptions applied, if any.

Parameters:: node (Dict) – A node represented as a dict
Returns:: A node represented as a dict, with default assumptions applied.
Return type:: Dict

kgx.source.graph_source#

GraphSource is responsible for reading from an instance of kgx.graph.base_graph.BaseGraph and must use only the methods exposed by BaseGraph to access the graph.

class kgx.source.graph_source.GraphSource(owner)[source]#

Bases: Source

GraphSource is responsible for reading data as records from an in memory graph representation.

The underlying store must be an instance of kgx.graph.base_graph.BaseGraph

check_edge_filter(edge: Dict) → bool#

Check if an edge passes defined edge filters.

Parameters:: edge (Dict) – An edge
Returns:: Whether the given edge has passed all defined edge filters
Return type:: bool

check_node_filter(node: Dict) → bool#

Check if a node passes defined node filters.

Parameters:: node (Dict) – A node
Returns:: Whether the given node has passed all defined node filters
Return type:: bool

clear_graph_metadata()#: Clears a Source graph’s internal graph_metadata. The value of such graph metadata is (now) generally a Callable function. This operation can be used in the code when the metadata is no longer needed, but may cause peculiar Python object persistent problems downstream.

get_infores_catalog() → Dict[str, str]#: Return the InfoRes Context of the source

parse(graph: BaseGraph, **kwargs: Any) → Generator[source]#

This method reads from a graph and yields records.

Parameters:

graph (kgx.graph.base_graph.BaseGraph) – The graph to read from
kwargs (Any) – Any additional arguments

Returns:

A generator for node and edge records read from the graph

Return type:

Generator

read_edges() → Generator[source]#

Read edges as records from the graph.

Returns:: A generator for edges
Return type:: Generator

read_nodes() → Generator[source]#

Read nodes as records from the graph.

Returns:: A generator for nodes
Return type:: Generator

set_edge_filter(key: str, value: set) → None#

Set an edge filter, as defined by a key and value pair. These filters are used to filter (or reduce) the search space when fetching nodes from the underlying store.

Note

Parameters:

key (str) – The key for edge filter
value (Union[str, set]) – The value for the edge filter. Can be either a string or a set.

set_edge_filters(filters: Dict) → None#

Set edge filters.

Parameters:: filters (Dict) – Edge filters

set_edge_provenance(edge_data)#: Set a specific edge provenance value.

set_node_filter(key: str, value: Union[str, set]) → None#

Set a node filter, as defined by a key and value pair. These filters are used to filter (or reduce) the search space when fetching nodes from the underlying store.

Note

Parameters:

key (str) – The key for node filter
value (Union[str, set]) – The value for the node filter. Can be either a string or a set.

set_node_filters(filters: Dict) → None#

Set node filters.

Parameters:: filters (Dict) – Node filters

set_node_provenance(node_data)#: Set a specific node provenance value.

set_prefix_map(m: Dict) → None#

Update default prefix map.

Parameters:: m (Dict) – A dictionary with prefix to IRI mappings

set_provenance_map(kwargs)#: Set up a provenance (Knowledge Source to InfoRes) map

validate_edge(edge: Dict) → Optional[Dict]#

Given an edge as a dictionary, check for required properties. This method will return the edge dictionary with default assumptions applied, if any.

Parameters:: edge (Dict) – An edge represented as a dict
Returns:: An edge represented as a dict, with default assumptions applied.
Return type:: Dict

validate_node(node: Dict) → Optional[Dict]#

Given a node as a dictionary, check for required properties. This method will return the node dictionary with default assumptions applied, if any.

Parameters:: node (Dict) – A node represented as a dict
Returns:: A node represented as a dict, with default assumptions applied.
Return type:: Dict

kgx.source.tsv_source#

TsvSource is responsible for reading from KGX formatted CSV or TSV using Pandas where every flat file is treated as a Pandas DataFrame and from which data are read in chunks.

KGX expects two separate files - one for nodes and another for edges.

class kgx.source.tsv_source.TsvSource(owner)[source]#

Bases: Source

TsvSource is responsible for reading data as records from a TSV/CSV.

check_edge_filter(edge: Dict) → bool#

Check if an edge passes defined edge filters.

Parameters:: edge (Dict) – An edge
Returns:: Whether the given edge has passed all defined edge filters
Return type:: bool

check_node_filter(node: Dict) → bool#

Check if a node passes defined node filters.

Parameters:: node (Dict) – A node
Returns:: Whether the given node has passed all defined node filters
Return type:: bool

clear_graph_metadata()#: Clears a Source graph’s internal graph_metadata. The value of such graph metadata is (now) generally a Callable function. This operation can be used in the code when the metadata is no longer needed, but may cause peculiar Python object persistent problems downstream.

get_infores_catalog() → Dict[str, str]#: Return the InfoRes Context of the source

parse(filename: str, format: str, compression: Optional[str] = None, **kwargs: Any) → Generator[source]#

This method reads from a TSV/CSV and yields records.

Parameters:

filename (str) – The filename to parse
format (str) – The format (tsv, csv)
compression (Optional[str]) – The compression type (tar, tar.gz)
kwargs (Any) – Any additional arguments

Returns:

A generator for node and edge records

Return type:

Generator

read_edge(edge: Dict) → Optional[Tuple][source]#

Load an edge into an instance of BaseGraph.

Parameters:: edge (Dict) – An edge
Returns:: A tuple that contains subject id, object id, edge key, and edge data
Return type:: Optional[Tuple]

read_edges(df: DataFrame) → Generator[source]#

Load edges from pandas.DataFrame into an instance of BaseGraph.

Parameters:: df (pandas.DataFrame) – Dataframe containing records that represent edges
Returns:: A generator for edge records
Return type:: Generator

read_node(node: Dict) → Optional[Tuple[str, Dict]][source]#

Prepare a node.

Parameters:: node (Dict) – A node
Returns:: A tuple that contains node id and node data
Return type:: Optional[Tuple[str, Dict]]

read_nodes(df: DataFrame) → Generator[source]#

Read records from pandas.DataFrame and yield records.

Parameters:: df (pandas.DataFrame) – Dataframe containing records that represent nodes
Returns:: A generator for node records
Return type:: Generator

set_edge_filter(key: str, value: set) → None#

Set an edge filter, as defined by a key and value pair. These filters are used to filter (or reduce) the search space when fetching nodes from the underlying store.

Note

Parameters:

key (str) – The key for edge filter
value (Union[str, set]) – The value for the edge filter. Can be either a string or a set.

set_edge_filters(filters: Dict) → None#

Set edge filters.

Parameters:: filters (Dict) – Edge filters

set_edge_provenance(edge_data)#: Set a specific edge provenance value.

set_node_filter(key: str, value: Union[str, set]) → None#

Set a node filter, as defined by a key and value pair. These filters are used to filter (or reduce) the search space when fetching nodes from the underlying store.

Note

Parameters:

key (str) – The key for node filter
value (Union[str, set]) – The value for the node filter. Can be either a string or a set.

set_node_filters(filters: Dict) → None#

Set node filters.

Parameters:: filters (Dict) – Node filters

set_node_provenance(node_data)#: Set a specific node provenance value.

set_prefix_map(m: Dict) → None[source]#

Add or override default prefix to IRI map.

Parameters:: m (Dict) – Prefix to IRI map

set_provenance_map(kwargs)#: Set up a provenance (Knowledge Source to InfoRes) map

set_reverse_prefix_map(m: Dict) → None[source]#

Add or override default IRI to prefix map.

Parameters:: m (Dict) – IRI to prefix map

validate_edge(edge: Dict) → Optional[Dict]#

Given an edge as a dictionary, check for required properties. This method will return the edge dictionary with default assumptions applied, if any.

Parameters:: edge (Dict) – An edge represented as a dict
Returns:: An edge represented as a dict, with default assumptions applied.
Return type:: Dict

validate_node(node: Dict) → Optional[Dict]#

Given a node as a dictionary, check for required properties. This method will return the node dictionary with default assumptions applied, if any.

Parameters:: node (Dict) – A node represented as a dict
Returns:: A node represented as a dict, with default assumptions applied.
Return type:: Dict

kgx.source.json_source#

JsonSource is responsible for reading data from a KGX formatted JSON using the ijson library, which allows for streaming data from the file.

class kgx.source.json_source.JsonSource(owner)[source]#

Bases: TsvSource

JsonSource is responsible for reading data as records from a JSON.

check_edge_filter(edge: Dict) → bool#

Check if an edge passes defined edge filters.

Parameters:: edge (Dict) – An edge
Returns:: Whether the given edge has passed all defined edge filters
Return type:: bool

check_node_filter(node: Dict) → bool#

Check if a node passes defined node filters.

Parameters:: node (Dict) – A node
Returns:: Whether the given node has passed all defined node filters
Return type:: bool

clear_graph_metadata()#: Clears a Source graph’s internal graph_metadata. The value of such graph metadata is (now) generally a Callable function. This operation can be used in the code when the metadata is no longer needed, but may cause peculiar Python object persistent problems downstream.

get_infores_catalog() → Dict[str, str]#: Return the InfoRes Context of the source

parse(filename: str, format: str = 'json', compression: Optional[str] = None, **kwargs: Any) → Generator[source]#

This method reads from a JSON and yields records.

Parameters:

filename (str) – The filename to parse
format (str) – The format (json)
compression (Optional[str]) – The compression type (gz)
kwargs (Any) – Any additional arguments

Returns:

A generator for node and edge records read from the file

Return type:

Generator

read_edge(edge: Dict) → Optional[Tuple]#

Load an edge into an instance of BaseGraph.

Parameters:: edge (Dict) – An edge
Returns:: A tuple that contains subject id, object id, edge key, and edge data
Return type:: Optional[Tuple]

read_edges(filename: str) → Generator[source]#

Read edge records from a JSON.

Parameters:: filename (str) – The filename to read from
Returns:: A generator for edge records
Return type:: Generator

read_node(node: Dict) → Optional[Tuple[str, Dict]]#

Prepare a node.

Parameters:: node (Dict) – A node
Returns:: A tuple that contains node id and node data
Return type:: Optional[Tuple[str, Dict]]

read_nodes(filename: str) → Generator[source]#

Read node records from a JSON.

Parameters:: filename (str) – The filename to read from
Returns:: A generator for node records
Return type:: Generator

set_edge_filter(key: str, value: set) → None#

Set an edge filter, as defined by a key and value pair. These filters are used to filter (or reduce) the search space when fetching nodes from the underlying store.

Note

Parameters:

key (str) – The key for edge filter
value (Union[str, set]) – The value for the edge filter. Can be either a string or a set.

set_edge_filters(filters: Dict) → None#

Set edge filters.

Parameters:: filters (Dict) – Edge filters

set_edge_provenance(edge_data)#: Set a specific edge provenance value.

set_node_filter(key: str, value: Union[str, set]) → None#

Set a node filter, as defined by a key and value pair. These filters are used to filter (or reduce) the search space when fetching nodes from the underlying store.

Note

Parameters:

key (str) – The key for node filter
value (Union[str, set]) – The value for the node filter. Can be either a string or a set.

set_node_filters(filters: Dict) → None#

Set node filters.

Parameters:: filters (Dict) – Node filters

set_node_provenance(node_data)#: Set a specific node provenance value.

set_prefix_map(m: Dict) → None#

Add or override default prefix to IRI map.

Parameters:: m (Dict) – Prefix to IRI map

set_provenance_map(kwargs)#: Set up a provenance (Knowledge Source to InfoRes) map

set_reverse_prefix_map(m: Dict) → None#

Add or override default IRI to prefix map.

Parameters:: m (Dict) – IRI to prefix map

validate_edge(edge: Dict) → Optional[Dict]#

Given an edge as a dictionary, check for required properties. This method will return the edge dictionary with default assumptions applied, if any.

Parameters:: edge (Dict) – An edge represented as a dict
Returns:: An edge represented as a dict, with default assumptions applied.
Return type:: Dict

validate_node(node: Dict) → Optional[Dict]#

Given a node as a dictionary, check for required properties. This method will return the node dictionary with default assumptions applied, if any.

Parameters:: node (Dict) – A node represented as a dict
Returns:: A node represented as a dict, with default assumptions applied.
Return type:: Dict

kgx.source.jsonl_source#

JsonlSource is responsible for reading data from a KGX formatted JSON Lines using the jsonlines library.

KGX expects two separate JSON Lines files - one for nodes and another for edges.

KGX JSON Lines Format Specification#

The JSON Lines format provides an efficient way to represent KGX data where each line contains a single JSON object representing either a node or an edge. This format is ideal for streaming large graphs and combines the advantages of JSON with line-oriented processing.

File Structure#

{filename}_nodes.jsonl: Contains one node per line, each as a complete JSON object
{filename}_edges.jsonl: Contains one edge per line, each as a complete JSON object

Node Record Format#

Required Properties#

id (string): A CURIE that uniquely identifies the node in the graph
category (array of strings): List of Biolink categories for the node, from the NamedThing hierarchy

Common Optional Properties#

name (string): Human-readable name of the entity
description (string): Human-readable description of the entity
provided_by (array of strings): List of sources that provided this node
xref (array of strings): List of database cross-references as CURIEs
synonym (array of strings): List of alternative names for the entity

Edge Record Format#

Required Properties#

subject (string): CURIE of the source node
predicate (string): Biolink predicate representing the relationship type
object (string): CURIE of the target node
knowledge_level (string): Level of knowledge representation (observation, assertion, concept, statement) according to Biolink Model
agent_type (string): Autonomous agents for edges (informational, computational, biochemical, biological) according to Biolink Model

Common Optional Properties#

id (string): Unique identifier for the edge, often a UUID
relation (string): Relation CURIE from a formal relation ontology (e.g., RO)
category (array of strings): List of Biolink association categories
knowledge_source (array of strings): Sources of knowledge (deprecated: provided_by)
primary_knowledge_source (array of strings): Primary knowledge sources
aggregator_knowledge_source (array of strings): Knowledge aggregator sources
publications (array of strings): List of publication CURIEs supporting the edge

Examples#

Node Example (nodes.jsonl):

Each line in a nodes.jsonl file represents a complete node record. Here are examples of different node types:

{
  "id": "HGNC:11603",
  "name": "TBX4",
  "category": ["biolink:Gene"]
},
{
  "id": "MONDO:0005002",
  "name": "chronic obstructive pulmonary disease",
  "category": ["biolink:Disease"]
},
{
  "id": "CHEBI:15365",
  "name": "acetaminophen",
  "category": ["biolink:SmallMolecule", "biolink:ChemicalEntity"]
}

In the actual jsonlines file, each record would be on a single line without comments and formatting:

{"id":"HGNC:11603","name":"TBX4","category":["biolink:Gene"]}
{"id":"MONDO:0005002","name":"chronic obstructive pulmonary disease","category":["biolink:Disease"]}
{"id":"CHEBI:15365","name":"acetaminophen","category":["biolink:SmallMolecule","biolink:ChemicalEntity"]}

Edge Example (edges.jsonl):

Each line in a jsonlines file represents a complete edge record. Here are examples of different edge types:

{
  "id": "a8575c4e-61a6-428a-bf09-fcb3e8d1644d",
  "subject": "HGNC:11603",
  "object": "MONDO:0005002",
  "predicate": "biolink:related_to",
  "relation": "RO:0003304",
  "knowledge_level": "assertion",
  "agent_type": "computational"
},
{
  "id": "urn:uuid:5b06e86f-d768-4cd9-ac27-abe31e95ab1e",
  "subject": "HGNC:11603",
  "predicate": "biolink:contributes_to",
  "object": "MONDO:0005002",
  "relation": "RO:0003304",
  "category": ["biolink:GeneToDiseaseAssociation"],
  "primary_knowledge_source": ["infores:gwas-catalog"],
  "publications": ["PMID:26634245", "PMID:26634244"],
  "knowledge_level": "observation",
  "agent_type": "biological"
},
{
  "id": "c7d632b4-6708-4296-9cfe-44bc586d32c8",
  "subject": "CHEBI:15365",
  "predicate": "biolink:affects",
  "object": "GO:0006915",
  "relation": "RO:0002434",
  "category": ["biolink:ChemicalToProcessAssociation"],
  "primary_knowledge_source": ["infores:monarchinitiative"],
  "aggregator_knowledge_source": ["infores:biolink-api"],
  "publications": ["PMID:12345678"],
  "knowledge_level": "assertion",
  "agent_type": "computational"
}

In the actual jsonlines file, each record would be on a single line without comments and formatting:

{"id":"a8575c4e-61a6-428a-bf09-fcb3e8d1644d","subject":"HGNC:11603","object":"MONDO:0005002","predicate":"biolink:related_to","relation":"RO:0003304","knowledge_level":"assertion","agent_type":"computational"}
{"id":"urn:uuid:5b06e86f-d768-4cd9-ac27-abe31e95ab1e","subject":"HGNC:11603","predicate":"biolink:contributes_to","object":"MONDO:0005002","relation":"RO:0003304","category":["biolink:GeneToDiseaseAssociation"],"primary_knowledge_source":["infores:gwas-catalog"],"publications":["PMID:26634245","PMID:26634244"],"knowledge_level":"observation","agent_type":"biological"}

Reading JSON Lines with KGX#

When using KGX to read JSON Lines files, the library will:

Parse each line as a complete JSON object
Validate required fields are present
Convert the data into the internal graph representation
Handle arrays properly as native Python lists (unlike TSV where lists are often pipe-delimited strings)

class kgx.source.jsonl_source.JsonlSource(owner)[source]#

Bases: JsonSource

JsonlSource is responsible for reading data as records from JSON Lines.

check_edge_filter(edge: Dict) → bool#

Check if an edge passes defined edge filters.

Parameters:: edge (Dict) – An edge
Returns:: Whether the given edge has passed all defined edge filters
Return type:: bool

check_node_filter(node: Dict) → bool#

Check if a node passes defined node filters.

Parameters:: node (Dict) – A node
Returns:: Whether the given node has passed all defined node filters
Return type:: bool

clear_graph_metadata()#: Clears a Source graph’s internal graph_metadata. The value of such graph metadata is (now) generally a Callable function. This operation can be used in the code when the metadata is no longer needed, but may cause peculiar Python object persistent problems downstream.

get_infores_catalog() → Dict[str, str]#: Return the InfoRes Context of the source

parse(filename: str, format: str = 'jsonl', compression: Optional[str] = None, **kwargs: Any) → Generator[source]#

This method reads from JSON Lines and yields records.

Parameters:

filename (str) – The filename to parse
format (str) – The format (json)
compression (Optional[str]) – The compression type (gz)
kwargs (Any) – Any additional arguments

Returns:

A generator for records

Return type:

Generator

read_edge(edge: Dict) → Optional[Tuple]#

Load an edge into an instance of BaseGraph.

Parameters:: edge (Dict) – An edge
Returns:: A tuple that contains subject id, object id, edge key, and edge data
Return type:: Optional[Tuple]

read_edges(filename: str) → Generator#

Read edge records from a JSON.

Parameters:: filename (str) – The filename to read from
Returns:: A generator for edge records
Return type:: Generator

read_node(node: Dict) → Optional[Tuple[str, Dict]]#

Prepare a node.

Parameters:: node (Dict) – A node
Returns:: A tuple that contains node id and node data
Return type:: Optional[Tuple[str, Dict]]

read_nodes(filename: str) → Generator#

Read node records from a JSON.

Parameters:: filename (str) – The filename to read from
Returns:: A generator for node records
Return type:: Generator

set_edge_filter(key: str, value: set) → None#

Set an edge filter, as defined by a key and value pair. These filters are used to filter (or reduce) the search space when fetching nodes from the underlying store.

Note

Parameters:

key (str) – The key for edge filter
value (Union[str, set]) – The value for the edge filter. Can be either a string or a set.

set_edge_filters(filters: Dict) → None#

Set edge filters.

Parameters:: filters (Dict) – Edge filters

set_edge_provenance(edge_data)#: Set a specific edge provenance value.

set_node_filter(key: str, value: Union[str, set]) → None#

Set a node filter, as defined by a key and value pair. These filters are used to filter (or reduce) the search space when fetching nodes from the underlying store.

Note

Parameters:

key (str) – The key for node filter
value (Union[str, set]) – The value for the node filter. Can be either a string or a set.

set_node_filters(filters: Dict) → None#

Set node filters.

Parameters:: filters (Dict) – Node filters

set_node_provenance(node_data)#: Set a specific node provenance value.

set_prefix_map(m: Dict) → None#

Add or override default prefix to IRI map.

Parameters:: m (Dict) – Prefix to IRI map

set_provenance_map(kwargs)#: Set up a provenance (Knowledge Source to InfoRes) map

set_reverse_prefix_map(m: Dict) → None#

Add or override default IRI to prefix map.

Parameters:: m (Dict) – IRI to prefix map

validate_edge(edge: Dict) → Optional[Dict]#

Given an edge as a dictionary, check for required properties. This method will return the edge dictionary with default assumptions applied, if any.

Parameters:: edge (Dict) – An edge represented as a dict
Returns:: An edge represented as a dict, with default assumptions applied.
Return type:: Dict

validate_node(node: Dict) → Optional[Dict]#

Given a node as a dictionary, check for required properties. This method will return the node dictionary with default assumptions applied, if any.

Parameters:: node (Dict) – A node represented as a dict
Returns:: A node represented as a dict, with default assumptions applied.
Return type:: Dict

kgx.source.trapi_source#

TrapiSource is responsible for reading data from a Translator Reasoner API formatted JSON.

class kgx.source.trapi_source.TrapiSource(owner)[source]#

Bases: JsonSource

TrapiSource is responsible for reading data as records from a TRAPI (Translator Reasoner API) compliant JSON.

This class handles TRAPI 1.5.0 specification.

check_edge_filter(edge: Dict) → bool#

Check if an edge passes defined edge filters.

Parameters:: edge (Dict) – An edge
Returns:: Whether the given edge has passed all defined edge filters
Return type:: bool

check_node_filter(node: Dict) → bool#

Check if a node passes defined node filters.

Parameters:: node (Dict) – A node
Returns:: Whether the given node has passed all defined node filters
Return type:: bool

clear_graph_metadata()#: Clears a Source graph’s internal graph_metadata. The value of such graph metadata is (now) generally a Callable function. This operation can be used in the code when the metadata is no longer needed, but may cause peculiar Python object persistent problems downstream.

get_infores_catalog() → Dict[str, str]#: Return the InfoRes Context of the source

load_edge(edge: Dict) → Tuple[str, str, str, Dict][source]#

Load a TRAPI edge into KGX format

Parameters:: edge (Dict) – A TRAPI edge
Returns:: A tuple containing (subject_id, object_id, edge_id, edge_data) in KGX format
Return type:: Tuple[str, str, str, Dict]

load_node(node: Dict) → Tuple[str, Dict][source]#

Load a TRAPI node into KGX format

Parameters:: node (Dict) – A TRAPI node
Returns:: A tuple containing (node_id, node_data) in KGX format
Return type:: Tuple[str, Dict]

parse(filename: str, format: str = 'json', compression: Optional[str] = None, **kwargs: Any) → Generator[source]#

This method reads from a TRAPI JSON and yields KGX records.

Parameters:

filename (str) – The filename to parse
format (str) – The format (json or jsonl)
compression (Optional[str]) – The compression type (gz)
kwargs (Any) – Any additional arguments

Returns:

A generator for node and edge records

Return type:

Generator

read_edge(edge: Dict) → Optional[Tuple]#

Load an edge into an instance of BaseGraph.

Parameters:: edge (Dict) – An edge
Returns:: A tuple that contains subject id, object id, edge key, and edge data
Return type:: Optional[Tuple]

read_edges(filename: str, compression: Optional[str] = None) → Generator[source]#

Read edge records from a TRAPI JSON.

Parameters:

filename (str) – The filename to read from
compression (Optional[str]) – The compression type

Returns:

A generator for edge records

Return type:

Generator

read_edges_jsonl(filename: str, compression: Optional[str] = None) → Generator[source]#

Read edge records from a TRAPI JSONL file.

Parameters:

filename (str) – The filename to read from
compression (Optional[str]) – The compression type

Returns:

A generator for edge records

Return type:

Generator

read_node(node: Dict) → Optional[Tuple[str, Dict]]#

Prepare a node.

Parameters:: node (Dict) – A node
Returns:: A tuple that contains node id and node data
Return type:: Optional[Tuple[str, Dict]]

read_nodes(filename: str, compression: Optional[str] = None) → Generator[source]#

Read node records from a TRAPI JSON.

Parameters:

filename (str) – The filename to read from
compression (Optional[str]) – The compression type

Returns:

A generator for node records

Return type:

Generator

read_nodes_jsonl(filename: str, compression: Optional[str] = None) → Generator[source]#

Read node records from a TRAPI JSONL file.

Parameters:

filename (str) – The filename to read from
compression (Optional[str]) – The compression type

Returns:

A generator for node records

Return type:

Generator

set_edge_filter(key: str, value: set) → None#

Set an edge filter, as defined by a key and value pair. These filters are used to filter (or reduce) the search space when fetching nodes from the underlying store.

Note

Parameters:

key (str) – The key for edge filter
value (Union[str, set]) – The value for the edge filter. Can be either a string or a set.

set_edge_filters(filters: Dict) → None#

Set edge filters.

Parameters:: filters (Dict) – Edge filters

set_edge_provenance(edge_data)#: Set a specific edge provenance value.

set_node_filter(key: str, value: Union[str, set]) → None#

Set a node filter, as defined by a key and value pair. These filters are used to filter (or reduce) the search space when fetching nodes from the underlying store.

Note

Parameters:

key (str) – The key for node filter
value (Union[str, set]) – The value for the node filter. Can be either a string or a set.

set_node_filters(filters: Dict) → None#

Set node filters.

Parameters:: filters (Dict) – Node filters

set_node_provenance(node_data)#: Set a specific node provenance value.

set_prefix_map(m: Dict) → None#

Add or override default prefix to IRI map.

Parameters:: m (Dict) – Prefix to IRI map

set_provenance_map(kwargs)#: Set up a provenance (Knowledge Source to InfoRes) map

set_reverse_prefix_map(m: Dict) → None#

Add or override default IRI to prefix map.

Parameters:: m (Dict) – IRI to prefix map

validate_edge(edge: Dict) → Optional[Dict]#

Given an edge as a dictionary, check for required properties. This method will return the edge dictionary with default assumptions applied, if any.

Parameters:: edge (Dict) – An edge represented as a dict
Returns:: An edge represented as a dict, with default assumptions applied.
Return type:: Dict

validate_node(node: Dict) → Optional[Dict]#

Given a node as a dictionary, check for required properties. This method will return the node dictionary with default assumptions applied, if any.

Parameters:: node (Dict) – A node represented as a dict
Returns:: A node represented as a dict, with default assumptions applied.
Return type:: Dict

kgx.source.obograph_source#

ObographSource is responsible for reading data from OBOGraphs in JSON.

class kgx.source.obograph_source.ObographSource(owner)[source]#

Bases: JsonSource

ObographSource is responsible for reading data as records from an OBO Graph JSON.

check_edge_filter(edge: Dict) → bool#

Check if an edge passes defined edge filters.

Parameters:: edge (Dict) – An edge
Returns:: Whether the given edge has passed all defined edge filters
Return type:: bool

check_node_filter(node: Dict) → bool#

Check if a node passes defined node filters.

Parameters:: node (Dict) – A node
Returns:: Whether the given node has passed all defined node filters
Return type:: bool

clear_graph_metadata()#: Clears a Source graph’s internal graph_metadata. The value of such graph metadata is (now) generally a Callable function. This operation can be used in the code when the metadata is no longer needed, but may cause peculiar Python object persistent problems downstream.

get_category(curie: str, node: dict) → Optional[str][source]#

Get category for a given CURIE.

Parameters:

curie (str) – Curie for node
node (dict) – Node data

Returns:

Category for the given node CURIE.

Return type:

Optional[str]

get_infores_catalog() → Dict[str, str]#: Return the InfoRes Context of the source

parse(filename: str, format: str = 'json', compression: Optional[str] = None, **kwargs: Any) → Generator[source]#

This method reads from JSON and yields records.

Parameters:

filename (str) – The filename to parse
format (str) – The format (json)
compression (Optional[str]) – The compression type (gz)
kwargs (Any) – Any additional arguments

Returns:

A generator for records

Return type:

Generator

parse_meta(node: str, meta: Dict) → Dict[source]#

Parse ‘meta’ field of a node.

Parameters:

node (str) – Node identifier
meta (Dict) – meta dictionary for the node

Returns:

A dictionary that contains ‘description’, ‘subsets’, ‘synonyms’, ‘xrefs’, a ‘deprecated’ flag and/or ‘equivalent_nodes’.

Return type:

Dict

read_edge(edge: Dict) → Optional[Tuple][source]#

Read and parse an edge record.

Parameters:: edge (Dict) – The edge record
Returns:: The processed edge
Return type:: Dict

read_edges(filename: str, compression: Optional[str] = None) → Generator[source]#

Read edge records from a JSON.

Parameters:

filename (str) – The filename to read from
compression (Optional[str]) – The compression type

Returns:

A generator for edge records

Return type:

Generator

read_node(node: Dict) → Optional[Tuple[str, Dict]][source]#

Read and parse a node record.

Parameters:: node (Dict) – The node record
Returns:: The processed node
Return type:: Dict

read_nodes(filename: str, compression: Optional[str] = None) → Generator[source]#

Read node records from a JSON.

Parameters:

filename (str) – The filename to read from
compression (Optional[str]) – The compression type

Returns:

A generator for node records

Return type:

Generator

set_edge_filter(key: str, value: set) → None#

Set an edge filter, as defined by a key and value pair. These filters are used to filter (or reduce) the search space when fetching nodes from the underlying store.

Note

Parameters:

key (str) – The key for edge filter
value (Union[str, set]) – The value for the edge filter. Can be either a string or a set.

set_edge_filters(filters: Dict) → None#

Set edge filters.

Parameters:: filters (Dict) – Edge filters

set_edge_provenance(edge_data)#: Set a specific edge provenance value.

set_node_filter(key: str, value: Union[str, set]) → None#

Set a node filter, as defined by a key and value pair. These filters are used to filter (or reduce) the search space when fetching nodes from the underlying store.

Note

Parameters:

key (str) – The key for node filter
value (Union[str, set]) – The value for the node filter. Can be either a string or a set.

set_node_filters(filters: Dict) → None#

Set node filters.

Parameters:: filters (Dict) – Node filters

set_node_provenance(node_data)#: Set a specific node provenance value.

set_prefix_map(m: Dict) → None#

Add or override default prefix to IRI map.

Parameters:: m (Dict) – Prefix to IRI map

set_provenance_map(kwargs)#: Set up a provenance (Knowledge Source to InfoRes) map

set_reverse_prefix_map(m: Dict) → None#

Add or override default IRI to prefix map.

Parameters:: m (Dict) – IRI to prefix map

validate_edge(edge: Dict) → Optional[Dict]#

Given an edge as a dictionary, check for required properties. This method will return the edge dictionary with default assumptions applied, if any.

Parameters:: edge (Dict) – An edge represented as a dict
Returns:: An edge represented as a dict, with default assumptions applied.
Return type:: Dict

validate_node(node: Dict) → Optional[Dict]#

Given a node as a dictionary, check for required properties. This method will return the node dictionary with default assumptions applied, if any.

Parameters:: node (Dict) – A node represented as a dict
Returns:: A node represented as a dict, with default assumptions applied.
Return type:: Dict

kgx.source.sssom_source#

SssomSource is responsible for reading data from an SSSOM formatted files.

KGX Source for Simple Standard for Sharing Ontology Mappings (“SSSOM”)

class kgx.source.sssom_source.SssomSource(owner)[source]#

Bases: Source

SssomSource is responsible for reading data as records from an SSSOM file.

check_edge_filter(edge: Dict) → bool#

Check if an edge passes defined edge filters.

Parameters:: edge (Dict) – An edge
Returns:: Whether the given edge has passed all defined edge filters
Return type:: bool

check_node_filter(node: Dict) → bool#

Check if a node passes defined node filters.

Parameters:: node (Dict) – A node
Returns:: Whether the given node has passed all defined node filters
Return type:: bool

clear_graph_metadata()#: Clears a Source graph’s internal graph_metadata. The value of such graph metadata is (now) generally a Callable function. This operation can be used in the code when the metadata is no longer needed, but may cause peculiar Python object persistent problems downstream.

get_infores_catalog() → Dict[str, str]#: Return the InfoRes Context of the source

load_edge(edge: Dict) → Generator[source]#

Load an edge into an instance of BaseGraph

Parameters:: edge (Dict) – An edge
Returns:: A generator for node and edge records
Return type:: Generator

load_edges(df: DataFrame) → Generator[source]#

Load edges from pandas.DataFrame into an instance of BaseGraph

Parameters:: df (pandas.DataFrame) – Dataframe containing records that represent edges
Returns:: A generator for edge records
Return type:: Generator

load_node(node_data: Dict) → Optional[Tuple[str, Dict]][source]#

Load a node into an instance of BaseGraph

Parameters:: node_data (Dict) – A node
Returns:: A tuple that contains node id and node data
Return type:: Optional[Tuple[str, Dict]]

parse(filename: str, format: str, compression: Optional[str] = None, **kwargs: Any) → Generator[source]#

Parse a SSSOM TSV

Parameters:

filename (str) – File to read from
format (str) – The input file format (tsv, by default)
compression (Optional[str]) – The compression (gz)
kwargs (Dict) – Any additional arguments

Returns:

A generator for node and edge records

Return type:

Generator

parse_header(filename: str, compression: Optional[str] = None) → None[source]#

Parse metadata from SSSOM headers.

Parameters:

filename (str) – Filename to parse
compression (Optional[str]) – Compression type

set_edge_filter(key: str, value: set) → None#

Set an edge filter, as defined by a key and value pair. These filters are used to filter (or reduce) the search space when fetching nodes from the underlying store.

Note

Parameters:

key (str) – The key for edge filter
value (Union[str, set]) – The value for the edge filter. Can be either a string or a set.

set_edge_filters(filters: Dict) → None#

Set edge filters.

Parameters:: filters (Dict) – Edge filters

set_edge_provenance(edge_data)#: Set a specific edge provenance value.

set_node_filter(key: str, value: Union[str, set]) → None#

Set a node filter, as defined by a key and value pair. These filters are used to filter (or reduce) the search space when fetching nodes from the underlying store.

Note

Parameters:

key (str) – The key for node filter
value (Union[str, set]) – The value for the node filter. Can be either a string or a set.

set_node_filters(filters: Dict) → None#

Set node filters.

Parameters:: filters (Dict) – Node filters

set_node_provenance(node_data)#: Set a specific node provenance value.

set_prefix_map(m: Dict) → None[source]#

Add or override default prefix to IRI map.

Parameters:: m (Dict) – Prefix to IRI map

set_provenance_map(kwargs)#: Set up a provenance (Knowledge Source to InfoRes) map

set_reverse_prefix_map(m: Dict) → None[source]#

Add or override default IRI to prefix map.

Parameters:: m (Dict) – IRI to prefix map

validate_edge(edge: Dict) → Optional[Dict]#

Given an edge as a dictionary, check for required properties. This method will return the edge dictionary with default assumptions applied, if any.

Parameters:: edge (Dict) – An edge represented as a dict
Returns:: An edge represented as a dict, with default assumptions applied.
Return type:: Dict

validate_node(node: Dict) → Optional[Dict]#

Given a node as a dictionary, check for required properties. This method will return the node dictionary with default assumptions applied, if any.

Parameters:: node (Dict) – A node represented as a dict
Returns:: A node represented as a dict, with default assumptions applied.
Return type:: Dict

kgx.source.neo_source#

NeoSource is responsible for reading data from a local or remote Neo4j instance.

class kgx.source.neo_source.NeoSource(owner)[source]#

Bases: Source

NeoSource is responsible for reading data as records from a Neo4j instance.

check_edge_filter(edge: Dict) → bool#

Check if an edge passes defined edge filters.

Parameters:: edge (Dict) – An edge
Returns:: Whether the given edge has passed all defined edge filters
Return type:: bool

check_node_filter(node: Dict) → bool#

Check if a node passes defined node filters.

Parameters:: node (Dict) – A node
Returns:: Whether the given node has passed all defined node filters
Return type:: bool

clear_graph_metadata()#: Clears a Source graph’s internal graph_metadata. The value of such graph metadata is (now) generally a Callable function. This operation can be used in the code when the metadata is no longer needed, but may cause peculiar Python object persistent problems downstream.

count(is_directed: bool = True) → int[source]#

Get the total count of records to be fetched from the Neo4j database.

Parameters:: is_directed (bool) – Are edges directed or undirected. True, by default, since edges in most cases are directed.
Returns:: The total count of records
Return type:: int

static format_edge_filter(edge_filters: Dict, key: str, variable: Optional[str] = None, prefix: Optional[str] = None, op: Optional[str] = None) → str[source]#

Get the value for edge filter as defined by key. This is used as a convenience method for generating cypher queries.

Parameters:

edge_filters (Dict) – All edge filters
key (str) – Name of the edge filter
variable (Optional[str]) – Variable binding for cypher query
prefix (Optional[str]) – Prefix for the cypher
op (Optional[str]) – The operator

Returns:

Value corresponding to the given edge filter key, formatted for CQL

Return type:

str

static format_node_filter(node_filters: Dict, key: str, variable: Optional[str] = None, prefix: Optional[str] = None, op: Optional[str] = None) → str[source]#

Get the value for node filter as defined by key. This is used as a convenience method for generating cypher queries.

Parameters:

node_filters (Dict) – All node filters
key (str) – Name of the node filter
variable (Optional[str]) – Variable binding for cypher query
prefix (Optional[str]) – Prefix for the cypher
op (Optional[str]) – The operator

Returns:

Value corresponding to the given node filter key, formatted for CQL

Return type:

str

get_edges(skip: int = 0, limit: int = 0, is_directed: bool = True, **kwargs: Any) → List[source]#

Get a page of edges from the Neo4j database.

Parameters:

skip (int) – Records to skip
limit (int) – Total number of records to query for
is_directed (bool) – Are edges directed or undirected (True, by default, since edges in most cases are directed)
kwargs (Any) – Any additional arguments

Returns:

A list of 3-tuples

Return type:

List

get_infores_catalog() → Dict[str, str]#: Return the InfoRes Context of the source

get_nodes(skip: int = 0, limit: int = 0, **kwargs: Any) → List[source]#

Get a page of nodes from the Neo4j database.

Parameters:

skip (int) – Records to skip
limit (int) – Total number of records to query for
kwargs (Any) – Any additional arguments

Returns:

A list of nodes

Return type:

List

get_pages(query_function, start: int = 0, end: Optional[int] = None, page_size: int = 50000, **kwargs: Any) → Iterator[source]#

Get pages of size page_size from Neo4j. Returns an iterator of pages where number of pages is (end - start)/page_size

Parameters:

query_function (func) – The function to use to fetch records. Usually this is self.get_nodes or self.get_edges
start (int) – Start for pagination
end (Optional[int]) – End for pagination
page_size (int) – Size of each page (10000, by default)
kwargs (Dict) – Any additional arguments that might be relevant for query_function

Returns:

An iterator for a list of records from Neo4j. The size of the list is page_size

Return type:

Iterator

load_edge(edge_record: List) → Tuple[source]#

Load an edge into an instance of BaseGraph

Parameters:: edge_record (List) – A 4-tuple edge record
Returns:: A tuple with subject ID, object ID, edge key, and edge data
Return type:: Tuple

load_edges(edges: List) → None[source]#

Load edges into an instance of BaseGraph

Parameters:: edges (List) – A list of edge records

load_node(node_data: Dict) → Optional[Tuple][source]#

Load node into an instance of BaseGraph

Parameters:: node_data (Dict) – A node
Returns:: A tuple with node ID and node data
Return type:: Tuple

load_nodes(nodes: List) → Generator[source]#

Load nodes into an instance of BaseGraph

Parameters:: nodes (List) – A list of nodes

parse(uri: str, username: str, password: str, node_filters: Optional[Dict] = None, edge_filters: Optional[Dict] = None, start: int = 0, end: Optional[int] = None, is_directed: bool = True, page_size: int = 50000, **kwargs: Any) → Generator[source]#

This method reads from Neo4j instance and yields records

Parameters:

uri (str) – The URI for the Neo4j instance. For example, http://localhost:7474
username (str) – The username
password (str) – The password
node_filters (Dict) – Node filters
edge_filters (Dict) – Edge filters
start (int) – Number of records to skip before streaming
end (int) – Total number of records to fetch
is_directed (bool) – Whether or not the edges should be treated as directed
page_size (int) – The size of each page/batch fetched from Neo4j (50000)
kwargs (Any) – Any additional arguments

Returns:

A generator for records

Return type:

Generator

set_edge_filter(key: str, value: set) → None#

Set an edge filter, as defined by a key and value pair. These filters are used to filter (or reduce) the search space when fetching nodes from the underlying store.

Note

Parameters:

key (str) – The key for edge filter
value (Union[str, set]) – The value for the edge filter. Can be either a string or a set.

set_edge_filters(filters: Dict) → None#

Set edge filters.

Parameters:: filters (Dict) – Edge filters

set_edge_provenance(edge_data)#: Set a specific edge provenance value.

set_node_filter(key: str, value: Union[str, set]) → None#

Set a node filter, as defined by a key and value pair. These filters are used to filter (or reduce) the search space when fetching nodes from the underlying store.

Note

Parameters:

key (str) – The key for node filter
value (Union[str, set]) – The value for the node filter. Can be either a string or a set.

set_node_filters(filters: Dict) → None#

Set node filters.

Parameters:: filters (Dict) – Node filters

set_node_provenance(node_data)#: Set a specific node provenance value.

set_prefix_map(m: Dict) → None#

Update default prefix map.

Parameters:: m (Dict) – A dictionary with prefix to IRI mappings

set_provenance_map(kwargs)#: Set up a provenance (Knowledge Source to InfoRes) map

validate_edge(edge: Dict) → Optional[Dict]#

Given an edge as a dictionary, check for required properties. This method will return the edge dictionary with default assumptions applied, if any.

Parameters:: edge (Dict) – An edge represented as a dict
Returns:: An edge represented as a dict, with default assumptions applied.
Return type:: Dict

validate_node(node: Dict) → Optional[Dict]#

Given a node as a dictionary, check for required properties. This method will return the node dictionary with default assumptions applied, if any.

Parameters:: node (Dict) – A node represented as a dict
Returns:: A node represented as a dict, with default assumptions applied.
Return type:: Dict

kgx.source.rdf_source#

RdfSource is responsible for reading data from RDF N-Triples.

This source makes use of a custom kgx.parsers.ntriples_parser.CustomNTriplesParser for parsing N-Triples, which extends rdflib.plugins.parsers.ntriples.W3CNTriplesParser.

To ensure proper parsing of N-Triples and a relatively low memory footprint, it is recommended that the N-Triples be sorted based on the subject IRIs.

sort -k 1,2 -t ' ' data.nt > data_sorted.nt

class kgx.source.rdf_source.RdfSource(owner)[source]#

Bases: Source

RdfSource is responsible for reading data as records from RDF.

Note

Currently only RDF N-Triples are supported.

add_edge(subject_iri: URIRef, object_iri: URIRef, predicate_iri: URIRef, data: Optional[Dict[Any, Any]] = None) → Dict[source]#

Add an edge to cache.

Parameters:

subject_iri (rdflib.URIRef) – Subject IRI for the subject in a triple
object_iri (rdflib.URIRef) – Object IRI for the object in a triple
predicate_iri (rdflib.URIRef) – Predicate IRI for the predicate in a triple
data (Optional[Dict[Any, Any]]) – Additional edge properties

Returns:

The edge data

Return type:

Dict

add_node(iri: URIRef, data: Optional[Dict] = None) → Dict[source]#

Add a node to cache.

Parameters:

iri (rdflib.URIRef) – IRI of a node
data (Optional[Dict]) – Additional node properties

Returns:

The node data

Return type:

Dict

add_node_attribute(iri: Union[URIRef, str], key: str, value: Union[str, List]) → None[source]#

Add an attribute to a node in cache, while taking into account whether the attribute should be multi-valued.

The key may be a rdflib.URIRef or an URI string that maps onto a property name as defined in rdf_utils.property_mapping.

Parameters:

iri (Union[rdflib.URIRef, str]) – The IRI of a node in the rdflib.Graph
key (str) – The name of the attribute. Can be a rdflib.URIRef or URI string
value (Union[str, List]) – The value of the attribute

Returns:

The node data

Return type:

Dict

check_edge_filter(edge: Dict) → bool#

Check if an edge passes defined edge filters.

Parameters:: edge (Dict) – An edge
Returns:: Whether the given edge has passed all defined edge filters
Return type:: bool

check_node_filter(node: Dict) → bool#

Check if a node passes defined node filters.

Parameters:: node (Dict) – A node
Returns:: Whether the given node has passed all defined node filters
Return type:: bool

clear_graph_metadata()#: Clears a Source graph’s internal graph_metadata. The value of such graph metadata is (now) generally a Callable function. This operation can be used in the code when the metadata is no longer needed, but may cause peculiar Python object persistent problems downstream.

dereify(n: str, node: Dict) → None[source]#

Dereify a node to create a corresponding edge.

Parameters:

n (str) – Node identifier
node (Dict) – Node data

get_biolink_element(predicate: Any) → Optional[Element][source]#

Returns a Biolink Model element for a given predicate.

Parameters:: predicate (Any) – The CURIE of a predicate
Returns:: The corresponding Biolink Model element
Return type:: Optional[Element]

get_infores_catalog() → Dict[str, str]#: Return the InfoRes Context of the source

parse(filename: str, format: str = 'nt', compression: Optional[str] = None, **kwargs: Any) → Generator[source]#

This method reads from RDF N-Triples and yields records.

Note

To ensure proper parsing of N-Triples and a relatively low memory footprint, it is recommended that the N-Triples be sorted based on the subject IRIs.

`sort -k 1,2 -t ' ' data.nt > data_sorted.nt`

Parameters:

filename (str) – The filename to parse
format (str) – The format (nt)
compression (Optional[str]) – The compression type (gz)
kwargs (Any) – Any additional arguments

Returns:

A generator for records

Return type:

Generator

process_predicate(p: Optional[Union[URIRef, str]]) → Tuple[source]#

Process a predicate where the method checks if there is a mapping in Biolink Model.

Parameters:: p (Optional[Union[URIRef, str]]) – The predicate
Returns:: A tuple that contains the Biolink CURIE (if available), the Biolink slot_uri CURIE (if available), the CURIE form of p, the reference of p
Return type:: Tuple

set_edge_filter(key: str, value: set) → None#

Set an edge filter, as defined by a key and value pair. These filters are used to filter (or reduce) the search space when fetching nodes from the underlying store.

Note

Parameters:

key (str) – The key for edge filter
value (Union[str, set]) – The value for the edge filter. Can be either a string or a set.

set_edge_filters(filters: Dict) → None#

Set edge filters.

Parameters:: filters (Dict) – Edge filters

set_edge_provenance(edge_data)#: Set a specific edge provenance value.

set_node_filter(key: str, value: Union[str, set]) → None#

Set a node filter, as defined by a key and value pair. These filters are used to filter (or reduce) the search space when fetching nodes from the underlying store.

Note

Parameters:

key (str) – The key for node filter
value (Union[str, set]) – The value for the node filter. Can be either a string or a set.

set_node_filters(filters: Dict) → None#

Set node filters.

Parameters:: filters (Dict) – Node filters

set_node_property_predicates(predicates) → None[source]#

Set predicates that are to be treated as node properties.

Parameters:: predicates (Set) – Set of predicates

set_node_provenance(node_data)#: Set a specific node provenance value.

set_predicate_mapping(m: Dict) → None[source]#

Set predicate mappings.

Use this method to update mappings for predicates that are not in Biolink Model.

Parameters:: m (Dict) – A dictionary where the keys are IRIs and values are their corresponding property names

set_prefix_map(m: Dict) → None#

Update default prefix map.

Parameters:: m (Dict) – A dictionary with prefix to IRI mappings

set_provenance_map(kwargs)#: Set up a provenance (Knowledge Source to InfoRes) map

triple(s: URIRef, p: URIRef, o: URIRef) → None[source]#

Parse a triple.

Parameters:

s (URIRef) – Subject
p (URIRef) – Predicate
o (URIRef) – Object

update_edge(subject_curie: str, object_curie: str, edge_key: str, data: Optional[Dict[Any, Any]]) → Dict[source]#

Update an edge with properties.

Parameters:

subject_curie (str) – Subject CURIE
object_curie (str) – Object CURIE
edge_key (str) – Edge key
data (Optional[Dict[Any, Any]]) – Edge properties

Returns:

The edge data

Return type:

Dict

update_node(n: Union[URIRef, str], data: Optional[Dict] = None) → Dict[source]#

Update a node with properties.

Parameters:

n (Union[URIRef, str]) – Node identifier
data (Optional[Dict]) – Node properties

Returns:

The node data

Return type:

Dict

validate_edge(edge: Dict) → Optional[Dict]#

Given an edge as a dictionary, check for required properties. This method will return the edge dictionary with default assumptions applied, if any.

Parameters:: edge (Dict) – An edge represented as a dict
Returns:: An edge represented as a dict, with default assumptions applied.
Return type:: Dict

validate_node(node: Dict) → Optional[Dict]#

Given a node as a dictionary, check for required properties. This method will return the node dictionary with default assumptions applied, if any.

Parameters:: node (Dict) – A node represented as a dict
Returns:: A node represented as a dict, with default assumptions applied.
Return type:: Dict

kgx.source.owl_source#

OwlSource is responsible for parsing an OWL ontology.

When parsing an OWL, this source also adds OwlStar annotations to certain OWL axioms.

class kgx.source.owl_source.OwlSource(owner)[source]#

Bases: RdfSource

OwlSource is responsible for parsing an OWL ontology.

..note::: This is a simple parser that loads direct class-class relationships. For more formal OWL parsing, refer to Robot: http://robot.obolibrary.org/

add_edge(subject_iri: URIRef, object_iri: URIRef, predicate_iri: URIRef, data: Optional[Dict[Any, Any]] = None) → Dict#

Add an edge to cache.

Parameters:

subject_iri (rdflib.URIRef) – Subject IRI for the subject in a triple
object_iri (rdflib.URIRef) – Object IRI for the object in a triple
predicate_iri (rdflib.URIRef) – Predicate IRI for the predicate in a triple
data (Optional[Dict[Any, Any]]) – Additional edge properties

Returns:

The edge data

Return type:

Dict

add_node(iri: URIRef, data: Optional[Dict] = None) → Dict#

Add a node to cache.

Parameters:

iri (rdflib.URIRef) – IRI of a node
data (Optional[Dict]) – Additional node properties

Returns:

The node data

Return type:

Dict

add_node_attribute(iri: Union[URIRef, str], key: str, value: Union[str, List]) → None#

Add an attribute to a node in cache, while taking into account whether the attribute should be multi-valued.

The key may be a rdflib.URIRef or an URI string that maps onto a property name as defined in rdf_utils.property_mapping.

Parameters:

iri (Union[rdflib.URIRef, str]) – The IRI of a node in the rdflib.Graph
key (str) – The name of the attribute. Can be a rdflib.URIRef or URI string
value (Union[str, List]) – The value of the attribute

Returns:

The node data

Return type:

Dict

check_edge_filter(edge: Dict) → bool#

Check if an edge passes defined edge filters.

Parameters:: edge (Dict) – An edge
Returns:: Whether the given edge has passed all defined edge filters
Return type:: bool

check_node_filter(node: Dict) → bool#

Check if a node passes defined node filters.

Parameters:: node (Dict) – A node
Returns:: Whether the given node has passed all defined node filters
Return type:: bool

clear_graph_metadata()#: Clears a Source graph’s internal graph_metadata. The value of such graph metadata is (now) generally a Callable function. This operation can be used in the code when the metadata is no longer needed, but may cause peculiar Python object persistent problems downstream.

dereify(n: str, node: Dict) → None#

Dereify a node to create a corresponding edge.

Parameters:

n (str) – Node identifier
node (Dict) – Node data

get_biolink_element(predicate: Any) → Optional[Element]#

Returns a Biolink Model element for a given predicate.

Parameters:: predicate (Any) – The CURIE of a predicate
Returns:: The corresponding Biolink Model element
Return type:: Optional[Element]

get_infores_catalog() → Dict[str, str]#: Return the InfoRes Context of the source

load_graph(rdfgraph: Graph, **kwargs: Any) → None[source]#

Walk through the rdflib.Graph and load all triples into kgx.graph.base_graph.BaseGraph

Parameters:

rdfgraph (rdflib.Graph) – Graph containing nodes and edges
kwargs (Any) – Any additional arguments

parse(filename: str, format: str = 'owl', compression: Optional[str] = None, **kwargs: Any) → Generator[source]#

This method reads from an OWL and yields records.

Parameters:

filename (str) – The filename to parse
format (str) – The format (owl)
compression (Optional[str]) – The compression type (gz)
kwargs (Any) – Any additional arguments

Returns:

A generator for node and edge records read from the file

Return type:

Generator

process_predicate(p: Optional[Union[URIRef, str]]) → Tuple#

Process a predicate where the method checks if there is a mapping in Biolink Model.

Parameters:: p (Optional[Union[URIRef, str]]) – The predicate
Returns:: A tuple that contains the Biolink CURIE (if available), the Biolink slot_uri CURIE (if available), the CURIE form of p, the reference of p
Return type:: Tuple

set_edge_filter(key: str, value: set) → None#

Set an edge filter, as defined by a key and value pair. These filters are used to filter (or reduce) the search space when fetching nodes from the underlying store.

Note

Parameters:

key (str) – The key for edge filter
value (Union[str, set]) – The value for the edge filter. Can be either a string or a set.

set_edge_filters(filters: Dict) → None#

Set edge filters.

Parameters:: filters (Dict) – Edge filters

set_edge_provenance(edge_data)#: Set a specific edge provenance value.

set_node_filter(key: str, value: Union[str, set]) → None#

Set a node filter, as defined by a key and value pair. These filters are used to filter (or reduce) the search space when fetching nodes from the underlying store.

Note

Parameters:

key (str) – The key for node filter
value (Union[str, set]) – The value for the node filter. Can be either a string or a set.

set_node_filters(filters: Dict) → None#

Set node filters.

Parameters:: filters (Dict) – Node filters

set_node_property_predicates(predicates) → None#

Set predicates that are to be treated as node properties.

Parameters:: predicates (Set) – Set of predicates

set_node_provenance(node_data)#: Set a specific node provenance value.

set_predicate_mapping(m: Dict) → None#

Set predicate mappings.

Use this method to update mappings for predicates that are not in Biolink Model.

Parameters:: m (Dict) – A dictionary where the keys are IRIs and values are their corresponding property names

set_prefix_map(m: Dict) → None#

Update default prefix map.

Parameters:: m (Dict) – A dictionary with prefix to IRI mappings

set_provenance_map(kwargs)#: Set up a provenance (Knowledge Source to InfoRes) map

triple(s: URIRef, p: URIRef, o: URIRef) → None#

Parse a triple.

Parameters:

s (URIRef) – Subject
p (URIRef) – Predicate
o (URIRef) – Object

update_edge(subject_curie: str, object_curie: str, edge_key: str, data: Optional[Dict[Any, Any]]) → Dict#

Update an edge with properties.

Parameters:

subject_curie (str) – Subject CURIE
object_curie (str) – Object CURIE
edge_key (str) – Edge key
data (Optional[Dict[Any, Any]]) – Edge properties

Returns:

The edge data

Return type:

Dict

update_node(n: Union[URIRef, str], data: Optional[Dict] = None) → Dict#

Update a node with properties.

Parameters:

n (Union[URIRef, str]) – Node identifier
data (Optional[Dict]) – Node properties

Returns:

The node data

Return type:

Dict

validate_edge(edge: Dict) → Optional[Dict]#

Given an edge as a dictionary, check for required properties. This method will return the edge dictionary with default assumptions applied, if any.

Parameters:: edge (Dict) – An edge represented as a dict
Returns:: An edge represented as a dict, with default assumptions applied.
Return type:: Dict

validate_node(node: Dict) → Optional[Dict]#

Given a node as a dictionary, check for required properties. This method will return the node dictionary with default assumptions applied, if any.

Parameters:: node (Dict) – A node represented as a dict
Returns:: A node represented as a dict, with default assumptions applied.
Return type:: Dict