Source#
A Source can be implemented for any file, local, and/or remote store that can contains a graph. A Source is responsible for reading nodes and edges from the graph.
A source must subclass kgx.source.source.Source
class and must implement the following methods:
parse
read_nodes
read_edges
parse
method
Responsible for parsing a graph from a file/store
Must return a generator that iterates over list of node and edge records from the graph
read_nodes
method
Responsible for reading nodes from the file/store
Must return a generator that iterates over list of node records
Each node record must be a 2-tuple
(node_id, node_data)
where,node_id
is the node CURIEnode_data
is a dictionary that represents the node properties
read_edges
method
Responsible for reading edges from the file/store
Must return a generator that iterates over list of edge records
Each edge record must be a 4-tuple
(subject_id, object_id, edge_key, edge_data)
where,subject_id
is the subject node CURIEobject_id
is the object node CURIEedge_key
is the unique key for the edgeedge_data
is a dictionary that represents the edge properties
kgx.source.source#
Base class for all Sources in KGX.
- class kgx.source.source.Source(owner)[source]#
Bases:
object
A Source is responsible for reading data as records from a store where the store is a file or a database.
- check_edge_filter(edge: Dict) bool [source]#
Check if an edge passes defined edge filters.
- Parameters:
edge (Dict) – An edge
- Returns:
Whether the given edge has passed all defined edge filters
- Return type:
- check_node_filter(node: Dict) bool [source]#
Check if a node passes defined node filters.
- Parameters:
node (Dict) – A node
- Returns:
Whether the given node has passed all defined node filters
- Return type:
- clear_graph_metadata()[source]#
Clears a Source graph’s internal graph_metadata. The value of such graph metadata is (now) generally a Callable function. This operation can be used in the code when the metadata is no longer needed, but may cause peculiar Python object persistent problems downstream.
- set_edge_filter(key: str, value: set) None [source]#
Set an edge filter, as defined by a key and value pair. These filters are used to filter (or reduce) the search space when fetching nodes from the underlying store.
Note
When defining the ‘subject_category’ or ‘object_category’ filter, the value should be of type
set
. This method also sets the ‘category’ node filter, to get a consistent set of nodes in the subgraph.
- set_edge_filters(filters: Dict) None [source]#
Set edge filters.
- Parameters:
filters (Dict) – Edge filters
- set_node_filter(key: str, value: Union[str, set]) None [source]#
Set a node filter, as defined by a key and value pair. These filters are used to filter (or reduce) the search space when fetching nodes from the underlying store.
Note
When defining the ‘category’ filter, the value should be of type
set
. This method also sets the ‘subject_category’ and ‘object_category’ edge filters, to get a consistent set of nodes in the subgraph.
- set_node_filters(filters: Dict) None [source]#
Set node filters.
- Parameters:
filters (Dict) – Node filters
- set_prefix_map(m: Dict) None [source]#
Update default prefix map.
- Parameters:
m (Dict) – A dictionary with prefix to IRI mappings
- validate_edge(edge: Dict) Optional[Dict] [source]#
Given an edge as a dictionary, check for required properties. This method will return the edge dictionary with default assumptions applied, if any.
- Parameters:
edge (Dict) – An edge represented as a dict
- Returns:
An edge represented as a dict, with default assumptions applied.
- Return type:
Dict
- validate_node(node: Dict) Optional[Dict] [source]#
Given a node as a dictionary, check for required properties. This method will return the node dictionary with default assumptions applied, if any.
- Parameters:
node (Dict) – A node represented as a dict
- Returns:
A node represented as a dict, with default assumptions applied.
- Return type:
Dict
kgx.source.graph_source#
GraphSource
is responsible for reading from an instance of kgx.graph.base_graph.BaseGraph
and must use only
the methods exposed by BaseGraph
to access the graph.
- class kgx.source.graph_source.GraphSource(owner)[source]#
Bases:
Source
GraphSource is responsible for reading data as records from an in memory graph representation.
The underlying store must be an instance of
kgx.graph.base_graph.BaseGraph
- check_edge_filter(edge: Dict) bool #
Check if an edge passes defined edge filters.
- Parameters:
edge (Dict) – An edge
- Returns:
Whether the given edge has passed all defined edge filters
- Return type:
- check_node_filter(node: Dict) bool #
Check if a node passes defined node filters.
- Parameters:
node (Dict) – A node
- Returns:
Whether the given node has passed all defined node filters
- Return type:
- clear_graph_metadata()#
Clears a Source graph’s internal graph_metadata. The value of such graph metadata is (now) generally a Callable function. This operation can be used in the code when the metadata is no longer needed, but may cause peculiar Python object persistent problems downstream.
- parse(graph: BaseGraph, **kwargs: Any) Generator [source]#
This method reads from a graph and yields records.
- Parameters:
graph (kgx.graph.base_graph.BaseGraph) – The graph to read from
kwargs (Any) – Any additional arguments
- Returns:
A generator for node and edge records read from the graph
- Return type:
Generator
- read_edges() Generator [source]#
Read edges as records from the graph.
- Returns:
A generator for edges
- Return type:
Generator
- read_nodes() Generator [source]#
Read nodes as records from the graph.
- Returns:
A generator for nodes
- Return type:
Generator
- set_edge_filter(key: str, value: set) None #
Set an edge filter, as defined by a key and value pair. These filters are used to filter (or reduce) the search space when fetching nodes from the underlying store.
Note
When defining the ‘subject_category’ or ‘object_category’ filter, the value should be of type
set
. This method also sets the ‘category’ node filter, to get a consistent set of nodes in the subgraph.
- set_edge_provenance(edge_data)#
Set a specific edge provenance value.
- set_node_filter(key: str, value: Union[str, set]) None #
Set a node filter, as defined by a key and value pair. These filters are used to filter (or reduce) the search space when fetching nodes from the underlying store.
Note
When defining the ‘category’ filter, the value should be of type
set
. This method also sets the ‘subject_category’ and ‘object_category’ edge filters, to get a consistent set of nodes in the subgraph.
- set_node_provenance(node_data)#
Set a specific node provenance value.
- set_prefix_map(m: Dict) None #
Update default prefix map.
- Parameters:
m (Dict) – A dictionary with prefix to IRI mappings
- set_provenance_map(kwargs)#
Set up a provenance (Knowledge Source to InfoRes) map
- validate_edge(edge: Dict) Optional[Dict] #
Given an edge as a dictionary, check for required properties. This method will return the edge dictionary with default assumptions applied, if any.
- Parameters:
edge (Dict) – An edge represented as a dict
- Returns:
An edge represented as a dict, with default assumptions applied.
- Return type:
Dict
- validate_node(node: Dict) Optional[Dict] #
Given a node as a dictionary, check for required properties. This method will return the node dictionary with default assumptions applied, if any.
- Parameters:
node (Dict) – A node represented as a dict
- Returns:
A node represented as a dict, with default assumptions applied.
- Return type:
Dict
kgx.source.tsv_source#
TsvSource
is responsible for reading from KGX formatted CSV or TSV using Pandas where every flat file is treated as a
Pandas DataFrame and from which data are read in chunks.
KGX expects two separate files - one for nodes and another for edges.
- class kgx.source.tsv_source.TsvSource(owner)[source]#
Bases:
Source
TsvSource is responsible for reading data as records from a TSV/CSV.
- check_edge_filter(edge: Dict) bool #
Check if an edge passes defined edge filters.
- Parameters:
edge (Dict) – An edge
- Returns:
Whether the given edge has passed all defined edge filters
- Return type:
- check_node_filter(node: Dict) bool #
Check if a node passes defined node filters.
- Parameters:
node (Dict) – A node
- Returns:
Whether the given node has passed all defined node filters
- Return type:
- clear_graph_metadata()#
Clears a Source graph’s internal graph_metadata. The value of such graph metadata is (now) generally a Callable function. This operation can be used in the code when the metadata is no longer needed, but may cause peculiar Python object persistent problems downstream.
- parse(filename: str, format: str, compression: Optional[str] = None, **kwargs: Any) Generator [source]#
This method reads from a TSV/CSV and yields records.
- read_edge(edge: Dict) Optional[Tuple] [source]#
Load an edge into an instance of BaseGraph.
- Parameters:
edge (Dict) – An edge
- Returns:
A tuple that contains subject id, object id, edge key, and edge data
- Return type:
Optional[Tuple]
- read_edges(df: DataFrame) Generator [source]#
Load edges from pandas.DataFrame into an instance of BaseGraph.
- Parameters:
df (pandas.DataFrame) – Dataframe containing records that represent edges
- Returns:
A generator for edge records
- Return type:
Generator
- read_node(node: Dict) Optional[Tuple[str, Dict]] [source]#
Prepare a node.
- Parameters:
node (Dict) – A node
- Returns:
A tuple that contains node id and node data
- Return type:
Optional[Tuple[str, Dict]]
- read_nodes(df: DataFrame) Generator [source]#
Read records from pandas.DataFrame and yield records.
- Parameters:
df (pandas.DataFrame) – Dataframe containing records that represent nodes
- Returns:
A generator for node records
- Return type:
Generator
- set_edge_filter(key: str, value: set) None #
Set an edge filter, as defined by a key and value pair. These filters are used to filter (or reduce) the search space when fetching nodes from the underlying store.
Note
When defining the ‘subject_category’ or ‘object_category’ filter, the value should be of type
set
. This method also sets the ‘category’ node filter, to get a consistent set of nodes in the subgraph.
- set_edge_provenance(edge_data)#
Set a specific edge provenance value.
- set_node_filter(key: str, value: Union[str, set]) None #
Set a node filter, as defined by a key and value pair. These filters are used to filter (or reduce) the search space when fetching nodes from the underlying store.
Note
When defining the ‘category’ filter, the value should be of type
set
. This method also sets the ‘subject_category’ and ‘object_category’ edge filters, to get a consistent set of nodes in the subgraph.
- set_node_provenance(node_data)#
Set a specific node provenance value.
- set_prefix_map(m: Dict) None [source]#
Add or override default prefix to IRI map.
- Parameters:
m (Dict) – Prefix to IRI map
- set_provenance_map(kwargs)#
Set up a provenance (Knowledge Source to InfoRes) map
- set_reverse_prefix_map(m: Dict) None [source]#
Add or override default IRI to prefix map.
- Parameters:
m (Dict) – IRI to prefix map
- validate_edge(edge: Dict) Optional[Dict] #
Given an edge as a dictionary, check for required properties. This method will return the edge dictionary with default assumptions applied, if any.
- Parameters:
edge (Dict) – An edge represented as a dict
- Returns:
An edge represented as a dict, with default assumptions applied.
- Return type:
Dict
- validate_node(node: Dict) Optional[Dict] #
Given a node as a dictionary, check for required properties. This method will return the node dictionary with default assumptions applied, if any.
- Parameters:
node (Dict) – A node represented as a dict
- Returns:
A node represented as a dict, with default assumptions applied.
- Return type:
Dict
kgx.source.json_source#
JsonSource
is responsible for reading data from a KGX formatted JSON using the ijson
library, which allows for streaming data from the file.
- class kgx.source.json_source.JsonSource(owner)[source]#
Bases:
TsvSource
JsonSource is responsible for reading data as records from a JSON.
- check_edge_filter(edge: Dict) bool #
Check if an edge passes defined edge filters.
- Parameters:
edge (Dict) – An edge
- Returns:
Whether the given edge has passed all defined edge filters
- Return type:
- check_node_filter(node: Dict) bool #
Check if a node passes defined node filters.
- Parameters:
node (Dict) – A node
- Returns:
Whether the given node has passed all defined node filters
- Return type:
- clear_graph_metadata()#
Clears a Source graph’s internal graph_metadata. The value of such graph metadata is (now) generally a Callable function. This operation can be used in the code when the metadata is no longer needed, but may cause peculiar Python object persistent problems downstream.
- parse(filename: str, format: str = 'json', compression: Optional[str] = None, **kwargs: Any) Generator [source]#
This method reads from a JSON and yields records.
- read_edge(edge: Dict) Optional[Tuple] #
Load an edge into an instance of BaseGraph.
- Parameters:
edge (Dict) – An edge
- Returns:
A tuple that contains subject id, object id, edge key, and edge data
- Return type:
Optional[Tuple]
- read_edges(filename: str) Generator [source]#
Read edge records from a JSON.
- Parameters:
filename (str) – The filename to read from
- Returns:
A generator for edge records
- Return type:
Generator
- read_node(node: Dict) Optional[Tuple[str, Dict]] #
Prepare a node.
- Parameters:
node (Dict) – A node
- Returns:
A tuple that contains node id and node data
- Return type:
Optional[Tuple[str, Dict]]
- read_nodes(filename: str) Generator [source]#
Read node records from a JSON.
- Parameters:
filename (str) – The filename to read from
- Returns:
A generator for node records
- Return type:
Generator
- set_edge_filter(key: str, value: set) None #
Set an edge filter, as defined by a key and value pair. These filters are used to filter (or reduce) the search space when fetching nodes from the underlying store.
Note
When defining the ‘subject_category’ or ‘object_category’ filter, the value should be of type
set
. This method also sets the ‘category’ node filter, to get a consistent set of nodes in the subgraph.
- set_edge_provenance(edge_data)#
Set a specific edge provenance value.
- set_node_filter(key: str, value: Union[str, set]) None #
Set a node filter, as defined by a key and value pair. These filters are used to filter (or reduce) the search space when fetching nodes from the underlying store.
Note
When defining the ‘category’ filter, the value should be of type
set
. This method also sets the ‘subject_category’ and ‘object_category’ edge filters, to get a consistent set of nodes in the subgraph.
- set_node_provenance(node_data)#
Set a specific node provenance value.
- set_prefix_map(m: Dict) None #
Add or override default prefix to IRI map.
- Parameters:
m (Dict) – Prefix to IRI map
- set_provenance_map(kwargs)#
Set up a provenance (Knowledge Source to InfoRes) map
- set_reverse_prefix_map(m: Dict) None #
Add or override default IRI to prefix map.
- Parameters:
m (Dict) – IRI to prefix map
- validate_edge(edge: Dict) Optional[Dict] #
Given an edge as a dictionary, check for required properties. This method will return the edge dictionary with default assumptions applied, if any.
- Parameters:
edge (Dict) – An edge represented as a dict
- Returns:
An edge represented as a dict, with default assumptions applied.
- Return type:
Dict
- validate_node(node: Dict) Optional[Dict] #
Given a node as a dictionary, check for required properties. This method will return the node dictionary with default assumptions applied, if any.
- Parameters:
node (Dict) – A node represented as a dict
- Returns:
A node represented as a dict, with default assumptions applied.
- Return type:
Dict
kgx.source.jsonl_source#
JsonlSource
is responsible for reading data from a KGX formatted JSON Lines using the
jsonlines library.
KGX expects two separate JSON Lines files - one for nodes and another for edges.
KGX JSON Lines Format Specification#
The JSON Lines format provides an efficient way to represent KGX data where each line contains a single JSON object representing either a node or an edge. This format is ideal for streaming large graphs and combines the advantages of JSON with line-oriented processing.
File Structure#
{filename}_nodes.jsonl
: Contains one node per line, each as a complete JSON object{filename}_edges.jsonl
: Contains one edge per line, each as a complete JSON object
Node Record Format#
Required Properties#
id
(string): A CURIE that uniquely identifies the node in the graphcategory
(array of strings): List of Biolink categories for the node, from the NamedThing hierarchy
Common Optional Properties#
name
(string): Human-readable name of the entitydescription
(string): Human-readable description of the entityprovided_by
(array of strings): List of sources that provided this nodexref
(array of strings): List of database cross-references as CURIEssynonym
(array of strings): List of alternative names for the entity
Edge Record Format#
Required Properties#
subject
(string): CURIE of the source nodepredicate
(string): Biolink predicate representing the relationship typeobject
(string): CURIE of the target nodeknowledge_level
(string): Level of knowledge representation (observation, assertion, concept, statement) according to Biolink Modelagent_type
(string): Autonomous agents for edges (informational, computational, biochemical, biological) according to Biolink Model
Common Optional Properties#
id
(string): Unique identifier for the edge, often a UUIDrelation
(string): Relation CURIE from a formal relation ontology (e.g., RO)category
(array of strings): List of Biolink association categoriesknowledge_source
(array of strings): Sources of knowledge (deprecated:provided_by
)primary_knowledge_source
(array of strings): Primary knowledge sourcesaggregator_knowledge_source
(array of strings): Knowledge aggregator sourcespublications
(array of strings): List of publication CURIEs supporting the edge
Examples#
Node Example (nodes.jsonl):
Each line in a nodes.jsonl file represents a complete node record. Here are examples of different node types:
{
"id": "HGNC:11603",
"name": "TBX4",
"category": ["biolink:Gene"]
},
{
"id": "MONDO:0005002",
"name": "chronic obstructive pulmonary disease",
"category": ["biolink:Disease"]
},
{
"id": "CHEBI:15365",
"name": "acetaminophen",
"category": ["biolink:SmallMolecule", "biolink:ChemicalEntity"]
}
In the actual jsonlines file, each record would be on a single line without comments and formatting:
{"id":"HGNC:11603","name":"TBX4","category":["biolink:Gene"]}
{"id":"MONDO:0005002","name":"chronic obstructive pulmonary disease","category":["biolink:Disease"]}
{"id":"CHEBI:15365","name":"acetaminophen","category":["biolink:SmallMolecule","biolink:ChemicalEntity"]}
Edge Example (edges.jsonl):
Each line in a jsonlines file represents a complete edge record. Here are examples of different edge types:
{
"id": "a8575c4e-61a6-428a-bf09-fcb3e8d1644d",
"subject": "HGNC:11603",
"object": "MONDO:0005002",
"predicate": "biolink:related_to",
"relation": "RO:0003304",
"knowledge_level": "assertion",
"agent_type": "computational"
},
{
"id": "urn:uuid:5b06e86f-d768-4cd9-ac27-abe31e95ab1e",
"subject": "HGNC:11603",
"predicate": "biolink:contributes_to",
"object": "MONDO:0005002",
"relation": "RO:0003304",
"category": ["biolink:GeneToDiseaseAssociation"],
"primary_knowledge_source": ["infores:gwas-catalog"],
"publications": ["PMID:26634245", "PMID:26634244"],
"knowledge_level": "observation",
"agent_type": "biological"
},
{
"id": "c7d632b4-6708-4296-9cfe-44bc586d32c8",
"subject": "CHEBI:15365",
"predicate": "biolink:affects",
"object": "GO:0006915",
"relation": "RO:0002434",
"category": ["biolink:ChemicalToProcessAssociation"],
"primary_knowledge_source": ["infores:monarchinitiative"],
"aggregator_knowledge_source": ["infores:biolink-api"],
"publications": ["PMID:12345678"],
"knowledge_level": "assertion",
"agent_type": "computational"
}
In the actual jsonlines file, each record would be on a single line without comments and formatting:
{"id":"a8575c4e-61a6-428a-bf09-fcb3e8d1644d","subject":"HGNC:11603","object":"MONDO:0005002","predicate":"biolink:related_to","relation":"RO:0003304","knowledge_level":"assertion","agent_type":"computational"}
{"id":"urn:uuid:5b06e86f-d768-4cd9-ac27-abe31e95ab1e","subject":"HGNC:11603","predicate":"biolink:contributes_to","object":"MONDO:0005002","relation":"RO:0003304","category":["biolink:GeneToDiseaseAssociation"],"primary_knowledge_source":["infores:gwas-catalog"],"publications":["PMID:26634245","PMID:26634244"],"knowledge_level":"observation","agent_type":"biological"}
Reading JSON Lines with KGX#
When using KGX to read JSON Lines files, the library will:
Parse each line as a complete JSON object
Validate required fields are present
Convert the data into the internal graph representation
Handle arrays properly as native Python lists (unlike TSV where lists are often pipe-delimited strings)
- class kgx.source.jsonl_source.JsonlSource(owner)[source]#
Bases:
JsonSource
JsonlSource is responsible for reading data as records from JSON Lines.
- check_edge_filter(edge: Dict) bool #
Check if an edge passes defined edge filters.
- Parameters:
edge (Dict) – An edge
- Returns:
Whether the given edge has passed all defined edge filters
- Return type:
- check_node_filter(node: Dict) bool #
Check if a node passes defined node filters.
- Parameters:
node (Dict) – A node
- Returns:
Whether the given node has passed all defined node filters
- Return type:
- clear_graph_metadata()#
Clears a Source graph’s internal graph_metadata. The value of such graph metadata is (now) generally a Callable function. This operation can be used in the code when the metadata is no longer needed, but may cause peculiar Python object persistent problems downstream.
- parse(filename: str, format: str = 'jsonl', compression: Optional[str] = None, **kwargs: Any) Generator [source]#
This method reads from JSON Lines and yields records.
- read_edge(edge: Dict) Optional[Tuple] #
Load an edge into an instance of BaseGraph.
- Parameters:
edge (Dict) – An edge
- Returns:
A tuple that contains subject id, object id, edge key, and edge data
- Return type:
Optional[Tuple]
- read_edges(filename: str) Generator #
Read edge records from a JSON.
- Parameters:
filename (str) – The filename to read from
- Returns:
A generator for edge records
- Return type:
Generator
- read_node(node: Dict) Optional[Tuple[str, Dict]] #
Prepare a node.
- Parameters:
node (Dict) – A node
- Returns:
A tuple that contains node id and node data
- Return type:
Optional[Tuple[str, Dict]]
- read_nodes(filename: str) Generator #
Read node records from a JSON.
- Parameters:
filename (str) – The filename to read from
- Returns:
A generator for node records
- Return type:
Generator
- set_edge_filter(key: str, value: set) None #
Set an edge filter, as defined by a key and value pair. These filters are used to filter (or reduce) the search space when fetching nodes from the underlying store.
Note
When defining the ‘subject_category’ or ‘object_category’ filter, the value should be of type
set
. This method also sets the ‘category’ node filter, to get a consistent set of nodes in the subgraph.
- set_edge_provenance(edge_data)#
Set a specific edge provenance value.
- set_node_filter(key: str, value: Union[str, set]) None #
Set a node filter, as defined by a key and value pair. These filters are used to filter (or reduce) the search space when fetching nodes from the underlying store.
Note
When defining the ‘category’ filter, the value should be of type
set
. This method also sets the ‘subject_category’ and ‘object_category’ edge filters, to get a consistent set of nodes in the subgraph.
- set_node_provenance(node_data)#
Set a specific node provenance value.
- set_prefix_map(m: Dict) None #
Add or override default prefix to IRI map.
- Parameters:
m (Dict) – Prefix to IRI map
- set_provenance_map(kwargs)#
Set up a provenance (Knowledge Source to InfoRes) map
- set_reverse_prefix_map(m: Dict) None #
Add or override default IRI to prefix map.
- Parameters:
m (Dict) – IRI to prefix map
- validate_edge(edge: Dict) Optional[Dict] #
Given an edge as a dictionary, check for required properties. This method will return the edge dictionary with default assumptions applied, if any.
- Parameters:
edge (Dict) – An edge represented as a dict
- Returns:
An edge represented as a dict, with default assumptions applied.
- Return type:
Dict
- validate_node(node: Dict) Optional[Dict] #
Given a node as a dictionary, check for required properties. This method will return the node dictionary with default assumptions applied, if any.
- Parameters:
node (Dict) – A node represented as a dict
- Returns:
A node represented as a dict, with default assumptions applied.
- Return type:
Dict
kgx.source.trapi_source#
TrapiSource
is responsible for reading data from a Translator Reasoner API
formatted JSON.
- class kgx.source.trapi_source.TrapiSource(owner)[source]#
Bases:
JsonSource
TrapiSource is responsible for reading data as records from a TRAPI (Translator Reasoner API) compliant JSON.
This class handles TRAPI 1.5.0 specification.
- check_edge_filter(edge: Dict) bool #
Check if an edge passes defined edge filters.
- Parameters:
edge (Dict) – An edge
- Returns:
Whether the given edge has passed all defined edge filters
- Return type:
- check_node_filter(node: Dict) bool #
Check if a node passes defined node filters.
- Parameters:
node (Dict) – A node
- Returns:
Whether the given node has passed all defined node filters
- Return type:
- clear_graph_metadata()#
Clears a Source graph’s internal graph_metadata. The value of such graph metadata is (now) generally a Callable function. This operation can be used in the code when the metadata is no longer needed, but may cause peculiar Python object persistent problems downstream.
- load_node(node: Dict) Tuple[str, Dict] [source]#
Load a TRAPI node into KGX format
- Parameters:
node (Dict) – A TRAPI node
- Returns:
A tuple containing (node_id, node_data) in KGX format
- Return type:
Tuple[str, Dict]
- parse(filename: str, format: str = 'json', compression: Optional[str] = None, **kwargs: Any) Generator [source]#
This method reads from a TRAPI JSON and yields KGX records.
- read_edge(edge: Dict) Optional[Tuple] #
Load an edge into an instance of BaseGraph.
- Parameters:
edge (Dict) – An edge
- Returns:
A tuple that contains subject id, object id, edge key, and edge data
- Return type:
Optional[Tuple]
- read_edges(filename: str, compression: Optional[str] = None) Generator [source]#
Read edge records from a TRAPI JSON.
- read_edges_jsonl(filename: str, compression: Optional[str] = None) Generator [source]#
Read edge records from a TRAPI JSONL file.
- read_node(node: Dict) Optional[Tuple[str, Dict]] #
Prepare a node.
- Parameters:
node (Dict) – A node
- Returns:
A tuple that contains node id and node data
- Return type:
Optional[Tuple[str, Dict]]
- read_nodes(filename: str, compression: Optional[str] = None) Generator [source]#
Read node records from a TRAPI JSON.
- read_nodes_jsonl(filename: str, compression: Optional[str] = None) Generator [source]#
Read node records from a TRAPI JSONL file.
- set_edge_filter(key: str, value: set) None #
Set an edge filter, as defined by a key and value pair. These filters are used to filter (or reduce) the search space when fetching nodes from the underlying store.
Note
When defining the ‘subject_category’ or ‘object_category’ filter, the value should be of type
set
. This method also sets the ‘category’ node filter, to get a consistent set of nodes in the subgraph.
- set_edge_provenance(edge_data)#
Set a specific edge provenance value.
- set_node_filter(key: str, value: Union[str, set]) None #
Set a node filter, as defined by a key and value pair. These filters are used to filter (or reduce) the search space when fetching nodes from the underlying store.
Note
When defining the ‘category’ filter, the value should be of type
set
. This method also sets the ‘subject_category’ and ‘object_category’ edge filters, to get a consistent set of nodes in the subgraph.
- set_node_provenance(node_data)#
Set a specific node provenance value.
- set_prefix_map(m: Dict) None #
Add or override default prefix to IRI map.
- Parameters:
m (Dict) – Prefix to IRI map
- set_provenance_map(kwargs)#
Set up a provenance (Knowledge Source to InfoRes) map
- set_reverse_prefix_map(m: Dict) None #
Add or override default IRI to prefix map.
- Parameters:
m (Dict) – IRI to prefix map
- validate_edge(edge: Dict) Optional[Dict] #
Given an edge as a dictionary, check for required properties. This method will return the edge dictionary with default assumptions applied, if any.
- Parameters:
edge (Dict) – An edge represented as a dict
- Returns:
An edge represented as a dict, with default assumptions applied.
- Return type:
Dict
- validate_node(node: Dict) Optional[Dict] #
Given a node as a dictionary, check for required properties. This method will return the node dictionary with default assumptions applied, if any.
- Parameters:
node (Dict) – A node represented as a dict
- Returns:
A node represented as a dict, with default assumptions applied.
- Return type:
Dict
kgx.source.obograph_source#
ObographSource
is responsible for reading data from OBOGraphs in JSON.
- class kgx.source.obograph_source.ObographSource(owner)[source]#
Bases:
JsonSource
ObographSource is responsible for reading data as records from an OBO Graph JSON.
- check_edge_filter(edge: Dict) bool #
Check if an edge passes defined edge filters.
- Parameters:
edge (Dict) – An edge
- Returns:
Whether the given edge has passed all defined edge filters
- Return type:
- check_node_filter(node: Dict) bool #
Check if a node passes defined node filters.
- Parameters:
node (Dict) – A node
- Returns:
Whether the given node has passed all defined node filters
- Return type:
- clear_graph_metadata()#
Clears a Source graph’s internal graph_metadata. The value of such graph metadata is (now) generally a Callable function. This operation can be used in the code when the metadata is no longer needed, but may cause peculiar Python object persistent problems downstream.
- parse(filename: str, format: str = 'json', compression: Optional[str] = None, **kwargs: Any) Generator [source]#
This method reads from JSON and yields records.
- parse_meta(node: str, meta: Dict) Dict [source]#
Parse ‘meta’ field of a node.
- Parameters:
node (str) – Node identifier
meta (Dict) – meta dictionary for the node
- Returns:
A dictionary that contains ‘description’, ‘subsets’, ‘synonyms’, ‘xrefs’, a ‘deprecated’ flag and/or ‘equivalent_nodes’.
- Return type:
Dict
- read_edge(edge: Dict) Optional[Tuple] [source]#
Read and parse an edge record.
- Parameters:
edge (Dict) – The edge record
- Returns:
The processed edge
- Return type:
Dict
- read_edges(filename: str, compression: Optional[str] = None) Generator [source]#
Read edge records from a JSON.
- read_node(node: Dict) Optional[Tuple[str, Dict]] [source]#
Read and parse a node record.
- Parameters:
node (Dict) – The node record
- Returns:
The processed node
- Return type:
Dict
- read_nodes(filename: str, compression: Optional[str] = None) Generator [source]#
Read node records from a JSON.
- set_edge_filter(key: str, value: set) None #
Set an edge filter, as defined by a key and value pair. These filters are used to filter (or reduce) the search space when fetching nodes from the underlying store.
Note
When defining the ‘subject_category’ or ‘object_category’ filter, the value should be of type
set
. This method also sets the ‘category’ node filter, to get a consistent set of nodes in the subgraph.
- set_edge_provenance(edge_data)#
Set a specific edge provenance value.
- set_node_filter(key: str, value: Union[str, set]) None #
Set a node filter, as defined by a key and value pair. These filters are used to filter (or reduce) the search space when fetching nodes from the underlying store.
Note
When defining the ‘category’ filter, the value should be of type
set
. This method also sets the ‘subject_category’ and ‘object_category’ edge filters, to get a consistent set of nodes in the subgraph.
- set_node_provenance(node_data)#
Set a specific node provenance value.
- set_prefix_map(m: Dict) None #
Add or override default prefix to IRI map.
- Parameters:
m (Dict) – Prefix to IRI map
- set_provenance_map(kwargs)#
Set up a provenance (Knowledge Source to InfoRes) map
- set_reverse_prefix_map(m: Dict) None #
Add or override default IRI to prefix map.
- Parameters:
m (Dict) – IRI to prefix map
- validate_edge(edge: Dict) Optional[Dict] #
Given an edge as a dictionary, check for required properties. This method will return the edge dictionary with default assumptions applied, if any.
- Parameters:
edge (Dict) – An edge represented as a dict
- Returns:
An edge represented as a dict, with default assumptions applied.
- Return type:
Dict
- validate_node(node: Dict) Optional[Dict] #
Given a node as a dictionary, check for required properties. This method will return the node dictionary with default assumptions applied, if any.
- Parameters:
node (Dict) – A node represented as a dict
- Returns:
A node represented as a dict, with default assumptions applied.
- Return type:
Dict
kgx.source.sssom_source#
SssomSource
is responsible for reading data from an SSSOM
formatted files.
KGX Source for Simple Standard for Sharing Ontology Mappings (“SSSOM”)
- class kgx.source.sssom_source.SssomSource(owner)[source]#
Bases:
Source
SssomSource is responsible for reading data as records from an SSSOM file.
- check_edge_filter(edge: Dict) bool #
Check if an edge passes defined edge filters.
- Parameters:
edge (Dict) – An edge
- Returns:
Whether the given edge has passed all defined edge filters
- Return type:
- check_node_filter(node: Dict) bool #
Check if a node passes defined node filters.
- Parameters:
node (Dict) – A node
- Returns:
Whether the given node has passed all defined node filters
- Return type:
- clear_graph_metadata()#
Clears a Source graph’s internal graph_metadata. The value of such graph metadata is (now) generally a Callable function. This operation can be used in the code when the metadata is no longer needed, but may cause peculiar Python object persistent problems downstream.
- load_edge(edge: Dict) Generator [source]#
Load an edge into an instance of BaseGraph
- Parameters:
edge (Dict) – An edge
- Returns:
A generator for node and edge records
- Return type:
Generator
- load_edges(df: DataFrame) Generator [source]#
Load edges from pandas.DataFrame into an instance of BaseGraph
- Parameters:
df (pandas.DataFrame) – Dataframe containing records that represent edges
- Returns:
A generator for edge records
- Return type:
Generator
- load_node(node_data: Dict) Optional[Tuple[str, Dict]] [source]#
Load a node into an instance of BaseGraph
- Parameters:
node_data (Dict) – A node
- Returns:
A tuple that contains node id and node data
- Return type:
Optional[Tuple[str, Dict]]
- parse(filename: str, format: str, compression: Optional[str] = None, **kwargs: Any) Generator [source]#
Parse a SSSOM TSV
- parse_header(filename: str, compression: Optional[str] = None) None [source]#
Parse metadata from SSSOM headers.
- set_edge_filter(key: str, value: set) None #
Set an edge filter, as defined by a key and value pair. These filters are used to filter (or reduce) the search space when fetching nodes from the underlying store.
Note
When defining the ‘subject_category’ or ‘object_category’ filter, the value should be of type
set
. This method also sets the ‘category’ node filter, to get a consistent set of nodes in the subgraph.
- set_edge_provenance(edge_data)#
Set a specific edge provenance value.
- set_node_filter(key: str, value: Union[str, set]) None #
Set a node filter, as defined by a key and value pair. These filters are used to filter (or reduce) the search space when fetching nodes from the underlying store.
Note
When defining the ‘category’ filter, the value should be of type
set
. This method also sets the ‘subject_category’ and ‘object_category’ edge filters, to get a consistent set of nodes in the subgraph.
- set_node_provenance(node_data)#
Set a specific node provenance value.
- set_prefix_map(m: Dict) None [source]#
Add or override default prefix to IRI map.
- Parameters:
m (Dict) – Prefix to IRI map
- set_provenance_map(kwargs)#
Set up a provenance (Knowledge Source to InfoRes) map
- set_reverse_prefix_map(m: Dict) None [source]#
Add or override default IRI to prefix map.
- Parameters:
m (Dict) – IRI to prefix map
- validate_edge(edge: Dict) Optional[Dict] #
Given an edge as a dictionary, check for required properties. This method will return the edge dictionary with default assumptions applied, if any.
- Parameters:
edge (Dict) – An edge represented as a dict
- Returns:
An edge represented as a dict, with default assumptions applied.
- Return type:
Dict
- validate_node(node: Dict) Optional[Dict] #
Given a node as a dictionary, check for required properties. This method will return the node dictionary with default assumptions applied, if any.
- Parameters:
node (Dict) – A node represented as a dict
- Returns:
A node represented as a dict, with default assumptions applied.
- Return type:
Dict
kgx.source.neo_source#
NeoSource
is responsible for reading data from a local or remote Neo4j instance.
- class kgx.source.neo_source.NeoSource(owner)[source]#
Bases:
Source
NeoSource is responsible for reading data as records from a Neo4j instance.
- check_edge_filter(edge: Dict) bool #
Check if an edge passes defined edge filters.
- Parameters:
edge (Dict) – An edge
- Returns:
Whether the given edge has passed all defined edge filters
- Return type:
- check_node_filter(node: Dict) bool #
Check if a node passes defined node filters.
- Parameters:
node (Dict) – A node
- Returns:
Whether the given node has passed all defined node filters
- Return type:
- clear_graph_metadata()#
Clears a Source graph’s internal graph_metadata. The value of such graph metadata is (now) generally a Callable function. This operation can be used in the code when the metadata is no longer needed, but may cause peculiar Python object persistent problems downstream.
- count(is_directed: bool = True) int [source]#
Get the total count of records to be fetched from the Neo4j database.
- static format_edge_filter(edge_filters: Dict, key: str, variable: Optional[str] = None, prefix: Optional[str] = None, op: Optional[str] = None) str [source]#
Get the value for edge filter as defined by
key
. This is used as a convenience method for generating cypher queries.- Parameters:
- Returns:
Value corresponding to the given edge filter
key
, formatted for CQL- Return type:
- static format_node_filter(node_filters: Dict, key: str, variable: Optional[str] = None, prefix: Optional[str] = None, op: Optional[str] = None) str [source]#
Get the value for node filter as defined by
key
. This is used as a convenience method for generating cypher queries.- Parameters:
- Returns:
Value corresponding to the given node filter
key
, formatted for CQL- Return type:
- get_edges(skip: int = 0, limit: int = 0, is_directed: bool = True, **kwargs: Any) List [source]#
Get a page of edges from the Neo4j database.
- get_nodes(skip: int = 0, limit: int = 0, **kwargs: Any) List [source]#
Get a page of nodes from the Neo4j database.
- get_pages(query_function, start: int = 0, end: Optional[int] = None, page_size: int = 50000, **kwargs: Any) Iterator [source]#
Get pages of size
page_size
from Neo4j. Returns an iterator of pages where number of pages is (end
-start
)/page_size
- Parameters:
query_function (func) – The function to use to fetch records. Usually this is
self.get_nodes
orself.get_edges
start (int) – Start for pagination
end (Optional[int]) – End for pagination
page_size (int) – Size of each page (
10000
, by default)kwargs (Dict) – Any additional arguments that might be relevant for
query_function
- Returns:
An iterator for a list of records from Neo4j. The size of the list is
page_size
- Return type:
Iterator
- load_edge(edge_record: List) Tuple [source]#
Load an edge into an instance of BaseGraph
- Parameters:
edge_record (List) – A 4-tuple edge record
- Returns:
A tuple with subject ID, object ID, edge key, and edge data
- Return type:
Tuple
- load_edges(edges: List) None [source]#
Load edges into an instance of BaseGraph
- Parameters:
edges (List) – A list of edge records
- load_node(node_data: Dict) Optional[Tuple] [source]#
Load node into an instance of BaseGraph
- Parameters:
node_data (Dict) – A node
- Returns:
A tuple with node ID and node data
- Return type:
Tuple
- load_nodes(nodes: List) Generator [source]#
Load nodes into an instance of BaseGraph
- Parameters:
nodes (List) – A list of nodes
- parse(uri: str, username: str, password: str, node_filters: Optional[Dict] = None, edge_filters: Optional[Dict] = None, start: int = 0, end: Optional[int] = None, is_directed: bool = True, page_size: int = 50000, **kwargs: Any) Generator [source]#
This method reads from Neo4j instance and yields records
- Parameters:
uri (str) – The URI for the Neo4j instance. For example, http://localhost:7474
username (str) – The username
password (str) – The password
node_filters (Dict) – Node filters
edge_filters (Dict) – Edge filters
start (int) – Number of records to skip before streaming
end (int) – Total number of records to fetch
is_directed (bool) – Whether or not the edges should be treated as directed
page_size (int) – The size of each page/batch fetched from Neo4j (
50000
)kwargs (Any) – Any additional arguments
- Returns:
A generator for records
- Return type:
Generator
- set_edge_filter(key: str, value: set) None #
Set an edge filter, as defined by a key and value pair. These filters are used to filter (or reduce) the search space when fetching nodes from the underlying store.
Note
When defining the ‘subject_category’ or ‘object_category’ filter, the value should be of type
set
. This method also sets the ‘category’ node filter, to get a consistent set of nodes in the subgraph.
- set_edge_provenance(edge_data)#
Set a specific edge provenance value.
- set_node_filter(key: str, value: Union[str, set]) None #
Set a node filter, as defined by a key and value pair. These filters are used to filter (or reduce) the search space when fetching nodes from the underlying store.
Note
When defining the ‘category’ filter, the value should be of type
set
. This method also sets the ‘subject_category’ and ‘object_category’ edge filters, to get a consistent set of nodes in the subgraph.
- set_node_provenance(node_data)#
Set a specific node provenance value.
- set_prefix_map(m: Dict) None #
Update default prefix map.
- Parameters:
m (Dict) – A dictionary with prefix to IRI mappings
- set_provenance_map(kwargs)#
Set up a provenance (Knowledge Source to InfoRes) map
- validate_edge(edge: Dict) Optional[Dict] #
Given an edge as a dictionary, check for required properties. This method will return the edge dictionary with default assumptions applied, if any.
- Parameters:
edge (Dict) – An edge represented as a dict
- Returns:
An edge represented as a dict, with default assumptions applied.
- Return type:
Dict
- validate_node(node: Dict) Optional[Dict] #
Given a node as a dictionary, check for required properties. This method will return the node dictionary with default assumptions applied, if any.
- Parameters:
node (Dict) – A node represented as a dict
- Returns:
A node represented as a dict, with default assumptions applied.
- Return type:
Dict
kgx.source.rdf_source#
RdfSource
is responsible for reading data from RDF N-Triples.
This source makes use of a custom kgx.parsers.ntriples_parser.CustomNTriplesParser
for parsing N-Triples,
which extends rdflib.plugins.parsers.ntriples.W3CNTriplesParser
.
To ensure proper parsing of N-Triples and a relatively low memory footprint, it is recommended that the N-Triples be sorted based on the subject IRIs.
sort -k 1,2 -t ' ' data.nt > data_sorted.nt
- class kgx.source.rdf_source.RdfSource(owner)[source]#
Bases:
Source
RdfSource is responsible for reading data as records from RDF.
Note
Currently only RDF N-Triples are supported.
- add_edge(subject_iri: URIRef, object_iri: URIRef, predicate_iri: URIRef, data: Optional[Dict[Any, Any]] = None) Dict [source]#
Add an edge to cache.
- Parameters:
subject_iri (rdflib.URIRef) – Subject IRI for the subject in a triple
object_iri (rdflib.URIRef) – Object IRI for the object in a triple
predicate_iri (rdflib.URIRef) – Predicate IRI for the predicate in a triple
data (Optional[Dict[Any, Any]]) – Additional edge properties
- Returns:
The edge data
- Return type:
Dict
- add_node(iri: URIRef, data: Optional[Dict] = None) Dict [source]#
Add a node to cache.
- Parameters:
iri (rdflib.URIRef) – IRI of a node
data (Optional[Dict]) – Additional node properties
- Returns:
The node data
- Return type:
Dict
- add_node_attribute(iri: Union[URIRef, str], key: str, value: Union[str, List]) None [source]#
Add an attribute to a node in cache, while taking into account whether the attribute should be multi-valued.
The
key
may be a rdflib.URIRef or an URI string that maps onto a property name as defined inrdf_utils.property_mapping
.
- check_edge_filter(edge: Dict) bool #
Check if an edge passes defined edge filters.
- Parameters:
edge (Dict) – An edge
- Returns:
Whether the given edge has passed all defined edge filters
- Return type:
- check_node_filter(node: Dict) bool #
Check if a node passes defined node filters.
- Parameters:
node (Dict) – A node
- Returns:
Whether the given node has passed all defined node filters
- Return type:
- clear_graph_metadata()#
Clears a Source graph’s internal graph_metadata. The value of such graph metadata is (now) generally a Callable function. This operation can be used in the code when the metadata is no longer needed, but may cause peculiar Python object persistent problems downstream.
- dereify(n: str, node: Dict) None [source]#
Dereify a node to create a corresponding edge.
- Parameters:
n (str) – Node identifier
node (Dict) – Node data
- get_biolink_element(predicate: Any) Optional[Element] [source]#
Returns a Biolink Model element for a given predicate.
- Parameters:
predicate (Any) – The CURIE of a predicate
- Returns:
The corresponding Biolink Model element
- Return type:
Optional[Element]
- parse(filename: str, format: str = 'nt', compression: Optional[str] = None, **kwargs: Any) Generator [source]#
This method reads from RDF N-Triples and yields records.
Note
To ensure proper parsing of N-Triples and a relatively low memory footprint, it is recommended that the N-Triples be sorted based on the subject IRIs.
`sort -k 1,2 -t ' ' data.nt > data_sorted.nt`
- process_predicate(p: Optional[Union[URIRef, str]]) Tuple [source]#
Process a predicate where the method checks if there is a mapping in Biolink Model.
- Parameters:
p (Optional[Union[URIRef, str]]) – The predicate
- Returns:
A tuple that contains the Biolink CURIE (if available), the Biolink slot_uri CURIE (if available), the CURIE form of p, the reference of p
- Return type:
Tuple
- set_edge_filter(key: str, value: set) None #
Set an edge filter, as defined by a key and value pair. These filters are used to filter (or reduce) the search space when fetching nodes from the underlying store.
Note
When defining the ‘subject_category’ or ‘object_category’ filter, the value should be of type
set
. This method also sets the ‘category’ node filter, to get a consistent set of nodes in the subgraph.
- set_edge_provenance(edge_data)#
Set a specific edge provenance value.
- set_node_filter(key: str, value: Union[str, set]) None #
Set a node filter, as defined by a key and value pair. These filters are used to filter (or reduce) the search space when fetching nodes from the underlying store.
Note
When defining the ‘category’ filter, the value should be of type
set
. This method also sets the ‘subject_category’ and ‘object_category’ edge filters, to get a consistent set of nodes in the subgraph.
- set_node_property_predicates(predicates) None [source]#
Set predicates that are to be treated as node properties.
- Parameters:
predicates (Set) – Set of predicates
- set_node_provenance(node_data)#
Set a specific node provenance value.
- set_predicate_mapping(m: Dict) None [source]#
Set predicate mappings.
Use this method to update mappings for predicates that are not in Biolink Model.
- Parameters:
m (Dict) – A dictionary where the keys are IRIs and values are their corresponding property names
- set_prefix_map(m: Dict) None #
Update default prefix map.
- Parameters:
m (Dict) – A dictionary with prefix to IRI mappings
- set_provenance_map(kwargs)#
Set up a provenance (Knowledge Source to InfoRes) map
- triple(s: URIRef, p: URIRef, o: URIRef) None [source]#
Parse a triple.
- Parameters:
s (URIRef) – Subject
p (URIRef) – Predicate
o (URIRef) – Object
- update_edge(subject_curie: str, object_curie: str, edge_key: str, data: Optional[Dict[Any, Any]]) Dict [source]#
Update an edge with properties.
- update_node(n: Union[URIRef, str], data: Optional[Dict] = None) Dict [source]#
Update a node with properties.
- Parameters:
n (Union[URIRef, str]) – Node identifier
data (Optional[Dict]) – Node properties
- Returns:
The node data
- Return type:
Dict
- validate_edge(edge: Dict) Optional[Dict] #
Given an edge as a dictionary, check for required properties. This method will return the edge dictionary with default assumptions applied, if any.
- Parameters:
edge (Dict) – An edge represented as a dict
- Returns:
An edge represented as a dict, with default assumptions applied.
- Return type:
Dict
- validate_node(node: Dict) Optional[Dict] #
Given a node as a dictionary, check for required properties. This method will return the node dictionary with default assumptions applied, if any.
- Parameters:
node (Dict) – A node represented as a dict
- Returns:
A node represented as a dict, with default assumptions applied.
- Return type:
Dict
kgx.source.owl_source#
OwlSource
is responsible for parsing an OWL ontology.
When parsing an OWL, this source also adds OwlStar annotations to certain OWL axioms.
- class kgx.source.owl_source.OwlSource(owner)[source]#
Bases:
RdfSource
OwlSource is responsible for parsing an OWL ontology.
- ..note::
This is a simple parser that loads direct class-class relationships. For more formal OWL parsing, refer to Robot: http://robot.obolibrary.org/
- add_edge(subject_iri: URIRef, object_iri: URIRef, predicate_iri: URIRef, data: Optional[Dict[Any, Any]] = None) Dict #
Add an edge to cache.
- Parameters:
subject_iri (rdflib.URIRef) – Subject IRI for the subject in a triple
object_iri (rdflib.URIRef) – Object IRI for the object in a triple
predicate_iri (rdflib.URIRef) – Predicate IRI for the predicate in a triple
data (Optional[Dict[Any, Any]]) – Additional edge properties
- Returns:
The edge data
- Return type:
Dict
- add_node(iri: URIRef, data: Optional[Dict] = None) Dict #
Add a node to cache.
- Parameters:
iri (rdflib.URIRef) – IRI of a node
data (Optional[Dict]) – Additional node properties
- Returns:
The node data
- Return type:
Dict
- add_node_attribute(iri: Union[URIRef, str], key: str, value: Union[str, List]) None #
Add an attribute to a node in cache, while taking into account whether the attribute should be multi-valued.
The
key
may be a rdflib.URIRef or an URI string that maps onto a property name as defined inrdf_utils.property_mapping
.
- check_edge_filter(edge: Dict) bool #
Check if an edge passes defined edge filters.
- Parameters:
edge (Dict) – An edge
- Returns:
Whether the given edge has passed all defined edge filters
- Return type:
- check_node_filter(node: Dict) bool #
Check if a node passes defined node filters.
- Parameters:
node (Dict) – A node
- Returns:
Whether the given node has passed all defined node filters
- Return type:
- clear_graph_metadata()#
Clears a Source graph’s internal graph_metadata. The value of such graph metadata is (now) generally a Callable function. This operation can be used in the code when the metadata is no longer needed, but may cause peculiar Python object persistent problems downstream.
- dereify(n: str, node: Dict) None #
Dereify a node to create a corresponding edge.
- Parameters:
n (str) – Node identifier
node (Dict) – Node data
- get_biolink_element(predicate: Any) Optional[Element] #
Returns a Biolink Model element for a given predicate.
- Parameters:
predicate (Any) – The CURIE of a predicate
- Returns:
The corresponding Biolink Model element
- Return type:
Optional[Element]
- load_graph(rdfgraph: Graph, **kwargs: Any) None [source]#
Walk through the rdflib.Graph and load all triples into kgx.graph.base_graph.BaseGraph
- Parameters:
rdfgraph (rdflib.Graph) – Graph containing nodes and edges
kwargs (Any) – Any additional arguments
- parse(filename: str, format: str = 'owl', compression: Optional[str] = None, **kwargs: Any) Generator [source]#
This method reads from an OWL and yields records.
- process_predicate(p: Optional[Union[URIRef, str]]) Tuple #
Process a predicate where the method checks if there is a mapping in Biolink Model.
- Parameters:
p (Optional[Union[URIRef, str]]) – The predicate
- Returns:
A tuple that contains the Biolink CURIE (if available), the Biolink slot_uri CURIE (if available), the CURIE form of p, the reference of p
- Return type:
Tuple
- set_edge_filter(key: str, value: set) None #
Set an edge filter, as defined by a key and value pair. These filters are used to filter (or reduce) the search space when fetching nodes from the underlying store.
Note
When defining the ‘subject_category’ or ‘object_category’ filter, the value should be of type
set
. This method also sets the ‘category’ node filter, to get a consistent set of nodes in the subgraph.
- set_edge_provenance(edge_data)#
Set a specific edge provenance value.
- set_node_filter(key: str, value: Union[str, set]) None #
Set a node filter, as defined by a key and value pair. These filters are used to filter (or reduce) the search space when fetching nodes from the underlying store.
Note
When defining the ‘category’ filter, the value should be of type
set
. This method also sets the ‘subject_category’ and ‘object_category’ edge filters, to get a consistent set of nodes in the subgraph.
- set_node_property_predicates(predicates) None #
Set predicates that are to be treated as node properties.
- Parameters:
predicates (Set) – Set of predicates
- set_node_provenance(node_data)#
Set a specific node provenance value.
- set_predicate_mapping(m: Dict) None #
Set predicate mappings.
Use this method to update mappings for predicates that are not in Biolink Model.
- Parameters:
m (Dict) – A dictionary where the keys are IRIs and values are their corresponding property names
- set_prefix_map(m: Dict) None #
Update default prefix map.
- Parameters:
m (Dict) – A dictionary with prefix to IRI mappings
- set_provenance_map(kwargs)#
Set up a provenance (Knowledge Source to InfoRes) map
- triple(s: URIRef, p: URIRef, o: URIRef) None #
Parse a triple.
- Parameters:
s (URIRef) – Subject
p (URIRef) – Predicate
o (URIRef) – Object
- update_edge(subject_curie: str, object_curie: str, edge_key: str, data: Optional[Dict[Any, Any]]) Dict #
Update an edge with properties.
- update_node(n: Union[URIRef, str], data: Optional[Dict] = None) Dict #
Update a node with properties.
- Parameters:
n (Union[URIRef, str]) – Node identifier
data (Optional[Dict]) – Node properties
- Returns:
The node data
- Return type:
Dict
- validate_edge(edge: Dict) Optional[Dict] #
Given an edge as a dictionary, check for required properties. This method will return the edge dictionary with default assumptions applied, if any.
- Parameters:
edge (Dict) – An edge represented as a dict
- Returns:
An edge represented as a dict, with default assumptions applied.
- Return type:
Dict
- validate_node(node: Dict) Optional[Dict] #
Given a node as a dictionary, check for required properties. This method will return the node dictionary with default assumptions applied, if any.
- Parameters:
node (Dict) – A node represented as a dict
- Returns:
A node represented as a dict, with default assumptions applied.
- Return type:
Dict