Graph Merge#
The Graph Merge operation takes one or more instances of kgx.graph.base_graph.BaseGraph
and
merges them into a single graph.
Depending on the desired outcome, there are two entry points for merging graphs:
kgx.graph_operations.graph_merge.merge_all_graphs
: This method takes a list of graphs, identifies the largest graph in the list and merges all the remaining graphs to the largest graph. This is done to reduce the memory footprint. The side-effect is that the incoming graphs are modified during this operation.kgx.graph_operations.graph_merge.merge_graphs
: This method takes a list of graphs and merges all of them into a new graph. While this approach ensures that the incoming graphs are not modified, there is an increased memory requirement to accommodate the newly created graph.
Following are the criteria used for merging graphs:
Two nodes are said to be identical if they have the same
id
If a two identical nodes have conflicting node properties,
when
preserve
isTrue
, the values for the properties are concatenated to a list, if and only if the node property is not a core node propertywhen
preserve
isFalse
, the values for the properties are replaced with the values from the incoming node, if and only if the node property is not a core node property
Two edges are said to be identical if they have the same
subject
,object
and edgekey
, where the edgekey
can be a pre-defined UUID or these are IDs autogenerated using and edge’ssubject
,predicate
, andobject
If a two identical edges have conflicting edges properties,
when
preserve
isTrue
, the values for the properties are concatenated to a list, if and only if the edge property is not a core edge propertywhen
preserve
isFalse
, the values for the properties are replaced with the values from the incoming edge, if and only if the edge property is not a core edge property
kgx.graph_operations.graph_merge#
- kgx.graph_operations.graph_merge.add_all_edges(g1: BaseGraph, g2: BaseGraph, preserve: bool = True) int [source]#
Add all edges from source graph (
g2
) to target graph (g1
).- Parameters:
g1 (kgx.graph.base_graph.BaseGraph) – Target graph
g2 (kgx.graph.base_graph.BaseGraph) – Source graph
preserve (bool) – Whether or not to preserve conflicting properties
- Returns:
Number of edges merged during this operation
- Return type:
- kgx.graph_operations.graph_merge.add_all_nodes(g1: BaseGraph, g2: BaseGraph, preserve: bool = True) int [source]#
Add all nodes from source graph (
g2
) to target graph (g1
).- Parameters:
g1 (kgx.graph.base_graph.BaseGraph) – Target graph
g2 (kgx.graph.base_graph.BaseGraph) – Source graph
preserve (bool) – Whether or not to preserve conflicting properties
- Returns:
Number of nodes merged during this operation
- Return type:
- kgx.graph_operations.graph_merge.merge_all_graphs(graphs: List[BaseGraph], preserve: bool = True) BaseGraph [source]#
Merge one or more graphs.
Note
This method will first pick the largest graph in
graphs
and use that as the target to merge the remaining graphs. This is to reduce the memory footprint for this operation. The criteria for largest graph is the graph with the largest number of edges.The caveat is that the merge operation has a side effect where the largest graph is altered.
If you would like to ensure that all incoming graphs remain as-is, then look at
merge_graphs
.The outcome of the merge on node and edge properties depend on the
preserve
parameter. If preserve isTrue
then, - core properties will not be overwritten - other properties will be concatenated to a listIf preserve is
False
then, - core properties will not be overwritten - other properties will be replaced- Parameters:
graphs (List[kgx.graph.base_graph.BaseGraph]) – A list of instances of BaseGraph to merge
preserve (bool) – Whether or not to preserve conflicting properties
- Returns:
The merged graph
- Return type:
- kgx.graph_operations.graph_merge.merge_edge(g: BaseGraph, u: str, v: str, key: str, data: dict, preserve: bool = True) dict [source]#
Merge edge
u
->v
into graphg
.
- kgx.graph_operations.graph_merge.merge_graphs(graph: BaseGraph, graphs: List[BaseGraph], preserve: bool = True) BaseGraph [source]#
Merge all graphs in
graphs
tograph
.- Parameters:
graph (kgx.graph.base_graph.BaseGraph) – An instance of BaseGraph
graphs (List[kgx.graph.base_graph.BaseGraph]) – A list of instances of BaseGraph to merge
preserve (bool) – Whether or not to preserve conflicting properties
- Returns:
The merged graph
- Return type:
- kgx.graph_operations.graph_merge.merge_node(g: BaseGraph, n: str, data: dict, preserve: bool = True) dict [source]#
Merge node
n
into graphg
.- Parameters:
g (kgx.graph.base_graph.BaseGraph) – The target graph
n (str) – Node id
data (dict) – Node properties
preserve (bool) – Whether or not to preserve conflicting properties
- Returns:
The merged node
- Return type: