Validator#
The Validator validates an instance of kgx.graph.base_graph.BaseGraph for Biolink Model compliance.
To validate a graph,
from kgx.validator import Validator
v = Validator()
v.validate(graph)
Streaming Data Processing Mode#
For very large graphs, the Validator operation may now successfully process graph data equally well using data streaming (command flag --stream=True
) which significantly minimizes the memory footprint required to process such graphs.
Biolink Model Versioning#
By default, the Validator validates against the latest Biolink Model release hosted by the current Biolink Model Toolkit; hwoever, one may override this default at the Validator class level using the Validator.set_biolink_model(version="#.#.#")
where #.#.# is the major.minor.patch semantic versioning of the desired Biolink Model release.
Every instance of Validator() persistently assumes the most recently set class level Biolink Model version. Resetting the class level Biolink Model does not change the version of previously instantiated Validator() objects. In a multi-threaded environment instantiating multiple validator objects, it may be necessary to wrap the Validator.set_biolink_model
and Validator()
object instantiation together within a single thread locked block.
Note that the kgx validate CLI operation also has an optional biolink_release
argument for the same purpose.
kgx.validator#
KGX Validator class
- class kgx.validator.Validator(verbose: bool = False, progress_monitor: Optional[Callable[[GraphEntityType, List], None]] = None, schema: Optional[str] = None, error_log: Optional[str] = None)[source]#
Bases:
ErrorDetecting
Class for validating a property graph.
The optional ‘progress_monitor’ for the validator should be a lightweight Callable which is injected into the class ‘inspector’ Callable, designed to intercepts node and edge records streaming through the Validator (inside a Transformer.process() call. The first (GraphEntityType) argument of the Callable tags the record as a NODE or an EDGE. The second argument given to the Callable is the current record itself. This Callable is strictly meant to be procedural and should not mutate the record. The intent of this Callable is to provide a hook to KGX applications wanting the namesake function of passively monitoring the graph data stream. As such, the Callable could simply tally up the number of times it is called with a NODE or an EDGE, then provide a suitable (quick!) report of that count back to the KGX application. The Callable (function/callable class) should not modify the record and should be of low complexity, so as not to introduce a large computational overhead to validation!
- Parameters:
verbose (bool) – Whether the generated report should be verbose or not (default:
False
)progress_monitor (Optional[Callable[[GraphEntityType, List], None]]) – Function given a peek at the current record being processed by the class wrapped Callable.
schema (Optional[str]) – URL to (Biolink) Model Schema to be used for validated (default: None, use default Biolink Model Toolkit schema)
error_log (str) – Where to write any graph processing error message (stderr, by default)
- clear_errors()#
Clears the current error log list
- static get_all_prefixes(jsonld: Optional[Dict] = None) set [source]#
Get all prefixes from Biolink Model JSON-LD context.
It also sets
self.prefixes
for subsequent access.- Parameters:
jsonld (Optional[Dict]) – The JSON-LD context
- Returns:
A set of prefixes
- Return type:
Optional[Dict]
- get_errors(level: Optional[str] = None) Dict #
Get the index list of distinct error messages.
- Parameters:
level (str) – Optional filter (case insensitive) name of error message level (generally either “Error” or “Warning”)
- Returns:
A raw dictionary of entities indexed by [message_level][error_type][message] or only just [error_type][message] specific to a given message level if the optional level filter is given
- Return type:
Dict
- static get_required_edge_properties(toolkit: Optional[Toolkit] = None) list [source]#
Get all properties for an edge that are required, as defined by Biolink Model.
- Parameters:
toolkit (Optional[Toolkit]) – Optional externally provided toolkit (default: use Validator class defined toolkit)
- Returns:
A list of required edge properties
- Return type:
- static get_required_node_properties(toolkit: Optional[Toolkit] = None) list [source]#
Get all properties for a node that are required, as defined by Biolink Model.
- Parameters:
toolkit (Optional[Toolkit]) – Optional externally provided toolkit (default: use Validator class defined toolkit)
- Returns:
A list of required node properties
- Return type:
- classmethod get_the_validator(verbose: bool = False, progress_monitor: Optional[Callable[[GraphEntityType, List], None]] = None, schema: Optional[str] = None, error_log: Optional[str] = None)[source]#
Creates and manages a default singleton Validator in the module, when called
- log_error(entity: str, error_type: ErrorType, message: str, message_level: MessageLevel = MessageLevel.ERROR)#
Log an error to the list of such errors.
- Parameters:
entity – source of parse error
error_type – ValidationError ErrorType,
message – message string describing the error
message_level – ValidationError MessageLevel
- classmethod set_biolink_model(version: Optional[str])[source]#
Set Biolink Model version of Validator Toolkit
- validate(graph: BaseGraph)[source]#
Validate nodes and edges in a graph.
- Parameters:
graph (kgx.graph.base_graph.BaseGraph) – The graph to validate
- validate_categories(node: str, data: dict, toolkit: Optional[Toolkit] = None)[source]#
Validate
category
field of a given node.
- validate_edge_predicate(subject: str, object: str, data: dict, toolkit: Optional[Toolkit] = None)[source]#
Validate
edge_predicate
field of a given edge.
- validate_edge_properties(subject: str, object: str, data: dict, required_properties: list)[source]#
Checks if all the required edge properties exist for a given edge.
- validate_edge_property_types(subject: str, object: str, data: dict, toolkit: Optional[Toolkit] = None)[source]#
Checks if edge properties have the expected value type.
- validate_edge_property_values(subject: str, object: str, data: dict)[source]#
Validate an edge property’s value.
- validate_edges(graph: BaseGraph)[source]#
Validate all the edges in a graph.
This method validates for the following, - Edge properties - Edge property type - Edge property value type - Edge predicate
- Parameters:
graph (kgx.graph.base_graph.BaseGraph) – The graph to validate
- validate_node_properties(node: str, data: dict, required_properties: list)[source]#
Checks if all the required node properties exist for a given node.
- validate_node_property_types(node: str, data: dict, toolkit: Optional[Toolkit] = None)[source]#
Checks if node properties have the expected value type.
- validate_nodes(graph: BaseGraph)[source]#
Validate all the nodes in a graph.
This method validates for the following, - Node properties - Node property type - Node property value type - Node categories
- Parameters:
graph (kgx.graph.base_graph.BaseGraph) – The graph to validate