Validator#

The Validator validates an instance of kgx.graph.base_graph.BaseGraph for Biolink Model compliance.

To validate a graph,

from kgx.validator import Validator
v = Validator()
v.validate(graph)

Streaming Data Processing Mode#

For very large graphs, the Validator operation may now successfully process graph data equally well using data streaming (command flag --stream=True) which significantly minimizes the memory footprint required to process such graphs.

kgx.validator#

KGX Validator class

class kgx.validator.Validator(verbose: bool = False, progress_monitor: Optional[Callable[[GraphEntityType, List], None]] = None, schema: Optional[str] = None, error_log: Optional[str] = None)[source]#

Bases: ErrorDetecting

Class for validating a property graph.

The optional ‘progress_monitor’ for the validator should be a lightweight Callable which is injected into the class ‘inspector’ Callable, designed to intercepts node and edge records streaming through the Validator (inside a Transformer.process() call. The first (GraphEntityType) argument of the Callable tags the record as a NODE or an EDGE. The second argument given to the Callable is the current record itself. This Callable is strictly meant to be procedural and should not mutate the record. The intent of this Callable is to provide a hook to KGX applications wanting the namesake function of passively monitoring the graph data stream. As such, the Callable could simply tally up the number of times it is called with a NODE or an EDGE, then provide a suitable (quick!) report of that count back to the KGX application. The Callable (function/callable class) should not modify the record and should be of low complexity, so as not to introduce a large computational overhead to validation!

Parameters:
  • verbose (bool) – Whether the generated report should be verbose or not (default: False)

  • progress_monitor (Optional[Callable[[GraphEntityType, List], None]]) – Function given a peek at the current record being processed by the class wrapped Callable.

  • schema (Optional[str]) – URL to (Biolink) Model Schema to be used for validated (default: None, use default Biolink Model Toolkit schema)

  • error_log (str) – Where to write any graph processing error message (stderr, by default)

analyse_edge(u, v, k, data)[source]#

Analyse edge

analyse_node(n, data)[source]#

Analyse Node

clear_errors()#

Clears the current error log list

static get_all_prefixes(jsonld: Optional[Dict] = None) set[source]#

Get all prefixes from Biolink Model JSON-LD context.

It also sets self.prefixes for subsequent access.

Parameters:

jsonld (Optional[Dict]) – The JSON-LD context

Returns:

A set of prefixes

Return type:

Optional[Dict]

classmethod get_default_model_version()[source]#

Get the Default Biolink Model version

get_errors(level: Optional[str] = None) Dict#

Get the index list of distinct error messages.

Parameters:

level (str) – Optional filter (case insensitive) name of error message level (generally either “Error” or “Warning”)

Returns:

A raw dictionary of entities indexed by [message_level][error_type][message] or only just [error_type][message] specific to a given message level if the optional level filter is given

Return type:

Dict

static get_required_edge_properties(toolkit: Optional[Toolkit] = None) list[source]#

Get all properties for an edge that are required, as defined by Biolink Model.

Parameters:

toolkit (Optional[Toolkit]) – Optional externally provided toolkit (default: use Validator class defined toolkit)

Returns:

A list of required edge properties

Return type:

list

static get_required_node_properties(toolkit: Optional[Toolkit] = None) list[source]#

Get all properties for a node that are required, as defined by Biolink Model.

Parameters:

toolkit (Optional[Toolkit]) – Optional externally provided toolkit (default: use Validator class defined toolkit)

Returns:

A list of required node properties

Return type:

list

classmethod get_the_validator(verbose: bool = False, progress_monitor: Optional[Callable[[GraphEntityType, List], None]] = None, schema: Optional[str] = None, error_log: Optional[str] = None)[source]#

Creates and manages a default singleton Validator in the module, when called

classmethod get_toolkit() Toolkit[source]#

Get the current default Validator Toolkit

get_validating_toolkit()[source]#

Get Validating Biolink Model toolkit

get_validation_model_version()[source]#

Get Validating Biolink Model version

log_error(entity: str, error_type: ErrorType, message: str, message_level: MessageLevel = MessageLevel.ERROR)#

Log an error to the list of such errors.

Parameters:
  • entity – source of parse error

  • error_type – ValidationError ErrorType,

  • message – message string describing the error

  • message_level – ValidationError MessageLevel

Set Biolink Model version of Validator Toolkit

validate(graph: BaseGraph)[source]#

Validate nodes and edges in a graph.

Parameters:

graph (kgx.graph.base_graph.BaseGraph) – The graph to validate

validate_categories(node: str, data: dict, toolkit: Optional[Toolkit] = None)[source]#

Validate category field of a given node.

Parameters:
  • node (str) – Node identifier

  • data (dict) – Node properties

  • toolkit (Optional[Toolkit]) – Optional externally provided toolkit (default: use Validator class defined toolkit)

validate_edge_predicate(subject: str, object: str, data: dict, toolkit: Optional[Toolkit] = None)[source]#

Validate edge_predicate field of a given edge.

Parameters:
  • subject (str) – Subject identifier

  • object (str) – Object identifier

  • data (dict) – Edge properties

  • toolkit (Optional[Toolkit]) – Optional externally provided toolkit (default: use Validator class defined toolkit)

validate_edge_properties(subject: str, object: str, data: dict, required_properties: list)[source]#

Checks if all the required edge properties exist for a given edge.

Parameters:
  • subject (str) – Subject identifier

  • object (str) – Object identifier

  • data (dict) – Edge properties

  • required_properties (list) – Required edge properties

validate_edge_property_types(subject: str, object: str, data: dict, toolkit: Optional[Toolkit] = None)[source]#

Checks if edge properties have the expected value type.

Parameters:
  • subject (str) – Subject identifier

  • object (str) – Object identifier

  • data (dict) – Edge properties

  • toolkit (Optional[Toolkit]) – Optional externally provided toolkit (default: use Validator class defined toolkit)

validate_edge_property_values(subject: str, object: str, data: dict)[source]#

Validate an edge property’s value.

Parameters:
  • subject (str) – Subject identifier

  • object (str) – Object identifier

  • data (dict) – Edge properties

validate_edges(graph: BaseGraph)[source]#

Validate all the edges in a graph.

This method validates for the following, - Edge properties - Edge property type - Edge property value type - Edge predicate

Parameters:

graph (kgx.graph.base_graph.BaseGraph) – The graph to validate

validate_node_properties(node: str, data: dict, required_properties: list)[source]#

Checks if all the required node properties exist for a given node.

Parameters:
  • node (str) – Node identifier

  • data (dict) – Node properties

  • required_properties (list) – Required node properties

validate_node_property_types(node: str, data: dict, toolkit: Optional[Toolkit] = None)[source]#

Checks if node properties have the expected value type.

Parameters:
  • node (str) – Node identifier

  • data (dict) – Node properties

  • toolkit (Optional[Toolkit]) – Optional externally provided toolkit (default: use Validator class defined toolkit)

validate_node_property_values(node: str, data: dict)[source]#

Validate a node property’s value.

Parameters:
  • node (str) – Node identifier

  • data (dict) – Node properties

validate_nodes(graph: BaseGraph)[source]#

Validate all the nodes in a graph.

This method validates for the following, - Node properties - Node property type - Node property value type - Node categories

Parameters:

graph (kgx.graph.base_graph.BaseGraph) – The graph to validate

write_report(outstream: Optional[TextIO] = None, level: Optional[str] = None) None#

Write error get_errors to a file

Parameters:
  • outstream (TextIO) – The stream to which to write

  • level (str) – Optional filter (case insensitive) name of error message level (generally either “Error” or “Warning”)