This section describes how a Neo4j database is mapped to the Biolink Model.
Although specific to Neo4j, these recommendations should hold for any Property Graph (PG) model, e.g a Python networkx graph (specifically a MultiDiGraph).
For mapping to RDF graphs refer to Mapping to RDF.
All nodes in the Neo4j database should be to NamedThing.
Biolink Model defines a typology of nodes, all of which inherit from NamedThing.
Nodes in Neo4j (and property graphs in general) may have node properties.
The NamedThing class defines core properties for a node, plus additional (optional) ones.
Core properties for a node:
id field MUST be provided and MUST be a CURIE.
Note: this is distinct from the internal autogenerated
idfield in Neo4j.
name field SHOULD correspond to a concise display label for the entity. For example asthma or Wnt signaling pathway. If the node is an ontology class then name will correspond to the
rdfs:label of that class.
Any Neo4j instance MAY provide as many additional properties as required. These SHOULD come from a registered list of properties for that node type.
Note: While the CURIE for a property is
biolink:namethat does not necessarily mean the property name has to be
biolink:namein Neo4j. Instead, the prefix part of the property can be omitted such that the property name is just
Nodes in Neo4j can be tagged with label(s) indicating a grouping to which the node belongs. The
category field in the model MUST map to a Neo4j label. The Biolink Model class name in CamelCase MUST be used for the
Additionally, the Neo4j implementation MAY have additional categories which are super-classes of a specific category.
Consequently, any number of additional local labels MAY also be used.
In addition to Neo4j labels, additional subclass of edges may be used to connect a node to an ontology class node.
Implementation Note: Cypher queries that use labels are optimized for speed, under the assumption that an index has already been generated in Neo4j for said label(s).
Terminology note: The term
labelis overloaded. In RDF it usually denotes the name of an entity (
rdfs:label). For this reason we use category as the property name in Biolink Model.
Each edge in the Neo4j graph should have an edge label or relationship type that is a sub-property of related_to.
For example, two protein nodes may be related via physically_interacts_with relationship type.
Note: Always use snake_case to represent edge labels.
The set of edge labels is deliberately kept minimal. This is partly for practical reasons. Neo4j has no easy way to automatically use sub-property relationship types in Cypher queries. For example, if we have a deep hierarchy of interaction relationships including specific physical interactions such as ‘phosphorylates’, then queries for any kind of interaction must be expanded to include all sub-property relationship types.
More precise relationship types are allowed through the use of relation property.
Neo4j uses a property graph model, where any number of properties can be attached to an edge. Some properties may be generic, while some may only pertain to particular kinds of relationship type.
Edges SHOULD have a relation property which encodes the most specific relationship type for the relationship. This MAY correspond to the edge label, or it MAY be more specific. The
relation property MUST be encoded as a CURIE or IRI.
Edges can also have generic properties that changes the meaning of the edge itself. For example, the generic property negated which logically negates the assertion defined by the edge.
Biolink Model includes a hierarchy of Association.
Note: This is distinct from the relation hierarchy, although in some cases they parallel one another.
For example, the relation hierarchy has a generic relation part_of. This can be used in different contexts. For example, connecting two anatomical entities, or connecting a pathway to a sub-pathway.
Different association types may have different properties associated with them.
The core properties are:
Note: 3 of these properties are builtin, so these do not correspond to edge properties (but they may, for the sake of verbosity).
edge_label is a snake_case human-readable high-level grouping relationship.
In contrast, relation is a CURIE from a more refined relationship ontology like RO or SIO.