This document describes how Biolink is to be used in the context of RDF, either in storage in a triplestore, or in serialization to one of the RDF syntaxes, such as turtle.
In RDF, a graph is a collection of triples <S P O>
(subject
predicate object). The S and P must be RDF resources (nodes). The O
can be a literal or a resource.
Graphs are organized into collections of Named Graphs. Each triple can be conceived of as a quad <S P O G>
.
Each node in a graph corresponds to an RDF resource.
Biolink Model defines a typology of nodes, all of which inherit from biolink:NamedThing.
Core properties for a node:
The biolink:id MUST be provided and MUST be a CURIE, which maps to the resource IRI/URI
using a standard prefix expansion. The RDF graph MAY include the CURIE short-form represented
with the predicate dcterms:identifier
where the CURIE itself is a literal.
The biolink:name field SHOULD correspond to a concise label for the entity, and maps
to rdfs:label
.
For example, the biolink node with ID MONDO:0001083
and name
Fanconi syndrome
will be expressed in RDF (turtle syntax) as:
PREFIX MONDO: <http://purl.obolibrary.org/obo/MONDO_>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
MONDO:0001083 rdfs:label "Fanconi syndrome"
When the CURIEs are expanded this will be rendered as:
<http://purl.obolibrary.org/obo/MONDO_0001083> <http://www.w3.org/2000/01/rdf-schema#label> "Fanconi syndrome" .
To define the type of a node, you can use the rdf:type
to link to a specific node type. You MAY use
the predicate biolink:category to represent additional categories for that node.
The rdf:type triples MAY be partitioned into separate named graphs. For example, it can be convenient to put the direct rdf:type assertion in the main graph and the inferred/index (ie asserted plus inferred) in a separate ‘inferred’ graph.
You MAY provide as many additional properties as required. These SHOULD come from a registered list of properties for that node type.
An edge maps to an RDF triple where both the subject and object are nodes representing Biolink entities.
RDF reification is used for representing edge properties. RDF* provides a convenient syntax for abstracting over this.
So for example, an edge between x and y with edge label p and an
additional edge property publication=PMID:123
would be represented
in RDF* as:
<<:x :p :y>> bl:publication <http://identifiers.org/pmid/123>.
This is syntactic sugar for the more verbose reification triples:
:x :p :y .
[a rdf:Statement ;
rdf:subject :x ;
rdf:predicate :p ;
rdf:object :y ;
bl:publication http://identifiers.org/pmid/123 ].
The jury is still out on the question whether referring to an RDF* triple also asserts the triple. Therefore in some RDF* implementations you need to assert it explicitly if you need to have it as a direct triple, similar to the RDF Reification example below:
:x :p :y.
<<:x :p :y>> bl:publication <http://identifiers.org/pmid/123>.
(For example, GraphDB’s RDF* does not assert automatically.)
See biolink:Association for a taxonomy of associations defined by the model, and to see a list of generic properties that are associated with an edge.
The mapping is similar to Mapping to Neo4j. Differences include: