biolinkml

Biolink modeling language

This project is maintained by biolink

Pyversions PyPi

Binder Link

biolinkml - biolink modeling language

biolinkml is a general purpose modeling language following object-oriented and ontological principles. Models are authored in YAML. A variety of artefacts can be generated from the model, including ShEx, JSON-Schema, OWL, Python dataclasses, UML diagrams, Markdown pages for deployment in a GitHub pages site, and more.

biolinkml is used for development of the BioLink Model, but the framework is general purpose and can be used for any kind of modeling.

This documentation is best seen via the biolinkml site but can also be viewed via the GitHub repository

Quickstart docs:

Further details about the general design of BiolinkML are in the Biolink Modeling Language Specification.

For an example, see the Jupyter notebook example

Installation

This project uses pipenv to install. Some IDE’s like PyCharms also have direct support for pipenv. Once pipenv is running, the project may be installed:

> pipenv install biolinkml

Language Features

Examples

biolinkml can be used as a modeling language in its own right, or it can be compiled to other schema/modeling languages

We use a basic schema for illustrative purposes:

id: http://example.org/sample/organization
name: organization

types:
  yearCount:
    base: int
    uri: xsd:int
  string:
    base: str
    uri: xsd:string

classes:

  organization:
    slots:
      - id
      - name
      - has boss

  employee:
    description: A person
    slots:
      - id
      - first name
      - last name
      - aliases
      - age in years
    slot_usage:
      last name :
        required: true

  manager:
    description: An employee who manages others
    is_a: employee
    slots:
      - has employees

slots:
  id:
    description: Unique identifier of a person
    identifier: true

  name:
    description: human readable name
    range: string

  aliases:
    is_a: name
    description: An alternative name
    multivalued: true

  first name:
    is_a: name
    description: The first name of a person

  last name:
    is_a: name
    description: The last name of a person

  age in years:
    description: The age of a person if living or age of death if not
    range: yearCount

  has employees:
    range: employee
    multivalued: true
    inlined: true

  has boss:
    range: manager
    inlined: true

Note this schema does not illustrate the more advanced features of blml

Generators

See

JSON Schema

JSON Schema is a schema language for JSON documents

JSON schema can be derived from a biolinkml schema, for example:

pipenv run gen-json-schema examples/organization.yaml

Output: examples/organization.schema.json

Note that any JSON that conforms to the derived JSON-Schema can be converted to RDF using the derived JSON-LD context.

JSON-LD Context

JSON-LD contexts provide a mapping from JSON to RDF

A JSON-LD context can be derived from a biolinkml schema, for example:

pipenv run gen-jsonld-context examples/organization.yaml

Output: examples/organization.context.jsonld

You can control this via prefixes declarations and default_curi_maps.

Any JSON that conforms to the derived JSON-Schema (see above) can be converted to RDF using this context. See the Jupyter notebook example for an example.

You can also combine a JSON instance file with a JSON-LD context using simple code or a tool like jq

jq -s '.[0] * .[1]' examples/organization-data.json examples/organization.context.jsonld > examples/organization-data.jsonld

You can then use a standard JSON-LD conversion file to make other RDF syntaxes, e.g.

riot examples/organization-data.jsonld > examples/organization-data.nt

See examples/organization-data.nt

Python DataClasses

pipenv run gen-py-classes examples/organization.yaml > examples/organization.py

See examples/organization.py

For example:

@dataclass
class Organization(YAMLRoot):
    _inherited_slots: ClassVar[List[str]] = []

    class_class_uri: ClassVar[URIRef] = URIRef("http://example.org/sample/organization/Organization")
    class_class_curie: ClassVar[str] = None
    class_name: ClassVar[str] = "organization"
    class_model_uri: ClassVar[URIRef] = URIRef("http://example.org/sample/organization/Organization")

    id: Union[str, OrganizationId]
    name: Optional[str] = None
    has_boss: Optional[Union[dict, "Manager"]] = None

    def __post_init__(self, **kwargs: Dict[str, Any]):
        if self.id is None:
            raise ValueError(f"id must be supplied")
        if not isinstance(self.id, OrganizationId):
            self.id = OrganizationId(self.id)
        if self.has_boss is not None and not isinstance(self.has_boss, Manager):
            self.has_boss = Manager(self.has_boss)
        super().__post_init__(**kwargs)

For more details see PythonGenNotes

The python object can be direcly serialized as RDF. See the Jupyter notebook example for an example.

ShEx

ShEx - Shape Expressions Langauge

pipenv run gen-shex examples/organization.yaml > examples/organization.shex

BASE <http://example.org/sample/organization/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX xsd1: <http://example.org/UNKNOWN/xsd/>


<YearCount> xsd1:int

<String> xsd1:string

<Employee>  (
    CLOSED {
       (  $<Employee_tes> (  <first_name> @<String> ? ;
             <last_name> @<String> ;
             <aliases> @<String> * ;
             <age_in_years> @<YearCount> ?
          ) ;
          rdf:type [ <Employee> ]
       )
    } OR @<Manager>
)

<Manager> CLOSED {
    (  $<Manager_tes> (  &<Employee_tes> ;
          rdf:type [ <Employee> ] ? ;
          <has_employees> @<Employee> *
       ) ;
       rdf:type [ <Manager> ]
    )
}

<Organization> CLOSED {
    (  $<Organization_tes> (  <name> @<String> ? ;
          <has_boss> @<Manager> ?
       ) ;
       rdf:type [ <Organization> ]
    )
}

See examples/organization.shex for full output

OWL

Web Ontology Language OWL

pipenv run gen-owl examples/organization.yaml > examples/organization.owl.ttl

...
<http://example.org/sample/organization/Organization> a owl:Class,
        meta:ClassDefinition ;
    rdfs:label "organization" ;
    rdfs:subClassOf [ a owl:Restriction ;
            owl:onClass <http://example.org/sample/organization/String> ;
            owl:onProperty <http://example.org/sample/organization/id> ;
            owl:qualifiedCardinality 1 ],
        [ a owl:Restriction ;
            owl:maxQualifiedCardinality 1 ;
            owl:onClass <http://example.org/sample/organization/String> ;
            owl:onProperty <http://example.org/sample/organization/name> ],
        [ a owl:Restriction ;
            owl:maxQualifiedCardinality 1 ;
            owl:onClass <http://example.org/sample/organization/Manager> ;
            owl:onProperty <http://example.org/sample/organization/has_boss> ] .

See examples/organization.owl.ttl for full output

Generating Markdown documentation

pipenv run gen-markdown examples/organization.yaml -d examples/organization-docs/

This will generate a markdown document for every class and slot in the model. These can be used in a static site, e.g. via GitHub pages.

Others

Specification

See the specification.

Also see the semantics folder for an experimental specification in terms of FOL.

FAQ

Why not use X as the modeling framework?

Why invent our own yaml and not use JSON-Schema, SQL, UML, ProtoBuf, OWL, …

each of these is tied to a particular formalisms. E.g. JSON-Schema to trees. OWL to open world logic. There are various impedance mismatches in converting between these. The goal was to develop something simple and more general that is not tied to any one serialization format or set of assumptions.

There are other projects with similar goals, e.g https://github.com/common-workflow-language/schema_salad

It may be possible to align with these.

Why not use X as the datamodel

Here X may be bioschemas, some upper ontology (BioTop), UMLS metathesaurus, bio*, various other attempts to model all of biology in an object model.

Currently as far as we know there is no existing reference datamodel that is flexible enough to be used here.

Type Definitions

typeof:
    domain: type definition
    range: type definition
    description: supertype

  base:
    domain: type definition
    description: python base type that implements this type definition
    inherited: true

  type uri:
    domain: type definition
    range: uri
    alias: uri
    description: the URI to be used for the type in semantic web mappings

  repr:
    domain: type definition
    range: string
    description: the python representation of this type if different than the base type
    inherited: true

Slot Definitions

Developers Notes

Release to Pypi

[A Github action] is set up to automatically release the Pypi package. When it is ready for a new release, create a Github release. The version should be in the vX.X.X format following the semantic versioning specification.

After the release is created, the GitHub action will be triggered to publish to Pypi. The release version will be used to create the Pypi package.

If the Pypi release failed, make fixes, delete the GitHub release, and recreate a release with the same version again.

Additional Documentation

Example Projects