Human Phenotype Ontology Annotations
Source Information
InfoRes ID: infores:hpo-annotations
Description: The Human Phenotype Ontology (HPO) provides a standard vocabulary of phenotypic abnormalities encountered in human disease. Each term in the HPO describes a phenotypic abnormality, such as Atrial septal defect. The HPO is currently being developed using the medical literature, Orphanet, DECIPHER, and OMIM. HPO currently contains over 18,000 terms and over 156,000 annotations to hereditary diseases. The HPO project and others have developed software for phenotype-driven differential diagnostics, genomic diagnostics, and translational research. The Human Phenotype Ontology group curates and assembles over 115,000 HPO-related annotations ("HPOA") to hereditary diseases using the HPO ontology. Here we create Biolink associations between diseases and phenotypic features, together with their evidence, and age of onset and frequency (if known). Disease annotations here are also cross-referenced to the MONarch Disease Ontology (MONDO) (https://mondo.monarchinitiative.org/). There are four HPOA ingests ('disease-to-phenotype' (includes capture of disease modes of inheritance, 'gene-to-phenotype' and 'gene-to-disease') that parse out records from the HPO Phenotype Annotation File (http://purl.obolibrary.org/obo/hp/hpoa/phenotype.hpoa).
Citations: - https://doi.org/10.1093/nar/gkaa1043
Data Access Locations: - https://hpo.jax.org/data/annotations
Data Provision Mechanisms: file_download, api_endpoint
Data Formats: tsv, other
Data Versioning and Releases: GitHub managed releases at https://github.com/obophenotype/human-phenotype-ontology/releases No consistent cadence for releases. Versioning is based on the month and year of the release
Additional Notes: None
Ingest Information
Ingest Categories: primary_knowledge_provider
Utility: The HPO and associated annotations are a flagship product of the Monarch Initiative (https://monarchinitiative.org/), an NIH-supported international consortium dedicated to semantic integration of biomedical and model organism data with the ultimate goal of improving biomedical research. The human phenotype/disease/gene knowledge integration aligns well with the general mission of the Biomedical Data Translator. As a consequence, several members of the Monarch Initiative are direct participants in the Biomedical Data Translator, with Monarch data forming one primary knowledge source contributing to Translator knowledge graphs.
Scope: Covers curated Disease, Phenotype and Genes relationships annotated with Human Phenotype Ontology terms.
Relevant Files
File Name | Location | Description |
---|---|---|
phenotype.hpoa | https://hpo.jax.org/data/annotations | disease to HPO phenotype annotations, including inheritance information |
genes_to_disease.txt | https://hpo.jax.org/data/annotations | gene to HPO disease annotations |
genes_to_phenotype.txt | https://hpo.jax.org/data/annotations | gene to HPO phenotype annotations |
Included Content
File Name | Included Records | Fields Used |
---|---|---|
phenotype.hpoa | Disease to Phenotype relationships (i.e., rows with 'aspect' == 'P') | database_id, qualifier, hpo_id, reference, evidence, onset, frequency, sex, aspect |
phenotype.hpoa | Disease "Mode of Inheritance" relationships (i.e., rows with 'aspect' == 'I') represented as node properties rather than edges | database_id, qualifier, hpo_id, reference, evidence, onset, frequency, sex, aspect |
genes_to_disease.txt | Mendelian Gene to Disease relationships (i.e., rows with 'association_type' == 'MENDELIAN') | ncbi_gene_id, gene_symbol, association_type, disease_id, source |
genes_to_disease.txt | Polygenic Gene to Disease relationships (i.e., rows with 'association_type' == 'POLYGENIC') | ncbi_gene_id, gene_symbol, association_type, disease_id, source |
genes_to_disease.txt | General Gene Contributions to Disease relationships (i.e., rows with 'association_type' == 'UNKNOWN') | ncbi_gene_id, gene_symbol, association_type, disease_id, source |
genes_to_phenotype.txt | Records where we determine that the reported G-P association was inferred over a G-D associated type with the value "MENDELIAN" | ncbi_gene_id, gene_symbol, hpo_id, hpo_name, frequency, disease_id |
Filtered Content
File Name | Filtered Records | Rationale |
---|---|---|
genes_to_phenotype.txt | Records where we determine that the reported G-P association was inferred over a G-D associated type with the value "POLYGENIC" or "UNKNOWN" | HPO will infer a Gene-Phenotype association G1-P1 in cases where G1 causes, contributes_to, or is associated with D1, and D1 is associated with a Phenotype P1. This logic holds for Mendelian disease where a single gene is causal and thus responsible for all associated phenotypes. It does not necessarily hold for Polygenic or Unknown diseases where the gene may be one of many contributing factors, and thus does not necessarily contribute to or have an association with each phenotype of the disease. |
Future Content Considerations
edge_content: Consider bringing back G-P associations based on inferences over Polygenic or Unknown Diseases if we establish a confidence annotation paradigm that lets us indicate these inferences to be weaker than those inferred over Mendelian diseases where the Gene is individually causal for the disease and all of its phenotypes. - Relevant files: genes_to_phenotype.txt
Additional Notes: None
Target Information
Target InfoRes ID: infores:translator-hpo-annotations-kgx
Edge Types
Subject Categories | Predicate | Object Categories | Knowledge Level | Agent Type | UI Explanation |
---|---|---|---|---|---|
biolink:Disease | biolink:PhenotypicFeature | knowledge_assertion | manual_agent | HPO curators manually review clinical data and published evidence to determine phenotypes that manifest in a Disease, which are reported using the has_phenotype predicate. | |
biolink:Gene | biolink:Disease | knowledge_assertion | manual_agent | HPOA aggregates manually curated Gene-Disease associations from sources like Orphanet and MIM2Gene and DECIPHER. For Mendelian diseases with a single causal gene, we report that a genetic variant form of the gene 'causes' the disease. | |
biolink:Gene | biolink:Disease | knowledge_assertion | manual_agent | HPOA aggregates manually curated Gene-Disease associations from sources like Orphanet and MIM2Gene. For polygenic diseases with multiple contributing genes, we report that a genetic variant form of the gene 'contributes to' the disease. | |
biolink:Gene | biolink:Disease | knowledge_assertion | manual_agent | HPOA aggregates manually curated Gene-Disease associations from sources like Orphanet and MIM2Gene. When the genetic etiology of the diseases is not sufficiently specified, the relationship is reported using the 'associated_with' predicate. | |
biolink:Gene | biolink:PhenotypicFeature | logical_entailment | automated_agent | HPOA provides direct Gene-Phenotype associations between genes with variants causing or contributing to a disease, and each phenotype associated with the disease. For Mendelian diseases with a single causal gene, we report that a genetic variant form of the gene 'causes' each of the phenotypes associated with the disease. |
Node Types
Node Category | Source Identifier Types | Additional Notes |
---|---|---|
biolink:Disease | OMIM, ORPHANET, DECIPHER | None |
biolink:PhenotypicFeature | HP | None |
biolink:Gene | NCBIGene | None |
biolink:GeneticInheritance | HP | None |
Future Modeling Considerations
spoq_pattern: Consider alternate patterns for representing G-causes-D and G-contributes_to-D associations where we place more semantics into predicates, per https://github.com/NCATSTranslator/Data-Ingest-Coordination-Working-Group/issues/22 Should we consider creating support paths in our data/graphs, for the G-D-P hops over which HPO infers G-P associations? (e.g. GENE1 -causes-> DISEASE1 -has_phenotype-> PHENO1 ----> GENE1 -causes-> PHENO1
)
Provenance Information
Contributors: - Richard Bruskiewich - data modeling, domain expertise, code author - Kevin Schaper - code author - Sierra Moxon - data modeling, domain expertise, code support - Matthew Brush - data modeling, domain expertise
Artifacts: - Ingest Survey (https://docs.google.com/spreadsheets/d/1R9z-vywupNrD_3ywuOt_sntcTrNlGmhiUWDXUdkPVpM/edit?gid=0#gid=0) - Ingest Ticket (https://github.com/NCATSTranslator/Data-Ingest-Coordination-Working-Group/issues/24) - Modeling Ticket (https://github.com/NCATSTranslator/Data-Ingest-Coordination-Working-Group/issues/22)