GO Annotations (GOA) Reference Ingest Guide
Source Information
InfoRes ID: infores:goa
Description: GO Annotations connect genes to a Gene Ontology term that describes a molecular function it enables, a biological process in which it participates, or a cellular component in which it is located. Most are produced through rigorous manual curation of the literature, although some are based on automated pipelines that assign GO terms based on things like orthology or sequence similarity.
Citations: - Data Archive: https://zenodo.org/records/10536401 - Publication: https://doi.org/10.1093/nar/gky1055
Data Access Locations: - All downloads: https://geneontology.org/docs/download-go-annotations/ - Commonly studied organisms: https://current.geneontology.org/products/pages/downloads.html
Data Provision Mechanisms: file_download
Data Formats: tsv
Data Versioning and Releases: Release cadence: Approximately every four weeks, synchronized with UniProtKB. Versioning: By date - each GAF header includes a !Generated: YYYY-MM-DD
line. Release notes: https://geneontology.org/docs/download-go-annotations/ and https://geneontology.org/docs/go-annotation-file-gaf-format-2.2/. Release archive: https://release.geneontology.org/. Formats: tsv in GAF format (17 columns)
Ingest Information
Ingest Categories: primary_knowledge_provider
Utility: GOA is a rich source of manually curated knowledge about gene function with broad relevance to all Translator queries and use cases.
Scope: This initial ingest of GOA covers molecular function, biological process, and cellular component annotations about human, mouse, and rat genes, including manually curated and electronically inferred content, from GAF files (GPAD and GPI formats not ingested). Other species may be added in future updates to the ingest.
Relevant Files
File Name | Location | Description |
---|---|---|
goa_human.gaf | https://current.geneontology.org/products/pages/downloads.html | Human gene-product to GO term associations (GAF 2.2) |
mgi.gaf | https://current.geneontology.org/products/pages/downloads.html | Mouse gene-product to GO term associations (GAF 2.2) |
rgd.gaf | https://current.geneontology.org/products/pages/downloads.html | Rat gene-product to GO term associations (GAF 2.2) |
Included Content
File Name | Included Records | Fields Used |
---|---|---|
goa_human.gaf | All records included | DB, DB Object ID, DB Object Symbol, Relation, GO ID, DB:Reference(s), Evidence Code, With (or) From, Aspect, DB Object Name, DB Object Type, Taxon |
mgi.gaf | All records included | DB, DB Object ID, DB Object Symbol, Relation, GO ID, DB:Reference(s), Evidence Code, With (or) From, Aspect, DB Object Name, DB Object Type, Taxon |
rgd.gaf | All records included | DB, DB Object ID, DB Object Symbol, Relation, GO ID, DB:Reference(s), Evidence Code, With (or) From, Aspect, DB Object Name, DB Object Type, Taxon |
Future Content Considerations
edge_content: Consider ingesting Gene/Product to GO Term annotations from other taxon (beyond human, mouse, and rat)
edge_content: Consider inclusion of qualifying information (as may be found in the Annotation Extensions, or With or From columns) to existing and new Gene/Product to GO Term annotations
edge_content: Consider ingesting associations between two GO Terms, per the specification at https://wiki.geneontology.org/index.php/Annotation_Relations#Standard_Annotation:_Annotation_Extension_Relations
node_property_content: t.b.d. if we will bring in taxon info about gene/gene product nodes from GOA, or rely on other gene property authorities for this information (e.g. ncbigene)
Target Information
Target InfoRes ID: infores:translator-goa-kgx
Edge Types
Subject Categories | Predicate | Object Categories | Knowledge Level | Agent Type | UI Explanation |
---|---|---|---|---|---|
biolink:Gene, biolink:Protein, biolink:MacromolecularComplex, biolink:RNAProduct | biolink:MolecularActivity | varies | varies | A GO Annotation uses 'enables' predicate when a gene product is solely capable of executing the reported function. | |
biolink:MacromolecularComplex | biolink:MolecularActivity | varies | varies | A GO Annotation uses 'contributes_to' predicate when a gene product is required as part of a macromolecular complex for executing the reported function. | |
biolink:Gene, biolink:Protein, biolink:MacromolecularComplex, biolink:RNAProduct | biolink:BiologicalProcess | varies | varies | A GO Annotation uses 'involved_in' predicate when a gene product's molecular function plays an integral role in the reported biological process. | |
biolink:Gene, biolink:Protein, biolink:MacromolecularComplex, biolink:RNAProduct | biolink:BiologicalProcess | varies | varies | A GO Annotation uses 'acts_upstream_of_or_within' predicate when the mechanism / timing of the gene product's activity relative to the reported biological process is not known, as is the directionality of its effect on the process. | |
biolink:Gene, biolink:Protein, biolink:MacromolecularComplex, biolink:RNAProduct | biolink:BiologicalProcess | varies | varies | A GO Annotation uses 'acts_upstream_of_or_within_positive_effect' predicate when the mechanism / timing of the gene product's activity relative to the reported biological process is not known, but the activity of the gene product has a positive effect on the process. | |
biolink:Gene, biolink:Protein, biolink:MacromolecularComplex, biolink:RNAProduct | biolink:BiologicalProcess | varies | varies | A GO Annotation uses 'acts_upstream_of_or_within_negative_effect' predicate when the mechanism / timing of the gene product's activity relative to the reported biological process is not known, but the activity of the gene product has a negative effect on the process. | |
biolink:Gene, biolink:Protein, biolink:MacromolecularComplex, biolink:RNAProduct | biolink:BiologicalProcess | varies | varies | A GO Annotation uses 'acts_upstream_of' predicate when the a gene product acts through a known mechanism upstream of the reported biological process, does not regulate the process, and the directionality of its effect on the process is not known. | |
biolink:Gene, biolink:Protein, biolink:MacromolecularComplex, biolink:RNAProduct | biolink:BiologicalProcess | varies | varies | A GO Annotation uses 'acts_upstream_of_positive_effect' predicate when a gene product acts through a known mechanism upstream of the reported biological process, does not regulate the process, and the activity of the gene product is required for the process but does not regulate it. | |
biolink:Gene, biolink:Protein, biolink:MacromolecularComplex, biolink:RNAProduct | biolink:BiologicalProcess | varies | varies | A GO Annotation uses 'acts_upstream_of_negative_effect' predicate when a gene product acts through a known mechanism upstream of the reported biological process, does not regulate the process, and the activity of the gene product prevents or reduces the process but does not regulate it. | |
biolink:Gene, biolink:Protein, biolink:MacromolecularComplex, biolink:RNAProduct | biolink:CellularComponent | varies | varies | A GO Annotation uses 'is_active_in' predicate when a gene product is present in and performs its molecular function in the reported cellular component. | |
biolink:Gene, biolink:Protein, biolink:MacromolecularComplex, biolink:RNAProduct | biolink:CellularComponent | varies | varies | A GO Annotation uses 'located_in' predicate when a gene product enables is detected in the reported cellular component. | |
biolink:Gene, biolink:Protein, biolink:RNAProduct | biolink:MacromolecularComplex | varies | varies | A GO Annotation uses 'part_of' predicate when a gene product is a component of the reported macromolecular complex. | |
biolink:Gene, biolink:Protein, biolink:MacromolecularComplex, biolink:RNAProduct | biolink:CellularComponent | varies | varies | A GO Annotation uses 'colocalizes_with' predicate when a gene product has a transient or dynamic association with the reported cellular component. |
Node Types
Node Category | Source Identifier Types | Additional Notes |
---|---|---|
biolink:Gene | MGI, RGD | |
biolink:Protein | UniProtKB accession | |
biolink:MacromolecularComplex | ComplexPortal IDs | |
biolink:RNAProduct | RNAcentral IDs | |
biolink:BiologicalProcess | Gene Ontology IDs (Aspect P) | |
biolink:MolecularActivity | Gene Ontology IDs (Aspect F) | |
biolink:CellularComponent | Gene Ontology IDs (Aspect C) |
Future Modeling Considerations
qualifiers: Introduce qualifier-based representation if/when we decide to ingest any qualifying context on GO annotations
node_properties: If we end up ingesting taxon info for gene nodes, we may have to update the Biolink Model to support this (currently in_taxon is represented as a predicate, and species_context_qualifier as an edge property - but there is no taxon node property)
Provenance Information
Contributors: - Adilbek Bazarkulov: code author - Evan Morris: code support - Adilbek Bazarkulov: code support, domain expertise - Sierra Moxon: data modeling, domain expertise - Matthew Brush: data modeling, domain expertise
Artifacts: - Ingest Survey: https://docs.google.com/spreadsheets/d/18wGm2a0W1oIXm7cn8TZ99xn_aAMJ91SgAsuPDcV-lII/edit?gid=325339947#gid=325339947 - Ingest Ticket: https://github.com/NCATSTranslator/Data-Ingest-Coordination-Working-Group/issues/8