Skip to content

GO Annotations (GOA) Reference Ingest Guide

Source Information

InfoRes ID: infores:goa

Description: GO Annotations connect genes to a Gene Ontology term that describes a molecular function it enables, a biological process in which it participates, or a cellular component in which it is located. Most are produced through rigorous manual curation of the literature, although some are based on automated pipelines that assign GO terms based on things like orthology or sequence similarity.

Citations: - Data Archive: https://zenodo.org/records/10536401 - Publication: https://doi.org/10.1093/nar/gky1055

Data Access Locations: - All downloads: https://geneontology.org/docs/download-go-annotations/ - Commonly studied organisms: https://current.geneontology.org/products/pages/downloads.html

Data Provision Mechanisms: file_download

Data Formats: tsv

Data Versioning and Releases: Release cadence: Approximately every four weeks, synchronized with UniProtKB. Versioning: By date - each GAF header includes a !Generated: YYYY-MM-DD line. Release notes: https://geneontology.org/docs/download-go-annotations/ and https://geneontology.org/docs/go-annotation-file-gaf-format-2.2/. Release archive: https://release.geneontology.org/. Formats: tsv in GAF format (17 columns)

Ingest Information

Ingest Categories: primary_knowledge_provider

Utility: GOA is a rich source of manually curated knowledge about gene function with broad relevance to all Translator queries and use cases.

Scope: This initial ingest of GOA covers molecular function, biological process, and cellular component annotations about human, mouse, and rat genes, including manually curated and electronically inferred content, from GAF files (GPAD and GPI formats not ingested). Other species may be added in future updates to the ingest.

Relevant Files

File Name Location Description
goa_human.gaf https://current.geneontology.org/products/pages/downloads.html Human gene-product to GO term associations (GAF 2.2)
mgi.gaf https://current.geneontology.org/products/pages/downloads.html Mouse gene-product to GO term associations (GAF 2.2)
rgd.gaf https://current.geneontology.org/products/pages/downloads.html Rat gene-product to GO term associations (GAF 2.2)

Included Content

File Name Included Records Fields Used
goa_human.gaf All records included DB, DB Object ID, DB Object Symbol, Relation, GO ID, DB:Reference(s), Evidence Code, With (or) From, Aspect, DB Object Name, DB Object Type, Taxon
mgi.gaf All records included DB, DB Object ID, DB Object Symbol, Relation, GO ID, DB:Reference(s), Evidence Code, With (or) From, Aspect, DB Object Name, DB Object Type, Taxon
rgd.gaf All records included DB, DB Object ID, DB Object Symbol, Relation, GO ID, DB:Reference(s), Evidence Code, With (or) From, Aspect, DB Object Name, DB Object Type, Taxon

Future Content Considerations

edge_content: Consider ingesting Gene/Product to GO Term annotations from other taxon (beyond human, mouse, and rat)

edge_content: Consider inclusion of qualifying information (as may be found in the Annotation Extensions, or With or From columns) to existing and new Gene/Product to GO Term annotations

edge_content: Consider ingesting associations between two GO Terms, per the specification at https://wiki.geneontology.org/index.php/Annotation_Relations#Standard_Annotation:_Annotation_Extension_Relations

node_property_content: t.b.d. if we will bring in taxon info about gene/gene product nodes from GOA, or rely on other gene property authorities for this information (e.g. ncbigene)

Target Information

Target InfoRes ID: infores:translator-goa-kgx

Edge Types

Subject Categories Predicate Object Categories Knowledge Level Agent Type UI Explanation
biolink:Gene, biolink:Protein, biolink:MacromolecularComplex, biolink:RNAProduct biolink:MolecularActivity varies varies A GO Annotation uses 'enables' predicate when a gene product is solely capable of executing the reported function.
biolink:MacromolecularComplex biolink:MolecularActivity varies varies A GO Annotation uses 'contributes_to' predicate when a gene product is required as part of a macromolecular complex for executing the reported function.
biolink:Gene, biolink:Protein, biolink:MacromolecularComplex, biolink:RNAProduct biolink:BiologicalProcess varies varies A GO Annotation uses 'involved_in' predicate when a gene product's molecular function plays an integral role in the reported biological process.
biolink:Gene, biolink:Protein, biolink:MacromolecularComplex, biolink:RNAProduct biolink:BiologicalProcess varies varies A GO Annotation uses 'acts_upstream_of_or_within' predicate when the mechanism / timing of the gene product's activity relative to the reported biological process is not known, as is the directionality of its effect on the process.
biolink:Gene, biolink:Protein, biolink:MacromolecularComplex, biolink:RNAProduct biolink:BiologicalProcess varies varies A GO Annotation uses 'acts_upstream_of_or_within_positive_effect' predicate when the mechanism / timing of the gene product's activity relative to the reported biological process is not known, but the activity of the gene product has a positive effect on the process.
biolink:Gene, biolink:Protein, biolink:MacromolecularComplex, biolink:RNAProduct biolink:BiologicalProcess varies varies A GO Annotation uses 'acts_upstream_of_or_within_negative_effect' predicate when the mechanism / timing of the gene product's activity relative to the reported biological process is not known, but the activity of the gene product has a negative effect on the process.
biolink:Gene, biolink:Protein, biolink:MacromolecularComplex, biolink:RNAProduct biolink:BiologicalProcess varies varies A GO Annotation uses 'acts_upstream_of' predicate when the a gene product acts through a known mechanism upstream of the reported biological process, does not regulate the process, and the directionality of its effect on the process is not known.
biolink:Gene, biolink:Protein, biolink:MacromolecularComplex, biolink:RNAProduct biolink:BiologicalProcess varies varies A GO Annotation uses 'acts_upstream_of_positive_effect' predicate when a gene product acts through a known mechanism upstream of the reported biological process, does not regulate the process, and the activity of the gene product is required for the process but does not regulate it.
biolink:Gene, biolink:Protein, biolink:MacromolecularComplex, biolink:RNAProduct biolink:BiologicalProcess varies varies A GO Annotation uses 'acts_upstream_of_negative_effect' predicate when a gene product acts through a known mechanism upstream of the reported biological process, does not regulate the process, and the activity of the gene product prevents or reduces the process but does not regulate it.
biolink:Gene, biolink:Protein, biolink:MacromolecularComplex, biolink:RNAProduct biolink:CellularComponent varies varies A GO Annotation uses 'is_active_in' predicate when a gene product is present in and performs its molecular function in the reported cellular component.
biolink:Gene, biolink:Protein, biolink:MacromolecularComplex, biolink:RNAProduct biolink:CellularComponent varies varies A GO Annotation uses 'located_in' predicate when a gene product enables is detected in the reported cellular component.
biolink:Gene, biolink:Protein, biolink:RNAProduct biolink:MacromolecularComplex varies varies A GO Annotation uses 'part_of' predicate when a gene product is a component of the reported macromolecular complex.
biolink:Gene, biolink:Protein, biolink:MacromolecularComplex, biolink:RNAProduct biolink:CellularComponent varies varies A GO Annotation uses 'colocalizes_with' predicate when a gene product has a transient or dynamic association with the reported cellular component.

Node Types

Node Category Source Identifier Types Additional Notes
biolink:Gene MGI, RGD
biolink:Protein UniProtKB accession
biolink:MacromolecularComplex ComplexPortal IDs
biolink:RNAProduct RNAcentral IDs
biolink:BiologicalProcess Gene Ontology IDs (Aspect P)
biolink:MolecularActivity Gene Ontology IDs (Aspect F)
biolink:CellularComponent Gene Ontology IDs (Aspect C)

Future Modeling Considerations

qualifiers: Introduce qualifier-based representation if/when we decide to ingest any qualifying context on GO annotations

node_properties: If we end up ingesting taxon info for gene nodes, we may have to update the Biolink Model to support this (currently in_taxon is represented as a predicate, and species_context_qualifier as an edge property - but there is no taxon node property)

Provenance Information

Contributors: - Adilbek Bazarkulov: code author - Evan Morris: code support - Adilbek Bazarkulov: code support, domain expertise - Sierra Moxon: data modeling, domain expertise - Matthew Brush: data modeling, domain expertise

Artifacts: - Ingest Survey: https://docs.google.com/spreadsheets/d/18wGm2a0W1oIXm7cn8TZ99xn_aAMJ91SgAsuPDcV-lII/edit?gid=325339947#gid=325339947 - Ingest Ticket: https://github.com/NCATSTranslator/Data-Ingest-Coordination-Working-Group/issues/8