Skip to content

Gene Ontology Causal Activity Models (GO-CAM) Reference Ingest Guide

Source Information

InfoRes ID: infores:gocam

Description: GO-CAM (Gene Ontology Causal Activity Models) is a framework that extends standard GO annotations by connecting molecular functions, biological processes, and cellular components into causally linked pathways. GO-CAMs provide explicit causal connections between gene products and their activities within specific biological contexts, enabling more detailed representation of biological mechanisms than traditional GO annotations.

Citations: - Thomas PD, Hill DP, Mi H, Osumi-Sutherland D, Van Auken K, Carbon S, Balhoff JP, Albou LP, Good B, Gaudet P, Lewis SE, Mungall CJ. Gene Ontology Causal Activity Modeling (GO-CAM) moves beyond GO annotations to structured descriptions of biological functions and systems. Nat Genet. 2019 Oct;51(10):1429-1433. doi: 10.1038/s41588-019-0500-1

Data Access Locations: - GO-CAMs are downloaded model by model, via kghub-downloader that takes an index file that shows all possible gocams by identifier, and then iterates one by one through the identifiers, downloading each gocam. - Index: https://s3.amazonaws.com/provider-to-model.json - URL pattern: https://live-go-cam.geneontology.io/product/yaml/go-cam/[id].json

Data Provision Mechanisms: file_download

Data Formats: json

Data Versioning and Releases: New GO-CAMs are added to the index weekly. Releases page / change log: https://geneontology.org/docs/download-go-cams/. Latest status page: https://geneontology.org/docs/go-cam-overview/

Ingest Information

Ingest Categories: primary_knowledge_provider

Utility: GO-CAMs provide structured causal relationships between gene products that are essential for pathway analysis, mechanistic understanding, and systems biology approaches in Translator. Unlike traditional GO annotations, GO-CAMs explicitly model how gene products causally regulate each other, making them valuable for reasoning about biological mechanisms and predicting downstream effects of perturbations.

Scope: This initial ingest focuses on gene-to-gene causal regulatory relationships extracted from GO-CAM models. The scope includes direct regulatory relationships (positive and negative regulation) between gene products, with associated molecular function, biological process, and cellular component annotations for context.

Relevant Files

File Name Location Description
provider-to-model.json https://s3.amazonaws.com/provider-to-model.json index file of models
5a7e68a100001817.json, etc. https://live-go-cam.geneontology.io/product/yaml/go-cam/[id].json each model individually

Included Content

File Name Included Records Fields Used
5a7e68a100001817.json, etc. Gene to Gene edges source, target, causal_predicate

Filtered Content

File Name Filtered Records Rationale
GO-CAM models GO Term nodes and non-gene entities Initial focus on gene-gene relationships; GO Terms and other entity types will be included in future iterations
GO-CAM models Edges without clear causal predicates Only including edges with explicit causal relationship predicates to ensure high-quality causal assertions
GO-CAM models Non-human/mouse models Species filtering applied to include only human (NCBITaxon:9606) and mouse (NCBITaxon:10090) models based on model_info.taxon field

Future Content Considerations

edge_content: Currently, we are excluding GOTerms from the edges. This is just a first pass at the GO-CAMs to get the Gene to Gene edges in place. Future iterations will include the GOTerms, and potentially other edge types.

node_property_content: Includes only the gene identifier and category of 'Gene'. (Note, there are likely nodes that represent Genes or Gene Products, but we are not distinguishing between these at this time because we will NodeNormalize the category and id.)

edge_property_content: TODO: plenty of work to do here to make edges like this, Biolink compliant past the source, target, and causal_predicate which are mapped in this ingest to 'biolink:subject', 'biolink:object', and the appropriate 'biolink:predicate' respectively. The edge properties are not currently being mapped to Biolink Model edge properties, but this will be done in future iterations.

Target Information

Target InfoRes ID: infores:translator-gocam-kgx

Edge Types

Subject Categories Predicate Object Categories Knowledge Level Agent Type UI Explanation
biolink:Gene biolink:Gene knowledge_assertion manual_agent GO-CAM models provide explicit causal relationships where one gene product directly positively regulates another gene product's activity.
biolink:Gene biolink:Gene knowledge_assertion manual_agent GO-CAM models provide explicit causal relationships where one gene product directly negatively regulates another gene product's activity.
biolink:Gene biolink:Gene knowledge_assertion manual_agent GO-CAM models provide causal relationships where one gene product positively regulates another gene product's activity, potentially through indirect mechanisms.
biolink:Gene biolink:Gene knowledge_assertion manual_agent GO-CAM models provide causal relationships where one gene product negatively regulates another gene product's activity, potentially through indirect mechanisms.

Node Types

Node Category Source Identifier Types Additional Notes
biolink:Gene UniProtKB, MGI Gene identifiers from human and mouse models only (NCBITaxon:9606, NCBITaxon:10090)

Future Modeling Considerations

other: Consider including GO Term nodes and their relationships to genes in future iterations

other: Evaluate modeling of complex regulatory cascades and multi-step pathways

other: Assess integration with other pathway databases and resources

Provenance Information

Contributors: - Sierra Moxon: code - Matthew Brush: data modeling, domain expertise

Artifacts: - Ingest Survey: https://docs.google.com/spreadsheets/d/1R9z-vywupNrD_3ywuOt_sntcTrNlGmhiUWDXUdkPVpM/edit?gid=0#gid=0