Gene Ontology Causal Activity Models (GO-CAM) Reference Ingest Guide
Source Information
InfoRes ID: infores:gocam
Description: GO-CAM (Gene Ontology Causal Activity Models) is a framework that extends standard GO annotations by connecting molecular functions, biological processes, and cellular components into causally linked pathways. GO-CAMs provide explicit causal connections between gene products and their activities within specific biological contexts, enabling more detailed representation of biological mechanisms than traditional GO annotations.
Citations: - Thomas PD, Hill DP, Mi H, Osumi-Sutherland D, Van Auken K, Carbon S, Balhoff JP, Albou LP, Good B, Gaudet P, Lewis SE, Mungall CJ. Gene Ontology Causal Activity Modeling (GO-CAM) moves beyond GO annotations to structured descriptions of biological functions and systems. Nat Genet. 2019 Oct;51(10):1429-1433. doi: 10.1038/s41588-019-0500-1
Data Access Locations: - GO-CAMs are downloaded model by model, via kghub-downloader that takes an index file that shows all possible gocams by identifier, and then iterates one by one through the identifiers, downloading each gocam. - Index: https://s3.amazonaws.com/provider-to-model.json - URL pattern: https://live-go-cam.geneontology.io/product/yaml/go-cam/[id].json
Data Provision Mechanisms: file_download
Data Formats: json
Data Versioning and Releases: New GO-CAMs are added to the index weekly. Releases page / change log: https://geneontology.org/docs/download-go-cams/. Latest status page: https://geneontology.org/docs/go-cam-overview/
Ingest Information
Ingest Categories: primary_knowledge_provider
Utility: GO-CAMs provide structured causal relationships between gene products that are essential for pathway analysis, mechanistic understanding, and systems biology approaches in Translator. Unlike traditional GO annotations, GO-CAMs explicitly model how gene products causally regulate each other, making them valuable for reasoning about biological mechanisms and predicting downstream effects of perturbations.
Scope: This initial ingest focuses on gene-to-gene causal regulatory relationships extracted from GO-CAM models. The scope includes direct regulatory relationships (positive and negative regulation) between gene products, with associated molecular function, biological process, and cellular component annotations for context.
Relevant Files
File Name | Location | Description |
---|---|---|
provider-to-model.json | https://s3.amazonaws.com/provider-to-model.json | index file of models |
5a7e68a100001817.json, etc. | https://live-go-cam.geneontology.io/product/yaml/go-cam/[id].json | each model individually |
Included Content
File Name | Included Records | Fields Used |
---|---|---|
5a7e68a100001817.json, etc. | Gene to Gene edges | source, target, causal_predicate |
Filtered Content
File Name | Filtered Records | Rationale |
---|---|---|
GO-CAM models | GO Term nodes and non-gene entities | Initial focus on gene-gene relationships; GO Terms and other entity types will be included in future iterations |
GO-CAM models | Edges without clear causal predicates | Only including edges with explicit causal relationship predicates to ensure high-quality causal assertions |
GO-CAM models | Non-human/mouse models | Species filtering applied to include only human (NCBITaxon:9606) and mouse (NCBITaxon:10090) models based on model_info.taxon field |
Future Content Considerations
edge_content: Currently, we are excluding GOTerms from the edges. This is just a first pass at the GO-CAMs to get the Gene to Gene edges in place. Future iterations will include the GOTerms, and potentially other edge types.
node_property_content: Includes only the gene identifier and category of 'Gene'. (Note, there are likely nodes that represent Genes or Gene Products, but we are not distinguishing between these at this time because we will NodeNormalize the category and id.)
edge_property_content: TODO: plenty of work to do here to make edges like this, Biolink compliant past the source, target, and causal_predicate which are mapped in this ingest to 'biolink:subject', 'biolink:object', and the appropriate 'biolink:predicate' respectively. The edge properties are not currently being mapped to Biolink Model edge properties, but this will be done in future iterations.
Target Information
Target InfoRes ID: infores:translator-gocam-kgx
Edge Types
Subject Categories | Predicate | Object Categories | Knowledge Level | Agent Type | UI Explanation |
---|---|---|---|---|---|
biolink:Gene | biolink:Gene | knowledge_assertion | manual_agent | GO-CAM models provide explicit causal relationships where one gene product directly positively regulates another gene product's activity. | |
biolink:Gene | biolink:Gene | knowledge_assertion | manual_agent | GO-CAM models provide explicit causal relationships where one gene product directly negatively regulates another gene product's activity. | |
biolink:Gene | biolink:Gene | knowledge_assertion | manual_agent | GO-CAM models provide causal relationships where one gene product positively regulates another gene product's activity, potentially through indirect mechanisms. | |
biolink:Gene | biolink:Gene | knowledge_assertion | manual_agent | GO-CAM models provide causal relationships where one gene product negatively regulates another gene product's activity, potentially through indirect mechanisms. |
Node Types
Node Category | Source Identifier Types | Additional Notes |
---|---|---|
biolink:Gene | UniProtKB, MGI | Gene identifiers from human and mouse models only (NCBITaxon:9606, NCBITaxon:10090) |
Future Modeling Considerations
other: Consider including GO Term nodes and their relationships to genes in future iterations
other: Evaluate modeling of complex regulatory cascades and multi-step pathways
other: Assess integration with other pathway databases and resources
Provenance Information
Contributors: - Sierra Moxon: code - Matthew Brush: data modeling, domain expertise
Artifacts: - Ingest Survey: https://docs.google.com/spreadsheets/d/1R9z-vywupNrD_3ywuOt_sntcTrNlGmhiUWDXUdkPVpM/edit?gid=0#gid=0