Comparative Toxicogenomics Database (CTD) Reference Ingest Guide
Source Information
InfoRes ID: infores:ctd
Description: CTD is a robust, publicly available database that aims to advance understanding about how environmental exposures affect human health. It provides knowledge, manually curated from the literature, about chemicals and their relationship to other biological entities: chemical to gene/protein interactions plus chemical to disease and gene to disease relationships. These data are integrated with functional and pathway data to aid in the development of hypotheses about the mechanisms underlying environmentally influenced diseases. It also generates novel inferences by further analyzing the knowledge they curate/create - based on statistically significant connections with intermediate concept (e.g. Chemical X associated with Disease Y based on shared associations with a common set of genes).
Citations: - Davis AP, Wiegers TC, Johnson RJ, Sciaky D, Wiegers J, Mattingly CJ Comparative Toxicogenomics Database (CTD): update 2023. Nucleic Acids Res. 2022 Sep 28.
Terms of Use: No formal license. Bespoke 'terms of use' are described here: https://ctdbase.org/about/legal.jsp
Data Access Locations: - CTD Bulk Downloads: http://ctdbase.org/downloads/ - CTD Catalog: https://ctdbase.org/reports/
Data Provision Mechanisms: file_download
Data Formats: tsv, csv, xml, obo
Data Versioning and Releases: No consistent cadence for releases, but on average there are 1-2 releases each month. Versioning is based on the month and year of the release. Releases page: https://ctdbase.org/about/changes/
Additional Notes: Latest status page: https://ctdbase.org/about/dataStatus.go
Ingest Information
Ingest Categories: primary_knowledge_provider
Utility: CTD is a rich source of manually curated chemical associations to other biological entities which are an important type of edge for Translator query and reasoning use cases, including treatment predictions, chemical-gene regulation predictions, and pathfinder queries. It is one of the few sources that focus on non-drug chemicals, e.g. environmental stressors, and how these are related to diseases, biological processes, and genes.
Scope: This initial ingest of CTD covers curated Chemical to Disease associations that report therapeutic and marker/mechanism relationships, or statistical associations generated by CTD. Additional types of Chemical associations will be added later.
Relevant Files
File Name | Location | Description |
---|---|---|
CTD_chemicals_diseases.tsv.gz | http://ctdbase.org/downloads/ | Manually curated and computationally inferred associations between chemicals and diseases |
CTD_exposure_events.tsv.gz | http://ctdbase.org/downloads/ | Descriptions of statistical studies of how exposure to chemicals affects a particular population, with some records providing outcomes |
Included Content
File Name | Included Records | Fields Used |
---|---|---|
CTD_chemicals_diseases.tsv.gz | Curated therapeutic and marker/mechanism associations (rows where a DirectEvidence value is populated with type T or M), as well as inferred associations (rows lacking a value in the DirectEvidence column) | ChemicalName, ChemicalID, CasRN, DiseaseName, DiseaseID, DirectEvidence, InferenceGeneSymbol, InferenceScore, OmimIDs, PubMedIDs |
Filtered Content
File Name | Filtered Records | Rationale |
---|---|---|
CTD_chemicals_diseases.tsv.gz | None | Currently taking all records with no publication count or inference score cutoffs - but these may be added in future iterations |
Future Content Considerations
edge_content: While the current ingest includes only Chemical-Disease Associations, future iterations will include additional types of associations between Chemicals and GO Terms, Molecular Phenotypes, Genes, etc. - Relevant files: Multiple CTD files
edge_content: Consider ingesting additional chemical-disease edges reporting statistical correlations from environmental exposure studies from CTD_exposure_events.tsv.gz - Relevant files: CTD_exposure_events.tsv.gz
edge_content: Consider adding publication count or inference score cutoffs to filter lower quality / confidence records - Relevant files: CTD_chemicals_diseases.tsv.gz
Target Information
Target InfoRes ID: infores:translator-ctd-kgx
Edge Types
Subject Categories | Predicate | Object Categories | Knowledge Level | Agent Type | UI Explanation |
---|---|---|---|---|---|
biolink:ChemicalEntity | biolink:treats_or_applied_or_studied_to_treat | biolink:DiseaseOrPhenotypicFeature | knowledge_assertion | manual_agent | CTD Chemical-Disease records with a T (therapeutic) DirectEvidence code indicate the chemical to be a potential treatment in virtue of its clinical use or study - which maps best to the Biolink predicate treats_or_applied_or_studied_to_treat. |
biolink:ChemicalEntity | biolink:correlates_with_or_contributes_to | biolink:DiseaseOrPhenotypicFeature | knowledge_assertion | manual_agent | CTD Chemical-Disease records with an M (marker/mechanism) DirectEvidence code indicate the chemical to correlate with or play an etiological role in a condition - which maps best to the Biolink predicate correlates_with_or_contributes_to. |
biolink:ChemicalEntity | biolink:associated_with | biolink:DiseaseOrPhenotypicFeature | statistical_association | data_analysis_pipeline | CTD Chemical-Disease records with an inference score indicate a statistically significant number of shared gene associations that suggest a biological relationship may exist. The statistical basis of this relationship maps to the Biolink associated_with predicate. |
Node Types
Node Category | Source Identifier Types | Additional Notes |
---|---|---|
biolink:ChemicalEntity | MeSH | Majority are Biolink SmallMolecules. The chemical ID row is expected to need a MESH: prefix added. |
biolink:DiseaseOrPhenotypicFeature | MeSH | Disease identifier used as-is. |
Future Modeling Considerations
edge_properties: Revisit use of has_confidence_score edge property if/when we refactor this part of the Biolink Model
predicates: Revisit correlates_with_or_contributes_to and treats_or_studied_or_applied_to_treat predicates if/when we refactor modeling or conventions here
Provenance Information
Contributors: - Kevin Schaper - code author - Evan Morris - code support - Sierra Moxon - code support - Vlado Dancik - code support, domain expertise - Matthew Brush - data modeling, domain expertise
Artifacts: - Ingest Survey: https://docs.google.com/spreadsheets/d/1R9z-vywupNrD_3ywuOt_sntcTrNlGmhiUWDXUdkPVpM/edit?gid=0#gid=0