Skip to content

Comparative Toxicogenomics Database (CTD) Reference Ingest Guide

Source Information

InfoRes ID: infores:ctd

Description: CTD is a robust, publicly available database that aims to advance understanding about how environmental exposures affect human health. It provides knowledge, manually curated from the literature, about chemicals and their relationship to other biological entities: chemical to gene/protein interactions plus chemical to disease and gene to disease relationships. These data are integrated with functional and pathway data to aid in the development of hypotheses about the mechanisms underlying environmentally influenced diseases. It also generates novel inferences by further analyzing the knowledge they curate/create - based on statistically significant connections with intermediate concept (e.g. Chemical X associated with Disease Y based on shared associations with a common set of genes).

Citations: - Davis AP, Wiegers TC, Johnson RJ, Sciaky D, Wiegers J, Mattingly CJ Comparative Toxicogenomics Database (CTD): update 2023. Nucleic Acids Res. 2022 Sep 28.

Terms of Use: No formal license. Bespoke 'terms of use' are described here: https://ctdbase.org/about/legal.jsp

Data Access Locations: - CTD Bulk Downloads: http://ctdbase.org/downloads/ - CTD Catalog: https://ctdbase.org/reports/

Data Provision Mechanisms: file_download

Data Formats: tsv, csv, xml, obo

Data Versioning and Releases: No consistent cadence for releases, but on average there are 1-2 releases each month. Versioning is based on the month and year of the release. Releases page: https://ctdbase.org/about/changes/

Additional Notes: Latest status page: https://ctdbase.org/about/dataStatus.go

Ingest Information

Ingest Categories: primary_knowledge_provider

Utility: CTD is a rich source of manually curated chemical associations to other biological entities which are an important type of edge for Translator query and reasoning use cases, including treatment predictions, chemical-gene regulation predictions, and pathfinder queries. It is one of the few sources that focus on non-drug chemicals, e.g. environmental stressors, and how these are related to diseases, biological processes, and genes.

Scope: This initial ingest of CTD covers curated Chemical to Disease associations that report therapeutic and marker/mechanism relationships, or statistical associations generated by CTD. Additional types of Chemical associations will be added later.

Relevant Files

File Name Location Description
CTD_chemicals_diseases.tsv.gz http://ctdbase.org/downloads/ Manually curated and computationally inferred associations between chemicals and diseases
CTD_exposure_events.tsv.gz http://ctdbase.org/downloads/ Descriptions of statistical studies of how exposure to chemicals affects a particular population, with some records providing outcomes

Included Content

File Name Included Records Fields Used
CTD_chemicals_diseases.tsv.gz Curated therapeutic and marker/mechanism associations (rows where a DirectEvidence value is populated with type T or M), as well as inferred associations (rows lacking a value in the DirectEvidence column) ChemicalName, ChemicalID, CasRN, DiseaseName, DiseaseID, DirectEvidence, InferenceGeneSymbol, InferenceScore, OmimIDs, PubMedIDs

Filtered Content

File Name Filtered Records Rationale
CTD_chemicals_diseases.tsv.gz None Currently taking all records with no publication count or inference score cutoffs - but these may be added in future iterations

Future Content Considerations

edge_content: While the current ingest includes only Chemical-Disease Associations, future iterations will include additional types of associations between Chemicals and GO Terms, Molecular Phenotypes, Genes, etc. - Relevant files: Multiple CTD files

edge_content: Consider ingesting additional chemical-disease edges reporting statistical correlations from environmental exposure studies from CTD_exposure_events.tsv.gz - Relevant files: CTD_exposure_events.tsv.gz

edge_content: Consider adding publication count or inference score cutoffs to filter lower quality / confidence records - Relevant files: CTD_chemicals_diseases.tsv.gz

Target Information

Target InfoRes ID: infores:translator-ctd-kgx

Edge Types

Subject Categories Predicate Object Categories Knowledge Level Agent Type UI Explanation
biolink:ChemicalEntity biolink:treats_or_applied_or_studied_to_treat biolink:DiseaseOrPhenotypicFeature knowledge_assertion manual_agent CTD Chemical-Disease records with a T (therapeutic) DirectEvidence code indicate the chemical to be a potential treatment in virtue of its clinical use or study - which maps best to the Biolink predicate treats_or_applied_or_studied_to_treat.
biolink:ChemicalEntity biolink:correlates_with_or_contributes_to biolink:DiseaseOrPhenotypicFeature knowledge_assertion manual_agent CTD Chemical-Disease records with an M (marker/mechanism) DirectEvidence code indicate the chemical to correlate with or play an etiological role in a condition - which maps best to the Biolink predicate correlates_with_or_contributes_to.
biolink:ChemicalEntity biolink:associated_with biolink:DiseaseOrPhenotypicFeature statistical_association data_analysis_pipeline CTD Chemical-Disease records with an inference score indicate a statistically significant number of shared gene associations that suggest a biological relationship may exist. The statistical basis of this relationship maps to the Biolink associated_with predicate.

Node Types

Node Category Source Identifier Types Additional Notes
biolink:ChemicalEntity MeSH Majority are Biolink SmallMolecules. The chemical ID row is expected to need a MESH: prefix added.
biolink:DiseaseOrPhenotypicFeature MeSH Disease identifier used as-is.

Future Modeling Considerations

edge_properties: Revisit use of has_confidence_score edge property if/when we refactor this part of the Biolink Model

predicates: Revisit correlates_with_or_contributes_to and treats_or_studied_or_applied_to_treat predicates if/when we refactor modeling or conventions here

Provenance Information

Contributors: - Kevin Schaper - code author - Evan Morris - code support - Sierra Moxon - code support - Vlado Dancik - code support, domain expertise - Matthew Brush - data modeling, domain expertise

Artifacts: - Ingest Survey: https://docs.google.com/spreadsheets/d/1R9z-vywupNrD_3ywuOt_sntcTrNlGmhiUWDXUdkPVpM/edit?gid=0#gid=0