Therapeutic Target Database (TTD) Reference Ingest Guide
Source Information
InfoRes ID: infores:ttd
Citations: - 2024: https://doi.org/10.1093/nar/gkad751 - 2022: https://doi.org/10.1093/nar/gkab953 - 2020: https://doi.org/10.1093/nar/gkz981 - 2018: https://doi.org/10.1093/nar/gkx1076 - 2016: https://doi.org/10.1093/nar/gkv1230 - 2014: https://doi.org/10.1093/nar/gkt1129 - 2012: https://doi.org/10.1093/nar/gkr797 - 2010: https://doi.org/10.1093/nar/gkp1014 - 2002 first paper: https://doi.org/10.1093/nar/30.1.412
Data Access Locations: - Downloads page: https://db.idrblab.net/ttd/full-data-download
Data Provision Mechanisms: file_download
Data Formats: other
Data Versioning and Releases: New release ~ every 2 years. Versioning is a little complicated. Some files have a header section that includes a semantic version number and date (these dates can differ a lot). Others don't. The Downloads page has a 'Last update by' section with a date, but it's unclear if this applies to all files that lack version info in their header.
Ingest Information
Ingest Categories: primary_knowledge_provider
Utility: TTD provides associations for drugs/chemicals, therapeutic targets (mostly proteins), and diseases that appear to be manually curated from literature review. This literature review includes information that may not be covered by other resources, including: drug industry reports, drug pipeline reports of hundreds of companies, patents from multiple countries, and manual review of Pubmed literature searches. This associations could be used in MVP1 (may treat disease X), MVP2 (drug Y may increase/decrease gene Z's activity), or Pathfinder queries.
Scope: This ingest covers the drug-disease associations and the protein-drug associations from one file. For more details on the data content and decision-making on what files to ingest, see Colleen Xu's internal document and https://github.com/NCATSTranslator/Data-Ingest-Coordination-Working-Group/issues/30.
Relevant Files
File Name | Location | Description |
---|---|---|
P1-05-Drug_disease.txt | https://db.idrblab.net/ttd/full-data-download | Description on Downloads page is 'Drug to disease mapping with ICD identifiers'. Includes chemical/drug 'treats' disease associations. Uses TTD drug IDs - need to use other file to map to usable IDs for Translator |
P1-07-Drug-TargetMapping.xlsx | https://db.idrblab.net/ttd/full-data-download | Description on Downloads page is 'Target to drug mapping with mode of action'. Has chemical/drug 'affects' protein associations. Uses TTD target IDs and drug IDs - need to use other file to map to usable IDs for Translator |
P1-03-TTD_crossmatching.txt | https://db.idrblab.net/ttd/full-data-download | Description on Downloads page is 'Cross-matching ID between TTD drugs and public databases'. Using for ID mapping only. Has TTD drug ID (start with 'D') mappings to PUBCHEM.COMPOUND, CAS, and/or CHEBI. This file does not have info for all TTD drug IDs. It also doesn't have any info on TTD chemical IDs (start with 'C'). |
P2-01-TTD_uniprot_all.txt | https://db.idrblab.net/ttd/full-data-download | Description on Downloads page is 'Download Uniprot IDs for all targets'. Using for ID mapping only. Has TTD target ID mappings to UNIPROT NAME (not ID). This file does not actually include all TTD Target IDs. It also has special values ('NOUNIPROTAC' appears to mean no name/mapping). |
Filtered Content
File Name | Filtered Records | Rationale |
---|---|---|
P1-05-Drug_disease.txt | ICD-11 ID is 'N.A.' | This means there isn't an ID for this disease, and we need one for each node. |
P1-05-Drug_disease.txt | TTD drug ID doesn't have mapping to an external namespace, or its name doesn't successfully retrieve an entity in NameResolver. | Need node IDs that are in NodeNorm's scope. |
P1-07-Drug-TargetMapping.xlsx | TTD drug ID doesn't have mapping to an external namespace, or its name doesn't successfully retrieve an entity in NameResolver. | Need node IDs that are in NodeNorm's scope. |
P1-07-Drug-TargetMapping.xlsx | TTD target ID doesn't have a mapping to a UniProt name, or that UniProt name doesn't successfully map to a UniProt ID. | Need node IDs that are in NodeNorm's scope. |
Future Content Considerations
edge_content: Could ingest another file 'Target to compound mapping with activity data', which contains chemical/drug 'affects' protein associations. Involves more parsing and filtering work. See https://github.com/NCATSTranslator/Data-Ingest-Coordination-Working-Group/issues/30#issuecomment-3209860820 for details. - Relevant files: P1-09-Target_compound_activity.txt
other: Could run disease names through NameResolver when ICD-11 ID is 'N.A.' (<200 records right now). But it'd involve manual review to see if the output seems accurate. - Relevant files: P1-05-Drug_disease.txt
Target Information
Edge Types
Subject Categories | Predicate | Object Categories | Knowledge Level | Agent Type | UI Explanation |
---|---|---|---|---|---|
biolink:ChemicalEntity | biolink:Disease | knowledge_assertion | manual_agent | The TTD curators assigned this relationship a clinical status of 'approved', 'phase 4', 'approved (orphan drug)', or 'NDA filed'. | |
biolink:ChemicalEntity | biolink:Disease | knowledge_assertion | manual_agent | The TTD curators assigned this relationship a clinical status of 'investigative', 'patented', 'discontinued in preregistration', 'preregistration', or 'withdrawn from market'. | |
biolink:ChemicalEntity | biolink:Disease | knowledge_assertion | manual_agent | The TTD curators assigned this relationship a clinical status of 'preclinical'. | |
biolink:ChemicalEntity | biolink:Disease | knowledge_assertion | manual_agent | The TTD curators assigned this relationship a clinical status related to clinical trials (which could be a specific phase, registered, various submissions, discontinued in a specific phase, or terminated). | |
biolink:ChemicalEntity | biolink:Gene, biolink:Protein | knowledge_assertion | manual_agent | The TTD curators associated this chemical or drug with its therapeutic target (reported in literature) and sometimes included the mode of action. |
Node Types
Node Category | Source Identifier Types | Additional Notes |
---|---|---|
biolink:ChemicalEntity | PUBCHEM.COMPOUND, CAS, CHEBI | Original ID is TTD drug ID, but we are using TTD mapping files to get these external IDs that can be NodeNormed. |
biolink:Disease | icd11 | |
biolink:Gene | UNIPROTKB | Original ID is TTD target ID, but we are using TTD mapping files and NameResolver to get UniProt IDs that can be NodeNormed. Some are non-human. |
biolink:Protein | UNIPROTKB | Original ID is TTD target ID, but we are using TTD mapping files and NameResolver to get UniProt IDs that can be NodeNormed. Some are non-human. |
Future Modeling Considerations
other: P1-02 ('Download TTD drug information in raw format') contains TTD drug ID to INCHIKEY mappings, which could be explored to see if it fills gaps in the TTD drug ID mappings.
node_properties: TTD has files with information on drugs and therapeutic target proteins (P1-01 targets, P1-02 drugs). This could potentially be used for node properties (but it may be better to use existing resources that are updated more frequently).
Provenance Information
Contributors: - Colleen Xu - code author, data modeling - Andrew Su - code support, domain expertise - Matthew Brush - data modeling, domain expertise
Artifacts: - https://github.com/NCATSTranslator/Data-Ingest-Coordination-Working-Group/issues/30