ChEMBL Reference Ingest Guide
Source Information
InfoRes ID: infores:chembl
Description: ChEMBL is a manually curated database of bioactive molecules with drug-like properties. It brings together chemical, bioactivity and genomic data to aid the translation of genomic information into effective new drugs.
Citations: - Barbara Zdrazil, Eloy Felix, Fiona Hunter, Emma J Manners, James Blackshaw, Sybilla Corbett, Marleen de Veij, Harris Ioannidis, David Mendez Lopez, Juan F Mosquera, Maria Paula Magarinos, Nicolas Bosc, Ricardo Arcila, Tevik Kizilören, Anna Gaulton, A PatrĂcia Bento, Melissa. F Adasme, Pater Monecke, Gregory A Landrum, Andrew R Leach. Nucleic Acids Res. 2023: gkad1004. doi: 10.1093/nar/gkad1004 - Davies M, Nowotka M, Papadatos G, Dedman N, Gaulton A, Atkinson F, Bellis L, Overington JP. Nucleic Acids Res. 2015; 43(W1):W612-20, doi: 10.1093/nar/gkv352
Data Access Locations: - https://ftp.ebi.ac.uk/pub/databases/chembl/ChEMBLdb/latest/
Data Provision Mechanisms: file_download, database_dump
Data Formats: mysql, postgresql, sqlite
Data Versioning and Releases: semiannual
Ingest Information
Ingest Categories: primary_knowledge_provider
Utility: ChEMBL is a manually curated database of bioactive molecules with drug-like properties. It brings together chemical, bioactivity and genomic data to aid the translation of genomic information into effective new drugs.
Scope: Drugs and probes mechanisms, metabolism, bioactivity, gene targets
Relevant Files
File Name | Location | Description |
---|---|---|
chembl_35_sqlite.tar.gz | https://ftp.ebi.ac.uk/pub/databases/chembl/ChEMBLdb/latest/ | ChEMBL SQLite database |
Included Content
File Name | Included Records | Fields Used |
---|---|---|
chembl_35_sqlite.tar.gz | Drug mechanisms from the following tables: drug_mechanism, molecule_dictionary, target_dictionary, binding_sites, compound_records, docs, source, target_components, component_sequences, variant_sequences. | drug_mechanism: mec_id, molregno, mechanism_of_action, action_type, direct_interaction, mechanism_comment, selectivity_comment, binding_site_comment, variant_id; molecule_dictionary: molregno, chembl_id; target_dictionary: tid, chembl_id (AS target_chembl_id), pref_name (AS target_name), target_type, organism (AS target_organism); binding_sites: site_id, site_name; compound_records: record_id, doc_id, src_id; docs: doc_id, chembl_id (AS document_chembl_id); source: src_id, src_description (AS source_description); target_components: tid, component_id; component_sequences: component_id, component_type, accession, description, tax_id, organism; variant_sequences: variant_id, mutation, accession (AS mutation_accession) |
chembl_35_sqlite.tar.gz | Gene targets - same as drug mechanisms with targets mapped to genets | drug_mechanism: mec_id, molregno, mechanism_of_action, action_type, direct_interaction, mechanism_comment, selectivity_comment, binding_site_comment, variant_id; molecule_dictionary: molregno, chembl_id; target_dictionary: tid, chembl_id (AS target_chembl_id), pref_name (AS target_name), target_type, organism (AS target_organism); binding_sites: site_id, site_name; compound_records: record_id, doc_id, src_id; docs: doc_id, chembl_id (AS document_chembl_id); source: src_id, src_description (AS source_description); target_components: tid, component_id; component_sequences: component_id, component_type, accession, description, tax_id, organism; variant_sequences: variant_id, mutation, accession (AS mutation_accession) |
chembl_35_sqlite.tar.gz | Metabolism information from the following tables: metabolism, molecule_dictionary, compound_records, compound_structures, target_dictionary, metabolism_refs | metabolism: drug_record_id, substrate_record_id, metabolite_record_id, met_id, enzyme_name, met_conversion, met_comment, organism, tax_id, enzyme_tid; molecule_dictionary: molregno, chembl_id, pref_name; compound_records: molregno, record_id, compound_name; target_dictionary: tid, target_type, chembl_id; compound_structures: molregno, standard_inchi, standard_inchi_key, canonical_smiles; metabolism_refs: met_id, ref_type, ref_id, ref_url |
chembl_35_sqlite.tar.gz | Bioactivity and assay information from the following tables: activities, molecule_dictionary, assays, bioassay_ontology, target_dictionary, target_components, component_sequences, cell_dictionary, assay_type, tissue_dictionary, docs, source, ligand_eff, relationship_type, confidence_score_lookup, confidence_score_lookup | activities: activity_id, standard_type, standard_relation, standard_value, standard_units, pchembl_value, activity_comment, data_validity_comment, standard_text_value, standard_upper_value, uo_units, potential_duplicate, action_type, src_id, doc_id; molecule_dictionary: molregno, chembl_id; assays: assay_id, chembl_id, description, assay_organism, assay_cell_type, assay_subcellular_fraction, bao_format, assay_category, assay_tax_id, assay_tissue, relationship_type, confidence_score, curated_by, src_id, assay_type, cell_id, tissue_id, tid; bioassay_ontology: bao_id, label; target_dictionary: tid, chembl_id, pref_name, organism, target_type; target_components: tid, component_id; component_sequences: component_id, component_type, accession; cell_dictionary: cell_id, chembl_id, cell_name, cell_description, cell_source_tissue, cell_source_organism, cell_source_tax_id, clo_id, efo_id, cellosaurus_id, cl_lincs_id, cell_ontology_id; assay_type: assay_type, assay_desc; tissue_dictionary: tissue_id, chembl_id, pref_name; docs: doc_id, chembl_id, journal, title, year, authors, pubmed_id, doi; source: src_id, src_description; ligand_eff: activity_id, bei, le, lle, sei; relationship_type: relationship_type, relationship_desc; confidence_score_lookup: confidence_score, description, target_mapping; curation_lookup: curated_by, description |
Filtered Content
File Name | Filtered Records | Rationale |
---|---|---|
chembl_35_sqlite.tar.gz | Gene targets - removed tragets that cannot be mapped to genes (like cell-lines) | tragets that cannot be mapped to genes |
Target Information
Target InfoRes ID: infores:translator-chembl-kgx
Edge Types
Subject Categories | Predicate | Object Categories | Knowledge Level | Agent Type | UI Explanation |
---|---|---|---|---|---|
SmallMolecule, MolecularMixture, ChemicalEntity, Protein | Gene, MacromolecularComplexMixin, NucleicAcidEntity, Organism, BiologicalProcess, Protein, ProteinComplex | knowledge_assertion | manual_agent | Drug mechanisms, some mapped to genes, from ChEMBL | |
SmallMolecule, MolecularMixture, ChemicalEntity, Protein | Gene, MacromolecularComplexMixin, NucleicAcidEntity, Organism, BiologicalProcess, Protein, ProteinComplex | knowledge_assertion | manual_agent | Drug mechanisms, some mapped to genes, from ChEMBL | |
SmallMolecule, MolecularMixture, ChemicalEntity, Protein | AnatomicalEntity, Cell, CellLine, OrganismTaxon, PhenotypicFeature | observation | manual_agent | Drug activities from ChEMBL | |
SmallMolecule, MolecularMixture, ChemicalEntity | SmallMolecule, MolecularMixture, ChemicalEntity | knowledge_assertion | manual_agent | Metabolism information from ChEMBL |
Node Types
Node Category | Source Identifier Types | Additional Notes |
---|---|---|
AnatomicalEntity | ChEMBL | |
Cell | ChEMBL | |
CellLine | ChEMBL | |
CellularComponent | ChEMBL | |
ChemicalEntity | ChEMBL | |
Gene | ChEMBL | |
MacromolecularComplexMixin | ChEMBL | |
MolecularMixture | ChEMBL | |
ProteinComplex | ChEMBL | |
ProteinFamily | ChEMBL | |
SmallMolecule | ChEMBL |
Provenance Information
Contributors: - Vlado Dancik - code author, domain expertise - Kevin Schaper - code support - Evan Morris - code support - Sierra Moxon - data modeling, code support - Matthew Brush - data modeling, domain expertise
Artifacts: - Ingest Survey: https://docs.google.com/spreadsheets/d/1CENLoKukPCHW2SimabAm6HfXeldeRYmPTV_--Fs9whg/edit?gid=0#gid=0