Skip to content

ChEMBL Reference Ingest Guide

Source Information

InfoRes ID: infores:chembl

Description: ChEMBL is a manually curated database of bioactive molecules with drug-like properties. It brings together chemical, bioactivity and genomic data to aid the translation of genomic information into effective new drugs.

Citations: - Barbara Zdrazil, Eloy Felix, Fiona Hunter, Emma J Manners, James Blackshaw, Sybilla Corbett, Marleen de Veij, Harris Ioannidis, David Mendez Lopez, Juan F Mosquera, Maria Paula Magarinos, Nicolas Bosc, Ricardo Arcila, Tevik Kizilören, Anna Gaulton, A Patrícia Bento, Melissa. F Adasme, Pater Monecke, Gregory A Landrum, Andrew R Leach. Nucleic Acids Res. 2023: gkad1004. doi: 10.1093/nar/gkad1004 - Davies M, Nowotka M, Papadatos G, Dedman N, Gaulton A, Atkinson F, Bellis L, Overington JP. Nucleic Acids Res. 2015; 43(W1):W612-20, doi: 10.1093/nar/gkv352

Data Access Locations: - https://ftp.ebi.ac.uk/pub/databases/chembl/ChEMBLdb/latest/

Data Provision Mechanisms: file_download, database_dump

Data Formats: mysql, postgresql, sqlite

Data Versioning and Releases: semiannual

Ingest Information

Ingest Categories: primary_knowledge_provider

Utility: ChEMBL is a manually curated database of bioactive molecules with drug-like properties. It brings together chemical, bioactivity and genomic data to aid the translation of genomic information into effective new drugs.

Scope: Drugs and probes mechanisms, metabolism, bioactivity, gene targets

Relevant Files

File Name Location Description
chembl_35_sqlite.tar.gz https://ftp.ebi.ac.uk/pub/databases/chembl/ChEMBLdb/latest/ ChEMBL SQLite database

Included Content

File Name Included Records Fields Used
chembl_35_sqlite.tar.gz Drug mechanisms from the following tables: drug_mechanism, molecule_dictionary, target_dictionary, binding_sites, compound_records, docs, source, target_components, component_sequences, variant_sequences. drug_mechanism: mec_id, molregno, mechanism_of_action, action_type, direct_interaction, mechanism_comment, selectivity_comment, binding_site_comment, variant_id; molecule_dictionary: molregno, chembl_id; target_dictionary: tid, chembl_id (AS target_chembl_id), pref_name (AS target_name), target_type, organism (AS target_organism); binding_sites: site_id, site_name; compound_records: record_id, doc_id, src_id; docs: doc_id, chembl_id (AS document_chembl_id); source: src_id, src_description (AS source_description); target_components: tid, component_id; component_sequences: component_id, component_type, accession, description, tax_id, organism; variant_sequences: variant_id, mutation, accession (AS mutation_accession)
chembl_35_sqlite.tar.gz Gene targets - same as drug mechanisms with targets mapped to genets drug_mechanism: mec_id, molregno, mechanism_of_action, action_type, direct_interaction, mechanism_comment, selectivity_comment, binding_site_comment, variant_id; molecule_dictionary: molregno, chembl_id; target_dictionary: tid, chembl_id (AS target_chembl_id), pref_name (AS target_name), target_type, organism (AS target_organism); binding_sites: site_id, site_name; compound_records: record_id, doc_id, src_id; docs: doc_id, chembl_id (AS document_chembl_id); source: src_id, src_description (AS source_description); target_components: tid, component_id; component_sequences: component_id, component_type, accession, description, tax_id, organism; variant_sequences: variant_id, mutation, accession (AS mutation_accession)
chembl_35_sqlite.tar.gz Metabolism information from the following tables: metabolism, molecule_dictionary, compound_records, compound_structures, target_dictionary, metabolism_refs metabolism: drug_record_id, substrate_record_id, metabolite_record_id, met_id, enzyme_name, met_conversion, met_comment, organism, tax_id, enzyme_tid; molecule_dictionary: molregno, chembl_id, pref_name; compound_records: molregno, record_id, compound_name; target_dictionary: tid, target_type, chembl_id; compound_structures: molregno, standard_inchi, standard_inchi_key, canonical_smiles; metabolism_refs: met_id, ref_type, ref_id, ref_url
chembl_35_sqlite.tar.gz Bioactivity and assay information from the following tables: activities, molecule_dictionary, assays, bioassay_ontology, target_dictionary, target_components, component_sequences, cell_dictionary, assay_type, tissue_dictionary, docs, source, ligand_eff, relationship_type, confidence_score_lookup, confidence_score_lookup activities: activity_id, standard_type, standard_relation, standard_value, standard_units, pchembl_value, activity_comment, data_validity_comment, standard_text_value, standard_upper_value, uo_units, potential_duplicate, action_type, src_id, doc_id; molecule_dictionary: molregno, chembl_id; assays: assay_id, chembl_id, description, assay_organism, assay_cell_type, assay_subcellular_fraction, bao_format, assay_category, assay_tax_id, assay_tissue, relationship_type, confidence_score, curated_by, src_id, assay_type, cell_id, tissue_id, tid; bioassay_ontology: bao_id, label; target_dictionary: tid, chembl_id, pref_name, organism, target_type; target_components: tid, component_id; component_sequences: component_id, component_type, accession; cell_dictionary: cell_id, chembl_id, cell_name, cell_description, cell_source_tissue, cell_source_organism, cell_source_tax_id, clo_id, efo_id, cellosaurus_id, cl_lincs_id, cell_ontology_id; assay_type: assay_type, assay_desc; tissue_dictionary: tissue_id, chembl_id, pref_name; docs: doc_id, chembl_id, journal, title, year, authors, pubmed_id, doi; source: src_id, src_description; ligand_eff: activity_id, bei, le, lle, sei; relationship_type: relationship_type, relationship_desc; confidence_score_lookup: confidence_score, description, target_mapping; curation_lookup: curated_by, description

Filtered Content

File Name Filtered Records Rationale
chembl_35_sqlite.tar.gz Gene targets - removed tragets that cannot be mapped to genes (like cell-lines) tragets that cannot be mapped to genes

Target Information

Target InfoRes ID: infores:translator-chembl-kgx

Edge Types

Subject Categories Predicate Object Categories Knowledge Level Agent Type UI Explanation
SmallMolecule, MolecularMixture, ChemicalEntity, Protein Gene, MacromolecularComplexMixin, NucleicAcidEntity, Organism, BiologicalProcess, Protein, ProteinComplex knowledge_assertion manual_agent Drug mechanisms, some mapped to genes, from ChEMBL
SmallMolecule, MolecularMixture, ChemicalEntity, Protein Gene, MacromolecularComplexMixin, NucleicAcidEntity, Organism, BiologicalProcess, Protein, ProteinComplex knowledge_assertion manual_agent Drug mechanisms, some mapped to genes, from ChEMBL
SmallMolecule, MolecularMixture, ChemicalEntity, Protein AnatomicalEntity, Cell, CellLine, OrganismTaxon, PhenotypicFeature observation manual_agent Drug activities from ChEMBL
SmallMolecule, MolecularMixture, ChemicalEntity SmallMolecule, MolecularMixture, ChemicalEntity knowledge_assertion manual_agent Metabolism information from ChEMBL

Node Types

Node Category Source Identifier Types Additional Notes
AnatomicalEntity ChEMBL
Cell ChEMBL
CellLine ChEMBL
CellularComponent ChEMBL
ChemicalEntity ChEMBL
Gene ChEMBL
MacromolecularComplexMixin ChEMBL
MolecularMixture ChEMBL
ProteinComplex ChEMBL
ProteinFamily ChEMBL
SmallMolecule ChEMBL

Provenance Information

Contributors: - Vlado Dancik - code author, domain expertise - Kevin Schaper - code support - Evan Morris - code support - Sierra Moxon - data modeling, code support - Matthew Brush - data modeling, domain expertise

Artifacts: - Ingest Survey: https://docs.google.com/spreadsheets/d/1CENLoKukPCHW2SimabAm6HfXeldeRYmPTV_--Fs9whg/edit?gid=0#gid=0