Skip to content

Infores cohd covid

Columbia Open Health Data (COHD) for COVID-19 Research (infores:cohd-covid)

Status: released

Knowledge Level: statistical_association

Agent Type: not_provided

Description: The Columbia Open Health Data (COHD) for COVID-19 Research API provides access to counts and frequencies (i.e., EHR visit prevalence) of conditions, procedures, drug exposures, and the co-occurrence frequencies between them for a cohort of hospitalized COVID-19 patients and two comparator cohorts of hospitalized influenza patients and hospitalized patients. Count and frequency data were derived from the Columbia University Medical Center''s OHDSI database including inpatient. Counts are the number of inpatient visits associated with the concept, e.g., diagnosed with a condition, exposed to a drug, or a procedure was performed. Frequencies are the number of unique visits associated with the concept divided by the total number of visits in the dataset, i.e., prevalence in the electronic health records. To protect patient privacy, all concepts and pairs of concepts where the count <= 10 were excluded, and counts were randomized by the Poisson distribution. Datasets from three primary cohorts are available: 1) COVID-19: Hospitalized patients aged 18 or older with a COVID-19 related condition diagnosis and/or a confirmed positive COVID-19 test during their hospitalization period or within the prior 21 days. Date range: March 1, 2020 to September 1, 2020. This cohort is also further stratified by sex (male and female) and age (adult: 18-64, senior: 65+). 2) General inpatient: All hospitalized patients aged 18 or older. Date range: January 1, 2014 to December 31, 2019. 3) Influenza: Hospitalized patients aged 18 or older who had at least one occurrence of influenza conditions or pre-coordinated positive measurements or positive influenza testing in the prior 21 days or during their hospitalization period. Date range: January 1, 2014 to December 31, 2019. Both hierarchical and non-hierarchical datasets are available for each cohort. In the hierarchical datasets, the counts for each concept include the visits from all descendant concepts. For example, the count for ibuprofen (ID 1177480) includes visits with Ibuprofen 600 MG Oral Tablet (ID 19019073), Ibuprofen 400 MG Oral Tablet (ID 19019072), Ibuprofen 20 MG/ML Oral Suspension (ID 19019050), etc. Clinical concepts (e.g., conditions, procedures, drugs) are coded by their standard concept ID in the OMOP Common Data Model. API methods are provided to map to/from other vocabularies supported in OMOP and other ontologies using the EMBL-EBI Ontology Xref Service (OxO). The following resources are available through this API: 1. Metadata: Metadata on the COHD database, including dataset descriptions, number of concepts, etc. 2. OMOP: Access to the common vocabulary for name and concept identifier mapping 3. Clinical Frequencies: Access to the counts and frequencies of conditions, procedures, and drug exposures, and the associations between them. Frequency was determined as the number of visits with the code(s) / total number of visits. 4. Concept Associations: Inferred associations between concepts using chi-square analysis, ratio between observed to expected frequency, and relative frequency. A Python notebook demonstrates simple examples of how to use the COHD API. COHD was developed at the Columbia University Department of Biomedical Informatics as a collaboration between the Weng Lab, Tatonetti Lab, and the NCATS Biomedical Data Translator program (TReK Team). This work was supported in part by grants: NCATS 1OT2TR003434, NLM R01LM012895, NCATS OT3TR002027, NLM R01LM009886-08A1, and NIGMS R01GM107145. The following external resources may be useful: OHDSI OMOP Common Data Model Athena (OMOP vocabularies, search, concept relationships, concept hierarchy) Atlas (OMOP vocabularies, search, concept relationships, concept hierarchy, concept sets) NCATS Biomedical Data Translator

Cross References: