Overview

ccda_to_omop converts HL7 C-CDA (Consolidated Clinical Document Architecture) XML documents into OMOP CDM tabular records.

Architecture

The conversion is driven by a metadata configuration layer. Each OMOP domain (e.g. condition_occurrence, drug_exposure, observation) has a corresponding metadata file that describes which C-CDA XPaths map to which OMOP columns. Adding a new domain requires only a new metadata config — no changes to the core parsing engine.

Key modules

  • data_driven_parse — XPath-based C-CDA XML parser driven by metadata configs

  • layer_datasets — orchestrates per-file, per-config parsing into pandas DataFrames

  • value_transformations — concept code mapping and value normalization

  • visit_reconciliation — links clinical events to visit_occurrence records

  • ddl — OMOP table definitions and domain→table name mappings

  • domain_dataframe_column_types — pandas dtype specifications per OMOP table

  • util — shared helpers (codemap loading, etc.)

Installation

git clone https://github.com/croeder/CCDA_OMOP_Conversion_Package.git
cd CCDA_OMOP_Conversion_Package
python -m venv env
source env/bin/activate
pip install -r requirements.txt

Usage

bin/process.sh