ccda_to_omop.util module

Shared utility functions and type aliases.

Covers: codemap CSV loading (create_codemap_dict_from_csv), date/datetime parsing and casting, and the CodemapDict type alias used throughout the conversion pipeline.

ccda_to_omop.util.cast_to_date(string_value: str) date | None[source]

Parse a date string and return a datetime.date.

TODO: does CCDA always use YYYYMMDD? https://build.fhir.org/ig/HL7/CDA-ccda/StructureDefinition-USRealmDateTimeInterval-definitions.html doc says YYYYMMDD… examples show ISO-8601. Should use a regex and detect parse failure. TODO: when is it date and when datetime?

ccda_to_omop.util.cast_to_datetime(string_value: str) datetime | None[source]

Parse a datetime string and return a datetime.datetime, or None on failure.

ccda_to_omop.util.create_codemap_dict(codemap_df: DataFrame) dict[tuple[str, str], list[dict[str, int | str]]][source]

creates a dictionary (code_system, code) –> {source_concept_id: n, target_domain_id: m, target_concept_id: o} from a spark dataframe

ccda_to_omop.util.create_codemap_dict_from_csv(map_csv_filepath: str) dict[tuple[str, str], list[dict[str, int | str]]][source]

creates a dictionary (code_system, code) –> {source_concept_id: n, target_domain_id: m, target_concept_id: o} from a CSV file:

OID, code, codeSystem, target_id, target_domain, source_concept_id

ccda_to_omop.util.logger = <Logger ccda_to_omop.util (WARNING)>

These three functions create dictionaries from the vocabulary xwalk pandas dataframes. Each dictionary, given vocabulary and code, provides each of source_concept_id, target_domain_id, or target_concept_id. It does this by returning a row-like dictionary with those field names as keys. The columns in the source datasets differ. Read carefully. Only the codemap provides the source_concept_id. The others just the two target fields.

Each key may have more than one value. {

(vocab, code)[
{ ‘source_concept_id’: None,

‘target_domain_id’: row[‘target_domain_id’], ‘target_concept_id’: row[‘target_concept_id’]

}

]

}