ccda_to_omop.util module
Shared utility functions and type aliases.
Covers: codemap CSV loading (create_codemap_dict_from_csv), date/datetime
parsing and casting, and the CodemapDict type alias used throughout the
conversion pipeline.
- ccda_to_omop.util.cast_to_date(string_value: str) date | None[source]
Parse a date string and return a datetime.date.
TODO: does CCDA always use YYYYMMDD? https://build.fhir.org/ig/HL7/CDA-ccda/StructureDefinition-USRealmDateTimeInterval-definitions.html doc says YYYYMMDD… examples show ISO-8601. Should use a regex and detect parse failure. TODO: when is it date and when datetime?
- ccda_to_omop.util.cast_to_datetime(string_value: str) datetime | None[source]
Parse a datetime string and return a datetime.datetime, or None on failure.
- ccda_to_omop.util.create_codemap_dict(codemap_df: DataFrame) dict[tuple[str, str], list[dict[str, int | str]]][source]
creates a dictionary (code_system, code) –> {source_concept_id: n, target_domain_id: m, target_concept_id: o} from a spark dataframe
- ccda_to_omop.util.create_codemap_dict_from_csv(map_csv_filepath: str) dict[tuple[str, str], list[dict[str, int | str]]][source]
creates a dictionary (code_system, code) –> {source_concept_id: n, target_domain_id: m, target_concept_id: o} from a CSV file:
OID, code, codeSystem, target_id, target_domain, source_concept_id
- ccda_to_omop.util.logger = <Logger ccda_to_omop.util (WARNING)>
These three functions create dictionaries from the vocabulary xwalk pandas dataframes. Each dictionary, given vocabulary and code, provides each of source_concept_id, target_domain_id, or target_concept_id. It does this by returning a row-like dictionary with those field names as keys. The columns in the source datasets differ. Read carefully. Only the codemap provides the source_concept_id. The others just the two target fields.
Each key may have more than one value. {
- (vocab, code)[
- { ‘source_concept_id’: None,
‘target_domain_id’: row[‘target_domain_id’], ‘target_concept_id’: row[‘target_concept_id’]
}
]
}