Automatic Outlier Detection in Laboratory Result Distributions Within a Real World Data Network.

Laboratory data must be interoperable to be able to accurately compare the results of a lab test between healthcare organizations. To achieve this, terminologies like LOINC (Logical Observation Identifiers, Names and Codes) provide unique identification codes for laboratory tests. Once standardized, the numeric results of laboratory tests can be aggregated and represented in histograms. Due to the characteristics of Real World Data (RWD), outliers and abnormal values are common, but these cases should be treated as exceptions, excluding them from possible analysis. The proposed work analyses two methods capable of automating the selection of histogram limits to sanitize the generated lab test result distributions, Tukey's box-plot method and a "Distance to Density" approach, within the TriNetX Real World Data Network. The generated limits using clinical RWD are generally wider for Tukey's method and narrower for the second method, both greatly dependent on the values used for the algorithm's parameters.

Studies in health technology and informatics. 2023 May;302():88-92.

ISSN 1879-8365

Authors: Aída Muñoz Monjas, David Rubio Ruiz, David Pérez-Rey, Matvey Palchuk

PMID 37203615

PubMed BibTeX