Searchable abstracts of presentations at key conferences in endocrinology
Endocrine Abstracts (2023) 90 P95 | DOI: 10.1530/endoabs.90.P95

ECE2023 Poster Presentations Diabetes, Obesity, Metabolism and Nutrition (159 abstracts)

Machine learning-derived low density lipoprotein cholesterol (LDL-C) estimation agrees better with directly measured LDL-C than conventional equations in individuals with type 2 diabetes mellitus.

Gerald Sng 1 , You Liang Khoo 2 , Hong Chang Tan 1 & Yong Mong Bee 1


1Singapore General Hospital, Department of Endocrinology, Singapore, Singapore; 2Singapore General Hospital, Health Services Research Unit, Singapore, Singapore


Introduction: Elevated low-density lipoprotein cholesterol (LDL-C) is an important risk factor for atherosclerotic cardiovascular disease (ASCVD). Direct LDL-C measurement is not widely performed. LDL-C is typically estimated using the Friedewald (FLDL), Martin-Hopkins (MLDL) or Sampson (SLDL) equations, which may be inaccurate at high triglycerides (TG) or low LDL-C levels. We aimed to determine if machine learning (ML)-derived LDL-C levels agree better with direct LDL-C than conventional equations in patients with type 2 diabetes mellitus (T2DM).

Methods: We performed a retrospective cohort study on T2DM patients from a multi-institutional diabetes registry in Singapore from 2013 to 2020. Directly measured LDL-C values were compared against LDL-C values estimated by the FLDL, MLDL and SLDL equations, and ML models using linear regression (LR), random forest (RF) and k-nearest neighbours (KNN) using measures of agreement and correlation. Values were considered discordant if estimated LDL-C was <1.8 mmol/l but directly measured LDL-C was ≥1.8 mmol/l as this might lead to undertreatment in a real-world setting. A repeat train and test was performed on the subset of patients with TG values >4.5 mmol/l.

Results: 11,475 patients with 39,417 sets of unique lipid panel results were included in the final analysis. 31,533 sets of results were used in the training set and 7,884 sets of results were used in the test set. All three ML models demonstrated better goodness-of-fit with lower root-mean-square-error values than any of the conventional equations, as well as stronger correlation with higher R2 and r values. Of the three ML models, LR performed the least well (rmse 0.231, R2 0.954 and r 0.977, P<0.001) as compared to RF (rmse 0.209, R2 0.962 and r 0.981, P<0.001) or KNN (rmse 0.212, R2 0.961 and r 0.98, P<0.001). All three ML methods had much lower discordance rates (LR 2.17%, RF 2.18%, KNN 2.04%) than conventional equations (FLDL 23.14%, SLDL 17.90%, MLDL 14.22%). ML methods performed less well in the subset of patients with TG >4.5 mmol/l, although all three models still demonstrated better goodness-of-fit and correlation. Discordance rates were lower as well (LR 3.69%, RF 3.69%, KNN 2.30%), although the MLDL equation had the lowest discordance rate in this subgroup (1.84%).

Discussion: Conventional LDL-C estimation equations have disadvantages and are reported to perform poorly at high TG levels. ML methods may offer an alternative to allow more accurate estimation of LDL-C and to reduce misclassification and undertreatment in T2DM patients at high ASCVD risk.

Volume 90

25th European Congress of Endocrinology

Istanbul, Turkey
13 May 2023 - 16 May 2023

European Society of Endocrinology 

Browse other volumes

Article tools

My recent searches

No recent searches.