A Data-Driven Comparative Analysis of Machine-Learning Models for Familial Hypercholesterolemia Detection
Abstrakt
This study presents an assessment of familial hypercholesterolemia (FH) probability using different algorithms (CatBoost, XGBoost, Random Forest, SVM) and its ensembles, leveraging electronic health record data. The primary objective is to explore an enhanced method for estimating FH probability, surpassing the currently recommended Dutch Lipid Clinic Network (DLCN) Score. The models were trained using the largest Polish cohort of patients enrolled in an FH clinic, all of whom underwent genetic testing for FH-associated mutations. The initial dataset comprised over 100 parameters per patient, which was reduced to 48 clinically accessible features to ensure applicability in routine outpatient settings. To preserve balance, the data were stratified according to DLCN score ranges (<0–2>, <3–5>, <6–8>, and ≥9), representing varying levels of FH likelihood. The dataset was then split into training and test sets with an 80/20 ratio. Machine-learning models were trained, with hyperparameters optimized via grid search. The accuracy of the DLCN score in predicting FH was first evaluated by examining the proportion of patients with positive DNA tests relative to those with a DLCN score of 6 and above, the threshold for genetic testing. The DLCN score demonstrated an accuracy of approximately 40%. In contrast, the CatBoost model and its ensembles achieved over 80% accuracy. While the DLCN score remains a clinically valuable tool, its diagnostic accuracy is limited. The findings indicate that the ML models offer a substantial improvement in the precision of FH diagnosis, demonstrating its potential to enhance clinical decision making in identifying patients with FH.
Cytowania
-
0
CrossRef
-
0
Web of Science
-
0
Scopus
Autor (1)
Cytuj jako
Pełna treść
pełna treść publikacji nie jest dostępna w portalu
Słowa kluczowe
Informacje szczegółowe
- Kategoria:
- Publikacja w czasopiśmie
- Typ:
- artykuły w czasopismach
- Opublikowano w:
-
Applied Sciences-Basel
nr 14,
ISSN: 2076-3417 - Język:
- angielski
- Rok wydania:
- 2024
- Opis bibliograficzny:
- Kocejko T.: A Data-Driven Comparative Analysis of Machine-Learning Models for Familial Hypercholesterolemia Detection// Applied Sciences-Basel -,iss. 14 (2024),
- DOI:
- Cyfrowy identyfikator dokumentu elektronicznego (otwiera się w nowej karcie) 10.3390/app142311187
- Źródła finansowania:
-
- Działalność statutowa/subwencja
- Weryfikacja:
- Politechnika Gdańska
wyświetlono 2 razy
Publikacje, które mogą cię zainteresować
Personalized prediction of the secondary oocytes number after ovarian stimulation: A machine learning model based on clinical and genetic data
- K. Zieliński,
- S. Pukszta,
- M. Mickiewicz
- + 7 autorów
Personalized prediction of the secondary oocytes number after ovarian stimulation: A machine learning model based on clinical and genetic data
- K. Zieliński,
- S. Pukszta,
- M. Mickiewicz
- + 7 autorów
Deep CNN based decision support system for detection and assessing the stage of diabetic retinopathy
- A. Kwasigroch,
- B. Jarzembinski,
- M. Grochowski