Penyeimbangan Data untuk Klasifikasi Jenis Anemia Menggunakan Smote dan Pengembangan Diagnostik Streamlit
Abstract
Anemia is a blood disorder characterized by low hemoglobin levels, resulting in reduced oxygen supply to the body. Manual diagnosis through a Complete Blood Count (CBC) is often time-consuming, especially in differentiating between similar types of anemia. This study aims to improve the accuracy of anemia classification by applying the Synthetic Minority Oversampling Technique (SMOTE) to an anemia classification dataset (1,281 samples, 14 predictors, and 9 categories). The main model used is an ensemble (bagging) model random forest with XGBoost (boosting) as a comparator. Evaluation uses accuracy, precision, recall, F1-score, and AUC ROC, both before and after SMOTE. The results are expected to improve the performance of minority classes, especially Leukemia with Thrombocytopenia. The best model is then implemented in a Streamlit-based application for automatic prediction based on CBC data. This study provides academic contributions through improved classification performance and practical contributions in the form of a simple diagnostic tool.