Skip to main navigation Skip to search Skip to main content

OUCH: Oversampling and Undersampling Cannot Help Improve Accuracy in Our Bayesian Classifiers That Predict Preeclampsia

  • Franklin Parrales-Bravo
  • , Rosangela Caicedo-Quiroz
  • , Elena Tolozano-Benitez
  • , Víctor Gómez-Rodríguez
  • , Lorenzo Cevallos-Torres
  • , Jorge Charco-Aguirre
  • , Leonel Vasquez-Cevallos
  • Universidad de Guayaquil
  • Universidad Bolivariana del Ecuador
  • Instituto Superior Tecnológico Urdesa (ITSU)

Research output: Contribution to journalArticlepeer-review

8 Scopus citations

Abstract

Unbalanced data can have an impact on the machine learning (ML) algorithms that build predictive models. This manuscript studies the influence of oversampling and undersampling strategies on the learning of the Bayesian classification models that predict the risk of suffering preeclampsia. Given the properties of our dataset, only the oversampling and undersampling methods that operate with numerical and categorical attributes will be taken into consideration. In particular, synthetic minority oversampling techniques for nominal and continuous data (SMOTE-NC), SMOTE—Encoded Nominal and Continuous (SMOTE-ENC), random oversampling examples (ROSE), random undersampling examples (UNDER), and random oversampling techniques (OVER) are considered. According to the results, when balancing the class in the training dataset, the accuracy percentages do not improve. However, in the test dataset, both positive and negative cases of preeclampsia were accurately classified by the models, which were built on a balanced training dataset. In contrast, models built on the imbalanced training dataset were not good at detecting positive cases of preeclampsia. We can conclude that while imbalanced training datasets can be addressed by using oversampling and undersampling techniques before building prediction models, an improvement in model accuracy is not always guaranteed. Despite this, the sensitivity and specificity percentages improve in binary classification problems in most cases, such as the one we are dealing with in this manuscript.

Original languageEnglish
Article number3351
JournalMathematics
Volume12
Issue number21
DOIs
StatePublished - Nov 2024

Keywords

  • ROSE
  • SMOTE-ENC
  • SMOTE-NC
  • bayesian network classifiers
  • class imbalance
  • oversampling
  • preeclampsia
  • undersampling

Fingerprint

Dive into the research topics of 'OUCH: Oversampling and Undersampling Cannot Help Improve Accuracy in Our Bayesian Classifiers That Predict Preeclampsia'. Together they form a unique fingerprint.

Cite this