Kafkas Üniversitesi Veteriner Fakültesi Dergisi 2024 , Vol 30 , Issue 1
Comparison of Some Balancing Methods for Classification of Pacing Horses Using Tree-based Machine Learning Algorithms
Hülya ÖZEN1, Doğukan ÖZEN2, Banu YÜCEER ÖZKUL3, Ceyhan ÖZBEYAZ3
1University of Health Sciences, Gulhane Faculty of Medicine, Department of Medical Informatics, TR-06018 Ankara - TÜRKİYE
2Ankara University, Faculty of Veterinary Medicine, Department of Biostatistics, TR-06070 Ankara - TÜRKİYE
3Ankara University, Faculty of Veterinary Medicine, Department of Animal Science, TR-06070 Ankara - TÜRKİYE
DOI : 10.9775/kvfd.2023.30325 Classifiers in machine learning work on the principle that the observations are evenly distributed across the classes. However, real-world datasets frequently exhibit skewed distributions of classes, which is called imbalanced, causing the classifiers make highly biased predictions. One of the several method groups that deal with imbalance data problem is class balancing methods. We aimed to compare some class balancing methods during the classification of pacing horses according to their origins. Data set contains morphological traits of horses and four origin classes with different sample sizes that leads a multi-class imbalanced data problem. Training data set was modified with different balancing methods. Each balanced data set was trained with C5.0, Random Forest and Extreme Gradient Boosting Machine classifiers. Method comparisons were made based on comparison metrics using the original test set. The best prediction result was obtained on the data set balanced with random undersampling method regarding both G-mean and Matthews Correlation Coefficient; however, the best result according to F1 score was observed on the data set balanced with Adaptive Synthetic Sampling Approach (ADASYN). Primary important variables of the best models were body length, withers height, chest circumference and rump height. The Bulgarian origin was the most accurately predicted class despite having the smallest sample size. Class balancing methods clearly improved the performance of classifiers for predicting origins of pacing horses. Keywords : Class balancing methods, Imbalanced data, Machine learning, Multi-class classification, Pacing horses