e-ISSN 2231-8526
ISSN 0128-7680
Andri, Darwin, Ng Poi Wong, Sutarman, and Erna Budhiarti Nababan
Pertanika Journal of Science & Technology, Pre-Press
DOI: https://doi.org/10.47836/jst.34.1.13
Keywords: CatBoost, complex dataset, hyperparameter tuning, imbalanced dataset, random forest
Published: 2026-02-06
Research on hyperparameter tuning of Random Forest (RF) and CatBoost on imbalanced datasets often focusses on these algorithms separately, with limited evaluation of both comprehensively across a wide range of dataset complexities. The effects of hyperparameter tuning in handling pattern complexity and class distribution in real-world imbalanced datasets remain unexplored. This gap hinders the understanding of how hyperparameter optimisation can improve performance for such data, leading to potential model unoptimality and challenges in selecting the most effective algorithm. This research evaluates the impact of RF and CatBoost hyperparameter tuning on complex and imbalanced real-world datasets, with XGBoost added as a comparative baseline. The datasets used include binary and multi-class categories with varying degrees of class imbalance and feature complexity, as measured using Shannon Entropy and Coefficient of Variation (CV). Hyperparameter tuning uses Bayesian Optimisation (BO-TPE), Hyperband (HB), and Random Search (RS). Results show that datasets with high CV result in significant differences between accuracy and F1-score values. Hyperparameter tuning on RF improved the average accuracy and F1 score on binary-class datasets, but did not have a significant impact on AUC. In contrast, tuning on RF for multi-class datasets provided more consistent improvements across all three evaluation metrics. On the other hand, CatBoost and XGBoost tuning provided consistent average improvements on all three metrics for both binary and multi-class datasets. CatBoost generally shows the best efficiency on large datasets, followed by XGBoost and RF. In contrast, on small datasets, XGBoost is the most efficient, followed by RF and CatBoost.
ISSN 0128-7702
e-ISSN 2231-8534
Share this article