PERTANIKA JOURNAL OF SCIENCE AND TECHNOLOGY

 

e-ISSN 2231-8526
ISSN 0128-7680

Home / Regular Issue / JST Vol. 34 (2) Apr. 2026 / JST-6220-2025

 

An Interpretable Random Forest with SHAP Explanations for Multiclass Skill Level Classification Model in Malaysia’s Labour Market

Rabi’atul’adawiah Shabli, Ahmad Zia Ul-Saufie, Arief Gusnanto, and Nurain Ibrahim

Pertanika Journal of Science & Technology, Volume 34, Issue 2, April 2026

DOI: https://doi.org/10.47836/pjst.34.2.25

Keywords: Classification, explainable artificial intelligence, feature selection, labour market, random forest, SHAP, skill level, XGBoost feature importance

Published on: 2026-04-30

Skill level classification is essential for effective human capital interventions, wage policy formulation and long-term workforce development. However, the black-box nature of machine learning algorithms limits the transparency and interpretability required for policymaking. This study aims to develop an accurate and interpretable multiclass skill level classification model for the Malaysian labour market. The model was constructed using secondary data from the Department of Statistics Malaysia’s 2023 Salaries and Wages Survey, covering 120,518 Malaysian workers. Data preprocessing involved cleaning, transformation and addressing class imbalance using the Synthetic Minority Oversampling Technique (SMOTE). Using the top nineteen XGBoost feature importance scores, a Random Forest model was selected due to its robust ensemble mechanism and trained using these features. The proposed model exhibits strong predictive performance across all metrics, achieving an accuracy of 0.8754, sensitivity of 0.8485, specificity of 0.9384 and F1-score of 0.8378. The interpretability of the most influential predictors was investigated using SHapley Additive exPlanations (SHAP) at both global and local levels, revealing the importance of salaries and wages activity, highest certificate and education level. These features provide deeper insights into understanding skill level classification across skilled, semi-skilled and low-skilled categories. The findings emphasise the significance of combining high predictive accuracy with transparent feature interpretability in labour market analysis. This study represents one of the initial labour market studies in Malaysia integrating Random Forest with SHAP. The proposed interpretable skill level framework can be utilised to facilitate evidence-based policymaking, targeted human capital interventions and workforce development planning in Malaysia.

ISSN 0128-7680

e-ISSN 2231-8526

Article ID

JST-6220-2025

Download Full Article PDF

Share this article

Recent Articles