e-ISSN 2231-8526
ISSN 0128-7680
Rabi’atul’adawiah Shabli, Ahmad Zia Ul-Saufie, Arief Gusnanto, and Nurain Ibrahim
Pertanika Journal of Science & Technology, Volume 34, Issue 2, April 2026
DOI: https://doi.org/10.47836/pjst.34.2.25
Keywords: Classification, explainable artificial intelligence, feature selection, labour market, random forest, SHAP, skill level, XGBoost feature importance
Published on: 2026-04-30
Skill level classification is essential for effective human capital interventions, wage policy formulation and long-term workforce development. However, the black-box nature of machine learning algorithms limits the transparency and interpretability required for policymaking. This study aims to develop an accurate and interpretable multiclass skill level classification model for the Malaysian labour market. The model was constructed using secondary data from the Department of Statistics Malaysia’s 2023 Salaries and Wages Survey, covering 120,518 Malaysian workers. Data preprocessing involved cleaning, transformation and addressing class imbalance using the Synthetic Minority Oversampling Technique (SMOTE). Using the top nineteen XGBoost feature importance scores, a Random Forest model was selected due to its robust ensemble mechanism and trained using these features. The proposed model exhibits strong predictive performance across all metrics, achieving an accuracy of 0.8754, sensitivity of 0.8485, specificity of 0.9384 and F1-score of 0.8378. The interpretability of the most influential predictors was investigated using SHapley Additive exPlanations (SHAP) at both global and local levels, revealing the importance of salaries and wages activity, highest certificate and education level. These features provide deeper insights into understanding skill level classification across skilled, semi-skilled and low-skilled categories. The findings emphasise the significance of combining high predictive accuracy with transparent feature interpretability in labour market analysis. This study represents one of the initial labour market studies in Malaysia integrating Random Forest with SHAP. The proposed interpretable skill level framework can be utilised to facilitate evidence-based policymaking, targeted human capital interventions and workforce development planning in Malaysia.
ISSN 0128-7680
e-ISSN 2231-8526