e-ISSN 2231-8526
ISSN 0128-7680
Allistair Nallie Konsil, Stephanie Chua1, Jacey Lynn Minoi, Md Mizanur Rahman, Gerraint Gillan, Rafazila Ramli, Rasitasam Safii, Ahmad Sofian bin Shminan, and Lee Jun Choi
Pertanika Journal of Science & Technology, Pre-Press
DOI: https://doi.org/10.47836/pjst.34.3.17
Keywords: BERT, ensemble, lightweight models, question answering, robustness
Published: 2026-06-19
Large language models (LLMs) such as BERT (Bidirectional Encoder Representations from Transformers) have caused strides of progress in the field of natural language processing (NLP), with lightweight variants such as DistilBERT, MobileBERT and TinyBERT being developed to lower the resource requirements to deploy the models in real-world settings. However, while past research has investigated improving non-distilled models on a variety of benchmarks by changing their architecture, there are limited studies that explore how the lightweight variants may perform in the same conditions. This study applies modifications and ensemble techniques on lightweight BERT models in extractive question answering (QA) to address this gap. The experiments were conducted using three datasets: SQuAD, AdversarialQA, and a newly curated Sexual and Reproductive Health QA (SRHQA) dataset consisting of 1000 samples. From the results, it was shown that applying the same modifications that would enhance the base BERT models to the lightweight variants generally caused a 0.14 - 5.20% F1 decrease in performance, with marginal exceptions observed in specific dataset-model combinations. Ensembling, on the other hand, showed improvements across all the datasets, ranging from 2.19 - 19.46% F1 over the BERT baseline. The results of the study highlight the sensitivity of the lightweight models, their trade-offs with efficiency, and that an ensemble is a valid approach to utilising the lightweight models without architectural modifications.
ISSN 0128-7702
e-ISSN 2231-8534
Share this article