Pertanika Journal

Home / Pre-Press / JST-6276-2025

Studying the Effects of Knowledge Distillation and the Fragility of Lightweight BERT-based Models

Allistair Nallie Konsil, Stephanie Chua1, Jacey Lynn Minoi, Md Mizanur Rahman, Gerraint Gillan, Rafazila Ramli, Rasitasam Safii, Ahmad Sofian bin Shminan, and Lee Jun Choi

Pertanika Journal of Science & Technology, Pre-Press

DOI: https://doi.org/10.47836/pjst.34.3.17

Keywords: BERT, ensemble, lightweight models, question answering, robustness

Published: 2026-06-19

Abstract

Large language models (LLMs) such as BERT (Bidirectional Encoder Representations from Transformers) have caused strides of progress in the field of natural language processing (NLP), with lightweight variants such as DistilBERT, MobileBERT and TinyBERT being developed to lower the resource requirements to deploy the models in real-world settings. However, while past research has investigated improving non-distilled models on a variety of benchmarks by changing their architecture, there are limited studies that explore how the lightweight variants may perform in the same conditions. This study applies modifications and ensemble techniques on lightweight BERT models in extractive question answering (QA) to address this gap. The experiments were conducted using three datasets: SQuAD, AdversarialQA, and a newly curated Sexual and Reproductive Health QA (SRHQA) dataset consisting of 1000 samples. From the results, it was shown that applying the same modifications that would enhance the base BERT models to the lightweight variants generally caused a 0.14 - 5.20% F1 decrease in performance, with marginal exceptions observed in specific dataset-model combinations. Ensembling, on the other hand, showed improvements across all the datasets, ranging from 2.19 - 19.46% F1 over the BERT baseline. The results of the study highlight the sensitivity of the lightweight models, their trade-offs with efficiency, and that an ensemble is a valid approach to utilising the lightweight models without architectural modifications.

ISSN 0128-7702

e-ISSN 2231-8534

Article ID

JST-6276-2025

PDF

Share this article

Make a Submission

PERTANIKA JOURNAL OF SCIENCE AND TECHNOLOGY

Studying the Effects of Knowledge Distillation and the Fragility of Lightweight BERT-based Models