e-ISSN 2231-8526
ISSN 0128-7680
Tuong Kiet Ngang and Chih How Bong
Pertanika Journal of Science & Technology, Pre-Press
DOI: https://doi.org/10.47836/pjst.34.3.10
Keywords: Artificial intelligence, automatic question generation, deep learning, natural language processing, question generation, Text-to-Text Transfer Transformer (T5), transfer learning
Published: 2026-06-19
Creating assessment questions to evaluate student achievement is a time-intensive task, especially in specialised subjects like Physics, particularly for structured, calculation-based problems in Science, Technology, Engineering, and Mathematics (STEM) disciplines. This study presents a web-based Automatic Question Generation (AQG) system that generates Physics questions for Malaysian upper secondary levels (Form 4 and 5). Covering topics such as Force and Motion, Heat, Light and Optics (Form 4), Pressure, and Electricity (Form 5), the system leverages transfer learning through the fine-tuning of the Text-to-Text Transfer Transformer (T5) model, a state-of-the-art natural language processing (NLP) technique. The methodology encompasses data construction, pre-processing, dataset generation, fine-tuning T5 model, evaluation, and inference. Performance was assessed through system experiments, ROUGE-L automatic evaluations, and human evaluations by expert educators, focusing on relevance, correctness, usefulness, and variety. The high ROUGE-L scores (0.82–0.85) indicate strong alignment with reference questions, while human evaluations demonstrate that the system generates contextually relevant and high-quality questions. The results from this study show that the AQG system matches the template approach for quality, but it is far more flexible and saves teachers a lot of manual work. It can also be scaled easily should more questions are needed. A comparative analysis with ChatGPT-4 was conducted, revealing the edge that a purpose-built, structured generator has over a broad and open-ended one. In short, deep-learning NLP can automate domain-specific question writing and make large-scale assessment design much simpler. These findings should interest researchers in computational linguistics, AI, and test automation.
ISSN 0128-7702
e-ISSN 2231-8534
Share this article