e-ISSN 2231-8526
ISSN 0128-7680
Nazliah Chikon, Shuzlina Abdul-Rahman and Syaripah Ruzaini Syed Aris
Pertanika Journal of Science & Technology, Pre-Press
DOI: https://doi.org/10.47836/pjst.33.3.07
Keywords: Artificial intelligence, big data analytics, data analytics, data quality, governance
Published: 2025-03-26
Data quality has become a critical issue in research and practice in the era of exponential data generation and increasing reliance on big data analytics (BDA) across industries. This study conducts a thematic analysis of literature published between 2020 and 2024 to examine the prevailing trends, challenges, and advancements in data quality studies within the domain of BDA. Guided by the systematic thematic review methodology, the research analysed 34 peer-reviewed studies identified from SCOPUS and Web of Science (WoS) databases, using qualitative data analysis tools such as ATLAS.ti. The findings reveal five major themes: Ontology and Data Quality Frameworks, Big Data Analytics in Various Industries, Machine Learning and AI Integration, Governance and Data Stewardship, and Tools and Techniques for Data Analysis. These themes highlight a shift towards interdisciplinary approaches, integrating advanced technologies like Artificial Intelligence (AI) and the Internet of Things (IoT) to address data quality issues. Limitations include potential selection bias from database restrictions and the exclusion of subscription-based journals, which may limit the generalisability of the findings. The study contributes to the theory by providing a comprehensive synthesis of data quality trends and their implications across various sectors. Methodologically, it demonstrates the utility of thematic analysis for consolidating diverse research. Practically, the insights inform data practitioners and policymakers on governance and technological strategies for ensuring data integrity. This review is original in its systematic exploration of thematic trends in data quality, offering a valuable roadmap for future research and addressing the critical intersection of data quality and BDA.
Al-Madhrahi, Z., Singh, D., & Yadegaridehkordi, E. (2022). Integrating big data analytics into business process modelling: Possible contributions and challenges. International Journal of Advanced Computer Science and Applications, 13(6), 461–468. https://doi.org/10.14569/IJACSA.2022.0130657
Barba-González, C., Caballero, I., Varela-Vaca, Á. J., Cruz-Lemus, J. A., Gómez-López, M. T., & Navas-Delgado, I. (2024). BIGOWL4DQ: Ontology-driven approach for big data quality meta-modelling, selection and reasoning. Information and Software Technology, 167, Article 107378. https://doi.org/10.1016/j.infsof.2023.107378
Bui, K. Q., & Perera, L. P. (2021). Advanced data analytics for ship performance monitoring under localized operational conditions. Ocean Engineering, 235, Article 109392. https://doi.org/10.1016/j.oceaneng.2021.109392
Chang, X., Huang, Y., Li, M., Bo, X., & Kumar, S. (2021). Efficient detection of environmental violators: A big data approach. Production and Operations Management, 30(5), 1246–1270. https://doi.org/10.1111/poms.13272
Chen, C., Choi, H. S., & Ractham, P. (2022). Data, attitudinal and organizational determinants of big data analytics systems use. Cogent Business & Management, 9(1), Article 2043535. https://doi.org/10.1080/23311975.2022.2043535
Chen, Y., Bai, R., Wu, Y., Li, T., & Zhou, H. (2023). A multidimensional data utility evaluation and pricing scheme in the big data market. Wireless Communications and Mobile Computing, 2023(1), Article 6217495. https://doi.org/10.1155/2023/6217495
Clarke, V., & Braun, V. (2013). Teaching thematic analysis: Overcoming challenges and developing strategies for effective learning. The Psychologist, 26(2), 120–123.
Côrte-Real, N., Ruivo, P., & Oliveira, T. (2020). Leveraging internet of things and big data analytics initiatives in European and American firms: Is data quality a way to extract business value? Information & Management, 57, Article 103141. https://doi.org/10.1016/j.im.2019.01.003
Hao, X., & Demir, E. (2024). Artificial intelligence in supply chain decision-making: an environmental, social, and governance triggering and technological inhibiting protocol. Journal of Modelling in Management, 19(2), 605–629. https://doi.org/10.1108/JM2-01-2023-0009
Hart, P., He, L., Wang, T., Kumar, V. S., Aggour, K., Subramanian, A., & Yan, W. (2022). Application of big data analytics and machine learning to large-scale synchrophasor datasets: Evaluation of dataset ‘Machine Learning-Readiness’. IEEE Open Access Journal of Power and Energy, 9, 386–397. https://doi.org/10.1109/OAJPE.2022.3197553
Jha, A. K., Agi, M. A. N. N., & Ngai, E. W. T. T. (2020). A note on big data analytics capability development in supply chain. Decision Support Systems, 138(2020), Article 113382. https://doi.org/10.1016/j.dss.2020.113382
Johnson, D. S., Sihi, D., & Muzellec, L. (2021). Implementing big data analytics in marketing departments: Mixing organic and administered approaches to increase data-driven decision making. Informatics, 8(4), Article 66. https://doi.org/10.3390/informatics8040066
Konanahalli, A., Marinelli, M., & Oyedele, L. (2022). Drivers and challenges associated with the implementation of big data within U.K. facilities management sector: An exploratory factor analysis approach. IEEE Transactions on Engineering Management, 69(4), 916–929. https://doi.org/10.1109/TEM.2019.2959914
Lavalle, A., Teruel, M. A., Maté, A., & Trujillo, J. (2020). Improving sustainability of smart cities through visualization techniques for big data from IoT devices. Sustainability, 12(14), Article 5595. https://doi.org/10.3390/su12145595
Medeiros, M. M., MaçAda, A. C. G., & Hoppen, N. (2021). The role of big data stewardship and analytics as enablers of corporate performance management. Revista de Administracao Mackenzie, 22(6), Article eRAMD210063. https://doi.org/10.1590/1678-6971/eRAMD210063
Patrucco, A. S., Marzi, G., & Trabucchi, D. (2023). The role of absorptive capacity and big data analytics in strategic purchasing and supply chain management decisions. Technovation, 126(2023), Article 102814. https://doi.org/10.1016/j.technovation.2023.102814
Phan, D. T., & Tran, L. Q. T. (2022). Building a conceptual framework for using big data analytics in the banking sector. Intellectual Economics, 16(1), 5–23. https://doi.org/10.13165/IE-22-16-1-01
Radhakrishnan, J., Gupta, S., & Prashar, S. (2022). Understanding organizations’ artificial intelligence journey: A qualitative approach. Pacific Asia Journal of the Association for Information Systems, 14(6), 43–77. https://doi.org/10.17705/1pais.14602
Rana, N. P., Chatterjee, S., Dwivedi, Y. K., & Akter, S. (2022). Understanding dark side of artificial intelligence (AI) integrated business analytics: assessing firm’s operational inefficiency and competitiveness. European Journal of Information Systems, 31(3), 364–387. https://doi.org/10.1080/0960085X.2021.1955628
Savoska, S., & Ristevski, B. (2020). Towards implementation of big data concepts in a pharmaceutical company. Open Computer Science, 10(1), 343–356. https://doi.org/10.1515/comp-2020-0201
Shahi, K. (2023). Volunteered Geographic Information (VGI) in Spatial Data Infrastructure (SDI) continuum. EAI Endorsed Transactions on Internet of Things, 9(1), Article e3. https://doi.org/10.4108/eetiot.v9i1.2979
Shidaganti, G., & Prakash, S. (2021). A comprehensive framework for big data analytics in education. International Journal of Advanced Computer Science and Applications, 12(9), 218–227. https://doi.org/10.14569/IJACSA.2021.0120926
Song, F. (2024). Incorporating Morris’ design thoughts for AI and big data-enabled coverage optimization in China’s wireless communication network. Journal of Information Systems Engineering and Management, 9(1), Article 23622. https://doi.org/10.55267/iadt.07.14076
Song, J., Xia, S., Vrontis, D., Sukumar, A., Liao, B., Li, Q., Tian, K., & Yao, N. (2022). The source of SMEs’ competitive performance in COVID-19: Matching big data analytics capability to business models. Information Systems Frontiers, 24, 1167–1187. https://doi.org/10.1007/s10796-022-10287-0
Spanaki, K., Karafili, E., & Despoudi, S. (2021). AI applications of data sharing in agriculture 4.0: A framework for role-based data access control. International Journal of Information Management, 59, Article 102350. https://doi.org/10.1016/j.ijinfomgt.2021.102350
Šprem, Š., Tomažin, N., Matečić, J., & Horvat, M. (2024). Building advanced web applications using data ingestion and data processing tools. Electronics, 13(4), Article 0709. https://doi.org/10.3390/electronics13040709
Stach, C., Behringer, M., Bräcker, J., Gritti, C., & Mitschang, B. (2022). SMARTEN - A sample-based approach towards privacy-friendly data refinement. Journal of Cybersecurity and Privacy, 2(3), 606–628. https://doi.org/10.3390/jcp2030031
Szukits, Á., & Móricz, P. (2023). Towards data-driven decision making: The role of analytical culture and centralization efforts. Review of Managerial Science, 18(10), 2849-2887. https://doi.org/10.1007/s11846-023-00694-1
Teh, H. Y., Kempa-Liehr, A. W., & Wang, K. I. K. (2020). Sensor data quality: A systematic review. Journal of Big Data, 7(1), 1–49. https://doi.org/10.1186/s40537-020-0285-1
Timotijevic, L., Carr, I., De La Cueva, J., Eftimov, T., Hodgkins, C. E., Seljak, B. K., Mikkelsen, B. E., Selnes, T., Van’t Veer, P., & Zimmermann, K. (2022). Responsible governance for a food and nutrition e-infrastructure: Case study of the determinants and intake data platform. Frontiers in Nutrition, 8, Article 795802. https://doi.org/10.3389/fnut.2021.795802
Widad, E., Saida, E., & Gahi, Y. (2023). Quality anomaly detection using predictive techniques: An extensive big data quality framework for reliable data analysis. IEEE Access, 11, 103306–103318. https://doi.org/10.1109/ACCESS.2023.3317354
Wook, M., Hasbullah, N. A., Zainudin, N. M., Jabar, Z. Z. A., Ramli, S., Razali, N. A. M., & Yusop, N. M. M. (2021). Exploring big data traits and data quality dimensions for big data analytics application using partial least squares structural equation modelling. Journal of Big Data, 8(1), 1-15. https://doi.org/10.1186/s40537-021-00439-5
Wurster, F., Beckmann, M., Cecon-Stabel, N., Dittmer, K., Jes Hansen, T., Jaschke, J., Köberlein-Neu, J., Okumu, M. R., Rusniok, C., Pfaff, H., & Karbach, U. (2024). The implementation of an electronic medical record in a German Hospital and the change in completeness of documentation: Longitudinal document analysis. JMIR Medical Informatics, 12(1), Article e47761. https://doi.org/10.2196/47761
Yahia, N. B., Hlel, J., & Colomo-Palacios, R. (2021). From big data to deep data to support people analytics for employee attrition prediction. IEEE Access, 9, 60447–60458. https://doi.org/10.1109/ACCESS.2021.3074559
Yu, J., Taskin, N., Nguyen, C. P., Li, J., & Pauleen, D. J. (2022). Investigating the determinants of big data analytics adoption in decision making: An empirical study in New Zealand, China, and Vietnam. Pacific Asia Journal of the Association for Information Systems, 14(4), 62–99. https://doi.org/10.17705/1pais.14403
Zairul, M. (2020). A thematic review on student-centered learning in the studio education. Journal of Critical Reviews, 7(2), 504–511. https://doi.org/10.31838/jcr.07.02.95
Zairul, M. (2021). A thematic review on Industrialised Building System (IBS) publications from 2015-2019: Analysis of patterns and trends for future studies of IBS in Malaysia. Pertanika Journal of Social Sciences and Humanities, 29(1), 635–652. https://doi.org/10.47836/PJSSH.29.1.35
Zairul, M. (2023). Thematic Review template (Patent No. CRLY2023W02032). Controller of Copyright.
Zairul, M., Azli, M., & Azlan, A. (2023). Defying tradition or maintaining the status quo? Moving towards a new hybrid architecture studio education to support blended learning post-COVID-19. Archnet-IJAR: International Journal of Architectural Research, 17(3), 554–573. https://doi.org/10.1108/ARCH-11-2022-0251
Zairul, M., & Zaremohzzabieh, Z. (2023). Thematic trends in Industry 4.0 Revolution potential towards sustainability in the construction industry. Sustainability, 15, Article 7720. https://doi.org/10.3390/su15097720
Zhang, G. (2022). Detecting and visualizing observation hot-spots in massive volunteer-contributed geographic data across spatial scales using GPU-accelerated kernel density estimation. ISPRS International Journal of Geo-Information, 11(1), Article 55. https://doi.org/10.3390/ijgi11010055
ISSN 0128-7702
e-ISSN 2231-8534
Share this article