PERTANIKA JOURNAL OF SCIENCE AND TECHNOLOGY

 

e-ISSN 2231-8526
ISSN 0128-7680

Home / Pre-Press / JST-5409-2024

 

Thematic Trends on Data Quality Studies in Big Data Analytics: A Review

Nazliah Chikon, Shuzlina Abdul-Rahman and Syaripah Ruzaini Syed Aris

Pertanika Journal of Science & Technology, Pre-Press

DOI: https://doi.org/10.47836/pjst.33.3.07

Keywords: Artificial intelligence, big data analytics, data analytics, data quality, governance

Published: 2025-03-26

Data quality has become a critical issue in research and practice in the era of exponential data generation and increasing reliance on big data analytics (BDA) across industries. This study conducts a thematic analysis of literature published between 2020 and 2024 to examine the prevailing trends, challenges, and advancements in data quality studies within the domain of BDA. Guided by the systematic thematic review methodology, the research analysed 34 peer-reviewed studies identified from SCOPUS and Web of Science (WoS) databases, using qualitative data analysis tools such as ATLAS.ti. The findings reveal five major themes: Ontology and Data Quality Frameworks, Big Data Analytics in Various Industries, Machine Learning and AI Integration, Governance and Data Stewardship, and Tools and Techniques for Data Analysis. These themes highlight a shift towards interdisciplinary approaches, integrating advanced technologies like Artificial Intelligence (AI) and the Internet of Things (IoT) to address data quality issues. Limitations include potential selection bias from database restrictions and the exclusion of subscription-based journals, which may limit the generalisability of the findings. The study contributes to the theory by providing a comprehensive synthesis of data quality trends and their implications across various sectors. Methodologically, it demonstrates the utility of thematic analysis for consolidating diverse research. Practically, the insights inform data practitioners and policymakers on governance and technological strategies for ensuring data integrity. This review is original in its systematic exploration of thematic trends in data quality, offering a valuable roadmap for future research and addressing the critical intersection of data quality and BDA.

  • Al-Madhrahi, Z., Singh, D., & Yadegaridehkordi, E. (2022). Integrating big data analytics into business process modelling: Possible contributions and challenges. International Journal of Advanced Computer Science and Applications, 13(6), 461–468. https://doi.org/10.14569/IJACSA.2022.0130657

    Barba-González, C., Caballero, I., Varela-Vaca, Á. J., Cruz-Lemus, J. A., Gómez-López, M. T., & Navas-Delgado, I. (2024). BIGOWL4DQ: Ontology-driven approach for big data quality meta-modelling, selection and reasoning. Information and Software Technology, 167, Article 107378. https://doi.org/10.1016/j.infsof.2023.107378

    Bui, K. Q., & Perera, L. P. (2021). Advanced data analytics for ship performance monitoring under localized operational conditions. Ocean Engineering, 235, Article 109392. https://doi.org/10.1016/j.oceaneng.2021.109392

    Chang, X., Huang, Y., Li, M., Bo, X., & Kumar, S. (2021). Efficient detection of environmental violators: A big data approach. Production and Operations Management, 30(5), 1246–1270. https://doi.org/10.1111/poms.13272

    Chen, C., Choi, H. S., & Ractham, P. (2022). Data, attitudinal and organizational determinants of big data analytics systems use. Cogent Business & Management, 9(1), Article 2043535. https://doi.org/10.1080/23311975.2022.2043535

    Chen, Y., Bai, R., Wu, Y., Li, T., & Zhou, H. (2023). A multidimensional data utility evaluation and pricing scheme in the big data market. Wireless Communications and Mobile Computing, 2023(1), Article 6217495. https://doi.org/10.1155/2023/6217495

    Clarke, V., & Braun, V. (2013). Teaching thematic analysis: Overcoming challenges and developing strategies for effective learning. The Psychologist, 26(2), 120–123.

    Côrte-Real, N., Ruivo, P., & Oliveira, T. (2020). Leveraging internet of things and big data analytics initiatives in European and American firms: Is data quality a way to extract business value? Information & Management, 57, Article 103141. https://doi.org/10.1016/j.im.2019.01.003

    Hao, X., & Demir, E. (2024). Artificial intelligence in supply chain decision-making: an environmental, social, and governance triggering and technological inhibiting protocol. Journal of Modelling in Management, 19(2), 605–629. https://doi.org/10.1108/JM2-01-2023-0009

    Hart, P., He, L., Wang, T., Kumar, V. S., Aggour, K., Subramanian, A., & Yan, W. (2022). Application of big data analytics and machine learning to large-scale synchrophasor datasets: Evaluation of dataset ‘Machine Learning-Readiness’. IEEE Open Access Journal of Power and Energy, 9, 386–397. https://doi.org/10.1109/OAJPE.2022.3197553

    Jha, A. K., Agi, M. A. N. N., & Ngai, E. W. T. T. (2020). A note on big data analytics capability development in supply chain. Decision Support Systems, 138(2020), Article 113382. https://doi.org/10.1016/j.dss.2020.113382

    Johnson, D. S., Sihi, D., & Muzellec, L. (2021). Implementing big data analytics in marketing departments: Mixing organic and administered approaches to increase data-driven decision making. Informatics, 8(4), Article 66. https://doi.org/10.3390/informatics8040066

    Konanahalli, A., Marinelli, M., & Oyedele, L. (2022). Drivers and challenges associated with the implementation of big data within U.K. facilities management sector: An exploratory factor analysis approach. IEEE Transactions on Engineering Management, 69(4), 916–929. https://doi.org/10.1109/TEM.2019.2959914

    Lavalle, A., Teruel, M. A., Maté, A., & Trujillo, J. (2020). Improving sustainability of smart cities through visualization techniques for big data from IoT devices. Sustainability, 12(14), Article 5595. https://doi.org/10.3390/su12145595

    Medeiros, M. M., MaçAda, A. C. G., & Hoppen, N. (2021). The role of big data stewardship and analytics as enablers of corporate performance management. Revista de Administracao Mackenzie, 22(6), Article eRAMD210063. https://doi.org/10.1590/1678-6971/eRAMD210063

    Patrucco, A. S., Marzi, G., & Trabucchi, D. (2023). The role of absorptive capacity and big data analytics in strategic purchasing and supply chain management decisions. Technovation, 126(2023), Article 102814. https://doi.org/10.1016/j.technovation.2023.102814

    Phan, D. T., & Tran, L. Q. T. (2022). Building a conceptual framework for using big data analytics in the banking sector. Intellectual Economics, 16(1), 5–23. https://doi.org/10.13165/IE-22-16-1-01

    Radhakrishnan, J., Gupta, S., & Prashar, S. (2022). Understanding organizations’ artificial intelligence journey: A qualitative approach. Pacific Asia Journal of the Association for Information Systems, 14(6), 43–77. https://doi.org/10.17705/1pais.14602

    Rana, N. P., Chatterjee, S., Dwivedi, Y. K., & Akter, S. (2022). Understanding dark side of artificial intelligence (AI) integrated business analytics: assessing firm’s operational inefficiency and competitiveness. European Journal of Information Systems, 31(3), 364–387. https://doi.org/10.1080/0960085X.2021.1955628

    Savoska, S., & Ristevski, B. (2020). Towards implementation of big data concepts in a pharmaceutical company. Open Computer Science, 10(1), 343–356. https://doi.org/10.1515/comp-2020-0201

    Shahi, K. (2023). Volunteered Geographic Information (VGI) in Spatial Data Infrastructure (SDI) continuum. EAI Endorsed Transactions on Internet of Things, 9(1), Article e3. https://doi.org/10.4108/eetiot.v9i1.2979

    Shidaganti, G., & Prakash, S. (2021). A comprehensive framework for big data analytics in education. International Journal of Advanced Computer Science and Applications, 12(9), 218–227. https://doi.org/10.14569/IJACSA.2021.0120926

    Song, F. (2024). Incorporating Morris’ design thoughts for AI and big data-enabled coverage optimization in China’s wireless communication network. Journal of Information Systems Engineering and Management, 9(1), Article 23622. https://doi.org/10.55267/iadt.07.14076

    Song, J., Xia, S., Vrontis, D., Sukumar, A., Liao, B., Li, Q., Tian, K., & Yao, N. (2022). The source of SMEs’ competitive performance in COVID-19: Matching big data analytics capability to business models. Information Systems Frontiers, 24, 1167–1187. https://doi.org/10.1007/s10796-022-10287-0

    Spanaki, K., Karafili, E., & Despoudi, S. (2021). AI applications of data sharing in agriculture 4.0: A framework for role-based data access control. International Journal of Information Management, 59, Article 102350. https://doi.org/10.1016/j.ijinfomgt.2021.102350

    Šprem, Š., Tomažin, N., Matečić, J., & Horvat, M. (2024). Building advanced web applications using data ingestion and data processing tools. Electronics, 13(4), Article 0709. https://doi.org/10.3390/electronics13040709

    Stach, C., Behringer, M., Bräcker, J., Gritti, C., & Mitschang, B. (2022). SMARTEN - A sample-based approach towards privacy-friendly data refinement. Journal of Cybersecurity and Privacy, 2(3), 606–628. https://doi.org/10.3390/jcp2030031

    Szukits, Á., & Móricz, P. (2023). Towards data-driven decision making: The role of analytical culture and centralization efforts. Review of Managerial Science, 18(10), 2849-2887. https://doi.org/10.1007/s11846-023-00694-1

    Teh, H. Y., Kempa-Liehr, A. W., & Wang, K. I. K. (2020). Sensor data quality: A systematic review. Journal of Big Data, 7(1), 1–49. https://doi.org/10.1186/s40537-020-0285-1

    Timotijevic, L., Carr, I., De La Cueva, J., Eftimov, T., Hodgkins, C. E., Seljak, B. K., Mikkelsen, B. E., Selnes, T., Van’t Veer, P., & Zimmermann, K. (2022). Responsible governance for a food and nutrition e-infrastructure: Case study of the determinants and intake data platform. Frontiers in Nutrition, 8, Article 795802. https://doi.org/10.3389/fnut.2021.795802

    Widad, E., Saida, E., & Gahi, Y. (2023). Quality anomaly detection using predictive techniques: An extensive big data quality framework for reliable data analysis. IEEE Access, 11, 103306–103318. https://doi.org/10.1109/ACCESS.2023.3317354

    Wook, M., Hasbullah, N. A., Zainudin, N. M., Jabar, Z. Z. A., Ramli, S., Razali, N. A. M., & Yusop, N. M. M. (2021). Exploring big data traits and data quality dimensions for big data analytics application using partial least squares structural equation modelling. Journal of Big Data, 8(1), 1-15. https://doi.org/10.1186/s40537-021-00439-5

    Wurster, F., Beckmann, M., Cecon-Stabel, N., Dittmer, K., Jes Hansen, T., Jaschke, J., Köberlein-Neu, J., Okumu, M. R., Rusniok, C., Pfaff, H., & Karbach, U. (2024). The implementation of an electronic medical record in a German Hospital and the change in completeness of documentation: Longitudinal document analysis. JMIR Medical Informatics, 12(1), Article e47761. https://doi.org/10.2196/47761

    Yahia, N. B., Hlel, J., & Colomo-Palacios, R. (2021). From big data to deep data to support people analytics for employee attrition prediction. IEEE Access, 9, 60447–60458. https://doi.org/10.1109/ACCESS.2021.3074559

    Yu, J., Taskin, N., Nguyen, C. P., Li, J., & Pauleen, D. J. (2022). Investigating the determinants of big data analytics adoption in decision making: An empirical study in New Zealand, China, and Vietnam. Pacific Asia Journal of the Association for Information Systems, 14(4), 62–99. https://doi.org/10.17705/1pais.14403

    Zairul, M. (2020). A thematic review on student-centered learning in the studio education. Journal of Critical Reviews, 7(2), 504–511. https://doi.org/10.31838/jcr.07.02.95

    Zairul, M. (2021). A thematic review on Industrialised Building System (IBS) publications from 2015-2019: Analysis of patterns and trends for future studies of IBS in Malaysia. Pertanika Journal of Social Sciences and Humanities, 29(1), 635–652. https://doi.org/10.47836/PJSSH.29.1.35

    Zairul, M. (2023). Thematic Review template (Patent No. CRLY2023W02032). Controller of Copyright.

    Zairul, M., Azli, M., & Azlan, A. (2023). Defying tradition or maintaining the status quo? Moving towards a new hybrid architecture studio education to support blended learning post-COVID-19. Archnet-IJAR: International Journal of Architectural Research, 17(3), 554–573. https://doi.org/10.1108/ARCH-11-2022-0251

    Zairul, M., & Zaremohzzabieh, Z. (2023). Thematic trends in Industry 4.0 Revolution potential towards sustainability in the construction industry. Sustainability, 15, Article 7720. https://doi.org/10.3390/su15097720

    Zhang, G. (2022). Detecting and visualizing observation hot-spots in massive volunteer-contributed geographic data across spatial scales using GPU-accelerated kernel density estimation. ISPRS International Journal of Geo-Information, 11(1), Article 55. https://doi.org/10.3390/ijgi11010055

ISSN 0128-7702

e-ISSN 2231-8534

Article ID

JST-5409-2024

Download Full Article PDF

Share this article

Related Articles