e-ISSN 2231-8526
ISSN 0128-7680
Chin-Teng Lin, Mohammed Thanoon and Sami Karali
Pertanika Journal of Science & Technology, Volume 32, Issue 4, July 2024
DOI: https://doi.org/10.47836/pjst.32.4.09
Keywords: Arabic, Arabic dataset, Arabic news, ML, NLP, Twitter
Published on: 25 July 2024
This research develops a classification model for Arabic news tweets using Bidirectional Long Short-Term Memory networks (BiLSTM). Tweets about Arabic news were gathered between August 2016 and August 2020 and divided into five categories. Custom Python scripts, Twitter API and the GetOldTweets3 Python library were used to collect the data. BiLSTM was used to train and test the model. The results indicated an average accuracy, precision, recall, and f1-score of 0.88, 0.92, 0.88, and 0.89, respectively. The results could have practical implications for Arabic machine learning and NLP tasks in research and practice.
Abdelaal, H. M., Elmahdy, A. N., Halawa, A. A., & Youness, H. A. (2018). Improve the automatic classification accuracy for Arabic tweets using ensemble methods. Journal of Electrical Systems and Information Technology, 5(3), 363-370. https://doi.org/10.1016/j.jesit.2018.03.001
Ahmed, W., Bath, P. A., & Demartini, G. (2017). Using Twitter as a data source: An overview of ethical, legal, and methodological challenges. In K. Woodfield (Ed.), The Ethics of Online Research (Advances in Research Ethics and Integrity, (Vol. 2, pp. 79-107). Emerald Publishing Limited. https://doi.org/10.1108/S2398-601820180000002004
Al Sbou, A. M., Hussein, A., Talal, B., & Rashid, R. A. (2018). A survey of Arabic text classification models. International Journal of Electrical and Computer Engineering, 8(6), 4352-4355. https://dx.doi.org/ 10.11591/ijece.v8i6.pp4352-4355
Alabbas, W., Al-Khateeb, H. M., & Mansour, A. (2016). Arabic text classification methods: Systematic literature review of primary studies. In 2016 4th IEEE International Colloquium on Information Science and Technology (CiSt) (pp. 361-367). IEEE Publishing. https://doi.org/10.1109/CIST.2016.7805072
Alayba, A. M., Palade, V., England, M., & Iqbal, R. (2018). A combined CNN and LSTM model for Arabic sentiment analysis. In A. Holzinger, P. Kieseberg, A. Tjoa, E. Weippl (Eds.), Machine Learning and Knowledge ExtractionCD-MAKE 2018, Lecture Notes in Computer Science (Vol 11015, pp. 179-191). Springer. https://doi.org/10.1007/978-3-319-99740-7_12
Albalooshi, N., Mohamed, N., & Al-Jaroodi, J. (2011). The challenges of Arabic language use on the Internet. In 2011 International Conference for Internet Technology and Secured Transactions (pp. 378-382). IEEE Publishing.
Almuqren, L., & Cristea, A. (2021). AraCust: A Saudi Telecom tweets corpus for sentiment analysis. PeerJ Computer Science, 7, Article e510. https://doi.org/10.7717/peerj-cs.510
Alonso, M. A., Vilares, D., Gómez-Rodríguez, C., & Vilares, J. (2021). Sentiment analysis for fake news detection. Electronics, 10(11), Article 1348. https://doi.org/10.3390/electronics10111348
Al-Tahrawi, M. M., & Al-Khatib, S. N. (2015). Arabic text classification using Polynomial Networks. Journal of King Saud University-Computer and Information Sciences, 27(4), 437-449. https://doi.org/10.1016/j.jksuci.2015.02.003
Antoun, W., Baly, F., & Hajj, H. (2020). AraBERT: Transformer-based model for Arabic language understanding. ArXiv, Article 2003.00104. https://doi.org/10.48550/arXiv.2003.00104
Aslam, S. (2018). Twitter by the numbers: Stats, demographics & fun facts. Omnicoreagency. com. https://www.omnicoreagency.com/twitter-statistics/
Assiri, A., Emam, A., & Al-Dossari, H. (2018). Towards enhancement of a lexicon-based approach for Saudi dialect sentiment analysis. Journal of Information Science, 44(2), 184-202. https://doi.org/10.1177/0165551516688143
Bdeir, A. M., & Ibrahim, F. (2020). A framework for Arabic tweets multi-label classification using word embedding and neural networks algorithms. In Proceedings of the 2020 2nd International Conference on Big Data Engineering (pp. 105-112). ACM Publishing. https://doi.org/10.1145/3404512.3404526
Bekkali, M., & Lachkar, A. (2014). Arabic tweets categorization based on rough set theory. International Journal of Computer Science & Information Technology, 6, 83-96. https://dx.doi.org/ 10.5121/csit.2014.41109
Boukil, S., Biniz, M., El Adnani, F., Cherrat, L., & El Moutaouakkil, A. E. (2018). Arabic text classification using deep learning technics. International Journal of Grid and Distributed Computing, 11(9), 103-114. http://dx.doi.org/10.14257/ijgdc.2018.11.9.09
Buabin, E. (2012). Boosted hybrid recurrent neural classifier for text document classification on the Reuters news text corpus. International Journal of Machine Learning and Computing, 2(5), Article 588. https://dx.doi.org/ 10.7763/IJMLC.2012.V2.195
Chung, J., Gulcehre, C., Cho, K., & Bengio, Y. (2014). Empirical evaluation of gated recurrent neural networks on sequence modelling. ArXiv, Article 1412.3555. https://doi.org/10.48550/arXiv.1412.3555
Dahou, A., Xiong, S., Zhou, J., Haddoud, M. H., & Duan, P. (2016). Word embeddings and convolutional neural network for Arabic sentiment classification. In Proceedings of coling 2016, the 26th international conference on computational linguistics: Technical papers (pp. 2418-2427). The COLING 2016 Organizing Committee.
Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. Arxiv, Article 1810.04805. https://doi.org/10.48550/arXiv.1810.04805.
El Mahdaouy, A., Gaussier, E., & El Alaoui, S. O. (2017). Arabic text classification based on word and document embeddings. In A. Hassanien, K. Shaalan, T. Gaber, A. Azar, M. Tolba (Eds.), Proceedings of the International Conference on Advanced Intelligent Systems and Informatics 2016. Advances in Intelligent Systems and Computing (Vol. 533). Springer. https://doi.org/10.1007/978-3-319-48308-5_4
El-Alami, F. Z., & El Alaoui, S. O. (2016). An efficient method based on a deep learning approach for Arabic text categorization. International Arab Conference on Information Technology, 1-7.
El-Alami, F. Z., El Alaoui, S. O., & En-Nahnahi, N. (2020). Deep neural models and retrofitting for Arabic text categorization. International Journal of Intelligent Information Technologies (IJIIT), 16(2), 74-86. https://dx.doi.org/10.4018/IJIIT.2020040104
Elfaik, H., & Nfaoui, E. H. (2021). Combining context-aware embeddings and an attentional deep learning model for Arabic affect analysis on Twitter. IEEE Access, 9, 111214-111230. https://doi.org/10.1109/ACCESS.2021.3102087
Elhassan, R., & Ahmed, M. (2015). Arabic text classification review. International Journal of Computer Science and Software Engineering (IJCSSE), 4(1), 1-5.
Elnagar, A., Al-Debsi, R., & Einea, O. (2020). Arabic text classification using deep learning models. Information Processing & Management, 57(1), Article 102121. https://doi.org/10.1016/j.ipm.2019.102121
Feng, S., & Kirkley, A. (2021). Integrating online and offline data for crisis management: Online geolocalized emotion, policy response, and local mobility during the COVID crisis. Scientific Reports, 11, Article 8574. https://doi.org/10.1038/s41598-021-88010-3
Galal, M., Madbouly, M. M., & El-Zoghby, A. D. E. L. (2019). Classifying Arabic text using deep learning. Journal of Theoretical and Applied Information Technology, 97(23), 3412-3422.
Graves, A., & Schmidhuber, J. (2005). Framewise phoneme classification with bidirectional LSTM networks. In Proceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005 (Vol. 4, pp. 2047-2052). IEEE Publishing. https://doi.org/10.1109/IJCNN.2005.1556215
Guzman, E., Alkadhi, R., & Seyff, N. (2017). An exploratory study of Twitter messages about software applications. Requirements Engineering, 22, 387-412. https://doi.org/10.1007/s00766-017-0274-x
Hmeidi, I., Al-Ayyoub, M., Abdulla, N. A., Almodawar, A. A., Abooraig, R., & Mahyoub, N. A. (2015). Automatic Arabic text categorization: A comprehensive comparative study. Journal of Information Science, 41(1), 114-124. https://doi.org/10.1177/0165551514558172
Howard, J., & Ruder, S. (2018). Universal language model fine-tuning for text classification. ArXiv, Article 1801.06146. https://doi.org/10.48550/arXiv.1801.06146
Hunt, E. (2016). What is fake news? How to spot it and what you can do to stop it. The Guardian. https://www.theguardian.com/media/2016/dec/18/what-is-fake-news-pizzagate
Ibrahim, M. F., Alhakeem, M. A., & Fadhil, N. A. (2021). Evaluation of Naïve Bayes classification in Arabic short text classification. Al-Mustansiriyah Journal of Science, 32(4), 42-50.
Ikonomakis, M., Kotsiantis, S., & Tampakas, V. (2005). Text classification using machine learning techniques. WSEAS Transactions On Computers, 4(8), 966-974.
IMF. (2016). Economic diversification in oil-exporting Arab countries. International Monetary Fund. https://www.imf.org/en/Publications/Policy-Papers/Issues/2016/12/31/Economic-Diversification-in-Oil-Exporting-Arab-Countries-PP5038
Jefferson, H. (2018). Get old tweets programmatically. Github. https://github.com/Jefferson-Henrique/GetOldTweets-python
Jordan, S. E., Hovet, S. E., Fung, I. C. H., Liang, H., Fu, K. W., & Tse, Z. T. H. (2018). Using Twitter for public health surveillance from monitoring and prediction to public response. Data, 4(1), Article 6. https://doi.org/10.3390/data4010006
Karali, S. M. Thanoon, C. T. Lin. (2021). Arabic news tweets. Mendeley Data, V3. http://dx.doi.org/10.17632/9dxgbgx86k.3
Khoja, Y., Alhadlaq, O., & Alsaif, S. (2017). Auto Generation of Arabic News Headlines. Stanford University.
Kowsari, K., Meimandi, K. J., Heidarysafa, M., Mendu, S., Barnes, L., & Brown, D. (2019). Text classification algorithms: A survey. Information, 10(4), Article 150. https://doi.org/10.3390/info10040150
Lehman-Wilzig, S. N., & Seletzky, M. (2010). Hard news, soft news, ‘general’news: The necessity and utility of an intermediate classification. Journalism, 11(1), 37-56. https://doi.org/10.1177/1464884909350642
Matrane, Y., Benabbou, F., & Sael, N. (2021). Sentiment analysis through word embedding using AraBERT: Moroccan dialect use case. In 2021 International Conference on Digital Age & Technological Advances for Sustainable Development (ICDATA) (pp. 80-87). IEEE.https://doi.org/10.1109/ICDATA52997.2021.00024
Moh’d Mesleh, A. (2011). Feature subset selection metrics for Arabic text classification. Pattern Recognition Letters, 32(14), 1922-1929. https://doi.org/10.1016/j.patrec.2011.07.010
Mohammed, P., Eid, Y., Badawy, M., & Hassan, A. (2020). Evaluation of different sarcasm detection models for Arabic news headlines. In A. Hassanien, K. Shaalan, & M. Tolba (Eds.), Proceedings of the International Conference on Advanced Intelligent Systems and Informatics 2019. Advances in Intelligent Systems and Computing, (Vol 1058). Springer. https://doi.org/10.1007/978-3-030-31129-2_38
Panagiotou, N., Katakis, I., & Gunopulos, D. (2016). Detecting events in online social networks: Definitions, trends and challenges. In S. Michaelis, N. Piatkowski, & M. Stolpe, M. (Eds.), Solving Large Scale Learning Tasks. Challenges and Algorithms. Lecture Notes in Computer Science, (Vol 9580). Springer. https://doi.org/10.1007/978-3-319-41706-6_2
Peters, M. E., Neumann, M., Iyyer, M., Gardner, M., Clark, C., Lee, K., & Zettlemoyer, L. (2018). Deep contextualized word representations. ArXiv, Article 1802.05365. https://doi.org/10.48550/arXiv.1802.05365
Raftery, T. (2017). Twitter Arab Word - Statistics Feb 2017. https://weedoo.tech/ twitter-arab-world-statistics-feb-2017
Rey, M. V. (2019). What are the eleven parts and their meaning. Philippine News. https://philnews.ph/2019/07/16/parts-of-newspaper/#google_vignette
Saeed, M. (2021). Farasapy: A Python wrapper for the well Farasa toolkit. Github. https://github.com/MagedSaeed/farasapy
Salloum, S. A., Al-Emran, M., & Shaalan, K. (2017a). Mining text in news channels: A case study from Facebook. International Journal of Information Technology and Language Studies, 1(1), 1-9.
Salloum, S. A., Al-Emran, M., & Shaalan, K. (2017b). Mining social media text: Extracting knowledge from Facebook. International Journal of Computing and Digital Systems, 6(02), 73-81. http://dx.doi.org/10.12785/IJCDS/060203
Salloum, S. A., Al-Emran, M., Monem, A. A., & Shaalan, K. (2018). Using text mining techniques for extracting information from research articles. In K. Shaalan, A. Hassanien, A., & F. Tolba (Eds.), Intelligent Natural Language Processing: Trends and Applications. Studies in Computational Intelligence, (Vol 740). Springer. https://doi.org/10.1007/978-3-319-67056-0_18
Sayed, M., Salem, R., & Khedr, A. E. (2017). Accuracy evaluation of Arabic text classification. In 2017 12th International Conference on Computer Engineering and Systems (ICCES) (pp. 365-370). IEEE Publishing. https://doi.org/10.1109/ICCES.2017.8275333
Schmidhuber, J., & Hochreiter, S. (1997). Long short-term memory. Neural Comput, 9(8), 1735-1780.
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., & Polosukhin, I. (2017). Attention is all you need. In I. Guyon, U. Von Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, & R. Garnett (Eds.), Advances in Neural Information Processing Systems 30 (NIPS 2017) (pp. 1-11). NeurIPS Proceedings.
Yang, Y., & Pedersen, J. O. (1997). A comparative study on feature selection in text categorization. ICML ‘97: Proceedings of the European Fourteenth International Conference on Machine Learning (pp. 412 - 420), Morgan Kaufmann Publishers Inc. https://surdeanu.cs.arizona.edu/mihai/teaching/ista555-fall13/readings/yang97comparative.pdf.
ISSN 0128-7680
e-ISSN 2231-8526