e-ISSN 2231-8526
ISSN 0128-7680
Kah Em Chu, Pantea Keikhosrokiani and Moussa Pourya Asl
Pertanika Journal of Science & Technology, Volume 30, Issue 4, October 2022
DOI: https://doi.org/10.47836/pjst.30.4.14
Keywords: Iranian diaspora, life writing, sentiment analysis, text mining, topic modeling
Published on: 28 September 2022
Despite the growing emergence of new computer analytic software programs, the adoption and application of computer-based data mining and processing methods remain sparse in literary studies and analyses. This study proposes a text analytics lifecycle to detect and visualize the prevailing themes in a corpus of literary texts. Two objectives are to be pursued: First, the study seeks to apply a Topic Modeling approach with selected algorithms of LDA, LSI, NMF, and HDP that can effectively detect the recurring topics about the major themes developed in the dataset. Second, the project aims to apply a Sentiment Analysis model that can analyze the polarity of writers’ discourse on the detected thematic topics with the algorithms of Vader and TextBlob. The implementation of Topic Modeling has detected six thematic topics of sex, family, revolution, imprisonment, intellectual, and death. The adoption of the Sentiment Analysis model also revealed that the feelings attached to all the identified themes are largely negative sentiments expressed towards socio-political issues.
Abdelrahman, O., & Keikhosrokiani, P. (2020). Assembly line anomaly detection and root cause analysis using machine learning. IEEE Access, 8, 189661-189672. https://doi.org/10.1109/ACCESS.2020.3029826
Afary, J. (2009). Sexual politics in modern Iran. Cambridge University Press.
Al Mamun, M. H., Keikhosrokiani, P., Asl, M. P., Anuar, N. A. N., Hadi, N. H. A., & Humida, T. (2022). Sentiment analysis of the Harry Potter Series using a lexicon-based approach. In P. Keikhosrokiani & M. Pourya Asl (Eds.), Handbook of Research on Opinion Mining and Text Analytics on Literary Works and Social Media (pp. 263-291). IGI Global. https://doi.org/10.4018/978-1-7998-9594-7.ch011
Alaei, A. R., Becken, S., & Stantic, B. (2019). Sentiment analysis in tourism: Capitalizing on big data. Journal of Travel Research, 58(2), 175-191. https://doi.org/10.1177/0047287517747753
Appel, O., Chiclana, F., Carter, J., & Fujita, H. (2016). A hybrid approach to the sentiment analysis problem at the sentence level. Knowledge-Based Systems, 108, 110-124. https://doi.org/https://doi.org/10.1016/j.knosys.2016.05.040
Asl, M. P. (2018). Practices of counter-conduct as a mode of resistance in Middle East women’s life writings. 3L: Language, Linguistics, Literature®, 24(2), 195-205. https://doi.10.17576/3L-2018-2402-15
Asl, M. P. (2019). Foucauldian rituals of justice and conduct in Zainab Salbi’s between two worlds. Journal of Contemporary Iraq & the Arab World, 13(2-3), 227-242. https://doi.10.1386/jciaw_00010_1
Asl, M. P. (2020). Micro-Physics of discipline: Spaces of the self in Middle Eastern women life writings. International Journal of Arabic-English Studies, 20(2), 223-240. https://doi.10.33806/ijaes2000.20.2.12
Asl, M. P. (2021). Gender, space and counter-conduct: Iranian women’s heterotopic imaginations in Ramita Navai’s City of Lies. Gender, Place & Culture, 1-21. https://doi:10.1080/0966369X.2021.1975100
Bird, S., Klein, E., & Loper, E. (2009). Natural language processing with Python. O’Reilly Media, Inc.
Chiaramonte, P. (2013, January 12). Hell on earth: Inside Iran’s brutal Evin prison. Fox News. https://www.foxnews.com/world/hell-on-earth-inside-irans-brutal-evin-prison
Costa, L. F. (2018). A method for content analysis applied to newspaper coverage of Japanese personalities in Brazil and Portugal. Digital Scholarship in the Humanities, 33(2), 231-247. https://doi.org/10.1093/llc/fqx050
Devika, M. D., Sunitha, C., & Ganesh, A. (2016). Sentiment analysis: A comparative study on different approaches. Procedia Computer Science, 87, 44-49. https://doi.org/https://doi.org/10.1016/j.procs.2016.05.124
Ding, X., Liu, B., & Yu, P. S. (2008). A holistic lexicon-based approach to opinion mining. In Proceedings of the 2008 International Conference on Web Search and Data Mining (pp. 231-240). ACM Publishing. https://doi.org/10.1145/1341531.1341561
Firmin, R. L., Bonfils, K. A., Luther, L., Minor, K. S., & Salyers, M. P. (2017). Using text-analysis computer software and thematic analysis on the same qualitative data: A case example. Qualitative Psychology, 4(3), 201-210. https://doi.org/10.1037/qup0000050
Gabrilovich, E., & Markovitch, S. (2006). Overcoming the brittleness bottleneck using Wikipedia: Enhancing text categorization with encyclopedic knowledge. In AAAI (Vol. 6, pp. 1301-1306). American Association for Artificial Intelligence.
Graneheim, U. H., Lindgren, B. M., & Lundman, B. (2017). Methodological challenges in qualitative content analysis: A discussion paper. Nurse Education Today, 56, 29-34. https://doi.org/10.1016/j.nedt.2017.06.002
Grayson, S., Mulvany, M., Wade, K., Meaney, G., & Greene, D. (2016, September 20-21). Novel2vec: Characterising 19th century fiction via word embeddings. In 24th Irish Conference on Artificial Intelligence and Cognitive Science (AICS’16). Dublin, Ireland.
Grayson, S., Mulvany, M., Wade, K., Meaney, G., & Greene, D. (2017). Exploring the role of gender in 19th century fiction through the lens of word embeddings. In J. Gracia, F. Bond, J. McCrae, P. Buitelaar, C. Chiarcos & S. Hellmann (Eds.), Language, Data and Knowledge, (pp. 358-364). Springer. https://doi.org/10.1007/978-3-319-59888-8_30
Grayson, S., Wade, K., Meaney, G., & Greene, D. (2016). The sense and sensibility of different sliding windows in constructing co-occurrence networks from literature. In B. Bozic, G. Mendel-Gleason, C. Debruyne & D. O’Sullivan (Eds.), Computational History and Data-Driven Humanities, (pp. 65-77). Springer.
Hadi, N. H. A., & Asl, M. P. (2022). The real, the imaginary, and the symbolic: A Lacanian reading of Ramita Navai’s City of Lies. GEMA Online Journal of Language Studies, 22(1), 145-158. https://doi.org/10.17576/gema-2022-2201-08
Hornick, M. (2017, November 17). Explicit semantic analysis (ESA) for text analytics. Oracle Machine Learning. https://blogs.oracle.com/r/explicit-semantic-analysis-esa-for-text-analytics
Jafery, N. N., Keikhosrokiani, P., & Asl, M. P. (2022). Text analytics model to identify the connection between theme and sentiment in literary works: A case study of Iraqi life writings. In P. Keikhosrokiani & M. P. Asl (Eds.), Handbook of research on opinion mining and text analytics on literary works and social media (pp. 173-190). IGI Global. https://doi.org/10.4018/978-1-7998-9594-7.ch008
Keikhosrokiani, P., & Asl, M. P. (Eds.). (2022). Handbook of Research on Opinion Mining and Text Analytics on Literary Works and Social Media. IGI Global. https://doi.org/10.4018/978-1-7998-9594-7.
Leavy, S., Meaney, G., Wade, K., & & Greene, D. (2019). Curatr: A platform for semantic analysis and curation of historical literary texts. In E. Garoufallou, F. Fallucchi & E. W. De Luca (Eds.), Metadata and Semantics Research, (pp. 354-366). Springer. https://doi.org/10.1007/978-3-030-36599-8_31
Leavy, S., Meaney, G., Wade, K., & & Greene, D. (2020). Mitigating gender bias in machine learning data sets. In L. Boratto, S. Faralli, M. Marras & G. Stilo (Eds.), Bias and Social Aspects in Search and Recommendation (pp. 12-26). Springer. https://doi.org/10.1007/978-3-030-52485-2_2
Lodin, H., & Balani, P. (2017). Rich semantic sentiment analysis using lexicon based approach. ICTACT Journal on Soft Computing, 7(04), 1486-1491. https://doi.org/10.21917/ijsc.2017.0206
Malik, E. F., Keikhosrokiani, P., & Asl, M. P. (2021, July 4-5). Text mining life cycle for a spatial reading of Viet Thanh Nguyen’s The Refugees (2017). In 2021 International Congress of Advanced Technology and Engineering (ICOTEN). Taiz, Yaman. https://doi.org/10.1109/ICOTEN52080.2021.9493520
Mayo, M. (2017). A general approach to preprocessing text data. KDnuggets. https://www.kdnuggets.com/2017/12/general-approach-preprocessing-text-data.html
Mazzola, L., Siegfried, P., Waldis, A., Kaufmann, M., & Denzler, A. (2018, September 25-27). A domain specific ESA inspired approach for document semantic description. In 2018 International Conference on Intelligent Systems (IS). Funchal, Portugal. https://doi.org/10.1109/IS.2018.8710507
Misuraca, M., Scepi, G., & Spano, M. (2021). Using opinion mining as an educational analytic: An integrated strategy for the analysis of students’ feedback. Studies in Educational Evaluation, 68, Article No. 100979. https://doi.org/10.1016/j.stueduc.2021.100979
Mumtaz, D., & Ahuja, B. (2018). A lexical and machine learning-based hybrid system for sentiment analysis. In B. Panda, S. Sharma & U. Batra (Eds.), Innovations in Computational Intelligence: Best Selected Papers of the Third International Conference on REDSET 2016 (pp. 165-175). Springer. https://doi.org/10.1007/978-981-10-4555-4_11
Naghibi, N. (2016). Women Write Iran: Nostalgia and Human Rights from the Diaspora. University of Minnesota Press.
Nekai, P. (2013). From education to segregation:Iran’s sexist policy. PROSPECT https://prospect-journal.org/2013/01/18/womens-access-to-higher-education-in-iran/
Paroubek, P., & Pak, A. (2010, May). Twitter as a corpus for sentiment analysis and opinion mining. In Seventh International Conference on Language Resources and Evaluation (LREC’10). Valletta, Malta. http://www.lrec-conf.org/proceedings/lrec2010/pdf/385_Paper.pdf
Ranucci, D. (2019). We never forgot Princeton’s Xiyue Wang during his imprisonment in Iran, student says. https://www.nj.com/opinion/2019/12/we-must-keep-fighting-for-princeton-grad-student-xiyue-wangs-release-from-an-iranian-prison-opinion.html
Řehůřek, R. (2019). Models.hdpmodel–Hierarchical Dirichlet Process. Gensim. https://radimrehurek.com/gensim/models/hdpmodel.html
Rehurek, R., & Sojka, P. (2010). Software framework for topic modelling with large corpora. In In Proceedings of the LREC 2010 workshop on new challenges for NLP frameworks (pp. 45-50). CiteSeer. https://doi.org/10.13140/2.1.2393.1847
Scharl, A., Hubmann-Haidvogel, A., Jones, A., Fischl, D., Kamolov, R., Weichselbraun, A., & Rafelsberger, W. (2016). Analyzing the public discourse on works of fiction–Detection and visualization of emotion in online coverage about HBO’s Game of Thrones. Information Processing and Management, 52(1), 129-138. https://doi.org/10.1016/j.ipm.2015.02.003
Shahrokni, N. (2020). Women in Place: The Politics of Gender Segregation in Iran. University of California Press.
Shi, T., Kang, K., Choo, J., & Reddy, C. K. (2018). Short-text topic modeling via non-negative matrix factorization enriched with localword-context correlations. In Proceedings of the 2018 World Wide Web Conference (pp. 1105-1114). ACM Publishing. https://doi.org/10.1145/3178876.3186009
Sofian, N. B., Keikhosrokiani, P., & Asl, M. P. (2022). Opinion mining and text analytics of reader reviews of Yoko Ogawa’s The Housekeeper and the Professor in Goodreads. In P. Keikhosrokiani & M. P. Asl (Eds.), Handbook of Research on Opinion Mining and Text Analytics on Literary Works and Social Media (pp. 240-262). IGI Global. https://doi.org/10.4018/978-1-7998-9594-7.ch010
Subramanian, D. (2019). Text Mining in Python: Steps and Examples. Medium. https://medium.com/towards-artificial-intelligence/text-mining-in-python-steps-and-examples-78b3f8fd913b
Suhendra, N. H. B., Keikhosrokiani, P., Asl, M. P., & Zhao, X. (2022). Opinion mining and text analytics of literary reader responses: A case study of reader responses to KL Noir volumes in Goodreads using sentiment analysis and topic. In P. Keikhosrokiani & M. P. Asl (Eds.), Handbook of Research on Opinion Mining and Text Analytics on Literary Works and SocialMedia (pp. 191-239). IGI Global. https://doi.org/10.4018/978-1-7998-9594-7.ch009
Sukhija, N., Tatineni, M., Brown, N., Van Moer, M., Rodriguez, P., & Callicott, S. (2016). Topic modeling and visualization for big data in social sciences. In 2016 International IEEE Conferences on Ubiquitous Intelligence & Computing, Advanced and Trusted Computing, Scalable Computing and Communications, Cloud and Big Data Computing, Internet of People, and Smart World Congress (UIC/ATC/ScalCom/CBDCom/IoP/SmartWorld) (pp. 1198-1205). IEEE Publishing. https://doi.org/10.1109/UIC-ATC-ScalCom-CBDCom-IoP-SmartWorld.2016.0183
Teoh, Y. Z. I., & Keikhosrokiani, P. (2020). Knowledge workers mental workload prediction using optimised ELANFIS. Applied Intelligence, 51, 2406-2430. https://doi.org/10.1007/s10489-020-01928-5
Vasapollo, S. (2020). Causes of the Iranian 1979 Revolution: Historical and Political Aspects. ASERI.
Vinodhini, G., & Chandrasekaran, R. (2012). Sentiment analysis and opinion mining: A survey. International Journal of Advanced Research in Computer Science and Software Engineering, 2(6), 282-292.
Ying, S. Y., Keikhosrokiani, P., & Asl, M. P. (2021). Comparison of data analytic techniques for a spatial opinion mining in literary works: A review paper. In F. Saeed, F. Mohammed & A. Al-Nahari (Eds.), Innovative Systems for Intelligent Health Informatics. IRICT 2020. Lecture Notes on Data Engineering and Communications Technologies (pp. 523-535). Springer. https://doi.org/10.1007/978-3-030-70713-2_49
Ying, S. Y., Keikhosrokiani, P., & Asl, M. P. (2022). Opinion mining on Viet Thanh Nguyen’s the sympathizer using topic modelling and sentiment analysis. Journal of Information Technology Management, 14(Special Issue), 163-183. https://doi.org/10.22059/jitm.2022.84895
ISSN 0128-7680
e-ISSN 2231-8526