PERTANIKA JOURNAL OF SCIENCE AND TECHNOLOGY

 

e-ISSN 2231-8526
ISSN 0128-7680

Home / Regular Issue / JST Vol. 30 (4) Oct. 2022 / JST-3160-2021

 

A Topic Modeling and Sentiment Analysis Model for Detection and Visualization of Themes in Literary Texts

Kah Em Chu, Pantea Keikhosrokiani and Moussa Pourya Asl

Pertanika Journal of Science & Technology, Volume 30, Issue 4, October 2022

DOI: https://doi.org/10.47836/pjst.30.4.14

Keywords: Iranian diaspora, life writing, sentiment analysis, text mining, topic modeling

Published on: 28 September 2022

Despite the growing emergence of new computer analytic software programs, the adoption and application of computer-based data mining and processing methods remain sparse in literary studies and analyses. This study proposes a text analytics lifecycle to detect and visualize the prevailing themes in a corpus of literary texts. Two objectives are to be pursued: First, the study seeks to apply a Topic Modeling approach with selected algorithms of LDA, LSI, NMF, and HDP that can effectively detect the recurring topics about the major themes developed in the dataset. Second, the project aims to apply a Sentiment Analysis model that can analyze the polarity of writers’ discourse on the detected thematic topics with the algorithms of Vader and TextBlob. The implementation of Topic Modeling has detected six thematic topics of sex, family, revolution, imprisonment, intellectual, and death. The adoption of the Sentiment Analysis model also revealed that the feelings attached to all the identified themes are largely negative sentiments expressed towards socio-political issues.

  • Abdelrahman, O., & Keikhosrokiani, P. (2020). Assembly line anomaly detection and root cause analysis using machine learning. IEEE Access, 8, 189661-189672. https://doi.org/10.1109/ACCESS.2020.3029826

  • Afary, J. (2009). Sexual politics in modern Iran. Cambridge University Press.

  • Al Mamun, M. H., Keikhosrokiani, P., Asl, M. P., Anuar, N. A. N., Hadi, N. H. A., & Humida, T. (2022). Sentiment analysis of the Harry Potter Series using a lexicon-based approach. In P. Keikhosrokiani & M. Pourya Asl (Eds.), Handbook of Research on Opinion Mining and Text Analytics on Literary Works and Social Media (pp. 263-291). IGI Global. https://doi.org/10.4018/978-1-7998-9594-7.ch011

  • Alaei, A. R., Becken, S., & Stantic, B. (2019). Sentiment analysis in tourism: Capitalizing on big data. Journal of Travel Research, 58(2), 175-191. https://doi.org/10.1177/0047287517747753

  • Appel, O., Chiclana, F., Carter, J., & Fujita, H. (2016). A hybrid approach to the sentiment analysis problem at the sentence level. Knowledge-Based Systems, 108, 110-124. https://doi.org/https://doi.org/10.1016/j.knosys.2016.05.040

  • Asl, M. P. (2018). Practices of counter-conduct as a mode of resistance in Middle East women’s life writings. 3L: Language, Linguistics, Literature®, 24(2), 195-205. https://doi.10.17576/3L-2018-2402-15

  • Asl, M. P. (2019). Foucauldian rituals of justice and conduct in Zainab Salbi’s between two worlds. Journal of Contemporary Iraq & the Arab World, 13(2-3), 227-242. https://doi.10.1386/jciaw_00010_1

  • Asl, M. P. (2020). Micro-Physics of discipline: Spaces of the self in Middle Eastern women life writings. International Journal of Arabic-English Studies, 20(2), 223-240. https://doi.10.33806/ijaes2000.20.2.12

  • Asl, M. P. (2021). Gender, space and counter-conduct: Iranian women’s heterotopic imaginations in Ramita Navai’s City of Lies. Gender, Place & Culture, 1-21. https://doi:10.1080/0966369X.2021.1975100

  • Bird, S., Klein, E., & Loper, E. (2009). Natural language processing with Python. O’Reilly Media, Inc.

  • Chiaramonte, P. (2013, January 12). Hell on earth: Inside Iran’s brutal Evin prison. Fox News. https://www.foxnews.com/world/hell-on-earth-inside-irans-brutal-evin-prison

  • Costa, L. F. (2018). A method for content analysis applied to newspaper coverage of Japanese personalities in Brazil and Portugal. Digital Scholarship in the Humanities, 33(2), 231-247. https://doi.org/10.1093/llc/fqx050

  • Devika, M. D., Sunitha, C., & Ganesh, A. (2016). Sentiment analysis: A comparative study on different approaches. Procedia Computer Science, 87, 44-49. https://doi.org/https://doi.org/10.1016/j.procs.2016.05.124

  • Ding, X., Liu, B., & Yu, P. S. (2008). A holistic lexicon-based approach to opinion mining. In Proceedings of the 2008 International Conference on Web Search and Data Mining (pp. 231-240). ACM Publishing. https://doi.org/10.1145/1341531.1341561

  • Firmin, R. L., Bonfils, K. A., Luther, L., Minor, K. S., & Salyers, M. P. (2017). Using text-analysis computer software and thematic analysis on the same qualitative data: A case example. Qualitative Psychology, 4(3), 201-210. https://doi.org/10.1037/qup0000050

  • Gabrilovich, E., & Markovitch, S. (2006). Overcoming the brittleness bottleneck using Wikipedia: Enhancing text categorization with encyclopedic knowledge. In AAAI (Vol. 6, pp. 1301-1306). American Association for Artificial Intelligence.

  • Graneheim, U. H., Lindgren, B. M., & Lundman, B. (2017). Methodological challenges in qualitative content analysis: A discussion paper. Nurse Education Today, 56, 29-34. https://doi.org/10.1016/j.nedt.2017.06.002

  • Grayson, S., Mulvany, M., Wade, K., Meaney, G., & Greene, D. (2016, September 20-21). Novel2vec: Characterising 19th century fiction via word embeddings. In 24th Irish Conference on Artificial Intelligence and Cognitive Science (AICS’16). Dublin, Ireland.

  • Grayson, S., Mulvany, M., Wade, K., Meaney, G., & Greene, D. (2017). Exploring the role of gender in 19th century fiction through the lens of word embeddings. In J. Gracia, F. Bond, J. McCrae, P. Buitelaar, C. Chiarcos & S. Hellmann (Eds.), Language, Data and Knowledge, (pp. 358-364). Springer. https://doi.org/10.1007/978-3-319-59888-8_30

  • Grayson, S., Wade, K., Meaney, G., & Greene, D. (2016). The sense and sensibility of different sliding windows in constructing co-occurrence networks from literature. In B. Bozic, G. Mendel-Gleason, C. Debruyne & D. O’Sullivan (Eds.), Computational History and Data-Driven Humanities, (pp. 65-77). Springer.

  • Hadi, N. H. A., & Asl, M. P. (2022). The real, the imaginary, and the symbolic: A Lacanian reading of Ramita Navai’s City of Lies. GEMA Online Journal of Language Studies, 22(1), 145-158. https://doi.org/10.17576/gema-2022-2201-08

  • Hornick, M. (2017, November 17). Explicit semantic analysis (ESA) for text analytics. Oracle Machine Learning. https://blogs.oracle.com/r/explicit-semantic-analysis-esa-for-text-analytics

  • Jafery, N. N., Keikhosrokiani, P., & Asl, M. P. (2022). Text analytics model to identify the connection between theme and sentiment in literary works: A case study of Iraqi life writings. In P. Keikhosrokiani & M. P. Asl (Eds.), Handbook of research on opinion mining and text analytics on literary works and social media (pp. 173-190). IGI Global. https://doi.org/10.4018/978-1-7998-9594-7.ch008

  • Keikhosrokiani, P., & Asl, M. P. (Eds.). (2022). Handbook of Research on Opinion Mining and Text Analytics on Literary Works and Social Media. IGI Global. https://doi.org/10.4018/978-1-7998-9594-7.

  • Leavy, S., Meaney, G., Wade, K., & & Greene, D. (2019). Curatr: A platform for semantic analysis and curation of historical literary texts. In E. Garoufallou, F. Fallucchi & E. W. De Luca (Eds.), Metadata and Semantics Research, (pp. 354-366). Springer. https://doi.org/10.1007/978-3-030-36599-8_31

  • Leavy, S., Meaney, G., Wade, K., & & Greene, D. (2020). Mitigating gender bias in machine learning data sets. In L. Boratto, S. Faralli, M. Marras & G. Stilo (Eds.), Bias and Social Aspects in Search and Recommendation (pp. 12-26). Springer. https://doi.org/10.1007/978-3-030-52485-2_2

  • Lodin, H., & Balani, P. (2017). Rich semantic sentiment analysis using lexicon based approach. ICTACT Journal on Soft Computing, 7(04), 1486-1491. https://doi.org/10.21917/ijsc.2017.0206

  • Malik, E. F., Keikhosrokiani, P., & Asl, M. P. (2021, July 4-5). Text mining life cycle for a spatial reading of Viet Thanh Nguyen’s The Refugees (2017). In 2021 International Congress of Advanced Technology and Engineering (ICOTEN). Taiz, Yaman. https://doi.org/10.1109/ICOTEN52080.2021.9493520

  • Mayo, M. (2017). A general approach to preprocessing text data. KDnuggets. https://www.kdnuggets.com/2017/12/general-approach-preprocessing-text-data.html

  • Mazzola, L., Siegfried, P., Waldis, A., Kaufmann, M., & Denzler, A. (2018, September 25-27). A domain specific ESA inspired approach for document semantic description. In 2018 International Conference on Intelligent Systems (IS). Funchal, Portugal. https://doi.org/10.1109/IS.2018.8710507

  • Misuraca, M., Scepi, G., & Spano, M. (2021). Using opinion mining as an educational analytic: An integrated strategy for the analysis of students’ feedback. Studies in Educational Evaluation, 68, Article No. 100979. https://doi.org/10.1016/j.stueduc.2021.100979

  • Mumtaz, D., & Ahuja, B. (2018). A lexical and machine learning-based hybrid system for sentiment analysis. In B. Panda, S. Sharma & U. Batra (Eds.), Innovations in Computational Intelligence: Best Selected Papers of the Third International Conference on REDSET 2016 (pp. 165-175). Springer. https://doi.org/10.1007/978-981-10-4555-4_11

  • Naghibi, N. (2016). Women Write Iran: Nostalgia and Human Rights from the Diaspora. University of Minnesota Press.

  • Nekai, P. (2013). From education to segregation:Iran’s sexist policy. PROSPECT https://prospect-journal.org/2013/01/18/womens-access-to-higher-education-in-iran/

  • Paroubek, P., & Pak, A. (2010, May). Twitter as a corpus for sentiment analysis and opinion mining. In Seventh International Conference on Language Resources and Evaluation (LREC’10). Valletta, Malta. http://www.lrec-conf.org/proceedings/lrec2010/pdf/385_Paper.pdf

  • Ranucci, D. (2019). We never forgot Princeton’s Xiyue Wang during his imprisonment in Iran, student says. https://www.nj.com/opinion/2019/12/we-must-keep-fighting-for-princeton-grad-student-xiyue-wangs-release-from-an-iranian-prison-opinion.html

  • Řehůřek, R. (2019). Models.hdpmodel–Hierarchical Dirichlet Process. Gensim. https://radimrehurek.com/gensim/models/hdpmodel.html

  • Rehurek, R., & Sojka, P. (2010). Software framework for topic modelling with large corpora. In In Proceedings of the LREC 2010 workshop on new challenges for NLP frameworks (pp. 45-50). CiteSeer. https://doi.org/10.13140/2.1.2393.1847

  • Scharl, A., Hubmann-Haidvogel, A., Jones, A., Fischl, D., Kamolov, R., Weichselbraun, A., & Rafelsberger, W. (2016). Analyzing the public discourse on works of fiction–Detection and visualization of emotion in online coverage about HBO’s Game of Thrones. Information Processing and Management, 52(1), 129-138. https://doi.org/10.1016/j.ipm.2015.02.003

  • Shahrokni, N. (2020). Women in Place: The Politics of Gender Segregation in Iran. University of California Press.

  • Shi, T., Kang, K., Choo, J., & Reddy, C. K. (2018). Short-text topic modeling via non-negative matrix factorization enriched with localword-context correlations. In Proceedings of the 2018 World Wide Web Conference (pp. 1105-1114). ACM Publishing. https://doi.org/10.1145/3178876.3186009

  • Sofian, N. B., Keikhosrokiani, P., & Asl, M. P. (2022). Opinion mining and text analytics of reader reviews of Yoko Ogawa’s The Housekeeper and the Professor in Goodreads. In P. Keikhosrokiani & M. P. Asl (Eds.), Handbook of Research on Opinion Mining and Text Analytics on Literary Works and Social Media (pp. 240-262). IGI Global. https://doi.org/10.4018/978-1-7998-9594-7.ch010

  • Subramanian, D. (2019). Text Mining in Python: Steps and Examples. Medium. https://medium.com/towards-artificial-intelligence/text-mining-in-python-steps-and-examples-78b3f8fd913b

  • Suhendra, N. H. B., Keikhosrokiani, P., Asl, M. P., & Zhao, X. (2022). Opinion mining and text analytics of literary reader responses: A case study of reader responses to KL Noir volumes in Goodreads using sentiment analysis and topic. In P. Keikhosrokiani & M. P. Asl (Eds.), Handbook of Research on Opinion Mining and Text Analytics on Literary Works and SocialMedia (pp. 191-239). IGI Global. https://doi.org/10.4018/978-1-7998-9594-7.ch009

  • Sukhija, N., Tatineni, M., Brown, N., Van Moer, M., Rodriguez, P., & Callicott, S. (2016). Topic modeling and visualization for big data in social sciences. In 2016 International IEEE Conferences on Ubiquitous Intelligence & Computing, Advanced and Trusted Computing, Scalable Computing and Communications, Cloud and Big Data Computing, Internet of People, and Smart World Congress (UIC/ATC/ScalCom/CBDCom/IoP/SmartWorld) (pp. 1198-1205). IEEE Publishing. https://doi.org/10.1109/UIC-ATC-ScalCom-CBDCom-IoP-SmartWorld.2016.0183

  • Teoh, Y. Z. I., & Keikhosrokiani, P. (2020). Knowledge workers mental workload prediction using optimised ELANFIS. Applied Intelligence, 51, 2406-2430. https://doi.org/10.1007/s10489-020-01928-5

  • Vasapollo, S. (2020). Causes of the Iranian 1979 Revolution: Historical and Political Aspects. ASERI.

  • Vinodhini, G., & Chandrasekaran, R. (2012). Sentiment analysis and opinion mining: A survey. International Journal of Advanced Research in Computer Science and Software Engineering, 2(6), 282-292.

  • Ying, S. Y., Keikhosrokiani, P., & Asl, M. P. (2021). Comparison of data analytic techniques for a spatial opinion mining in literary works: A review paper. In F. Saeed, F. Mohammed & A. Al-Nahari (Eds.), Innovative Systems for Intelligent Health Informatics. IRICT 2020. Lecture Notes on Data Engineering and Communications Technologies (pp. 523-535). Springer. https://doi.org/10.1007/978-3-030-70713-2_49

  • Ying, S. Y., Keikhosrokiani, P., & Asl, M. P. (2022). Opinion mining on Viet Thanh Nguyen’s the sympathizer using topic modelling and sentiment analysis. Journal of Information Technology Management, 14(Special Issue), 163-183. https://doi.org/10.22059/jitm.2022.84895

ISSN 0128-7680

e-ISSN 2231-8526

Article ID

JST-3160-2021

Download Full Article PDF

Share this article

Recent Articles