Methodology for Text Classification using Manually Created Corpora-based Sentiment Dictionary - Publication - Bridge of Knowledge

Search

Methodology for Text Classification using Manually Created Corpora-based Sentiment Dictionary

Abstract

This paper presents the methodology of Textual Content Classification, which is based on a combination of algorithms: preliminary formation of a contextual framework for the texts in particular problem area; manual creation of the Hierarchical Sentiment Dictionary (HSD) on the basis of a topically-oriented Corpus; tonality texts recognition via using HSD for analysing the documents as a collection of topically completed fragments (paragraphs). For verification of the proposed methodology, a case study of Polish-language film reviews Corpora was used. The main scientific contributions of this research are: writing style of the analyzed text determines the possibility of adaptation of the Texts Classification algorithms; Hierarchically-oriented Structure of the HSD allows customizing the classification process to qualitative recognition of text tonality in the context of individual paragraphs topics; texts of Persuasive style most often are initially empowered by authors with a certain tonality. The tone, expressed in the author's opinion, effects the qualitative indicators of sentiment recognition. Negative emotions of the author usually reduce the level of vocabulary variability as well as the variety of topics raised in the document but simultaneously increase the level of unpredictability of words contextually used with both positive and negative emotional coloring

Citations

  • 3

    CrossRef

  • 0

    Web of Science

  • 4

    Scopus

Cite as

Full text

download paper
downloaded 76 times
Publication version
Accepted or Published Version
License
Copyright (2018 by SCITEPRESS – Science and Technology Publications, Lda)

Keywords

Details

Category:
Conference activity
Type:
publikacja w wydawnictwie zbiorowym recenzowanym (także w materiałach konferencyjnych)
Title of issue:
Proceedings of the 10th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management strony 1 - 9
Language:
English
Publication year:
2018
Bibliographic description:
Rizun N., Waloszek W.: Methodology for Text Classification using Manually Created Corpora-based Sentiment Dictionary// Proceedings of the 10th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management/ 10th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management : , 2018, s.1-9
DOI:
Digital Object Identifier (open in new tab) 10.5220/0006932602120220
Bibliography: test
  1. Bijuraj, L. V., 2013. Clustering and its Applications. Proceedings of National Conference on New Horizons in IT (NCNHIT). pp. 169-172.
  2. Boiy, E., 2007. Automatic Sentiment Analysis in On-line Text. Proceedings of the 11th International Conference on Electronic Publishing (ELPUB 2007). pp. 349-360. open in new tab
  3. Boucher J.D., Osgood Ch.E., 1969. The Pollyanna hypothesis. Journ. of Verbal Learning and Verbal Behaviour, no. 8, pp. 1-8. open in new tab
  4. Hu, M., 2004. Mining and Summarizing Customer Reviews. Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '04), pp. 168-177. open in new tab
  5. Ivanov V., Tutubalina E., Mingazov N., Alimova I., 2015. Extracting Aspects, Sentiment and Categories of Aspects in User Reviews about Restaurants and Cars, Computational Linguistics and Intellectual Technologies: Proceedings of the International Conference "Dialogue 2015", Moscow, pp. 22-33.
  6. Klekovkina MV, Kotelnikov EV., 2012. The method of automatic classification of texts by tonality, based on the dictionary of emotional vocabulary. Electronic libraries: promising methods and technologies, electronic collections (RCDL-2012): tr. XIV Vseros. sci. Conf. Pereslavl-Zalessky: pp. 118-123.
  7. König A.C., Brill E., 2006. Reducing the human overhead in text categorization. Proc. 12th ACM SIGKDD conf. on knowledge discovery and data mining, pp. 598-603. open in new tab
  8. Liu, B., 2012. Sentiment Analysis and Opinion Mining. Synthesis Lectures on Human Language Technologies. Vol. 5(1). open in new tab
  9. Manning, Ch., 2009. Introduction to Information Retrieval. Cambridge University Press, p. 544, p. 222.
  10. Pang, B., 2008. Opinion Mining and Sentiment Analysis. Foundations and Trends in Information Retrieval. Vol. 2. pp. 18-22. open in new tab
  11. Popovic, M., 2006. Statistical Machine Translation with a Small Amount of Training Data. In Proceedings of the 5th LREC SALTMIL Workshop on Minority Languages. pp. 25-29.
  12. Rizun N., Ossowska K., Taranenko Y., 2018. Modeling the Customer's Contextual Expectations Based on Latent open in new tab
  13. Semantic Analysis Algorithms. Information Systems Architecture and Technology: 38th International Conference on Information Systems Architecture and Technology -ISAT 2017, pp.364-373. open in new tab
  14. Rizun N., Taranenko Y., Waloszek W., 2017a. The Algorithm of Modelling and Analysis of Latent Semantic Relations: Linear Algebra vs. Probabilistic Topic Models. Knowledge Engineering and Semantic Web. 8th International Conference, KESW 2017, pp.53-68. open in new tab
  15. Rizun N., Taranenko Y. Methodology of Constructing and Analyzing the Hierarchical Contextually-Oriented Corpora. Proceeding of Federated Conference on Computer Science and Information Systems - FedCSIS 2018. open in new tab
  16. Rizun N., Taranenko Y., Waloszek W., 2017b. The Algorithm of Building the Hierarchical Contextual Framework of Textual Corpora. Eighth IEEE International Conference on Intelligent Computing and Information System, ICICIS 2017, Cairo, Egypt, pp.366-372.. open in new tab
  17. Rizun, N., Taranenko, Y., 2017. Development of the Algorithm of Polish Language Film Reviews Preprocessing. Research Yearbook Faculty of Management in Ciechanów WSM, 1-4 (IX), pp. 168- 188. open in new tab
  18. Salton, G., 1988. Term-weighting Approaches in Automatic Text Retrieval. Information Processing and Management. № 5, Vol. 24. pp. 513-523. open in new tab
  19. Salton, G. 1989. Automatic Text Processing: the Transformation, Analysis, and Retrieval of Information by Compute. Addison-Wesley Longman Publishing, 543 p. open in new tab
  20. Taboada M., Brooke J., Tofiloski M., Voll K., Stede M., 2011. Lexicon-based methods for sentiment analysis, Computational Linguistics. no. 37 (2), pp. 267-307, Titov, I., 2008. Modeling Online Reviews with Multi-grain Topic Models. Proceedings of the 17th International Conference on World Wide Web (WWW'08), pp. 111- 120. open in new tab
  21. Ur-Rahman, N., 2012. Textual data mining for industrial knowledge management and text classification: A business oriented approach. Expert Systems with Applications. № 39. pp. 4729-4739. open in new tab
Verified by:
Gdańsk University of Technology

seen 115 times

Recommended for you

Meta Tags