Abstract
This paper presents the methodology of Textual Content Classification, which is based on a combination of algorithms: preliminary formation of a contextual framework for the texts in particular problem area; manual creation of the Hierarchical Sentiment Dictionary (HSD) on the basis of a topically-oriented Corpus; tonality texts recognition via using HSD for analysing the documents as a collection of topically completed fragments (paragraphs). For verification of the proposed methodology, a case study of Polish-language film reviews Corpora was used. The main scientific contributions of this research are: writing style of the analyzed text determines the possibility of adaptation of the Texts Classification algorithms; Hierarchically-oriented Structure of the HSD allows customizing the classification process to qualitative recognition of text tonality in the context of individual paragraphs topics; texts of Persuasive style most often are initially empowered by authors with a certain tonality. The tone, expressed in the author's opinion, effects the qualitative indicators of sentiment recognition. Negative emotions of the author usually reduce the level of vocabulary variability as well as the variety of topics raised in the document but simultaneously increase the level of unpredictability of words contextually used with both positive and negative emotional coloring
Citations
-
3
CrossRef
-
0
Web of Science
-
4
Scopus
Authors (2)
Cite as
Full text
- Publication version
- Accepted or Published Version
- License
- Copyright (2018 by SCITEPRESS – Science and Technology Publications, Lda)
Keywords
Details
- Category:
- Conference activity
- Type:
- publikacja w wydawnictwie zbiorowym recenzowanym (także w materiałach konferencyjnych)
- Title of issue:
- Proceedings of the 10th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management strony 1 - 9
- Language:
- English
- Publication year:
- 2018
- Bibliographic description:
- Rizun N., Waloszek W.: Methodology for Text Classification using Manually Created Corpora-based Sentiment Dictionary// Proceedings of the 10th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management/ 10th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management : , 2018, s.1-9
- DOI:
- Digital Object Identifier (open in new tab) 10.5220/0006932602120220
- Bibliography: test
-
- Bijuraj, L. V., 2013. Clustering and its Applications. Proceedings of National Conference on New Horizons in IT (NCNHIT). pp. 169-172.
- Boiy, E., 2007. Automatic Sentiment Analysis in On-line Text. Proceedings of the 11th International Conference on Electronic Publishing (ELPUB 2007). pp. 349-360. open in new tab
- Boucher J.D., Osgood Ch.E., 1969. The Pollyanna hypothesis. Journ. of Verbal Learning and Verbal Behaviour, no. 8, pp. 1-8. open in new tab
- Hu, M., 2004. Mining and Summarizing Customer Reviews. Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '04), pp. 168-177. open in new tab
- Ivanov V., Tutubalina E., Mingazov N., Alimova I., 2015. Extracting Aspects, Sentiment and Categories of Aspects in User Reviews about Restaurants and Cars, Computational Linguistics and Intellectual Technologies: Proceedings of the International Conference "Dialogue 2015", Moscow, pp. 22-33.
- Klekovkina MV, Kotelnikov EV., 2012. The method of automatic classification of texts by tonality, based on the dictionary of emotional vocabulary. Electronic libraries: promising methods and technologies, electronic collections (RCDL-2012): tr. XIV Vseros. sci. Conf. Pereslavl-Zalessky: pp. 118-123.
- König A.C., Brill E., 2006. Reducing the human overhead in text categorization. Proc. 12th ACM SIGKDD conf. on knowledge discovery and data mining, pp. 598-603. open in new tab
- Liu, B., 2012. Sentiment Analysis and Opinion Mining. Synthesis Lectures on Human Language Technologies. Vol. 5(1). open in new tab
- Manning, Ch., 2009. Introduction to Information Retrieval. Cambridge University Press, p. 544, p. 222.
- Pang, B., 2008. Opinion Mining and Sentiment Analysis. Foundations and Trends in Information Retrieval. Vol. 2. pp. 18-22. open in new tab
- Popovic, M., 2006. Statistical Machine Translation with a Small Amount of Training Data. In Proceedings of the 5th LREC SALTMIL Workshop on Minority Languages. pp. 25-29.
- Rizun N., Ossowska K., Taranenko Y., 2018. Modeling the Customer's Contextual Expectations Based on Latent open in new tab
- Semantic Analysis Algorithms. Information Systems Architecture and Technology: 38th International Conference on Information Systems Architecture and Technology -ISAT 2017, pp.364-373. open in new tab
- Rizun N., Taranenko Y., Waloszek W., 2017a. The Algorithm of Modelling and Analysis of Latent Semantic Relations: Linear Algebra vs. Probabilistic Topic Models. Knowledge Engineering and Semantic Web. 8th International Conference, KESW 2017, pp.53-68. open in new tab
- Rizun N., Taranenko Y. Methodology of Constructing and Analyzing the Hierarchical Contextually-Oriented Corpora. Proceeding of Federated Conference on Computer Science and Information Systems - FedCSIS 2018. open in new tab
- Rizun N., Taranenko Y., Waloszek W., 2017b. The Algorithm of Building the Hierarchical Contextual Framework of Textual Corpora. Eighth IEEE International Conference on Intelligent Computing and Information System, ICICIS 2017, Cairo, Egypt, pp.366-372.. open in new tab
- Rizun, N., Taranenko, Y., 2017. Development of the Algorithm of Polish Language Film Reviews Preprocessing. Research Yearbook Faculty of Management in Ciechanów WSM, 1-4 (IX), pp. 168- 188. open in new tab
- Salton, G., 1988. Term-weighting Approaches in Automatic Text Retrieval. Information Processing and Management. № 5, Vol. 24. pp. 513-523. open in new tab
- Salton, G. 1989. Automatic Text Processing: the Transformation, Analysis, and Retrieval of Information by Compute. Addison-Wesley Longman Publishing, 543 p. open in new tab
- Taboada M., Brooke J., Tofiloski M., Voll K., Stede M., 2011. Lexicon-based methods for sentiment analysis, Computational Linguistics. no. 37 (2), pp. 267-307, Titov, I., 2008. Modeling Online Reviews with Multi-grain Topic Models. Proceedings of the 17th International Conference on World Wide Web (WWW'08), pp. 111- 120. open in new tab
- Ur-Rahman, N., 2012. Textual data mining for industrial knowledge management and text classification: A business oriented approach. Expert Systems with Applications. № 39. pp. 4729-4739. open in new tab
- Verified by:
- Gdańsk University of Technology
seen 115 times