Methodology for Text Classification using Manually Created Corpora-based Sentiment Dictionary - Publikacja - MOST Wiedzy


Methodology for Text Classification using Manually Created Corpora-based Sentiment Dictionary


This paper presents the methodology of Textual Content Classification, which is based on a combination of algorithms: preliminary formation of a contextual framework for the texts in particular problem area; manual creation of the Hierarchical Sentiment Dictionary (HSD) on the basis of a topically-oriented Corpus; tonality texts recognition via using HSD for analysing the documents as a collection of topically completed fragments (paragraphs). For verification of the proposed methodology, a case study of Polish-language film reviews Corpora was used. The main scientific contributions of this research are: writing style of the analyzed text determines the possibility of adaptation of the Texts Classification algorithms; Hierarchically-oriented Structure of the HSD allows customizing the classification process to qualitative recognition of text tonality in the context of individual paragraphs topics; texts of Persuasive style most often are initially empowered by authors with a certain tonality. The tone, expressed in the author's opinion, effects the qualitative indicators of sentiment recognition. Negative emotions of the author usually reduce the level of vocabulary variability as well as the variety of topics raised in the document but simultaneously increase the level of unpredictability of words contextually used with both positive and negative emotional coloring


  • 1


  • 0

    Web of Science

  • 0


Pełna treść

pobierz publikację
pobrano 17 razy


Copyright (2018 by SCITEPRESS – Science and Technology Publications, Lda)

Informacje szczegółowe

Aktywność konferencyjna
publikacja w wydawnictwie zbiorowym recenzowanym (także w materiałach konferencyjnych)
Rok wydania:
Opis bibliograficzny:
Rizun N., Waloszek W.: Methodology for Text Classification using Manually Created Corpora-based Sentiment Dictionary// Proceedings of the 10th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management/ 10th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management : , 2018, s.1-9
Cyfrowy identyfikator dokumentu elektronicznego (otwiera się w nowej karcie) 10.5220/0006932602120220
Bibliografia: test
  1. Bijuraj, L. V., 2013. Clustering and its Applications. Proceedings of National Conference on New Horizons in IT (NCNHIT). pp. 169-172.
  2. Boiy, E., 2007. Automatic Sentiment Analysis in On-line Text. Proceedings of the 11th International Conference on Electronic Publishing (ELPUB 2007). pp. 349-360. otwiera się w nowej karcie
  3. Boucher J.D., Osgood Ch.E., 1969. The Pollyanna hypothesis. Journ. of Verbal Learning and Verbal Behaviour, no. 8, pp. 1-8. otwiera się w nowej karcie
  4. Hu, M., 2004. Mining and Summarizing Customer Reviews. Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '04), pp. 168-177. otwiera się w nowej karcie
  5. Ivanov V., Tutubalina E., Mingazov N., Alimova I., 2015. Extracting Aspects, Sentiment and Categories of Aspects in User Reviews about Restaurants and Cars, Computational Linguistics and Intellectual Technologies: Proceedings of the International Conference "Dialogue 2015", Moscow, pp. 22-33.
  6. Klekovkina MV, Kotelnikov EV., 2012. The method of automatic classification of texts by tonality, based on the dictionary of emotional vocabulary. Electronic libraries: promising methods and technologies, electronic collections (RCDL-2012): tr. XIV Vseros. sci. Conf. Pereslavl-Zalessky: pp. 118-123.
  7. König A.C., Brill E., 2006. Reducing the human overhead in text categorization. Proc. 12th ACM SIGKDD conf. on knowledge discovery and data mining, pp. 598-603. otwiera się w nowej karcie
  8. Liu, B., 2012. Sentiment Analysis and Opinion Mining. Synthesis Lectures on Human Language Technologies. Vol. 5(1). otwiera się w nowej karcie
  9. Manning, Ch., 2009. Introduction to Information Retrieval. Cambridge University Press, p. 544, p. 222.
  10. Pang, B., 2008. Opinion Mining and Sentiment Analysis. Foundations and Trends in Information Retrieval. Vol. 2. pp. 18-22. otwiera się w nowej karcie
  11. Popovic, M., 2006. Statistical Machine Translation with a Small Amount of Training Data. In Proceedings of the 5th LREC SALTMIL Workshop on Minority Languages. pp. 25-29.
  12. Rizun N., Ossowska K., Taranenko Y., 2018. Modeling the Customer's Contextual Expectations Based on Latent otwiera się w nowej karcie
  13. Semantic Analysis Algorithms. Information Systems Architecture and Technology: 38th International Conference on Information Systems Architecture and Technology -ISAT 2017, pp.364-373. otwiera się w nowej karcie
  14. Rizun N., Taranenko Y., Waloszek W., 2017a. The Algorithm of Modelling and Analysis of Latent Semantic Relations: Linear Algebra vs. Probabilistic Topic Models. Knowledge Engineering and Semantic Web. 8th International Conference, KESW 2017, pp.53-68. otwiera się w nowej karcie
  15. Rizun N., Taranenko Y. Methodology of Constructing and Analyzing the Hierarchical Contextually-Oriented Corpora. Proceeding of Federated Conference on Computer Science and Information Systems - FedCSIS 2018. otwiera się w nowej karcie
  16. Rizun N., Taranenko Y., Waloszek W., 2017b. The Algorithm of Building the Hierarchical Contextual Framework of Textual Corpora. Eighth IEEE International Conference on Intelligent Computing and Information System, ICICIS 2017, Cairo, Egypt, pp.366-372.. otwiera się w nowej karcie
  17. Rizun, N., Taranenko, Y., 2017. Development of the Algorithm of Polish Language Film Reviews Preprocessing. Research Yearbook Faculty of Management in Ciechanów WSM, 1-4 (IX), pp. 168- 188. otwiera się w nowej karcie
  18. Salton, G., 1988. Term-weighting Approaches in Automatic Text Retrieval. Information Processing and Management. № 5, Vol. 24. pp. 513-523. otwiera się w nowej karcie
  19. Salton, G. 1989. Automatic Text Processing: the Transformation, Analysis, and Retrieval of Information by Compute. Addison-Wesley Longman Publishing, 543 p. otwiera się w nowej karcie
  20. Taboada M., Brooke J., Tofiloski M., Voll K., Stede M., 2011. Lexicon-based methods for sentiment analysis, Computational Linguistics. no. 37 (2), pp. 267-307, Titov, I., 2008. Modeling Online Reviews with Multi-grain Topic Models. Proceedings of the 17th International Conference on World Wide Web (WWW'08), pp. 111- 120. otwiera się w nowej karcie
  21. Ur-Rahman, N., 2012. Textual data mining for industrial knowledge management and text classification: A business oriented approach. Expert Systems with Applications. № 39. pp. 4729-4739. otwiera się w nowej karcie
Politechnika Gdańska

wyświetlono 44 razy

Publikacje, które mogą cię zainteresować

Meta Tagi