Abstract
Typical text clustering methods use the bag of words (BoW) representation to describe content of documents. However, this method is known to have several limitations. Employing Wikipedia as the lexical knowledge base has shown an improvement of the text representation for data-mining purposes. Promising extensions of that trend employ hierarchical organization of Wikipedia category system. In this paper we propose three path-based measures for calcu- lating document relatedness in such conceptual space and compare them with the Path Length widely used approach. We perform their evaluation using the OPTICS clustering algorithm for categorization of keyword-based search results. The results have shown that our method outperforms the Path-Length approach.
Citations
-
1
CrossRef
-
0
Web of Science
-
1
Scopus
Authors (2)
Cite as
Full text
full text is not available in portal
Keywords
Details
- Category:
- Conference activity
- Type:
- publikacja w wydawnictwie zbiorowym recenzowanym (także w materiałach konferencyjnych)
- Title of issue:
- W : Foundations of Intelligent Systems strony 435 - 444
- Language:
- English
- Publication year:
- 2014
- Bibliographic description:
- Kucharczyk Ł., Szymański J.: Evaluation of Path Based Methods for Conceptual Representation of the Text// W : Foundations of Intelligent Systems/ ed. Andreasen, Troels and Christiansen, Henning and Cubero, Juan-Carlos and Raś, Zbigniew : Springer International Publishing, 2014, s.435-444
- DOI:
- Digital Object Identifier (open in new tab) 10.1007/978-3-319-08326-1_44
- Verified by:
- Gdańsk University of Technology
seen 120 times
Recommended for you
Text Categorization Improvement via User Interaction
- J. Atroszko,
- J. Szymański,
- D. Gil
- + 1 authors