Filtry
wszystkich: 13
Wyniki wyszukiwania dla: DOCUMENTS CLUSTERING
-
External Validation Measures for Nested Clustering of Text Documents
PublikacjaAbstract. This article handles the problem of validating the results of nested (as opposed to "flat") clusterings. It shows that standard external validation indices used for partitioning clustering validation, like Rand statistics, Hubert Γ statistic or F-measure are not applicable in nested clustering cases. Additionally to the work, where F-measure was adopted to hierarchical classification as hF-measure, here some methods to...
-
Development and Research of the Text Messages Semantic Clustering Methodology
PublikacjaThe methodology of semantic clustering analysis of customer’s text-opinions collection is developed. The author's version of the mathematical models of formalization and practical realization of short textual messages semantic clustering procedure is proposed, based on the customer’s text-opinions collection Latent Semantic Analysis knowledge extracting method. An algorithm for semantic clustering of the text-opinions is developed,...
-
Information Retrieval with the Use of Music Clustering by Directions Algorithm
PublikacjaThis paper introduces the Music Clustering by Directions (MCBD) algorithm. The algorithm is designed to support users of query by humming systems in formulating queries. This kind of systems makes it possible to retrieve songs and tunes on the basis of a melody recorded by the user. The Music Clustering by Directions algorithm is a kind of an interactive query expansion method. On the basis of query, the algorithm provides suggestions...
-
Retrieval with Semantic Sieve
PublikacjaThe article presents an algorithm we called Semantic Sieve applied for refining search results in text documents repository. The algorithm calculates socalled conceptual directions that enables interaction with the user and allows to narrow the set of results to the most relevant ones. We present the system where the algorithm has been implemented. The system also offers in the presentation layer clustering of the results into...
-
Web search results clusterization with background knowledge
PublikacjaClusterization of web pages is an attractive wayfor presenting web resources. Arranging pages into groups ofsimilar topics simplifies and shorten the search process. Thispaper concerns the problem of clustering web pages and presentsour approach to this issue. Our solution is focused on findingsimilarities between documents delivered by different web searchengines. This process was accomplished by applying WordNetdictionary.
-
Evaluation of Path Based Methods for Conceptual Representation of the Text
PublikacjaTypical text clustering methods use the bag of words (BoW) representation to describe content of documents. However, this method is known to have several limitations. Employing Wikipedia as the lexical knowledge base has shown an improvement of the text representation for data-mining purposes. Promising extensions of that trend employ hierarchical organization of Wikipedia category system. In this paper we propose three path-based...
-
Increasing K-Means Clustering Algorithm Effectivity for Using in Source Code Plagiarism Detection
PublikacjaThe problem of plagiarism is becoming increasingly more significant with the growth of Internet technologies and the availability of information resources. Many tools have been successfully developed to detect plagiarisms in textual documents, but the situation is more complicated in the field of plagiarism of source codes, where the problem is equally serious. At present, there are no complex tools available to detect plagiarism...
-
Path-based methods on categorical structures for conceptual representation of wikipedia articles
PublikacjaMachine learning algorithms applied to text categorization mostly employ the Bag of Words (BoW) representation to describe the content of the documents. This method has been successfully used in many applications, but it is known to have several limitations. One way of improving text representation is usage of Wikipedia as the lexical knowledge base – an approach that has already shown promising results in many research studies....
-
Semantic Analysis and Text Summarization in Socio-Technical Systems
PublikacjaIn this chapter the authors present the results of the development the methodology for increasing the reliability of the functioning of the Socio-Technical System. The existed methods and algorithms for processing unstructured (textual) information were studied. Taking into account noted above strengths and weaknesses of Discriminant and Probabilistic approaches of Latent Semantic Relations analysis in of the summarization projection...
-
Social learning in cluster initiatives
PublikacjaPurpose – The purpose of the paper is to portray social learning in cluster initiatives (CIs), namely: 1) to explore, with the lens of the communities of practice (CoPs) theory, in what ways social learning occurs in CIs; 2) to discover how various CoPs emerge and evolve in CIs to facilitate a collective journey in their learning process. Subsequently, the authors address the research questions: In what ways does social learning...
-
Social learning and knowledge flows in cluster initiatives, In: Sanz S.C., Blanco F.P., Urzelai B. (Eds). Human and Relational Resources (pp. 44-45). the 4th International Conference on Clusters and Industrial Districts CLUSTERING, University of Valencia, Spain, May 23–24 (ISBN: 978-84-09-11926-4).
PublikacjaPurpose – The purpose of the paper is to explore how learning manifests and knowledge flows in cluster initiatives (CIs) due to interactions undertaken by their members. The paper addresses the research question of how social learning occurs and knowledge flows in CIs. Design/methodology/approach – The qualitative study of four cluster initiatives helped to identify various symptoms of social learning and knowledge flows in...
-
Information Retrieval in Wikipedia with Conceptual Directions
PublikacjaThe paper describes our algorithm used for retrieval of textual information from Wikipedia. The experiments show that the algorithm allows to improve typical evaluation measures of retrieval quality. The improvement of the retrieval results was achieved by two phase usage approach. In first the algorithm extends the set of content that has been indexed by the specified keywords and thus increases the Recall value. Then, using the...
-
Interactive Information Retrieval Algorithm for Wikipedia Articels
PublikacjaThe article presents an algorithm for retrieving textual information in documents collection. The algorithm employs a category system that organizers the repository and using interaction with user improves search precision. The algorithm was implemented for simple English Wikipedia and the first evaluation results indicates the proposed method can help to retrieve information from large document repositories.