Filters
total: 3352
-
Catalog
- Publications 2143 available results
- Journals 54 available results
- Conferences 12 available results
- Publishing Houses 1 available results
- People 77 available results
- Research Teams 1 available results
- e-Learning Courses 329 available results
- Events 4 available results
- Open Research Data 731 available results
displaying 1000 best results Help
Search results for: text representation · document categorization wikipedia · word2vec · paragraph vector · self-organizing maps
-
Text Categorization Improvement via User Interaction
PublicationIn this paper, we propose an approach to improvement of text categorization using interaction with the user. The quality of categorization has been defined in terms of a distribution of objects related to the classes and projected on the self-organizing maps. For the experiments, we use the articles and categories from the subset of Simple Wikipedia. We test three different approaches for text representation. As a baseline we use...
-
Self–Organizing Map representation for clustering Wikipedia search results
PublicationThe article presents an approach to automated organization of textual data. The experiments have been performed on selected sub-set of Wikipedia. The Vector Space Model representation based on terms has been used to build groups of similar articles extracted from Kohonen Self-Organizing Maps with DBSCAN clustering. To warrant efficiency of the data processing, we performed linear dimensionality reduction of raw data using Principal...
-
Self-Organizing Map representation for clustering Wikipedia search results
PublicationThe article presents an approach to automated organization of textual data. The experiments have been performed on selected sub-set of Wikipedia. The Vector Space Model representation based on terms has been used to build groups of similar articles extracted from Kohonen Self-Organizing Maps with DBSCAN clustering. To warrant efficiency of the data processing, we performed linear dimensionality reduction of raw data using Principal...
-
Evaluation of Path Based Methods for Conceptual Representation of the Text
PublicationTypical text clustering methods use the bag of words (BoW) representation to describe content of documents. However, this method is known to have several limitations. Employing Wikipedia as the lexical knowledge base has shown an improvement of the text representation for data-mining purposes. Promising extensions of that trend employ hierarchical organization of Wikipedia category system. In this paper we propose three path-based...
-
Comparative Analysis of Text Representation Methods Using Classification
PublicationIn our work, we review and empirically evaluate five different raw methods of text representation that allow automatic processing of Wikipedia articles. The main contribution of the article—evaluation of approaches to text representation for machine learning tasks—indicates that the text representation is fundamental for achieving good categorization results. The analysis of the representation methods creates a baseline that cannot...
-
Path-based methods on categorical structures for conceptual representation of wikipedia articles
PublicationMachine learning algorithms applied to text categorization mostly employ the Bag of Words (BoW) representation to describe the content of the documents. This method has been successfully used in many applications, but it is known to have several limitations. One way of improving text representation is usage of Wikipedia as the lexical knowledge base – an approach that has already shown promising results in many research studies....
-
Study of Statistical Text Representation Methods for Performance Improvement of a Hierarchical Attention Network
PublicationTo effectively process textual data, many approaches have been proposed to create text representations. The transformation of a text into a form of numbers that can be computed using computers is crucial for further applications in downstream tasks such as document classification, document summarization, and so forth. In our work, we study the quality of text representations using statistical methods and compare them to approaches...
-
An Analysis of Neural Word Representations for Wikipedia Articles Classification
PublicationOne of the current popular methods of generating word representations is an approach based on the analysis of large document collections with neural networks. It creates so-called word-embeddings that attempt to learn relationships between words and encode this information in the form of a low-dimensional vector. The goal of this paper is to examine the differences between the most popular embedding models and the typical bag-of-words...
-
Self Organizing Maps for Visualization of Categories
PublicationVisualization of Wikipedia categories using Self Organizing Mapsshows an overview of categories and their relations, helping to narrow down search domains. Selecting particular neurons this approach enables retrieval of conceptually similar categories. Evaluation of neural activations indicates that they form coherent patterns that may be useful for building user interfaces for navigation over category structures.
-
Music Mood Visualization Using Self-Organizing Maps
PublicationDue to an increasing amount of music being made available in digital form in the Internet, an automatic organization of music is sought. The paper presents an approach to graphical representation of mood of songs based on Self-Organizing Maps. Parameters describing mood of music are proposed and calculated and then analyzed employing correlation with mood dimensions based on the Multidimensional Scaling. A map is created in which...
-
Wikipedia Articles Representation with Matrix'u
PublicationIn the article we evaluate different text representation methods used for a task of Wikipedia articles categorization. We present the Matrix’u application used for creating computational datasets ofWikipedia articles. The representations have been evaluated with SVM classifiers used for reconstruction human made categories.
-
Text classifiers for automatic articles categorization
PublicationThe article concerns the problem of automatic classification of textual content. We present selected methods for generation of documents representation and we evaluate them in classification tasks. The experiments have been performed on Wikipedia articles classified automatically to their categories made by Wikipedia editors.
-
Standard of living in Poland at regional level - classification with Kohonen self-organizing maps
PublicationThe standard of living is spatially diversified and its analyzes enable shaping regional policy. Therefore, it is crucial to assess the standard of living and to classify regions due to their standard of living, based on a wide set of determinants. The most common research methods are those based on composite indicators, however, they are not ideal. Among the current critiques moved to the use of composite...
-
Assessment of the water quality of Kłodnica River catchment using self-organizing maps
PublicationRisk assessment of industrial areas heavily polluted due to anthropogenic actions is of increasing concern worldwide. So is the case of Polish Silesia region where mostly heavy industry like smelters, mining, chemical industries as well as heat and electricity production facilities are being located. Such situation raises numerous questions about environmental state of local water bodies with special attention paid to the Kłodnica...
-
Parallel Computations of Text Similarities for Categorization Task
PublicationIn this chapter we describe the approach to parallel implementation of similarities in high dimensional spaces. The similarities computation have been used for textual data categorization. A test datasets we create from Wikipedia articles that with their hyper references formed a graph used in our experiments. The similarities based on Euclidean distance and Cosine measure have been used to process the data using k-means algorithm....
-
Categorization of Wikipedia articles with spectral clustering
PublicationAbstract. The article reports application of clustering algorithms for creating hierarchical groups withinWikipedia articles.We evaluate three spectral clustering algorithms based on datasets constructed with usage ofWikipedia categories. Selected algorithm has been implemented in the system that categorize Wikipedia search results in the fly.
-
Comparative Study of Self-Organizing Maps vs. Subjective Evaluation of Quality of Allophone Pronunciation for Nonnative English Speakers
PublicationThe purpose of this study was to apply Self-Organizing Maps to differentiate between the correct and the incorrect allophone pronunciations and to compare the results with subjective evaluation. Recordings of a list of target words, containing selected allophones of English plosive consonants, the velar nasal and the lateral consonant, were made twice. First, the target words were read from the list by 9 non-native speakers and...
-
Self-Organizing Wireless Nodes Monitoring Network
PublicationThe concept of data monitoring system and self-organizing network of multipurpose data transfer nodes are presented. Two practical applications of this system are also presented. The first of these is the wireless monitoring system for containers, and the second is the mobile monitoring system for gas air pollution measurements.
-
Novel approach to ecotoxicological risk assessment of sediments cores around the shipwreck by the use of self-organizing maps
PublicationMarine and coastal pollution plays an increasingly important role due to recent severe accidents which drew attention to the consequences of oil spills causing widespread devastation of marine ecosystems. All these problems cannot be solved without conducting environmental studies in the area of possible oil spill and performing chemometric evaluation of the data obtained looking for similar patterns among pollutants and optimize...
-
Spectral Clustering Wikipedia Keyword-Based search Results
PublicationThe paper summarizes our research in the area of unsupervised categorization of Wikipedia articles. As a practical result of our research, we present an application of spectral clustering algorithm used for grouping Wikipedia search results. The main contribution of the paper is a representation method for Wikipedia articles that has been based on combination of words and links and used for categoriation of search results in this...
-
Self-Organizing Wireless Monitoring System for Containers
PublicationThis paper presents a description of new global monitoring system for containers, with its layer-modular structure, as a solution for enhance security and efficiency of container transport with particular emphasis on the practical implementation of that system for maritime container terminals. Especially the Smart Container Module (SCM) architecture and its operation as a part of the Self-Organizing Container Monitoring Network...
-
Distributed infrastructure of self-organizing service servers
PublicationThis paper presents an idea of creating distributed system consisting of autonomous self-organizing service providers. It shows a real implemented system allowing for dynamic service search without interaction with client based on the inter-servers communication. Moreover presented system can be easily enriched with new elements without the need for restarting already existing servers.
-
Concept of managing quality in baking industry, in vector representation
PublicationThe author introduced an innovative metrisable method of describing a manufacturing process. The idea of vector structure of a manufacturing process allows to formulate quantitative relations between the activity of input streams, elements of product quality, and measurable effects of losses. The structure was basis for the formulation of the concept of the process of managing product quality in the baking industry in a vector...
-
Self-organizing wireless monitoring system for cargo containers
PublicationThis paper presents a description of new global monitoring system for containers with its layer-modular structure, as a solution for enhance security and efficiency of container transport with particular emphasis on the practical implementation of that system for maritime container terminals. Especially the Smart Container Module (SCM) architecture and its operation as a part of the Self-Organizing ContainerMonitoring Network is...
-
Lefschetz periodic point free self-maps of compact manifolds
PublicationLet f be a self-map of a compact connected manifold M. We characterize Lefschetz periodic point free continuous self-maps of M for several classes of manifolds and generalize the results of Guirao and Llibre [J.L.G. Guirao, J. Llibre, On the Lefschetz periodic point free continuous self-maps on connected compact manifolds,
-
Lefschetz periodic point free self-maps of compact manifolds
PublicationLet f be a self-map of a compact connected manifold M. We characterize Lefschetz periodic point free continuous self-maps of M for several classes of manifolds and generalize the results of Guirao and Llibre [J.L.G. Guirao, J. Llibre, On the Lefschetz periodic point free continuous self-maps on connected compact manifolds, Topology Appl. 158 (16) (2011) 2165-2169].
-
Text categorization with semantic commonsense knowledge: First results
PublicationDo przetwarzania tekstów typowo wykorzystuje się reprezentacjeBOW. Podejście takie nie daje jednak dobrych rezultatów w sytuacjigdy podobne dokumenty nie współdzielą ze sobą słów.W artykule zaprezentowano podejście do konstrukcji funkcjijądra dla klasyfikatorów SVM opartego na zewnętrznej bazie wiedzyo pojęciach językowych.
-
Analysing the Residential Market Using Self-Organizing Map
PublicationAlthough the residential property market has strong connections with various sectors, such as construction, logistics, and investment, it works through different dynamics than other markets; thus, it can be analysed from various perspectives. Researchers and investors are mostly interested in price trends, the impact of external factors on residential property prices, and price prediction. When analysing price trends, it is beneficial...
-
Text Documents Classification with Support Vector Machines
Publication -
Embedded Representations of Wikipedia Categories
PublicationIn this paper, we present an approach to building neural representations of the Wikipedia category graph. We test four different methods and examine the neural embeddings in terms of preservation of graphs edges, neighborhood coverage in representation space, and their influence on the results of a task predicting parent of two categories. The main contribution of this paper is application of neural representations for improving the...
-
Self-organizing maps classification of epidemiological data and toenail selenium content monitored on cancer and healthy patients from Poland
PublicationW pracy przedstawiono wyniki wielowymiarowej analizy danych pomiarowych (przy wykorzystaniu techniki samoorganizującej sie mapy (SOM))nad oszacowaniem zawartości selenu w próbkach paznokci pobranych od mieszkańców Województwa Pomorskiego ((w tym od grupy osób zdrowych i ze zdiagnozowaną chorobą nowotworową) i Lubuskiego. W wyniku przeprowadzonej analizy uzyskano podział uczestników ze zdiagnozowanym nowotworem na trzy różne grupy:1...
-
Representation of hypertext documents based on terms, Links and text compressibility
PublicationOpisano metody reprezentacji dokumentów tekstowych oparte na słowach, wzajemnych powiązaniach i metodach kompresji. Dokonano ich oceny w oparciu o klasyfikator SVM.
-
Estimation of the minimal number of periodic points for smooth self-maps of odd dimensional real projective spaces
PublicationLet f be a smooth self-map of a closed connected manifold of dimension m⩾3. The authors introduced in [G. Graff, J. Jezierski, Minimizing the number of periodic points for smooth maps. Non-simply connected case, Topology Appl. 158 (3) (2011) 276-290] the topological invariant NJD_r[f], where r is a fixed natural number, which is equal to the minimal number of r-periodic points in the smooth homotopy class of f. In this paper smooth...
-
Interactive Information Retrieval Algorithm for Wikipedia Articels
PublicationThe article presents an algorithm for retrieving textual information in documents collection. The algorithm employs a category system that organizers the repository and using interaction with user improves search precision. The algorithm was implemented for simple English Wikipedia and the first evaluation results indicates the proposed method can help to retrieve information from large document repositories.
-
Bidirectional Fragment to Fragment Links in Wikipedia
PublicationThe paper presents a WikiLinks system that extends the Wikipedia linkage model with bidirectional links between fragments of the articles and overlapping links’ anchors. The proposed model adopts some ideas from the research conducted in a field of nonlinear, computer-aided writing, often called a hypertext. WikiLinks may be considered as a web augmentation tool but it presents a new approach to the problem that addresses the specific...
-
Minimization of the number of periodic points for smooth self-maps of simply-connected manifolds with periodic sequence of Lefschetz numbers
PublicationLet f be a smooth self-map of m-dimensional, m ≥ 4, smooth closed connected and simply-connected manifold, r a fixed natural number. For the class of maps with periodic sequence of Lefschetz numbers of iterations the authors introduced in [Graff G., Kaczkowska A., Reducing the number of periodic points in smooth homotopy class of self-maps of simply-connected manifolds with periodic sequence of Lefschetz numbers, Ann. Polon. Math....
-
Unsupervised Learning for Biomechanical Data Using Self-organising Maps, an Approach for Temporomandibular Joint Analysis
PublicationWe proposed to apply a specific machine learning technique called Self-Organising Maps (SOM) to identify similarities in the performance of muscles around human temporomandibular joint (TMJ). The performance was assessed by measuring muscle activation with the use of surface electromyography (sEMG). SOM algorithm used in the study was able to find clusters of data in sEMG test results. The SOM analysis was based on processed sEMG...
-
Asynchronous and self-organizing radiolocation system — AEGIR
Publication -
Selection of Relevant Features for Text Classification with K-NN
PublicationIn this paper, we describe five features selection techniques used for a text classification. An information gain, independent significance feature test, chi-squared test, odds ratio test, and frequency filtering have been compared according to the text benchmarks based on Wikipedia. For each method we present the results of classification quality obtained on the test datasets using K-NN based approach. A main advantage of evaluated...
-
Categorization of Cloud Workload Types with Clustering
PublicationThe paper presents a new classification schema of IaaS cloud workloads types, based on the functional characteristics. We show the results of an experiment of automatic categorization performed with different benchmarks that represent particular workload types. Monitoring of resource utilization allowed us to construct workload models that can be processed with machine learning algorithms. The direct connection between the functional...
-
Minimal number of periodic points of smooth boundary-preserving self-maps of simply-connected manifolds
PublicationLet M be a smooth compact and simply-connected manifold with simply-connected boundary ∂M, r be a fixed odd natural number. We consider f, a C1 self-map of M, preserving ∂M . Under the assumption that the dimension of M is at least 4, we define an invariant Dr(f;M,∂M) that is equal to the minimal number of r-periodic points for all maps preserving ∂M and C1-homotopic to f. As an application, we give necessary and sufficient...
-
Computations of the least number of periodic points of smooth boundary-preserving self-maps of simply-connected manifolds
PublicationLet $r$ be an odd natural number, $M$ a compact simply-connected smooth manifold, $\dim M\geq 4$, such that its boundary $\partial M$ is also simply-connected. We consider $f$, a $C^1$ self-maps of $M$, preserving $\partial M$. In [G. Graff and J. Jezierski, Geom. Dedicata 187 (2017), 241-258] the smooth Nielsen type periodic number $D_r(f;M,\partial M)$ was defined and proved to be equal to the minimal number of $r$-periodic points...
-
Reliable Document-Centric Processing in Loosely Coupled Email-Based Systems
PublicationEmail is a simple way to exchange digital documents of any kind. The Mobile INteractive Document architecture (MIND) enables self-coordination and self-steering of document agent systems based on commonly available email services. In this paper, a mechanism for providing integrity and reliability of such an email based agent system is proposed to cope with message soft or hard bounces, user interrupts, and other unexpected events....
-
Autonomous, Ground Based, Self-Organizing Radiolocation Systems - AEGIR
PublicationThis article describes the construction and operation of autonomous ground-based radiolocation system that was developed as a technology demonstrator at the Technical University of Gdansk. Preliminary results and conclusions will be presented as well as analysis of its effectiveness. There will be also described the basic blocks of the system.
-
Towards Extending Wikipedia with Bidirectional Links
PublicationIn this paper, we present the results of our WikiLinks project which aims at extending current Wikipedia linkage mechanisms. Wikipedia has become recently one of the most important information sources on the Internet, which still is based on relatively simple linkage facilities. A WikiLinks system extends the Wikipedia with bidirectional links between fragments of articles. However, there were several attempts to introduce bidirectional...
-
Document Agents with the Intelligent Negotiations Capability
PublicationThe paper focus is on augmenting proactive document-agents with built -in intelligence to enable them to recognize execution context provided by devices visited durning the business process, and to reach collaboration agreement despite of their conflicting requirements. We propose a solution based on neural networks to improve simple multi-issue negotiation between the document and the device, practically with no excessive cost...
-
Algebraic periods and minimal number of periodic points for smooth self-maps of 1-connected 4-manifolds with definite intersection forms
PublicationLet M be a closed 1-connected smooth 4-manifolds, and let r be a non-negative integer. We study the problem of finding minimal number of r-periodic points in the smooth homotopy class of a given map f: M-->M. This task is related to determining a topological invariant D^4_r[f], defined in Graff and Jezierski (Forum Math 21(3):491–509, 2009), expressed in terms of Lefschetz numbers of iterations and local fixed point indices of...
-
TF-IDF weighted bag-of-words preprocessed text documents from Simple English Wikipedia
Open Research DataThe SimpleWiki2K-scores dataset contains TF-IDF weighted bag-of-words preprocessed text documents (raw strings are not available) [feature matrix] and their multi-label assignments [label-matrix]. Label scores for each document are also provided for an enhanced multi-label KNN [1] and LEML [2] classifiers. The aim of the dataset is to establish a benchmark...
-
Relation-based Wikipedia Search System for Factoid Questions Answering
PublicationIn this paper we propose an alternative keyword search mechanism for Wikipedia, designed as a prototype solution towards factoid questions answering. The method considers relations between articles for finding the best matching article. Unlike the standard Wikipedia search engine and also Google engine, which search the articles content independently, requiring the entire query to be satisfied by a single article, the proposed...
-
The Application of the IODA Document Architecture to Music Data
PublicationThis paper is concerned with storing music data with the use of document architecture called Interactive Open Document Architecture (IODA). This architecture makes it possible to create documents which are executable, mobile, interactive and intelligent. Such documents consist of many files that are semantically related to each other. Semantic links are defined in XML files which are a part of a document. IODA documents with music...