Search results for: text representation - Bridge of Knowledge

Search

Search results for: text representation

Search results for: text representation

  • Comparative Analysis of Text Representation Methods Using Classification

    Publication

    In our work, we review and empirically evaluate five different raw methods of text representation that allow automatic processing of Wikipedia articles. The main contribution of the article—evaluation of approaches to text representation for machine learning tasks—indicates that the text representation is fundamental for achieving good categorization results. The analysis of the representation methods creates a baseline that cannot...

    Full text to download in external service

  • Study of Statistical Text Representation Methods for Performance Improvement of a Hierarchical Attention Network

    To effectively process textual data, many approaches have been proposed to create text representations. The transformation of a text into a form of numbers that can be computed using computers is crucial for further applications in downstream tasks such as document classification, document summarization, and so forth. In our work, we study the quality of text representations using statistical methods and compare them to approaches...

    Full text available to download

  • Evaluation of Path Based Methods for Conceptual Representation of the Text

    Publication

    Typical text clustering methods use the bag of words (BoW) representation to describe content of documents. However, this method is known to have several limitations. Employing Wikipedia as the lexical knowledge base has shown an improvement of the text representation for data-mining purposes. Promising extensions of that trend employ hierarchical organization of Wikipedia category system. In this paper we propose three path-based...

    Full text to download in external service

  • Representation of hypertext documents based on terms, Links and text compressibility

    Publication

    Opisano metody reprezentacji dokumentów tekstowych oparte na słowach, wzajemnych powiązaniach i metodach kompresji. Dokonano ich oceny w oparciu o klasyfikator SVM.

  • Wikipedia Articles Representation with Matrix'u

    Publication

    - Year 2013

    In the article we evaluate different text representation methods used for a task of Wikipedia articles categorization. We present the Matrix’u application used for creating computational datasets ofWikipedia articles. The representations have been evaluated with SVM classifiers used for reconstruction human made categories.

    Full text to download in external service

  • Text Categorization Improvement via User Interaction

    Publication

    - Year 2018

    In this paper, we propose an approach to improvement of text categorization using interaction with the user. The quality of categorization has been defined in terms of a distribution of objects related to the classes and projected on the self-organizing maps. For the experiments, we use the articles and categories from the subset of Simple Wikipedia. We test three different approaches for text representation. As a baseline we use...

    Full text to download in external service

  • Path-based methods on categorical structures for conceptual representation of wikipedia articles

    Machine learning algorithms applied to text categorization mostly employ the Bag of Words (BoW) representation to describe the content of the documents. This method has been successfully used in many applications, but it is known to have several limitations. One way of improving text representation is usage of Wikipedia as the lexical knowledge base – an approach that has already shown promising results in many research studies....

    Full text available to download

  • Fusion-based Representation Learning Model for Multimode User-generated Social Network Content

    As mobile networks and APPs are developed, user-generated content (UGC), which includes multi-source heterogeneous data like user reviews, tags, scores, images, and videos, has become an essential basis for improving the quality of personalized services. Due to the multi-source heterogeneous nature of the data, big data fusion offers both promise and drawbacks. With the rise of mobile networks and applications, UGC, which includes...

    Full text to download in external service

  • Ontologies vs. Rules — Comparison of Methods of Knowledge Representation Based on the Example of IT Services Management

    Publication

    - Year 2013

    This text provides a brief overview of selected structures aimed at knowledge representation in the form of ontologies based on description logic and aims at comparing them with their counterparts based on the rule-based approach. Due to the limitations on the length of the article, only elements associated with the representation of concepts could be shown, without including roles. The formalisms of the OWL language were used...

    Full text to download in external service

  • An Analysis of Neural Word Representations for Wikipedia Articles Classification

    Publication

    - CYBERNETICS AND SYSTEMS - Year 2019

    One of the current popular methods of generating word representations is an approach based on the analysis of large document collections with neural networks. It creates so-called word-embeddings that attempt to learn relationships between words and encode this information in the form of a low-dimensional vector. The goal of this paper is to examine the differences between the most popular embedding models and the typical bag-of-words...

    Full text to download in external service

  • Review on Wikification methods

    Publication

    - AI COMMUNICATIONS - Year 2019

    The paper reviews methods on automatic annotation of texts with Wikipedia entries. The process, called Wikification aims at building references between concepts identified in the text and Wikipedia articles. Wikification finds many applications, especially in text representation, where it enables one to capture the semantic similarity of the documents. Also, it can be considered as automatic tagging of the text. We describe typical...

    Full text to download in external service

  • Spectral Clustering Wikipedia Keyword-Based search Results

    The paper summarizes our research in the area of unsupervised categorization of Wikipedia articles. As a practical result of our research, we present an application of spectral clustering algorithm used for grouping Wikipedia search results. The main contribution of the paper is a representation method for Wikipedia articles that has been based on combination of words and links and used for categoriation of search results in this...

    Full text available to download

  • Concept description vectors and the 20 question game

    Publication

    - Year 2005

    Knowledge of properties that are applicable to a given object is a necessary prerequisite to formulate intelligent question. Concept description vectors provide simplest representation of this knowledge, storing for each object information about the values of its properties. Experiments with automatic creation of concept description vectors from various sources, including ontologies, dictionaries, encyclopedias and unstructured...

    Full text to download in external service

  • Ontologie vs. reguły — porównanie metod reprezentacji wiedzy na przykładzie dziedziny zarządzania usługami informatycznymi

    Tekst stanowi krótki przegląd wybranych konstrukcji służących reprezentacji wiedzy w postaci ontologii opartych na logice opisowej i porównanie ich z odpowiednikami opartymi na zapisie regułowym. Z powodu ograniczonej liczby stron pokazano tylko elementy związane z reprezentacją konceptów, bez uwzględniania ról. Do zapisu ontologii wykorzystano formalizmy języka OWL, zaś reguły wyrażono w Prologu. Dla lepszego zilustrowania tych...

    Full text available to download

  • Wykluczenie finansowe starszych konsumentów na rynku usług finansowych

    Publication

    Celem opracowania jest identyfikacja uwarunkowań wykluczenia finansowego starszych konsumentów na rynku usług finansowych. Dla realizacji tego celu zidentyfikowano m.in. podstawowe pojęcia dotyczące wykluczenia (w tym wykluczenia finansowego) osób starszych. Zagrożenia wynikające z wykluczenia finansowego starszych konsumentów zostały zilustrowane poprzez analizę danych statystycznych dotyczących starzenia się społeczeństwa. Podstawowy...

    Full text to download in external service

  • Semantic Memory for Avatars in Cyberspace

    Publication

    - Year 2005

    Avatars that show intelligent behavior should have an access to general knowledge about the world, knowledge that humans store in their semantic memories. The simplest knowledge representation for semantic memory is based on the Concept Description Vectors (CDVs) that store, for each concept, an information whether a given property can be applied to this concept or not. Unfortunately large-scale semantic memories are not available....

  • Detection of Lexical Stress Errors in Non-Native (L2) English with Data Augmentation and Attention

    Publication

    - Year 2021

    This paper describes two novel complementary techniques that improve the detection of lexical stress errors in non-native (L2) English speech: attention-based feature extraction and data augmentation based on Neural Text-To-Speech (TTS). In a classical approach, audio features are usually extracted from fixed regions of speech such as the syllable nucleus. We propose an attention-based deep learning model that automatically de...

    Full text available to download

  • Text classifiers for automatic articles categorization

    Publication

    The article concerns the problem of automatic classification of textual content. We present selected methods for generation of documents representation and we evaluate them in classification tasks. The experiments have been performed on Wikipedia articles classified automatically to their categories made by Wikipedia editors.

  • A collection of directed graphs for the minimum cycle mean weight computation

    Open Research Data
    open access

    This dataset contains definitions of the 16 directed graphs with weighted edges that were described in the following paper: Paweł Pilarczyk, A space-efficient algorithm for computing the minimum cycle mean in a directed graph, Journal of Mathematics and Computer Science, 20 (2020), no. 4, 349--355, DOI: 10.22436/jmcs.020.04.08, URL: http://dx.doi.org/10.22436/jmcs.020.04.08   These...

  • TF-IDF weighted bag-of-words preprocessed text documents from Simple English Wikipedia

    Open Research Data

    The SimpleWiki2K-scores dataset contains TF-IDF weighted bag-of-words preprocessed text documents (raw strings are not available) [feature matrix] and their multi-label assignments [label-matrix]. Label scores for each document are also provided for an enhanced multi-label KNN [1] and LEML [2] classifiers. The aim of the dataset is to establish a benchmark...

  • Ontology-based text convolution neural network (TextCNN) for prediction of construction accidents

    Publication
    • S. Donghui
    • L. Zhigang
    • J. Zurada
    • A. Manikas
    • J. Guan
    • P. Weichbroth

    - KNOWLEDGE AND INFORMATION SYSTEMS - Year 2024

    The construction industry suffers from workplace accidents, including injuries and fatalities, which represent a significant economic and social burden for employers, workers, and society as a whole.The existing research on construction accidents heavily relies on expert evaluations,which often suffer from issues such as low efficiency, insufficient intelligence, and subjectivity.However, expert opinions provided in construction...

    Full text to download in external service

  • Time frequency representation of Doppler boold flow recordings

    Open Research Data
    open access

    Vital signals registration plays a grate role in biomedical engineering and education process. Well acquired data allow future engineers to observe certain physical phenomenons as well learn how to correctly process and interpret the data. This data set was designed for students to learn about Doppler phenomena and to demonstrate correctly and incorrectly...

  • Data from environmental sensors installed in two locations

    Open Research Data

    The dataset contains data gathered from environmental sensors installed in two locations:

  • Marek Czachor prof. dr hab.

  • The Chow Ring of flag manifolds

    Open Research Data
    open access

    Schubert calculus is the intersection theory of 19th century. Justifying this calculus is the content of the 15th problem of Hilbert. In the course to establish the foundation of algebraic geometry, Van der Vaerden and A. Weil attributed the problem to the determination of the chow ring of flag manifolds G/P, where G is a compact Lie group and P is...

  • Architektura a dekonstrukcja. Przypadek Petera Eisenmana i Bernarda Tschumiego

    Publication

    - Year 2015

    Architecture and Deconstruction Case of Peter Eisenman and Bernard Tschumi   Introduction Towards deconstruction in architecture Intensive relations between philosophical deconstruction and architecture, which were present in the late 1980s and early 1990s, belong to the past and therefore may be described from a greater than...

    Full text available to download

  • LDRAW based renders of LEGO bricks moving on a conveyor belt

    Open Research Data
    open access
    • T. Boiński
    • K. Zawora
    • S. Zaraziński
    • B. Śledź
    • B. Łobacz
    - series: LEGO

    The set contains renders of 5237 LEGO bricks moving on a white conveyor belt. The images were prepared for training neural network for recognition of LEGO bricks. For each brick starting position, alignment and color was selected (simulating the brick falling down on the conveyour belt) and than 10 images was created while the brick was moved across...

  • SkinDepth - synthetic 3D skin lesion database

    Open Research Data
    version 1.0 open access

    SkinDepth is the first synthetic 3D skin lesion database. The release of SkinDepth dataset intends to contribute to the development of algorithms for:

  • LDRAW based renders of LEGO bricks moving on a conveyor belt with extracted models

    Open Research Data
    version 3.0 open access - series: LEGO

    The set contains renders of LEGO bricks moving on a white conveyor belt. The images were prepared for training neural network for recognition of LEGO bricks. For each brick starting position, alignment and color was selected (simulating the brick falling down on the conveyour belt) and than 10 images was created while the brick was moved across the...

  • Collaborative approach to WordNet and Wikipedia integration

    Publication

    In this article we present a collaborative approach tocreating mappings between WordNet and Wikipedia. Wikipediaarticles have been first matched with WordNet synsets in anautomatic way. Then such associations have been evaluated andcomplemented in a collaborative way using a web application.We describe algorithms used for creating automatic mappingsas well as a system for their collaborative development. Theoutcome enables further...