Wyniki wyszukiwania dla: text representation - MOST Wiedzy


Wyniki wyszukiwania dla: text representation

Wyniki wyszukiwania dla: text representation

  • Comparative Analysis of Text Representation Methods Using Classification


    In our work, we review and empirically evaluate five different raw methods of text representation that allow automatic processing of Wikipedia articles. The main contribution of the article—evaluation of approaches to text representation for machine learning tasks—indicates that the text representation is fundamental for achieving good categorization results. The analysis of the representation methods creates a baseline that cannot...

    Pełny tekst do pobrania w serwisie zewnętrznym

  • Study of Statistical Text Representation Methods for Performance Improvement of a Hierarchical Attention Network

    To effectively process textual data, many approaches have been proposed to create text representations. The transformation of a text into a form of numbers that can be computed using computers is crucial for further applications in downstream tasks such as document classification, document summarization, and so forth. In our work, we study the quality of text representations using statistical methods and compare them to approaches...

    Pełny tekst do pobrania w portalu

  • Evaluation of Path Based Methods for Conceptual Representation of the Text


    Typical text clustering methods use the bag of words (BoW) representation to describe content of documents. However, this method is known to have several limitations. Employing Wikipedia as the lexical knowledge base has shown an improvement of the text representation for data-mining purposes. Promising extensions of that trend employ hierarchical organization of Wikipedia category system. In this paper we propose three path-based...

    Pełny tekst do pobrania w serwisie zewnętrznym

  • Representation of hypertext documents based on terms, Links and text compressibility


    Opisano metody reprezentacji dokumentów tekstowych oparte na słowach, wzajemnych powiązaniach i metodach kompresji. Dokonano ich oceny w oparciu o klasyfikator SVM.

  • Wikipedia Articles Representation with Matrix'u


    - Rok 2013

    In the article we evaluate different text representation methods used for a task of Wikipedia articles categorization. We present the Matrix’u application used for creating computational datasets ofWikipedia articles. The representations have been evaluated with SVM classifiers used for reconstruction human made categories.

    Pełny tekst do pobrania w serwisie zewnętrznym

  • Text Categorization Improvement via User Interaction


    - Rok 2018

    In this paper, we propose an approach to improvement of text categorization using interaction with the user. The quality of categorization has been defined in terms of a distribution of objects related to the classes and projected on the self-organizing maps. For the experiments, we use the articles and categories from the subset of Simple Wikipedia. We test three different approaches for text representation. As a baseline we use...

    Pełny tekst do pobrania w serwisie zewnętrznym

  • Path-based methods on categorical structures for conceptual representation of wikipedia articles

    Machine learning algorithms applied to text categorization mostly employ the Bag of Words (BoW) representation to describe the content of the documents. This method has been successfully used in many applications, but it is known to have several limitations. One way of improving text representation is usage of Wikipedia as the lexical knowledge base – an approach that has already shown promising results in many research studies....

    Pełny tekst do pobrania w portalu

  • Fusion-based Representation Learning Model for Multimode User-generated Social Network Content

    As mobile networks and APPs are developed, user-generated content (UGC), which includes multi-source heterogeneous data like user reviews, tags, scores, images, and videos, has become an essential basis for improving the quality of personalized services. Due to the multi-source heterogeneous nature of the data, big data fusion offers both promise and drawbacks. With the rise of mobile networks and applications, UGC, which includes...

    Pełny tekst do pobrania w serwisie zewnętrznym

  • Ontologies vs. Rules — Comparison of Methods of Knowledge Representation Based on the Example of IT Services Management


    - Rok 2013

    This text provides a brief overview of selected structures aimed at knowledge representation in the form of ontologies based on description logic and aims at comparing them with their counterparts based on the rule-based approach. Due to the limitations on the length of the article, only elements associated with the representation of concepts could be shown, without including roles. The formalisms of the OWL language were used...

    Pełny tekst do pobrania w serwisie zewnętrznym

  • An Analysis of Neural Word Representations for Wikipedia Articles Classification



    One of the current popular methods of generating word representations is an approach based on the analysis of large document collections with neural networks. It creates so-called word-embeddings that attempt to learn relationships between words and encode this information in the form of a low-dimensional vector. The goal of this paper is to examine the differences between the most popular embedding models and the typical bag-of-words...

    Pełny tekst do pobrania w serwisie zewnętrznym

  • Review on Wikification methods


    - AI COMMUNICATIONS - Rok 2019

    The paper reviews methods on automatic annotation of texts with Wikipedia entries. The process, called Wikification aims at building references between concepts identified in the text and Wikipedia articles. Wikification finds many applications, especially in text representation, where it enables one to capture the semantic similarity of the documents. Also, it can be considered as automatic tagging of the text. We describe typical...

    Pełny tekst do pobrania w serwisie zewnętrznym

  • Spectral Clustering Wikipedia Keyword-Based search Results

    The paper summarizes our research in the area of unsupervised categorization of Wikipedia articles. As a practical result of our research, we present an application of spectral clustering algorithm used for grouping Wikipedia search results. The main contribution of the paper is a representation method for Wikipedia articles that has been based on combination of words and links and used for categoriation of search results in this...

    Pełny tekst do pobrania w portalu

  • Concept description vectors and the 20 question game


    - Rok 2005

    Knowledge of properties that are applicable to a given object is a necessary prerequisite to formulate intelligent question. Concept description vectors provide simplest representation of this knowledge, storing for each object information about the values of its properties. Experiments with automatic creation of concept description vectors from various sources, including ontologies, dictionaries, encyclopedias and unstructured...

    Pełny tekst do pobrania w serwisie zewnętrznym

  • Ontologie vs. reguły — porównanie metod reprezentacji wiedzy na przykładzie dziedziny zarządzania usługami informatycznymi

    Tekst stanowi krótki przegląd wybranych konstrukcji służących reprezentacji wiedzy w postaci ontologii opartych na logice opisowej i porównanie ich z odpowiednikami opartymi na zapisie regułowym. Z powodu ograniczonej liczby stron pokazano tylko elementy związane z reprezentacją konceptów, bez uwzględniania ról. Do zapisu ontologii wykorzystano formalizmy języka OWL, zaś reguły wyrażono w Prologu. Dla lepszego zilustrowania tych...

    Pełny tekst do pobrania w portalu

  • Wykluczenie finansowe starszych konsumentów na rynku usług finansowych


    Celem opracowania jest identyfikacja uwarunkowań wykluczenia finansowego starszych konsumentów na rynku usług finansowych. Dla realizacji tego celu zidentyfikowano m.in. podstawowe pojęcia dotyczące wykluczenia (w tym wykluczenia finansowego) osób starszych. Zagrożenia wynikające z wykluczenia finansowego starszych konsumentów zostały zilustrowane poprzez analizę danych statystycznych dotyczących starzenia się społeczeństwa. Podstawowy...

    Pełny tekst do pobrania w serwisie zewnętrznym

  • Enhancing Word Embeddings for Improved Semantic Alignment

    This study introduces a method for the improvement of word vectors, addressing the limitations of traditional approaches like Word2Vec or GloVe through introducing into embeddings richer semantic properties. Our approach leverages supervised learning methods, with shifts in vectors in the representation space enhancing the quality of word embeddings. This ensures better alignment with semantic reference resources, such as WordNet....

    Pełny tekst do pobrania w serwisie zewnętrznym

  • Semantic Memory for Avatars in Cyberspace


    - Rok 2005

    Avatars that show intelligent behavior should have an access to general knowledge about the world, knowledge that humans store in their semantic memories. The simplest knowledge representation for semantic memory is based on the Concept Description Vectors (CDVs) that store, for each concept, an information whether a given property can be applied to this concept or not. Unfortunately large-scale semantic memories are not available....

  • Detection of Lexical Stress Errors in Non-Native (L2) English with Data Augmentation and Attention


    - Rok 2021

    This paper describes two novel complementary techniques that improve the detection of lexical stress errors in non-native (L2) English speech: attention-based feature extraction and data augmentation based on Neural Text-To-Speech (TTS). In a classical approach, audio features are usually extracted from fixed regions of speech such as the syllable nucleus. We propose an attention-based deep learning model that automatically de...

    Pełny tekst do pobrania w portalu

  • Text classifiers for automatic articles categorization


    The article concerns the problem of automatic classification of textual content. We present selected methods for generation of documents representation and we evaluate them in classification tasks. The experiments have been performed on Wikipedia articles classified automatically to their categories made by Wikipedia editors.

  • A collection of directed graphs for the minimum cycle mean weight computation

    Dane Badawcze
    open access

    This dataset contains definitions of the 16 directed graphs with weighted edges that were described in the following paper: Paweł Pilarczyk, A space-efficient algorithm for computing the minimum cycle mean in a directed graph, Journal of Mathematics and Computer Science, 20 (2020), no. 4, 349--355, DOI: 10.22436/jmcs.020.04.08, URL: http://dx.doi.org/10.22436/jmcs.020.04.08   These...

  • TF-IDF weighted bag-of-words preprocessed text documents from Simple English Wikipedia

    Dane Badawcze

    The SimpleWiki2K-scores dataset contains TF-IDF weighted bag-of-words preprocessed text documents (raw strings are not available) [feature matrix] and their multi-label assignments [label-matrix]. Label scores for each document are also provided for an enhanced multi-label KNN [1] and LEML [2] classifiers. The aim of the dataset is to establish a benchmark...

  • Ontology-based text convolution neural network (TextCNN) for prediction of construction accidents

    • S. Donghui
    • L. Zhigang
    • J. Zurada
    • A. Manikas
    • J. Guan
    • P. Weichbroth


    The construction industry suffers from workplace accidents, including injuries and fatalities, which represent a significant economic and social burden for employers, workers, and society as a whole.The existing research on construction accidents heavily relies on expert evaluations,which often suffer from issues such as low efficiency, insufficient intelligence, and subjectivity.However, expert opinions provided in construction...

    Pełny tekst do pobrania w portalu

  • Time frequency representation of Doppler boold flow recordings

    Dane Badawcze
    open access

    Vital signals registration plays a grate role in biomedical engineering and education process. Well acquired data allow future engineers to observe certain physical phenomenons as well learn how to correctly process and interpret the data. This data set was designed for students to learn about Doppler phenomena and to demonstrate correctly and incorrectly...

  • Data from environmental sensors installed in two locations

    Dane Badawcze

    The dataset contains data gathered from environmental sensors installed in two locations:

  • Marek Czachor prof. dr hab.

  • The Chow Ring of flag manifolds

    Dane Badawcze
    open access

    Schubert calculus is the intersection theory of 19th century. Justifying this calculus is the content of the 15th problem of Hilbert. In the course to establish the foundation of algebraic geometry, Van der Vaerden and A. Weil attributed the problem to the determination of the chow ring of flag manifolds G/P, where G is a compact Lie group and P is...

  • Architektura a dekonstrukcja. Przypadek Petera Eisenmana i Bernarda Tschumiego


    - Rok 2015

    Architecture and Deconstruction Case of Peter Eisenman and Bernard Tschumi   Introduction Towards deconstruction in architecture Intensive relations between philosophical deconstruction and architecture, which were present in the late 1980s and early 1990s, belong to the past and therefore may be described from a greater than...

    Pełny tekst do pobrania w portalu

  • LDRAW based renders of LEGO bricks moving on a conveyor belt

    Dane Badawcze
    open access
    • T. Boiński
    • K. Zawora
    • S. Zaraziński
    • B. Śledź
    • B. Łobacz
    - seria: LEGO

    The set contains renders of 5237 LEGO bricks moving on a white conveyor belt. The images were prepared for training neural network for recognition of LEGO bricks. For each brick starting position, alignment and color was selected (simulating the brick falling down on the conveyour belt) and than 10 images was created while the brick was moved across...

  • SkinDepth - synthetic 3D skin lesion database

    Dane Badawcze
    wersja 1.0 open access

    SkinDepth is the first synthetic 3D skin lesion database. The release of SkinDepth dataset intends to contribute to the development of algorithms for:

  • LDRAW based renders of LEGO bricks moving on a conveyor belt with extracted models

    Dane Badawcze
    wersja 3.0 open access - seria: LEGO

    The set contains renders of LEGO bricks moving on a white conveyor belt. The images were prepared for training neural network for recognition of LEGO bricks. For each brick starting position, alignment and color was selected (simulating the brick falling down on the conveyour belt) and than 10 images was created while the brick was moved across the...

  • Collaborative approach to WordNet and Wikipedia integration


    In this article we present a collaborative approach tocreating mappings between WordNet and Wikipedia. Wikipediaarticles have been first matched with WordNet synsets in anautomatic way. Then such associations have been evaluated andcomplemented in a collaborative way using a web application.We describe algorithms used for creating automatic mappingsas well as a system for their collaborative development. Theoutcome enables further...