Search results for: text representation, documents categorization - Bridge of Knowledge

Search

Search results for: text representation, documents categorization

Filters

total: 60
filtered: 7

clear all filters


Chosen catalog filters

clear Chosen catalog filters

Search results for: text representation, documents categorization

  • TF-IDF weighted bag-of-words preprocessed text documents from Simple English Wikipedia

    Open Research Data

    The SimpleWiki2K-scores dataset contains TF-IDF weighted bag-of-words preprocessed text documents (raw strings are not available) [feature matrix] and their multi-label assignments [label-matrix]. Label scores for each document are also provided for an enhanced multi-label KNN [1] and LEML [2] classifiers. The aim of the dataset is to establish a benchmark...

  • Internal legal acts of technical and medical universities in Poland regulating classes conducted in-person during the Covid-19 pandemic

    Open Research Data
    open access

    A database of legal acts and other internal documents of medical and technical universities in Poland regulating the way of organizing in-person or hybrid classes during the COVID-19 pandemic from the summer semester 2019/2020 to the winter semester 2020/2021.Documents were encoded in two separate coding systems using the MAXQDA program for qualitative...

  • A collection of directed graphs for the minimum cycle mean weight computation

    Open Research Data
    open access

    This dataset contains definitions of the 16 directed graphs with weighted edges that were described in the following paper: Paweł Pilarczyk, A space-efficient algorithm for computing the minimum cycle mean in a directed graph, Journal of Mathematics and Computer Science, 20 (2020), no. 4, 349--355, DOI: 10.22436/jmcs.020.04.08, URL: http://dx.doi.org/10.22436/jmcs.020.04.08   These...

  • Elgold partial: News

    Open Research Data

    The dataset contains 37 English texts scrapped from news websites. In each text, the named entities are marked. Each name entity is linked to the corresponding Wikipedia if possible. All entities were manually verified by at least three people, which makes the dataset a high-quality gold standard for the evaluation of named entity recognition and linking...

  • Elgold partial: Automotive blogs

    Open Research Data

    The dataset contains 34 English texts scrapped from automotive blogs. In each text, the named entities are marked. Each name entity is linked to the corresponding Wikipedia if possible. All entities were manually verified by at least three people, which makes the dataset a high-quality gold standard for the evaluation of named entity recognition and...

  • Elgold partial: Movie reviews

    Open Research Data

    The dataset contains 37 English texts with movie reviews. In each text, the named entities are marked. Each name entity is linked to the corresponding Wikipedia if possible. All entities were manually verified by at least three people, which makes the dataset a high-quality gold standard for the evaluation of named entity recognition and linking algorithms.

  • Elgold partial: Job offers

    Open Research Data

    The dataset contains 34 English texts scrapped from the web portals offering job offers. In each text, the named entities are marked. Each name entity is linked to the corresponding Wikipedia if possible. All entities were manually verified by at least three people, which makes the dataset a high-quality gold standard for the evaluation of named entity...