Wyniki wyszukiwania dla: text representation, documents categorization

Wyniki wyszukiwania dla: text representation, documents categorization

wyników na stronę:
osadź ten widok na swojej stronie

TF-IDF weighted bag-of-words preprocessed text documents from Simple English Wikipedia
Dane Badawcze
open access
The SimpleWiki2K-scores dataset contains TF-IDF weighted bag-of-words preprocessed text documents (raw strings are not available) [feature matrix] and their multi-label assignments [label-matrix]. Label scores for each document are also provided for an enhanced multi-label KNN [1] and LEML [2] classifiers. The aim of the dataset is to establish a benchmark...
Internal legal acts of technical and medical universities in Poland regulating classes conducted in-person during the Covid-19 pandemic
Dane Badawcze
open access
- K. Górak-Sosnowska
- L. Tomaszewska
A database of legal acts and other internal documents of medical and technical universities in Poland regulating the way of organizing in-person or hybrid classes during the COVID-19 pandemic from the summer semester 2019/2020 to the winter semester 2020/2021.Documents were encoded in two separate coding systems using the MAXQDA program for qualitative...
A collection of directed graphs for the minimum cycle mean weight computation
Dane Badawcze
open access
- P. Pilarczyk
- G. Graff
This dataset contains definitions of the 16 directed graphs with weighted edges that were described in the following paper: Paweł Pilarczyk, A space-efficient algorithm for computing the minimum cycle mean in a directed graph, Journal of Mathematics and Computer Science, 20 (2020), no. 4, 349--355, DOI: 10.22436/jmcs.020.04.08, URL: http://dx.doi.org/10.22436/jmcs.020.04.08 These...
Elgold partial: News
Dane Badawcze
open access
- S. Olewniczak
- J. Szymański
- seria: Elgold - partial
The dataset contains 37 English texts scrapped from news websites. In each text, the named entities are marked. Each name entity is linked to the corresponding Wikipedia if possible. All entities were manually verified by at least three people, which makes the dataset a high-quality gold standard for the evaluation of named entity recognition and linking...
Elgold partial: Automotive blogs
Dane Badawcze
open access
- S. Olewniczak
- J. Szymański
- seria: Elgold - partial
The dataset contains 34 English texts scrapped from automotive blogs. In each text, the named entities are marked. Each name entity is linked to the corresponding Wikipedia if possible. All entities were manually verified by at least three people, which makes the dataset a high-quality gold standard for the evaluation of named entity recognition and...
Elgold partial: Movie reviews
Dane Badawcze
open access
- S. Olewniczak
- J. Szymański
- seria: Elgold - partial
The dataset contains 37 English texts with movie reviews. In each text, the named entities are marked. Each name entity is linked to the corresponding Wikipedia if possible. All entities were manually verified by at least three people, which makes the dataset a high-quality gold standard for the evaluation of named entity recognition and linking algorithms.
Elgold partial: Job offers
Dane Badawcze
open access
- S. Olewniczak
- J. Szymański
- seria: Elgold - partial
The dataset contains 34 English texts scrapped from the web portals offering job offers. In each text, the named entities are marked. Each name entity is linked to the corresponding Wikipedia if possible. All entities were manually verified by at least three people, which makes the dataset a high-quality gold standard for the evaluation of named entity...
Elgold partial: Scientific papers' abstracts
Dane Badawcze
open access
- S. Olewniczak
- J. Szymański
- seria: Elgold - partial
The dataset contains 87 Scientific papers' abstracts in English randomly chosen from the folowing scientific disciplines: Biomedicine, Life Sciences, Mathematics, Medicine, Science, Humanities, Social Science.
Elgold partial: Amazon product reviews
Dane Badawcze
open access
- S. Olewniczak
- J. Szymański
- seria: Elgold - partial
The dataset contains 34 Amazon product reviews in English. In each text, the named entities are marked. Each name entity is linked to the corresponding Wikipedia if possible. All entities were manually verified by at least three people, which makes the dataset a high-quality gold standard for the evaluation of named entity recognition and linking algorithms.
Elgold partial: History blogs
Dane Badawcze
open access
- S. Olewniczak
- J. Szymański
- seria: Elgold - partial
The dataset contains 13 texts from English history blogs. In each text, the named entities are marked. Each name entity is linked to the corresponding Wikipedia if possible. All entities were manually verified by at least three people, which makes the dataset a high-quality gold standard for the evaluation of named entity recognition and linking algorithms.

Filtry

Katalog

Rok publikacji

Dziedzina

Jednostka administracyjna

Model otwartości

Źródło danych

TF-IDF weighted bag-of-words preprocessed text documents from Simple English Wikipedia

Internal legal acts of technical and medical universities in Poland regulating classes conducted in-person during the Covid-19 pandemic

A collection of directed graphs for the minimum cycle mean weight computation

Elgold partial: News

Elgold partial: Automotive blogs

Elgold partial: Movie reviews

Elgold partial: Job offers

Elgold partial: Scientific papers' abstracts

Elgold partial: Amazon product reviews

Elgold partial: History blogs

Wyszukiwarka

Filtry

Katalog

Rok publikacji

Dziedzina

Jednostka administracyjna

Model otwartości

Źródło danych

Wyniki wyszukiwania dla: text representation, documents categorization