Szymon Olewniczak - Dane badawcze

mgr inż. Szymon Olewniczak

Zatrudnienie

Zastępca kierownika katedry w Katedra Architektury Systemów Komputerowych
asystent w Katedra Architektury Systemów Komputerowych

Obszary badawcze

seria: Elgold - partial liczba: 8

rozwiń zwiń

Elgold partial: News
Dane Badawcze
wersja 1.1
- S. Olewniczak
- J. Szymański
- seria: Elgold - partial
The dataset contains 37 English texts scrapped from news websites. In each text, the named entities are marked. Each name entity is linked to the corresponding Wikipedia if possible. All entities were manually verified by at least three people, which makes the dataset a high-quality gold standard for the evaluation of named entity recognition and linking...
Elgold intermediate: annotated raw
Dane Badawcze
- S. Olewniczak
- J. Szymański
- seria: Elgold - partial
The dataset contains a subset of texts from Elgold intermediate: raw texts with named entities marked and linked to corresponding Wikipedia articles. The texts were annotated by 31 participants during the 1.5-hour session.
Elgold partial: History blogs
Dane Badawcze
- S. Olewniczak
- J. Szymański
- seria: Elgold - partial
The dataset contains 13 texts from English history blogs. In each text, the named entities are marked. Each name entity is linked to the corresponding Wikipedia if possible. All entities were manually verified by at least three people, which makes the dataset a high-quality gold standard for the evaluation of named entity recognition and linking algorithms.
Elgold partial: Scientific papers' abstracts
Dane Badawcze
- S. Olewniczak
- J. Szymański
- seria: Elgold - partial
The dataset contains 87 Scientific papers' abstracts in English randomly chosen from the folowing scientific disciplines: Biomedicine, Life Sciences, Mathematics, Medicine, Science, Humanities, Social Science.
Elgold partial: Amazon product reviews
Dane Badawcze
- S. Olewniczak
- J. Szymański
- seria: Elgold - partial
The dataset contains 34 Amazon product reviews in English. In each text, the named entities are marked. Each name entity is linked to the corresponding Wikipedia if possible. All entities were manually verified by at least three people, which makes the dataset a high-quality gold standard for the evaluation of named entity recognition and linking algorithms.
Elgold partial: Automotive blogs
Dane Badawcze
- S. Olewniczak
- J. Szymański
- seria: Elgold - partial
The dataset contains 34 English texts scrapped from automotive blogs. In each text, the named entities are marked. Each name entity is linked to the corresponding Wikipedia if possible. All entities were manually verified by at least three people, which makes the dataset a high-quality gold standard for the evaluation of named entity recognition and...
Elgold partial: Movie reviews
Dane Badawcze
- S. Olewniczak
- J. Szymański
- seria: Elgold - partial
The dataset contains 37 English texts with movie reviews. In each text, the named entities are marked. Each name entity is linked to the corresponding Wikipedia if possible. All entities were manually verified by at least three people, which makes the dataset a high-quality gold standard for the evaluation of named entity recognition and linking algorithms.
Elgold partial: Job offers
Dane Badawcze
wersja 1.0
- S. Olewniczak
- J. Szymański
- seria: Elgold - partial
The dataset contains 34 English texts scrapped from the web portals offering job offers. In each text, the named entities are marked. Each name entity is linked to the corresponding Wikipedia if possible. All entities were manually verified by at least three people, which makes the dataset a high-quality gold standard for the evaluation of named entity...

seria: Polish-Kashubian translation liczba: 2

rozwiń zwiń

Remus: Polish-Kashubian parallel translation corpus
Dane Badawcze
- S. Olewniczak
- M. Nowak
- F. Szweda
- J. Żęgota
- K. Kulpiński
- M. Wrzosek
- J. Grzybowski
- K. Czepiel
- seria: Polish-Kashubian translation
The dataset contains 10,825 sentences from the Kashubian book "Life and Adventures of Remus" (Żëcé i przigòdë Remùsa) with parallel Polish translations. Aleksander Majkowski's book is considered the most important book in Kashubian literature, making it a valuable source of high-quality translation data.
Polish-Kashubian parallel translation corpus
Dane Badawcze
wersja 2.0
- S. Olewniczak
- M. Nowak
- F. Szweda
- J. Żęgota
- K. Kulpiński
- M. Wrzosek
- J. Grzybowski
- K. Czepiel
- seria: Polish-Kashubian translation
The dataset contains Polish words and sentences and their translations into Kashubian. The dataset consists of train and test subsets. The train subset contains about 100,000 parallel translations. It was created using two types of sources. The first one is the online dictionaries:

seria: Elgold intermediate liczba: 3

rozwiń zwiń

Elgold intermediate: verified by the authors
Dane Badawcze
- S. Olewniczak
- J. Szymański
- seria: Elgold intermediate
The dataset contains the texts from Elgold intermediate: verified by verification team additionaly verified by the dataset authors but before the final validation step with the elgold toolset.
Elgold intermediate: verified by verification team
Dane Badawcze
- S. Olewniczak
- J. Szymański
- seria: Elgold intermediate
The dataset contains the texts from Elgold intermediate: annotated raw additionaly verified by the five-person verification team. arly 25% of the mentions were corrected in some aspect.
Elgold intermediate: raw texts
Dane Badawcze
- S. Olewniczak
- J. Szymański
- seria: Elgold intermediate
The dataset contains raw texts scrapped from various internet sources which were used for creating the Elgold dataset.

OntoValidate: OntoNotes 5.0 NER validation dataset
Dane Badawcze
wersja 1.11
- S. Olewniczak
OntoValidate dataset consists of 603 randomly chosen raw textsfrom the original OntoNote 5.0 dataset (3637 raw texts in total).
Elgold: gold standard, multi-genre dataset for named entity recognition and linking
Dane Badawcze
wersja 1.1
- S. Olewniczak
- J. Szymański
The dataset contains 276 multi-genre texts with marked named entities, which are linked to corresponding Wikipedia articles if available. Each entity was manually verified by at least three people, which makes the dataset a high-quality gold standard for the evaluation of named entity recognition and linking algorithms.
Single Bit Errors in Ethernet II frames
Dane Badawcze
- M. Nurczyński
- M. Szymański
- S. Olewniczak
Check our final report for a detailed sumary on how the data was gathered and processed ("Methods" section of the report.pdf file).In the report, there are 7 different datasets mentionted. Below you can find specific information on how to navigate all the folders and construct those datasets from multiple files.
The American Sign Language alphabet
Dane Badawcze
- S. Olewniczak
- K. Witczak
- I. Czartowski
- H. Wołek
The American Sign Language dataset contains all static letters of the American alphabet, meaning those that do not require movement to perform (the entire alphabet except for the letters 'J' and 'Z', which are dynamic and require hand movement).
Rust QA: question answering dataset for "The Rust Programming Language" in SQuAD 2.0 format
Dane Badawcze
- S. Olewniczak
- M. Maciszka
- K. Paluszewski
- G. Pozorski
- W. Rosenthal
- Ł. Zaleski
Rust QA is a dataset for training and evaluating QA systems. The dataset consists of 1068 questions to "The Rust Programming Language" book (https://doc.rust-lang.org/stable/book/) with the answers provided as text spans from the book. The dataset is released in SQuAD 2.0 format.

Wyszukiwarka

mgr inż. Szymon Olewniczak

Zatrudnienie

Obszary badawcze

seria: Elgold - partial liczba: 8

seria: Polish-Kashubian translation liczba: 2

seria: Elgold intermediate liczba: 3