Towards Facts Extraction From Texts in Polish Language

Tomasz Maria Boiński; Adam Brzeski

Towards Facts Extraction From Texts in Polish Language

Abstrakt

The Polish language differs from English in many ways. It has more complicated conjugation and declination. Because of that automatic facts extraction from texts is difficult. In this paper we present basic differences between those languages. The paper presents an algorithm for extraction of facts from articles from Polish Wikipedia. The algorithm is based on 7 proposed facts schemes that are searched for in the analyzed text. The analysis includes morphosyntactic tagging, named entity extraction and relation identification. The results acquired for an exemplary Wikipedia text is presented. We indicate the free word formation principle as the main difficulty in the Polish texts analysis. At the same time satisfactory performance of the tagging and analysis tools for the Polish language was confirmed in the conducted experiment.

Autorzy (2)

Cytuj jako

Pełna treść

pobierz publikację

pobrano 28 razy

Wersja publikacji: Accepted albo Published Version
Licencja: otwiera się w nowej karcie

pełna treść artykułu zobacz w serwisie zewnętrznym otwiera się w nowej karcie

Słowa kluczowe

Informacje szczegółowe

Kategoria:: Publikacja w czasopiśmie
Typ:: artykuły w czasopismach recenzowanych i innych wydawnictwach ciągłych
Opublikowano w:: International Journal of Innovative Research in Computer and Communication Engineering nr 2, wydanie 8, strony 5231 - 5234,
ISSN: 2320-9798
Język:: angielski
Rok wydania:: 2014
Opis bibliograficzny:: Boiński T., Brzeski A.: Towards Facts Extraction From Texts in Polish Language// International Journal of Innovative Research in Computer and Communication Engineering. -Vol. 2., iss. 8 (2014), s.5231-5234
Weryfikacja:: Politechnika Gdańska

wyświetlono 129 razy

Publikacje, które mogą cię zainteresować

Towards facts extraction from text in Polish language

2017

SEMANTIC ANALYSIS ALGORITHMS FOR KNOWLEDGE WORKERS SUPPORT

N. Rizun,
M. Rizun,
J. Taranenko

2017

A Parallel Corpus-Based Approach to the Crime Event Extraction for Low-Resource Languages

N. Khairova,
O. Mamyrbayev,
N. Rizun
+ 2 autorów

2023

Assessing business process complexity based on textual data: Evidence from ITIL IT ticket processing

N. Rizun,
A. Revina,
V. Maister

2021

Meta Tagi