Towards Facts Extraction From Texts in Polish Language - Publication - Bridge of Knowledge

Search

Towards Facts Extraction From Texts in Polish Language

Abstract

The Polish language differs from English in many ways. It has more complicated conjugation and declination. Because of that automatic facts extraction from texts is difficult. In this paper we present basic differences between those languages. The paper presents an algorithm for extraction of facts from articles from Polish Wikipedia. The algorithm is based on 7 proposed facts schemes that are searched for in the analyzed text. The analysis includes morphosyntactic tagging, named entity extraction and relation identification. The results acquired for an exemplary Wikipedia text is presented. We indicate the free word formation principle as the main difficulty in the Polish texts analysis. At the same time satisfactory performance of the tagging and analysis tools for the Polish language was confirmed in the conducted experiment.

Cite as

Full text

download paper
downloaded 22 times
Publication version
Accepted or Published Version
License
Creative Commons: CC-BY open in new tab

Keywords

Details

Category:
Articles
Type:
artykuły w czasopismach recenzowanych i innych wydawnictwach ciągłych
Published in:
International Journal of Innovative Research in Computer and Communication Engineering no. 2, edition 8, pages 5231 - 5234,
ISSN: 2320-9798
Language:
English
Publication year:
2014
Bibliographic description:
Boiński T., Brzeski A.: Towards Facts Extraction From Texts in Polish Language// International Journal of Innovative Research in Computer and Communication Engineering. -Vol. 2., iss. 8 (2014), s.5231-5234
Verified by:
Gdańsk University of Technology

seen 117 times

Recommended for you

Meta Tags