A Parallel Corpus-Based Approach to the Crime Event Extraction for Low-Resource Languages - Publikacja - MOST Wiedzy

Wyszukiwarka

A Parallel Corpus-Based Approach to the Crime Event Extraction for Low-Resource Languages

Abstrakt

These days, a lot of crime-related events take place all over the world. Most of them are reported in news portals and social media. Crime-related event extraction from the published texts can allow monitoring, analysis, and comparison of police or criminal activities in different countries or regions. Existing approaches to event extraction mainly suggest processing texts in English, French, Chinese, and some other resource-rich and well-annotated languages. This paper presents a parallel corpus-based approach that follows a closed-domain event extraction methodology to event extraction from web news articles in low-resource languages. To identify the event, its arguments, and the arguments’ roles in the sourcelanguage part of the corpus we utilize an enhanced pattern-based method that involves the multilingual synonyms dictionary with knowledge about crime-related concepts and logic-linguistic equations. The event extraction from the target-language part of the corpus uses a cross-lingual crime-related event extraction transfer technique that is based on supplementary knowledge about the semantic similarity patterns of the considered pair of languages. The presented approach does not require a preliminarily annotated corpus for training making it more attractive to low-resource languages and allows extracting TRANSFER, CRIME, and POLICE types of events and their seven subtypes from various topics of news articles simultaneously. Implementation of our approach for the Russian-Kazakh parallel corpus of news portals articles allowed obtaining the F1-measure of crime-related event extraction of over 82% for the source language and 63% for the target language.

Cytowania

  • 2

    CrossRef

  • 0

    Web of Science

  • 2

    Scopus

Autorzy (5)

  • Zdjęcie użytkownika Professor Nina Khairova

    Nina Khairova Professor

    • Umeå University, Sweden
  • Zdjęcie użytkownika Associate Professor Orken Mamyrbayev

    Orken Mamyrbayev Associate Professor

    • 3Institute of Information and Computational Technologies, Kazakhstan
  • Zdjęcie użytkownika M.S Mariia Razno

    Mariia Razno M.S

    • 5Institut für Slawistik und Kaukasusstudien, Friedrich Schiller University Jena, Germany
  • Zdjęcie użytkownika M.S. Galiya Ybytayeva

    Galiya Ybytayeva M.S.

    • 6Department of Cybersecurity, Information Processing and Storage, Satbayev University, Kazakhstan

Słowa kluczowe

Informacje szczegółowe

Kategoria:
Publikacja w czasopiśmie
Typ:
artykuły w czasopismach
Opublikowano w:
IEEE Access nr 11, strony 54093 - 54111,
ISSN: 2169-3536
Język:
angielski
Rok wydania:
2023
Opis bibliograficzny:
Khairova N., Mamyrbayev O., Rizun N., Razno M., Ybytayeva G.: A Parallel Corpus-Based Approach to the Crime Event Extraction for Low-Resource Languages// IEEE Access -Vol. 11, (2023), s.54093-54111
DOI:
Cyfrowy identyfikator dokumentu elektronicznego (otwiera się w nowej karcie) 10.1109/access.2023.3281680
Źródła finansowania:
  • Science Committee of the Ministry of Education and Science of the Republic of Kazakhstan under Grant AP09259309
Weryfikacja:
Politechnika Gdańska

wyświetlono 330 razy

Publikacje, które mogą cię zainteresować

Meta Tagi