Description
The data set contains about 120,000 Polish words and sentences and their translations into Kashubian. It was created using two types of sources. The first one is the online dictionaries:
The second type of source was an existing dataset that was incorporated into this one:
The dataset was pre-cleaned and duplicates were removed.
Dataset file
Polish-Kashubian parallel translation corpus.zip
814.0 kB,
S3 ETag
44810ca14f445862b0bbd85c3fa03ec7-1,
downloads: 2
The file hash is calculated from the formula
Example script for calculation:
https://github.com/antespi/s3md5
hexmd5(md5(part1)+md5(part2)+...)-{parts_count}
where a single part of the file is 512 MB in size.Example script for calculation:
https://github.com/antespi/s3md5
File details
- License:
-
open in new tabCC 0Public Domain Dedication
Details
- Year of publication:
- 2024
- Verification date:
- 2025-02-01
- Dataset language:
- Polish
- Fields of science:
-
- information and communication technology (Engineering and Technology)
- DOI:
- DOI ID 10.34808/5whb-dk74 open in new tab
- Series:
- Verified by:
- Gdańsk University of Technology
Keywords
References
Cite as
Authors
Version this document has several versions
-
Current versionversion 2.0release date 2025-02-01
-
version 1.0release date 2024-09-30
DOI
10.34808/4sbd-2v21
represents the latest version of the data.
seen 17 times