Clinical situations text database for Polish language - Open Research Data - Bridge of Knowledge

Search

Clinical situations text database for Polish language

Description

Dataset contains a database of anonymized texts in Polish for the purposes of building a medical speech corpus,  for clinical situations in the following areas: medical interview, interview and description of the result of an oncological examination, description of a radiological examination, description of a pathomorphological examination, description of a cardiological examination, description of the surgical procedure, description of the reanimation procedure, medical recommendations, prescription (including lists of drug names).

Illustration of the publication

Example content of the text file

The texts in the database are divided into 10 clinical situations: 

  1. Medical interview.
  2. Radiological examination.
  3. Oncology examination.
  4. Pathomorphological examination.
  5. Cardiology examination.
  6. Course of surgical procedure.
  7. Course of reanimation procedure.
  8. Recommendations.
  9. Referral to treatment.
  10. Prescriptions with pharmaceutical names.

The texts are saved in CSV format in the file phrases.csv

The first row of the file serves as the header row and contains information about the contents of each column:

  • id - unique number of the phrase;
  • phrase – phrase (a sentence or several related sentences);
  • CategoryID - number of the clinical situation;
  • SubCategoryID - subcategory number (only appears for some CategoryIDs).

The classification of the clinical situations (categories) is provided in the file situations.csv

Dataset file

Clinical situations text database for Polish language.zip
159.5 kB, S3 ETag 12aa6e3d256b1319ea0172462142a23e-1, downloads: 4
The file hash is calculated from the formula
hexmd5(md5(part1)+md5(part2)+...)-{parts_count} where a single part of the file is 512 MB in size.

Example script for calculation:
https://github.com/antespi/s3md5
download file Clinical situations text database for Polish language.zip

File details

License:
Creative Commons: by-nc-sa 4.0 open in new tab
CC BY-NC-SA
Non-commercial - Share-alike

Details

Year of publication:
2024
Verification date:
2024-07-19
Dataset language:
Polish
Fields of science:
  • medical sciences (Medical and Health Sciences )
  • pharmacology and pharmacy (Medical and Health Sciences )
  • information and communication technology (Engineering and Technology)
DOI:
DOI ID 10.34808/0pg7-2b80 open in new tab
Funding:
Series:
Verified by:
Gdańsk University of Technology

Keywords

Cite as

seen 60 times