Description
Dataset contains a database of anonymized texts in Polish for the purposes of building a medical speech corpus, for clinical situations in the following areas: medical interview, interview and description of the result of an oncological examination, description of a radiological examination, description of a pathomorphological examination, description of a cardiological examination, description of the surgical procedure, description of the reanimation procedure, medical recommendations, prescription (including lists of drug names).
Example content of the text file
The texts in the database are divided into 10 clinical situations:
- Medical interview.
- Radiological examination.
- Oncology examination.
- Pathomorphological examination.
- Cardiology examination.
- Course of surgical procedure.
- Course of reanimation procedure.
- Recommendations.
- Referral to treatment.
- Prescriptions with pharmaceutical names.
The texts are saved in CSV format in the file phrases.csv
The first row of the file serves as the header row and contains information about the contents of each column:
- id - unique number of the phrase;
- phrase – phrase (a sentence or several related sentences);
- CategoryID - number of the clinical situation;
- SubCategoryID - subcategory number (only appears for some CategoryIDs).
The classification of the clinical situations (categories) is provided in the file situations.csv
Dataset file
hexmd5(md5(part1)+md5(part2)+...)-{parts_count}
where a single part of the file is 512 MB in size.Example script for calculation:
https://github.com/antespi/s3md5
File details
- License:
-
open in new tabCC BY-NC-SANon-commercial - Share-alike
Details
- Year of publication:
- 2024
- Verification date:
- 2024-07-19
- Dataset language:
- Polish
- Fields of science:
-
- medical sciences (Medical and Health Sciences )
- pharmacology and pharmacy (Medical and Health Sciences )
- information and communication technology (Engineering and Technology)
- DOI:
- DOI ID 10.34808/0pg7-2b80 open in new tab
- Funding:
- Series:
- Verified by:
- Gdańsk University of Technology
Keywords
Cite as
Authors
seen 150 times