Active Annotation in Evaluating the Credibility of Web-Based Medical Information: Guidelines for Creating Training Data Sets for Machine Learning

Aleksandra Nabożny; Bartłomiej Balcerzak; Adam Wierzbicki; Mikołaj Morzy; Małgorzata Chlabicz

doi:10.2196/26065

Active Annotation in Evaluating the Credibility of Web-Based Medical Information: Guidelines for Creating Training Data Sets for Machine Learning

Abstract

Methods Results Discussion References Abbreviations Copyright Abstract Background: The spread of false medical information on the web is rapidly accelerating. Establishing the credibility of web-based medical information has become a pressing necessity. Machine learning offers a solution that, when properly deployed, can be an effective tool in fighting medical misinformation on the web. Objective: The aim of this study is to present a comprehensive framework for designing and curating machine learning training data sets for web-based medical information credibility assessment. We show how to construct the annotation process. Our main objective is to support researchers from the medical and computer science communities. We offer guidelines on the preparation of data sets for machine learning models that can fight medical misinformation. Methods: We begin by providing the annotation protocol for medical experts involved in medical sentence credibility evaluation. The protocol is based on a qualitative study of our experimental data. To address the problem of insufficient initial labels, we propose a preprocessing pipeline for the batch of sentences to be assessed. It consists of representation learning, clustering, and reranking. We call this process active annotation. Results: We collected more than 10,000 annotations of statements related to selected medical subjects (psychiatry, cholesterol, autism, antibiotics, vaccines, steroids, birth methods, and food allergy testing) for less than US $7000 by employing 9 highly qualified annotators (certified medical professionals), and we release this data set to the general public. We developed an active annotation framework for more efficient annotation of noncredible medical statements. The application of qualitative analysis resulted in a better annotation protocol for our future efforts in data set creation. Conclusions: The results of the qualitative analysis support our claims of the efficacy of the presented method.

Citations

3

CrossRef
0

Web of Science
5

Scopus

Authors (5)

Aleksandra Nabożny dr inż.
Bartłomiej Balcerzak dr
- Polsko-Japońska Akademia Technik Komputerowych
Adam Wierzbicki prof. dr hab. inż.
- Polsko-Japońska Akademia Technik Komputerowych
Mikołaj Morzy dr hab. inż.
- Politechnika Poznańska
Małgorzata Chlabicz
- Uniwersytet Medyczny w Białymstoku

Cite as

Full text

download paper

downloaded 57 times

Publication version: Accepted or Published Version
DOI:: Digital Object Identifier (open in new tab) 10.2196/26065
License: open in new tab

full content of the article see on external site open in new tab

Keywords

Details

Category:: Articles
Type:: artykuły w czasopismach
Published in:: JMIR Medical Informatics no. 9,
ISSN: 2291-9694
Language:: English
Publication year:: 2021
Bibliographic description:: Nabożny A., Balcerzak B., Wierzbicki A., Morzy M., Chlabicz M.: Active Annotation in Evaluating the Credibility of Web-Based Medical Information: Guidelines for Creating Training Data Sets for Machine Learning// JMIR Medical Informatics -Vol. 9,iss. 11 (2021), s.e26065-
DOI:: Digital Object Identifier (open in new tab) 10.2196/26065
Verified by:: Gdańsk University of Technology

seen 156 times

Recommended for you

Improving medical experts’ efficiency of misinformation detection: an exploratory study

A. Nabożny,
B. Balcerzak,
M. Morzy
+ 3 authors

2022

Focus on Misinformation: Improving Medical Experts’ Efficiency of Misinformation Detection

A. Nabożny,
B. Balcerzak,
M. Morzy
+ 1 authors

2021

Medical Image Dataset Annotation Service (MIDAS)

B. Klaudel,
A. Obuchowski,
B. Rydziński
+ 4 authors

2020

Assessment Of the Relevance of Best Practices in The Development of Medical R&D Projects Based on Machine Learning

2024

Meta Tags

Search

Active Annotation in Evaluating the Credibility of Web-Based Medical Information: Guidelines for Creating Training Data Sets for Machine Learning

Abstract

Citations

Authors (5)

Aleksandra Nabożny dr inż.

Bartłomiej Balcerzak dr

Adam Wierzbicki prof. dr hab. inż.

Mikołaj Morzy dr hab. inż.

Małgorzata Chlabicz

Cite as

Full text

Keywords

Details

Recommended for you

Improving medical experts’ efficiency of misinformation detection: an exploratory study

Focus on Misinformation: Improving Medical Experts’ Efficiency of Misinformation Detection

Medical Image Dataset Annotation Service (MIDAS)

Assessment Of the Relevance of Best Practices in The Development of Medical R&D Projects Based on Machine Learning