Optimizing Medical Personnel Speech Recognition Models Using Speech Synthesis and Reinforcement Learning - Publikacja - MOST Wiedzy

Wyszukiwarka

Optimizing Medical Personnel Speech Recognition Models Using Speech Synthesis and Reinforcement Learning

Abstrakt

Text-to-Speech synthesis (TTS) can be used to generate training data for building Automatic Speech Recognition models (ASR). Access to medical speech data is because it is sensitive data that is difficult to obtain for privacy reasons; TTS can help expand the data set. Speech can be synthesized by mimicking different accents, dialects, and speaking styles that may occur in a medical language. Reinforcement Learning (RL), in the context of ASR, can be used to optimize a model based on specific goals. A model can be trained to minimize errors in speech-to-text transcription, especially for technical medical terminology. In this case, the "reward" to the RL model can be negatively proportional to the number of transcription errors. The paper presents a method and experimental study from which it is concluded that the combination of TTS and RL can enable the creation of a speech recognition model that is better suited to the specific needs of medical personnel, helping to expand the training data and optimize the model to minimize transcription errors. The learning process used reward functions based on Mean Opinion Score (MOS), a subjective metric for assessing speech quality, and Word Error Rate (WER), which evaluates the quality of speech-to-text transcription.

Cytowania

  • 1

    CrossRef

  • 0

    Web of Science

  • 0

    Scopus

Cytuj jako

Pełna treść

pobierz publikację
pobrano 104 razy
Wersja publikacji
Accepted albo Published Version
DOI:
Cyfrowy identyfikator dokumentu elektronicznego (otwiera się w nowej karcie) 10.1121/10.0023271
Licencja
Copyright (2023 Acoustical Society of America)

Słowa kluczowe

Informacje szczegółowe

Kategoria:
Publikacja w czasopiśmie
Typ:
artykuły w czasopismach
Opublikowano w:
Journal of the Acoustical Society of America nr 154, strony A202 - A203,
ISSN: 0001-4966
Język:
angielski
Rok wydania:
2023
Opis bibliograficzny:
Czyżewski A.: Optimizing Medical Personnel Speech Recognition Models Using Speech Synthesis and Reinforcement Learning// Journal of the Acoustical Society of America -Vol. 154,iss. 4suppl (2023), s.A202-A203
DOI:
Cyfrowy identyfikator dokumentu elektronicznego (otwiera się w nowej karcie) 10.1121/10.0023271
Źródła finansowania:
  • Publikacja bezkosztowa
Weryfikacja:
Politechnika Gdańska

wyświetlono 88 razy

Publikacje, które mogą cię zainteresować

Meta Tagi