Voice command recognition using hybrid genetic algorithm

Marta Wroniszewska; Jacek Dziedzic

Voice command recognition using hybrid genetic algorithm

Abstrakt

Abstract: Speech recognition is a process of converting the acoustic signal into a set of words, whereas voice command recognition consists in the correct identification of voice commands, usually single words. Voice command recognition systems are widely used in the military, control systems, electronic devices, such as cellular phones, or by people with disabilities (e.g., for controlling a wheelchair or operating a computer system). This paper describes the construction of a model for a voice command recognition system based on the combination of genetic algorithms (GAs) and K-nearest neighbour classifier (KNN). The model consists of two parts. The first one concerns the creation of feature patterns from spoken words. This is done by means of the discrete Fourier transform and frequency analysis. The second part constitutes the essence of the model, namely the design of the supervised learning and classification system. The technique used for the classification task is based on the simplest classifier - K-nearest neighbour algorithm. GAs, which have been demonstrated as a good optimization and machine learning technique, are applied to the feature extraction process for the pattern vectors. The purpose and main interest of this work is to adapt such a hybrid approach to the task of voice command recognition, develop an implementation and to assess its performance. The complete model of the system was implemented in the C++ language, the implementation was subsequently used to determine the relevant parameters of the method and to improve the approach in order to obtain the desired accuracy. Different variants of GAs were surveyed in this project and the influence of particular operators was verified in terms of the classification success rate. The main finding from the performed numerical experiments indicates the necessity of using genetic algorithms for the learning process. In consequence, a highly accurate recognition system was obtained, providing 94.2% correctly classified patterns. The hybrid GA/KNN approach constituted a significant improvement over the simple KNN classifier. Moreover, the training time required for the GA to learn the given set of words was found to be on a level that is acceptable for the efficient functioning of the voice command recognition system.