Non-invasive pathological voice classifications using linear and non-linear classifiers
Abstract
In this research work, a non-invasive method is conducted to diagnose the voice diseases
through acoustic analysis of voice signal. Three feature extraction methods are proposed based
on the time-domain energy variations, Mel frequency cepstral coefficients combined with singular
value decomposition and wavelet packet and entropy features. Linear classifier namely LDA
based classifier and non-linear classifiers such as k-NN classifier, MLP network, PNN, and GRNN
are suggested to discriminate pathological voices from normal voices. In this research work, three
databases such as MEEI voice disorders database, MAPACI Speech Pathology database, and
dataset-III (collected at Hospital Tuanku Fauziah, Kangar, Perlis) are used to test the
independence of the algorithms to the databases and the proposed feature extraction algorithms
are also tested in noisy condition at 30dB signal-to-noise ratio. Two types of experiments are
conducted using the proposed feature extraction and classification algorithms. In the first
experiment, classification of normal and pathological voice has been investigated. In the second
experiment, the detection of the specific type of voice disorders has been carried out through twoclass
pattern classification problems. The different kind of voice disorders are selected such as
AP squeezing, vocal fold edema and vocal fold paralysis based on the previous research works.
The experiment investigations elucidate that the proposed feature extraction algorithms give very
promising classification accuracy for the classification of normal and pathological voices under
controlled and noisy environment. In the case of detection of specific disorders, wavelet packet
and entropy features perform well compared to time-domain energy variations based features
and MFCCs and SVD based features. The following performance measures such as positive
predictivity, specificity, sensitivity, and overall accuracy have been considered, in order to test the
reliability and effectiveness of the linear and non-linear classifiers. For the MEEI voice disorders
database, the success rate of the classifiers is above 98% for the classification of normal and
pathological voices and for the detection of specific disorders the best classification accuracy of
100% is achieved. The experiments have also been repeated for the MAPACI speech pathology
database and dataset- III under controlled and noisy environment. The results indicate that the
wavelet packet and entropy based features provides better classification accuracy compared to
time-domain energy based features and MFCCs and SVD based features for the two more
databases. It is concluded that proposed feature extraction and classification algorithms can be
employed to help the medical professionals for early investigation of voice disorders.