Feature extraction and classification of malay speech vowels
Abstract
In human language, a phoneme is the smallest structural unit that distinguishes meaning.
Normally, language like English commonly combines phonemes to form a word. In many
languages, the Consonant-Vowel (CV) units have the highest frequency of occurrence among
different forms of subword units. Therefore, recognition of CV units with a good accuracy is
crucial for development of a speech recognition system. There are also many applications that
use vowels phonemes. Among them are speech therapy systems that improve utterances of
word pronunciation especially to children. There are also systems that teach hearing impaired
person to speak properly by pronouncing words with a good degree of intelligibility. All of
these systems require high degree of vowel recognition capability in which this study focuses
on. This thesis contributes five modified feature extraction methods for vowel recognition
based on intensities of the Frequency Filter Bands. They are First Formant Bandwidth
(F1BW), Fixed Formant Frequency Band (FFB), Spectral Delta (SpD), Bark Intensity (BrKI)
and Formant Frequency Difference (FFD). The performance of these five proposed methods
are compared with performance of three conventional feature extraction methods of single
frame Mel-frequency cepstrum coefficients (MFCCs), multiple frame Mel-frequency cepstrum
coefficients (MFCCf) and the first three formant features. The classifiers analysed in this study
were Multinomial Logistic Regression (MLR), Levenberg-Marquardt (LM) network, k-Nearest
Neighbors (KNN) and Linear Discriminant Analysis (LDA). There are four main contributions
of this thesis. First is the new vowel corpus consisting of more than 1300 recorded vowels from
100 Malaysian speakers. Second are the five improved feature extraction methods which
perform better than MFCC on single frame analysis. The third is the performance and
robustness analysis using different classifiers and different Gaussian noise level. The fourth
contribution is the frame analysis criteria for isolated vowel analysis.