Implementation of feature extraction and classification for speech dysfluencies
Abstract
Speech is prone to disruption of involuntary dysfluent events especially repetitions and
prolongations of sounds, syllables and words which lead to dysfluency in
communication. Traditionally, speech language pathologists count and classify
occurrence of dysfluencies in flow of speech manually. However, these types of
assessment are subjective, inconsistent, time-consuming and prone to error. In the last
three decades, many research works have been developed to automate the conventional
assessments with various approaches such as speech signal analysis, personal
variables, acoustic analysis of speech signal and artificial intelligence techniques.
From the previous works, it can be concluded that feature extraction methods and
classification techniques play important roles in this research field. Therefore, in this
work, there are few feature extraction methods, namely, Short Time Fourier Transform
(STFT), Mel-frequency Cepstral Coefficient (MFCC) and Linear Predictive Coding
(LPC) based parameterization were proposed to extract the salient feature of the two
types of dysfluencies. By applying the feature extraction methods on each signal, there
are total of seven acoustical features extracted namely STFT, MFCC and five acoustical
features from Linear Predictive Coding based parameterization, that is, Linear
Predictive Coefficient (LPC), Linear Predictive Cepstral Coefficient (LPCC), Weighted
Linear Predictive Cepstral Coefficient(WLPCC), First Order Temporal Derivatives
(FOTD) and Second Order Temporal Derivatives (SOTD). Acoustical features are
extracted from the signal are use as input parameters for classifiers. Both linear and
nonlinear classifiers namely Linear Discriminant Analysis (LDA), k-Nearest Neighbor
(kNN) and Least-Squares Support Vector Machines (LSSVM) with linear kernel (SLIN)
and Radial Basis Function kernel (SRBF) were suggested to classify the two types of
dysfluencies. In order to evaluate the effectiveness of the different feature extraction
methods and classification techniques, a standard database named as University
College London’s Archive of Stuttered Speech (UCLASS) is used. The reliability of the
classification accuracy is achieved by adopting the two validation schemas, namely,
conventional validation and ten-fold cross-validation. For further analysis, parameters
selections of the respective classifiers and parameter variation namely order of Linear
Predictive Coding based parameterization, parameter used to control the degree of preemphasis
filtering, frame length and overlap percentages on the signal pre-processing
techniques are investigated. Analysis results reported that the highest classification
accuracy is achieved by STFT features and SLIN classifier. By observing the
classification accuracy obtained from different acoustical features and classifiers, it can
be concluded that it is necessary to evaluate correlation between acoustical features
and different classifiers in order to achieve the best classification accuracy. As a
conclusion, the proposed feature extraction methods and classifiers can be used in
speech dysfluencies classification. Finally, a Graphical User Interface of this work is
developed by using MATLAB® based on the results achieved in the experiments.