Speaker accent recognition through statistical descriptors of Mel-bands spectral energy and neural network model

Yusnita, Mohd Ali; Pandiyan, Paulraj Murugesa, Prof. Dr.; Sazali, Yaacob, Prof. Dr.; Shahriman, Abu Bakar, Dr.; Nataraj, Sathees Kumar

Please use this identifier to cite or link to this item: http://dspace.unimap.edu.my:80/xmlui/handle/123456789/35449

Full metadata record

DC Field	Value	Language
dc.contributor.author	Yusnita, Mohd Ali	-
dc.contributor.author	Pandiyan, Paulraj Murugesa, Prof. Dr.	-
dc.contributor.author	Sazali, Yaacob, Prof. Dr.	-
dc.contributor.author	Shahriman, Abu Bakar, Dr.	-
dc.contributor.author	Nataraj, Sathees Kumar	-
dc.date.accessioned	2014-06-12T16:40:25Z	-
dc.date.available	2014-06-12T16:40:25Z	-
dc.date.issued	2012-10	-
dc.identifier.citation	p. 262-267	en_US
dc.identifier.isbn	978-1-4673-1649-1 (Print)	-
dc.identifier.isbn	978-1-4673-1704-7 (Online)	-
dc.identifier.issn	1985-5753	-
dc.identifier.uri	http://ieeexplore.ieee.org/xpl/articleDetails.jsp?tp=&arnumber=6408416	-
dc.identifier.uri	http://dspace.unimap.edu.my:80/dspace/handle/123456789/35449	-
dc.description	Proceeding of the 3rd IEEE Conference on Sustainable Utilization and Development in Engineering and Technology (STUDENT) 2012 at Kuala Lumpur, Malaysia on 6 October 2012 through 9 October 2012. Link to publisher's homepage at http://ezproxy.unimap.edu.my:2080/Xplore/dynhome.jsp	en_US
dc.description.abstract	Accent recognition is one of the most important topics in automatic speaker and speaker-independent speech recognition (SI-ASR) systems in recent years. The growth of voice-controlled technologies has becoming part of our daily life, nevertheless variability in speech makes these spoken language technologies relatively difficult. One of the profound variability is accent. By classifying accent types, different models could be developed to handle SI-ASR. In this paper, we classified three accents in English language recorded from three main ethnicities in Malaysia namely Malay, Chinese and Indian using artificial neural network model. All experiments were performed in speaker-independent and three most accent-sensitive words-independent modes. Mel-bands spectral energy was extracted from eighteen bands taking the statistical values of each speech sample i.e. mean, standard deviation, kurtosis and the ratio of standard deviation to kurtosis to characterize the spectral energy distribution. The system was evaluated using independent test dataset, partial-independent test dataset and training dataset. The best three-class accuracy rate of 99.01% with independent test dataset was obtained. The overall accuracy rate for several trials was averaged to 96.79% with the average learning time at 49 epochs.	en_US
dc.language.iso	en	en_US
dc.publisher	IEEE Conference Publications	en_US
dc.relation.ispartofseries	Proceeding of The 3rd IEEE Conference on Sustainable Utilization and Development in Engineering and Technology (STUDENT 2012);	-
dc.subject	Accent recognition	en_US
dc.subject	Mel-bands	en_US
dc.subject	Neural network	en_US
dc.subject	Spectral energy	en_US
dc.subject	Statistical analysis	en_US
dc.title	Speaker accent recognition through statistical descriptors of Mel-bands spectral energy and neural network model	en_US
dc.type	Working Paper	en_US
dc.identifier.url	http://dx.doi.org/10.1109/STUDENT.2012.6408416	-
dc.contributor.url	yusnita082@ppinang.uitm.edu.my	en_US
dc.contributor.url	paul@unimap.edu.my	en_US
dc.contributor.url	s.yaacob@unimap.edu.my	en_US
dc.contributor.url	shahriman@unimap.edu.my	en_US
Appears in Collections:	Shahriman Abu Bakar, Assoc. Prof. Ir. Ts. Dr. Paulraj Murugesa Pandiyan, Assoc. Prof. Dr. Sazali Yaacob, Prof. Dr.

Files in This Item:

File	Description	Size	Format
Speaker accent recognition through statistical descriptors of Mel-bands spectral energy and neural network model-abstract.pdf		58.82 kB	Adobe PDF	View/Open

Show simple item record

UniMAP Library Digital Repository JSPUI

UniMAP Library Digital Repository preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets