Evaluating deep learning architectures for Speech Emotion Recognition.

Citation data:

Neural networks : the official journal of the International Neural Network Society, ISSN: 1879-2782, Vol: 92, Page: 60-68

Publication Year:
2017
Usage 137
Abstract Views 120
Link-outs 17
Captures 1
Exports-Saves 1
Social Media 44
Shares, Likes & Comments 43
Tweets 1
Citations 1
Citation Indexes 1
PMID:
28396068
DOI:
10.1016/j.neunet.2017.02.013
Author(s):
Fayek, Haytham M, Lech, Margaret, Cavedon, Lawrence
Publisher(s):
Elsevier BV
Tags:
Neuroscience, Computer Science
Most Recent Tweet View All Tweets
article description
Speech Emotion Recognition (SER) can be regarded as a static or dynamic classification problem, which makes SER an excellent test bed for investigating and comparing various deep learning architectures. We describe a frame-based formulation to SER that relies on minimal speech processing and end-to-end deep learning to model intra-utterance dynamics. We use the proposed SER system to empirically explore feed-forward and recurrent neural network architectures and their variants. Experiments conducted illuminate the advantages and limitations of these architectures in paralinguistic speech recognition and emotion recognition in particular. As a result of our exploration, we report state-of-the-art results on the IEMOCAP database for speaker-independent SER and present quantitative and qualitative assessments of the models' performances.

This article has 0 Wikipedia mention.