Emotion recognition at a distance: The robustness of machine learning based on hand-crafted facial features vs deep learning models

Citation DataImage and Vision Computing, ISSN: 0262-8856, Vol: 136, Page: 104724

Publication Year2023

14
Citations
0
Usage
57
Captures
1
Mentions
0
Social Media

Metric Options: Counts1 Year3 Year

Metrics Details

Citations
14
- Citation Indexes
  14
Captures
57
- Readers
  57
Mentions
1
- News Mentions
  1

Most Recent News

Recent Studies from University of Salerno Add New Data to Machine Learning (Emotion Recognition At a Distance: the Robustness of Machine Learning Based On Hand-crafted Facial Features Vs Deep Learning Models)

August 14, 2023
Robotics & Machine Learning Daily News

2023 AUG 14 (NewsRx) -- By a News Reporter-Staff News Editor at Robotics & Machine Learning Daily News Daily News -- New research on Machine

Article Description

Emotion estimation from face expression analysis is nowadays a widely-explored computer vision task. In turn, the classification of expressions relies on relevant facial features and their dynamics. Despite the promising accuracy results achieved in controlled and favorable conditions, the processing of faces acquired at a distance, entailing low-quality images, still suffers from a significant performance decrease. In particular, most approaches and related computational models become extremely unstable in the case of the very small amount of useful pixels that is typical in these conditions. Therefore, their behavior should be investigated more carefully. On the other hand, real-time emotion recognition at a distance may play a critical role in smart video surveillance, especially when controlling particular kinds of events, e.g., political meetings, to try to prevent adverse actions. This work compares facial expression recognition at a distance by: 1) a deep learning architecture based on state-of-the-art (SOTA) proposals, which exploits the whole images to autonomously learn the relevant embeddings; 2) a machine learning approach that relies on hand-crafted features, namely the facial landmarks preliminarily extracted using the popular Mediapipe framework. Instead of using either the complete sequence of frames or only the final still image of the expression, like current SOTA approaches, the two proposed methods are designed to use rich temporal information to identify three different stages of emotion. Expressions are time-split accordingly into four phases to better exploit their temporal-dependent dynamics. Experiments were conducted on the popular Extended Cohn-Kanade dataset (CK+). It was chosen for its wide use in related literature, and because it includes videos of facial expressions and not only still images. The results show that the approach relying on machine learning via hand-crafted features is more suitable for classifying the initial phases of the expression and does not decay in terms of accuracy when images are at a distance (only 0.08% of decay). On the contrary, deep learning not only has difficulties classifying the initial phases of the expressions but also suffers from relevant performance decay when considering images at a distance (52.68% accuracy decay).

Bibliographic Details

DOI10.1016/j.imavis.2023.104724

URL IDhttp://www.sciencedirect.com/science/article/pii/S0262885623000987; http://dx.doi.org/10.1016/j.imavis.2023.104724; http://www.scopus.com/inward/record.url?partnerID=HzOxMe3b&scp=85162122662&origin=inward; https://linkinghub.elsevier.com/retrieve/pii/S0262885623000987; https://dx.doi.org/10.1016/j.imavis.2023.104724

AUTHOR(S)

Carmen Bisogni; Lucia Cimmino; Maria De Marsico; Fei Hao; Fabio Narducci

PUBLISHER(S)

Elsevier BV

TAG(S)

Computer Science

Provide Feedback

Have ideas for a new metric? Would you like to see something else here?Let us know