PlumX Metrics
Embed PlumX Metrics

Self-Attention Pooling-Based Long-Term Temporal Network for Action Recognition

IEEE Transactions on Cognitive and Developmental Systems, ISSN: 2379-8939, Vol: 15, Issue: 1, Page: 65-77
2023
  • 14
    Citations
  • 0
    Usage
  • 7
    Captures
  • 1
    Mentions
  • 0
    Social Media
Metric Options:   Counts1 Year3 Year

Metrics Details

  • Citations
    14
    • Citation Indexes
      14
  • Captures
    7
  • Mentions
    1
    • Blog Mentions
      1
      • Blog
        1

Most Recent Blog

IEEE Transactions on Cognitive and Developmental Systems, Volume 15, Issue 1, March 2023

1) Editorial IEEE Transactions on Cognitive and Developmental Systems Author(s): Huajin Tang Pages: 2 2) Vision-and-Language Navigation Based on Cross-Modal Feature Fusion in Indoor Environment Author(s): Shuhuan Wen, Xiaohan Lv,

Article Description

With the development of Internet of Things (IoT), self-driving technology has been successful. Yet safe driving faces challenges due to such cases as pedestrians crossing roads. How to sense their movements and identify their behaviors from video data is important. Most of the existing methods fail to: 1) capture long-term temporal relationship well due to their limited temporal coverage and 2) aggregate discriminative representation effectively, such as caused by little or even no attention paid to differences among representations. To address such issues, this work presents a new architecture called a self-attention pooling-based long-term temporal network (SP-LTN), which can learn long-term temporal representations and aggregate those discriminative representations in an end-to-end manner, and on the other hand, effectively conduct long-term representation learning on a given video by capturing spatial information and mining temporal patterns. Next, it develops a self-attention pooling method to predict the importance scores of obtained representations for distinguishing them from each other and then weights them together to highlight the contributions of those discriminative representations in action recognition. Finally, it designs a new loss function that combines a standard cross-entropy loss function with a regularization term to further focus on the discriminative representations while restraining the impact of distractive ones on activity classification. Experimental results on two data sets show that our SP-LTN, fed by only red-green-blue (RGB) frames, outperforms the state-of-the-art methods.

Bibliographic Details

Huifang Li; Jingwei Huang; Qisong Shi; Qing Fei; Mengchu Zhou

Institute of Electrical and Electronics Engineers (IEEE)

Computer Science

Provide Feedback

Have ideas for a new metric? Would you like to see something else here?Let us know