Transformer-CNN for small image object detection

Citation DataSignal Processing: Image Communication, ISSN: 0923-5965, Vol: 129, Page: 117194

Publication Year2024

3
Citations
0
Usage
9
Captures
0
Mentions
0
Social Media

Metric Options: Counts1 Year3 Year

Metrics Details

Citations
3
- Citation Indexes
  3
Captures
9
- Readers
  9

Article Description

Object recognition in computer vision technology has been a popular research field in recent years. Although the detection success rate of regular objects has achieved impressive results, small object detection (SOD) is still a challenging issue. In the Microsoft Common Objects in Context (MS COCO) public dataset, the detection rate of small objects is typically half that of regular-sized objects. The main reason is that small objects are often affected by multi-layer convolution and pooling, leading to insufficient details to distinguish them from the background or similar objects, resulting in poor recognition rates or even no results. This paper presents a network architecture, Transformer-CNN, that combines a self-attention mechanism-based transformer and a convolutional neural network (CNN) to improve the recognition rate of SOD. It captures global information through a transformer and uses the translation invariance and translation equivalence of CNN to maximize the retention of global and local features while improving the reliability and robustness of SOD. Our experiments show that the proposed model improves the small object recognition rate by 2∼5 % than the general transformer architectures.

Bibliographic Details

DOI10.1016/j.image.2024.117194

URL IDhttp://www.sciencedirect.com/science/article/pii/S092359652400095X; http://dx.doi.org/10.1016/j.image.2024.117194; http://www.scopus.com/inward/record.url?partnerID=HzOxMe3b&scp=85201746847&origin=inward; https://linkinghub.elsevier.com/retrieve/pii/S092359652400095X

AUTHOR(S)

Yan-Lin Chen; Chun-Liang Lin; Yu-Chen Lin; Tzu-Chun Chen

PUBLISHER(S)

Elsevier BV

TAG(S)

Computer Science; Engineering

Provide Feedback

Have ideas for a new metric? Would you like to see something else here?Let us know