PlumX Metrics
Embed PlumX Metrics

Research on High-Performance Fourier Transform Algorithms Based on the NPU

Applied Sciences (Switzerland), ISSN: 2076-3417, Vol: 14, Issue: 1
2024
  • 1
    Citations
  • 0
    Usage
  • 5
    Captures
  • 2
    Mentions
  • 0
    Social Media
Metric Options:   Counts1 Year3 Year

Metrics Details

  • Citations
    1
  • Captures
    5
  • Mentions
    2
    • Blog Mentions
      1
      • 1
    • News Mentions
      1
      • 1

Most Recent Blog

Applied Sciences, Vol. 14, Pages 405: Research on High-Performance Fourier Transform Algorithms Based on the NPU

Applied Sciences, Vol. 14, Pages 405: Research on High-Performance Fourier Transform Algorithms Based on the NPU Applied Sciences doi: 10.3390/app14010405 Authors: Qing Li Decheng Zuo

Most Recent News

Reports from Harbin Institute of Technology Describe Recent Advances in Applied Sciences (Research on High-Performance Fourier Transform Algorithms Based on the NPU)

2024 JAN 16 (NewsRx) -- By a News Reporter-Staff News Editor at NewsRx Science Daily -- Researchers detail new data in applied sciences. According to

Article Description

Backpack computers require powerful, intelligent computing capabilities for field wearables while taking energy consumption into careful consideration. A recommended solution for this demand is the CPU + NPU-based SoC. In many wearable intelligence applications, the Fourier Transform is an essential, computationally intensive preprocessing task. However, due to the unique structure of the NPU, the conventional Fourier Transform algorithms cannot be applied directly to it. This paper proposes two NPU-accelerated Fourier Transform algorithms that leverage the unique hardware structure of the NPU and provides three implementations of those algorithms, namely MM-2DFT, MV-2FFTm, and MV-2FFTv. Then, we benchmarked the speed and energy efficiency of our algorithms for the gray image edge filtering task on the Huawei Atlas200I-DK-A2 development kits against the Cooley-Tukey algorithm running on CPU and GPU platforms. The experiment results reveal MM-2DFT outperforms OpenCL-based FFT on NVIDIA Tegra X2 GPU for small input sizes, with a 4- to 8-time speedup. As the input image resolution exceeds 2048, MV-2FFTv approaches GPU computation speed. Additionally, two scenarios were tested and analyzed for energy efficiency, revealing that cube units of the NPU are more energy efficient. The vector and CPU units are better suited for sparse matrix multiplication and small-scale inputs, respectively.

Bibliographic Details

Qing Li; Decheng Zuo; Yi Feng; Dongxin Wen

MDPI AG

Materials Science; Physics and Astronomy; Engineering; Chemical Engineering; Computer Science

Provide Feedback

Have ideas for a new metric? Would you like to see something else here?Let us know