Using Efficient Vision Transformers to Improve Perception Systems in Autonomous Off-Road Vehicles

Publication Year2024

0
Citations
49
Usage
0
Captures
0
Mentions
0
Social Media

Metric Options: Counts1 Year3 Year

Metrics Details

Usage
49
- Abstract Views
  49

Thesis / Dissertation Description

The development of autonomous vehicles has become one of the greatest research endeavors in recent years. These vehicles rely on many complex systems working in tandem to make decisions. For practical use and safety reasons, these systems must not only be accurate, but also be quick to make decisions. In Autonomous Vehicle research, the environment perception system is one of the key com- ponents of development. The environment perception system allows the vehicle to understand its surroundings using cameras, light detection and ranging (LiDAR), and other sensor systems or modalities. Deep learning computer vision algorithms have shown to be the strongest tool for trans- lating this data into accurate and safe decisions about a vehicle’s environment. This understanding of the environment allows the vehicle to make real-time decisions on steering, velocity, and path planning. In order for a vehicle to be able to safely traverse an area in real-time, these computer vision algorithms must be accurate and have low latency. While much research has studied autonomous driving for traversing urban environments, minimal research exists in off-road settings. Autonomous unmanned ground vehicles (UGVs), typi- cally deployed for rescue missions, terrain exploration, and military deployments in off-road settings, must learn what terrain is traversable without the defined structure of urban environments. While urban scenes typically have signs or defined road as cues for vehicles, off-road environments do not have well-defined boundaries. In perception systems, semantic segmentation using deep learning techniques provides a strong understanding of a vehicle’s surrounding environment. However, this comes at a higher computational cost than other computer vision methods such as object detection. Further, accurate semantic segmentation is challenging in the unstructured environments that are typical in off-road settings. iiConvolutional neural networks (CNNs) have been most popular architecture for computer vision tasks in recent years. However, with recent advances in deep learning, Vision Transformers (ViTs) are gaining consideration as state-of-the-art architectures in computer vision tasks. Although much work exists using ViTs for research-level computer vision tasks, there are fewer real-time studies where latency and memory constraints are cause for concern. This research aims to investigate new methods in semantic segmentation with Vision Trans- former concepts and study the viability in off-road environments for UGVs. In this work, new architectures are explored that strive to maintain accuracy while improving inference speed when compared to CNN-based architectures.

Bibliographic Details

REPOSITORY URLhttps://open.clemson.edu/all_theses/4383

URL IDhttps://open.clemson.edu/all_theses/4383; https://open.clemson.edu/cgi/viewcontent.cgi?article=5411&context=all_theses

AUTHOR(S)

Adam S Pickeral

Provide Feedback

Have ideas for a new metric? Would you like to see something else here?Let us know