Adaptive Sensor Fusion

Robust Perception for Autonomous Vehicles

Category: Autonomous Vehicles

Overview

This project implements and evaluates three advanced sensor fusion architectures for autonomous vehicles, focusing on robust perception in varying environmental conditions. The research compares Spatial-IL Fusion, Fusion-DETR, and Hierarchical Fusion Net (HFN) approaches for multi-modal data integration.

By implementing sophisticated algorithms for sensor data integration, the project addresses the challenges of maintaining accurate perception in diverse driving scenarios, particularly under adversarial conditions.

Technologies Used

Python PyTorch YOLOv8 DETR PointNet

Fusion Architectures

1. Spatial-IL Fusion

  • Data-Data Fusion: RGB + Depth + Reflectance from LiDAR
  • Feature-Feature Fusion: Spatial Aware Attention Based Fusion
  • Grid Projection using calibration matrices
  • Robust performance under moderately noisy conditions (mAP: 0.277)

2. Fusion-DETR

  • Feature Extraction: ResNet50 (2048D) + PointNet (1024D)
  • Advanced cross-attention fusion with bidirectional information flow
  • DETR Detection Head with Transformer architecture
  • Excelled in high-precision pedestrian detection (mAP: 0.6597)

3. Hierarchical Fusion Net (HFN)

  • YOLOv8n for multi-scale spatial features from RGB
  • PointNet for point cloud processing
  • Element-wise and adaptive fusion with channel attention
  • Feature Pyramid Network with top-down pathway
  • Generalized performance across Vehicle, Pedestrian, and Cyclist (mAP: 0.471)

Experimental Setup

Dataset: KITTI

  • Image Dimensions: 1242 × 375 pixels
  • Training Samples: 5,611
  • Testing Samples: 1,870

Input Data

  • RGB Images: High-resolution real-world scenarios
  • Voxelized LiDAR Point Clouds: 3D spatial data
  • Sensor Calibration Matrix: For precise alignment

Adversarial Testing

Evaluated robustness against salt and pepper noise and pixelation attacks

Limitations & Future Work

Limitations

Spatial-IL

  • YOLOv3 pre-training limitations
  • PointNet pre-training drawbacks
  • LiDAR feature re-use across YOLOv3 grids

Fusion-DETR

  • Fixed number of maximum object detections
  • Computationally expensive for AVs

Hierarchical Fusion Net

  • Lack of dynamic adaptation to scene complexity
  • Simple element-wise operations limitations

Future Work

Contrastive Learning Pre-training

  • Generate paired data samples (clear, clear) and (clear, noisy)
  • Training objective: Learn distinct representations
  • Goal: Improve robustness to noisy inputs

Uncertainty-Aware Sensor Fusion

  • Implement Evidential Deep Learning
  • Quantify uncertainty for each sensor
  • Compute adaptive fusion weights

View the Project