Adaptive Sensor Fusion
Robust Perception for Autonomous Vehicles
Overview
This project implements and evaluates three advanced sensor fusion architectures for autonomous vehicles, focusing on robust perception in varying environmental conditions. The research compares Spatial-IL Fusion, Fusion-DETR, and Hierarchical Fusion Net (HFN) approaches for multi-modal data integration.
By implementing sophisticated algorithms for sensor data integration, the project addresses the challenges of maintaining accurate perception in diverse driving scenarios, particularly under adversarial conditions.
Technologies Used
Fusion Architectures
1. Spatial-IL Fusion
- Data-Data Fusion: RGB + Depth + Reflectance from LiDAR
- Feature-Feature Fusion: Spatial Aware Attention Based Fusion
- Grid Projection using calibration matrices
- Robust performance under moderately noisy conditions (mAP: 0.277)
2. Fusion-DETR
- Feature Extraction: ResNet50 (2048D) + PointNet (1024D)
- Advanced cross-attention fusion with bidirectional information flow
- DETR Detection Head with Transformer architecture
- Excelled in high-precision pedestrian detection (mAP: 0.6597)
3. Hierarchical Fusion Net (HFN)
- YOLOv8n for multi-scale spatial features from RGB
- PointNet for point cloud processing
- Element-wise and adaptive fusion with channel attention
- Feature Pyramid Network with top-down pathway
- Generalized performance across Vehicle, Pedestrian, and Cyclist (mAP: 0.471)
Experimental Setup
Dataset: KITTI
- Image Dimensions: 1242 × 375 pixels
- Training Samples: 5,611
- Testing Samples: 1,870
Input Data
- RGB Images: High-resolution real-world scenarios
- Voxelized LiDAR Point Clouds: 3D spatial data
- Sensor Calibration Matrix: For precise alignment
Adversarial Testing
Evaluated robustness against salt and pepper noise and pixelation attacks
Limitations & Future Work
Limitations
Spatial-IL
- YOLOv3 pre-training limitations
- PointNet pre-training drawbacks
- LiDAR feature re-use across YOLOv3 grids
Fusion-DETR
- Fixed number of maximum object detections
- Computationally expensive for AVs
Hierarchical Fusion Net
- Lack of dynamic adaptation to scene complexity
- Simple element-wise operations limitations
Future Work
Contrastive Learning Pre-training
- Generate paired data samples (clear, clear) and (clear, noisy)
- Training objective: Learn distinct representations
- Goal: Improve robustness to noisy inputs
Uncertainty-Aware Sensor Fusion
- Implement Evidential Deep Learning
- Quantify uncertainty for each sensor
- Compute adaptive fusion weights