JDT3D: Addressing the Gaps in LiDAR-Based Tracking-by-Attention

Brian Cheong, Jiachen Zhou, Steven Waslander,

University of Toronto

Paper arXiv Video Code

JDT3D is a novel LiDAR-based tracking method designed for end-to-end multi-object tracking in autonomous driving.

Key Features

Tracking-by-Attention (TBA): JDT3D uses a TBA approach, which represents objects as vector embeddings or "queries" to detect them across multiple frames.
Joint Detection and Tracking (JDT): The model performs detection and tracking jointly in an end-to-end manner, enabling the exchange of information between the detector and tracker.
Track Sampling Augmentation: JDT3D employs a novel data augmentation method that injects consistent objects over multiple LiDAR frames to enrich supervision signals while maintaining temporal consistency.
Confidence-based Query Propagation: The model uses a confidence threshold for query propagation during both training and inference to prevent over-trusting false positive queries.

Video

Introduction

Tracking-by-attention performs online multi-object tracking by representing unique objects with embeddings or "queries" that are propagated across frames. These object queries are passed between time steps and used to detect objects in the scene. Queries that output high confidence predictions in multiple frames are considered to be associated with the same object.

This approach has shown promising results in the 2D and vision-based tracking domains. However, the performance of TBA methods in the 3D LiDAR tracking domain has yet to match that of tracking-by-detection (TBD) methods. JDT3D explores this performance gap, proposing a novel LiDAR-based tracking method that addresses the limitations of existing LiDAR-based TBA methods.

Track Sampling Augmentation

JDT3D introduces a novel data augmentation method called Track Sampling Augmentation (TSA) to enrich the supervision signals for the model. TSA injects consistent objects over multiple LiDAR frames to maintain temporal consistency while providing additional supervision signals for the model. This method enables the model to learn from a more diverse set of object configurations and improves the robustness of the model to occlusions and object interactions.

Confidence-based Query Propagation

Unlike previous methods that use the ground truth matching to determine which queries are passed to the next frame during training, JDT3D employs a confidence threshold for query propagation during both training and inference. This approach ensures a consistent query passing criterion between training and inference. By using a confidence threshold, JDT3D ensures that only high confidence queries are propagated across frames, naturally creating instances of false positive and false negative track queries and improving the overall tracking performance.

Visualization

BibTeX

@article{cheongJDT3DAddressingGaps2024,
  author    = {Cheong, Brian and Zhou, Jiachen and Waslander, Steven},
  title     = {JDT3D: Addressing the Gaps in LiDAR-Based Tracking-by-Attention},
  journal   = {ECCV},
  year      = {2024},  
}