Conference Papers


Unleashing the Temporal-Spatial Reasoning Capacity of GPT for Training-Free Audio and Language Referenced Video Object Segmentation
AAAI 2025[PDF]
AAAI 2025[PDF]

Towards Realistic UAV Vision-Language Navigation: Platform, Benchmark, and Methodology
ICLR 2025[PDF]
ICLR 2025[PDF]


Point Cluster: A Compact Message Unit for Communication-Efficient Collaborative Perception
ICLR 2025[PDF]
ICLR 2025[PDF]


Generative Map Priors for Collaborative BEV Semantic Segmentation
CVPR 2025
CVPR 2025

FlexDrive: Toward Trajectory Flexibility in Driving Scene Reconstruction and Rendering
CVPR 2025[PDF]
CVPR 2025[PDF]

VideoEspresso: A Large-Scale Chain-of-Thought Dataset for Fine-Grained Video Reasoning via Core Frame Selection
CVPR 2025[PDF]
CVPR 2025[PDF]

LLaVA-ST: A Multimodal Large Language Model for Fine-Grained Spatial-Temporal Understanding
CVPR 2025[PDF]
CVPR 2025[PDF]