TIM: A Time Interval Machine for Audio-Visual Action Recognition
Published in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024
We propose the Time Interval Machine (TIM) where a modality-specific time interval poses as a query to a transformer encoder that ingests a long video input.
Download here