Depth-Aware Audio Visual Segmentation with Geometry-Heuristic Cross Attention

Producción científica: Capítulo del libro/informe/acta de congresoContribución a la conferenciarevisión exhaustiva

Resumen

The current state-of-the-art in Audio Visual Segmentation (AVS) has demonstrated successful milestones in performing pixel-level sounding object segmentation. However, they faced significant issues of misalignment and performance degradation in complex settings, such as in substandard lighting conditions and cluttered environments. To address those challenges, we introduce Depth-aware Audio Visual Segmentation (Depth AVS) to enhance the capability of current transformer-based AVS. This paper proposes three main contributions: first, a Geometry-Heuristic Cross Attention (GHCA) as a novel method to suppress irrelevant distant features from the main object of interest, thus enhancing the robustness of audio-visual cross-attention fusion in cluttered and inadequate lighting conditions. Second, an Intermediate Fusion module for integrating depth and RGB features to enrich our model’s learning with non-redundant visual features. Third, a depth-aware segmentor that outputs not only a binary mask but also a segmented depth mask. We experimented using the S4 AVSBench-Object dataset against the current state-of-the-art in AVS, AVSegFormer. Our experiments demonstrate that our Depth AVS surpasses the performance of the AVS baseline method using only small input sizes. Our Depth AVS also extends the capability of AVS using distance estimation with a small error rate.

Idioma originalInglés
Título de la publicación alojadaAI 2025
Subtítulo de la publicación alojadaAdvances in Artificial Intelligence - 38th Australasian Joint Conference on Artificial Intelligence, AI 2025, Proceedings
EditoresMiaomiao Liu, Xin Yu, Chang Xu, Yiliao Song
EditorialSpringer Science and Business Media Deutschland GmbH
Páginas187-199
Número de páginas13
ISBN (versión impresa)9789819549719
DOI
EstadoPublicada - 2026
Evento38th Australasian Joint Conference on Artificial Intelligence, AI 2025 - Canberra, Australia
Duración: 1 dic. 20255 dic. 2025

Serie de la publicación

NombreLecture Notes in Computer Science
Volumen16371 LNAI
ISSN (versión impresa)0302-9743
ISSN (versión digital)1611-3349

Conferencia

Conferencia38th Australasian Joint Conference on Artificial Intelligence, AI 2025
País/TerritorioAustralia
CiudadCanberra
Período1/12/255/12/25

Huella

Profundice en los temas de investigación de 'Depth-Aware Audio Visual Segmentation with Geometry-Heuristic Cross Attention'. En conjunto forman una huella única.

Citar esto