An Audio-Visual System for Object-Based Audio: From Recording to Listening

doi:10.60692/833js-pkj32

Published August 1, 2018 | Version v1

Publication Open

An Audio-Visual System for Object-Based Audio: From Recording to Listening

1. University of Surrey
2. University of Southampton
3. British Broadcasting Corporation (United Kingdom)
4. Universidade de Brasília
5. University of Salford

Object-based audio is an emerging representation for audio content, where content is represented in a reproduction-format-agnostic way and, thus, produced once for consumption on many different kinds of devices. This affords new opportunities for immersive, personalized, and interactive listening experiences. This paper introduces an end-to-end object-based spatial audio pipeline, from sound recording to listening. A high-level system architecture is proposed, which includes novel audio-visual interfaces to support object-based capture and listener-tracked rendering, and incorporates a proposed component for objectification, that is, recording content directly into an object-based form. Text-based and extensible metadata enable communication between the system components. An open architecture for object rendering is also proposed. The system's capabilities are evaluated in two parts. First, listener-tracked reproduction of metadata automatically estimated from two moving talkers is evaluated using an objective binaural localization model. Second, object-based scene capture with audio extracted using blind source separation (to remix between two talkers) and beamforming (to remix a recording of a jazz group) is evaluated with perceptually motivated objective and subjective experiments. These experiments demonstrate that the novel components of the system add capabilities beyond the state of the art. Finally, we discuss challenges and future perspectives for object-based audio workflows.

Translated Descriptions

This is an automatic machine translation with an accuracy of 90-95%

Translated Description (Arabic)

الصوت القائم على الكائن هو تمثيل ناشئ للمحتوى الصوتي، حيث يتم تمثيل المحتوى بطريقة غير استنساخية، وبالتالي يتم إنتاجه مرة واحدة للاستهلاك على العديد من أنواع الأجهزة المختلفة. وهذا يوفر فرصًا جديدة لتجارب استماع غامرة وشخصية وتفاعلية. تقدم هذه الورقة خط أنابيب الصوت المكاني القائم على الكائن من البداية إلى النهاية، من التسجيل الصوتي إلى الاستماع. يتم اقتراح بنية نظام عالية المستوى، والتي تتضمن واجهات سمعية بصرية جديدة لدعم الالتقاط المستند إلى الكائن والعرض الذي يتتبعه المستمع، وتتضمن مكونًا مقترحًا للتجسيد، أي تسجيل المحتوى مباشرة في نموذج قائم على الكائن. تتيح البيانات الوصفية القائمة على النص والقابلة للتوسيع الاتصال بين مكونات النظام. كما تم اقتراح بنية مفتوحة لتقديم الكائن. يتم تقييم قدرات النظام في جزأين. أولاً، يتم تقييم إعادة إنتاج البيانات الوصفية التي يتتبعها المستمع والتي يتم تقديرها تلقائيًا من متحدثين متحركين باستخدام نموذج موضوعي للتوطين بكلتا الأذنين. ثانيًا، يتم تقييم التقاط المشهد المستند إلى الأشياء باستخدام الصوت المستخرج باستخدام فصل المصدر الأعمى (لإعادة المزج بين اثنين من المتحدثين) وتشكيل الحزم (لإعادة مزج تسجيل لمجموعة موسيقى الجاز) من خلال تجارب موضوعية وذاتية ذات دوافع إدراكية. تثبت هذه التجارب أن المكونات الجديدة للنظام تضيف قدرات تتجاوز أحدث التقنيات. أخيرًا، نناقش التحديات ووجهات النظر المستقبلية لسير العمل الصوتي المستند إلى الكائنات.

Translated Description (French)

L'audio basé sur des objets est une représentation émergente du contenu audio, où le contenu est représenté de manière indépendante du format de reproduction et, par conséquent, produit une seule fois pour être consommé sur de nombreux types d'appareils différents. Cela offre de nouvelles opportunités pour des expériences d'écoute immersives, personnalisées et interactives. Cet article présente un pipeline audio spatial basé sur des objets de bout en bout, de l'enregistrement sonore à l'écoute. Une architecture de système de haut niveau est proposée, qui comprend de nouvelles interfaces audiovisuelles pour prendre en charge la capture basée sur l'objet et le rendu suivi par l'auditeur, et incorpore un composant proposé pour l'objectivation, c'est-à-dire l'enregistrement du contenu directement dans une forme basée sur l'objet. Les métadonnées textuelles et extensibles permettent la communication entre les composants du système. Une architecture ouverte pour le rendu d'objets est également proposée. Les capacités du système sont évaluées en deux parties. Tout d'abord, la reproduction suivie par l'auditeur des métadonnées automatiquement estimées à partir de deux locuteurs en mouvement est évaluée à l'aide d'un modèle objectif de localisation binaurale. Deuxièmement, la capture de scène basée sur des objets avec l'audio extrait en utilisant la séparation aveugle des sources (pour remixer entre deux locuteurs) et la formation de faisceaux (pour remixer un enregistrement d'un groupe de jazz) est évaluée avec des expériences objectives et subjectives motivées par la perception. Ces expériences démontrent que les nouveaux composants du système ajoutent des capacités au-delà de l'état de l'art. Enfin, nous discutons des défis et des perspectives futures pour les flux de travail audio basés sur des objets.

Translated Description (Spanish)

El audio basado en objetos es una representación emergente para el contenido de audio, donde el contenido se representa de una manera independiente del formato de reproducción y, por lo tanto, se produce una vez para su consumo en muchos tipos diferentes de dispositivos. Esto ofrece nuevas oportunidades para experiencias auditivas inmersivas, personalizadas e interactivas. Este documento presenta una canalización de audio espacial basada en objetos de extremo a extremo, desde la grabación de sonido hasta la escucha. Se propone una arquitectura de sistema de alto nivel, que incluye interfaces audiovisuales novedosas para admitir la captura basada en objetos y la representación rastreada por el oyente, e incorpora un componente propuesto para la objetivación, es decir, grabar contenido directamente en una forma basada en objetos. Los metadatos extensibles y basados en texto permiten la comunicación entre los componentes del sistema. También se propone una arquitectura abierta para la representación de objetos. Las capacidades del sistema se evalúan en dos partes. En primer lugar, la reproducción rastreada por el oyente de los metadatos estimados automáticamente a partir de dos hablantes en movimiento se evalúa utilizando un modelo de localización binaural objetivo. En segundo lugar, la captura de escenas basada en objetos con audio extraído utilizando la separación de fuentes ciegas (para remezclar entre dos hablantes) y la formación de haces (para remezclar una grabación de un grupo de jazz) se evalúa con experimentos objetivos y subjetivos motivados por la percepción. Estos experimentos demuestran que los nuevos componentes del sistema añaden capacidades más allá del estado de la técnica. Finalmente, discutimos los desafíos y las perspectivas futuras para los flujos de trabajo de audio basados en objetos.

Files

Coleman_et_al_An_Audio_Visual_System_for_Object_Based_Audio.pdf.pdf

Files (3.3 MB)

Please wait a few minutes before your translated files are ready Note: Some files might be protected thus translations might not work.

Name	Size	Download all
Coleman_et_al_An_Audio_Visual_System_for_Object_Based_Audio.pdf.pdf md5:38f25290fbd70a61ef2b7846ddc1ed2b	3.3 MB	Preview Download

Additional details

Translated title (Arabic): نظام سمعي بصري للصوت المستند إلى الكائنات: من التسجيل إلى الاستماع
Translated title (French): Un système audiovisuel pour l'audio basé sur les objets : de l'enregistrement à l'écoute
Translated title (Spanish): Un sistema audiovisual para audio basado en objetos: de la grabación a la escucha

Other: https://openalex.org/W2784500888
DOI: 10.1109/tmm.2018.2794780

Is Global South Knowledge: Yes
Country: Brazil

https://openalex.org/W1492698906
https://openalex.org/W1509827371
https://openalex.org/W1518556865
https://openalex.org/W1549940068
https://openalex.org/W1552314771
https://openalex.org/W1563665955
https://openalex.org/W1577556689
https://openalex.org/W1583730476
https://openalex.org/W1597896774
https://openalex.org/W1598480683
https://openalex.org/W1607422548
https://openalex.org/W160800111
https://openalex.org/W1843049702
https://openalex.org/W1845662211
https://openalex.org/W19178994
https://openalex.org/W1965159237
https://openalex.org/W1966907769
https://openalex.org/W1978457425
https://openalex.org/W1982728343
https://openalex.org/W1988229958
https://openalex.org/W2007162393
https://openalex.org/W2014787937
https://openalex.org/W2023100145
https://openalex.org/W2024055028
https://openalex.org/W2032751712
https://openalex.org/W2046317813
https://openalex.org/W2047875689
https://openalex.org/W2049126398
https://openalex.org/W2060108923
https://openalex.org/W2083686669
https://openalex.org/W2089473323
https://openalex.org/W2091156197
https://openalex.org/W2099741732
https://openalex.org/W2113990625
https://openalex.org/W2125188847
https://openalex.org/W2126736494
https://openalex.org/W2127851351
https://openalex.org/W2129171989
https://openalex.org/W2131393427
https://openalex.org/W2141998673
https://openalex.org/W2148244744
https://openalex.org/W2171890583
https://openalex.org/W2213751552
https://openalex.org/W2242685705
https://openalex.org/W2397577657
https://openalex.org/W2398659024
https://openalex.org/W2402849521
https://openalex.org/W2404331097
https://openalex.org/W2407200143
https://openalex.org/W2408744528
https://openalex.org/W2528563433
https://openalex.org/W2531499203
https://openalex.org/W2544981553
https://openalex.org/W2567390207
https://openalex.org/W2582557451
https://openalex.org/W2585331463
https://openalex.org/W2589294660
https://openalex.org/W2897168886
https://openalex.org/W638019951
https://openalex.org/W650469045
https://openalex.org/W99952337

	All versions	This version
Views	2	2
Downloads	1	1
Data volume	3.3 MB	3.3 MB

An Audio-Visual System for Object-Based Audio: From Recording to Listening

Translated Descriptions

Translated Description (Arabic)

Translated Description (French)

Translated Description (Spanish)

Files

Coleman_et_al_An_Audio_Visual_System_for_Object_Based_Audio.pdf.pdf

Files (3.3 MB)

Additional details

Additional titles

Identifiers

Related works

GreSIS Basics Section

References

An Audio-Visual System for Object-Based Audio: From Recording to Listening

Creators

Description

Translated Descriptions

Translated Description (Arabic)

Translated Description (French)

Translated Description (Spanish)

Files

Coleman_et_al_An_Audio_Visual_System_for_Object_Based_Audio.pdf.pdf

Files (3.3 MB)

Additional details

Additional titles

Identifiers

Related works

GreSIS Basics Section

References