Published May 13, 2011 | Version v1
Publication Open

Utterance independent bimodal emotion recognition in spontaneous communication

  • 1. Institute of Automation
  • 2. Chinese Academy of Sciences

Description

Emotion expressions sometimes are mixed with the utterance expression in spontaneous face-to-face communication, which makes difficulties for emotion recognition. This article introduces the methods of reducing the utterance influences in visual parameters for the audio-visual-based emotion recognition. The audio and visual channels are first combined under a Multistream Hidden Markov Model (MHMM). Then, the utterance reduction is finished by finding the residual between the real visual parameters and the outputs of the utterance related visual parameters. This article introduces the Fused Hidden Markov Model Inversion method which is trained in the neutral expressed audio-visual corpus to solve the problem. To reduce the computing complexity the inversion model is further simplified to a Gaussian Mixture Model (GMM) mapping. Compared with traditional bimodal emotion recognition methods (e.g., SVM, CART, Boosting), the utterance reduction method can give better results of emotion recognition. The experiments also show the effectiveness of our emotion recognition system when it was used in a live environment.

⚠️ This is an automatic machine translation with an accuracy of 90-95%

Translated Description (Arabic)

يتم خلط تعبيرات العاطفة في بعض الأحيان مع تعبير النطق في التواصل التلقائي وجهاً لوجه، مما يجعل من الصعب التعرف على العاطفة. تقدم هذه المقالة طرق الحد من تأثيرات الكلام في المعلمات المرئية للتعرف على المشاعر السمعية والبصرية. يتم دمج القنوات الصوتية والمرئية أولاً تحت نموذج ماركوف المخفي متعدد الدفق (MHMM). بعد ذلك، يتم الانتهاء من تقليل الكلام من خلال إيجاد المتبقي بين المعلمات المرئية الحقيقية ومخرجات المعلمات المرئية المتعلقة بالكلام. تقدم هذه المقالة طريقة انعكاس نموذج ماركوف المخفي المنصهر والتي يتم تدريبها في الجسم السمعي البصري المعبر عنه المحايد لحل المشكلة. لتقليل تعقيد الحوسبة، يتم تبسيط نموذج الانعكاس بشكل أكبر إلى تخطيط نموذج الخليط الغاوسي (GMM). مقارنة بالطرق التقليدية للتعرف على العاطفة ثنائية النمط (على سبيل المثال، طريقة SVM، CART، التعزيز)، يمكن أن تعطي طريقة الحد من الكلام نتائج أفضل للتعرف على العاطفة. تُظهر التجارب أيضًا فعالية نظام التعرف على المشاعر لدينا عند استخدامه في بيئة حية.

Translated Description (English)

Emotion expressions are sometimes mixed with the utterance expression in spontaneous face-to-face communication, which makes difficulties for emotion recognition. This article introduces the methods of reducing the utterance influences in visual parameters for the audio-visual-based emotion recognition. The audio and visual channels are first combined under a Multistream Hidden Markov Model (MHMM). Then, the utterance reduction is finished by finding the residual between the real visual parameters and the outputs of the utterance related visual parameters. This article introduces the Fused Hidden Markov Model Inversion method which is trained in the neutral expressed audio-visual corpus to solve the problem. To reduce the computing complexity the inversion model is further simplified to a Gaussian Mixture Model (GMM) mapping. Compared with traditional bimodal emotion recognition methods (e.g., SVM, CART, Boosting), the utterance reduction method can give better results of emotion recognition. The experiments also show the effectiveness of our emotion recognition system when it was used in a live environment.

Translated Description (French)

Emotion expressions submitmes are mixed with the utterance expression in spontaneous face-to-face communication, which makes difficulties for emotion recognition. This article introduites the methods of reducing the utterance influences in visual parameters for the audio-visual-based emotion recognition. The audio and visual channels are first combind under a Multistream Hidden Markov Model (MHMM). Then, the utterance reduction is finished by finding the residual between the real visual parameters and the outputs of the utterance related visual parameters. This article introces the Fused Hidden Markov Model Inversion method which is trained in the neutral expressed audio-visual corpus to solve the problem. To reduce the computing complexity the inversion model is further simplified to a Gaussian Mixture Model (GMM) mapping. Compared with traditional bimodal emotion recognition methods (par exemple, SVM, CART, Boosting), the utterance reduction method can give better results of emotion recognition. The experiments also show the effectiveness of our emotion recognition system when it was used in a live environment.

Translated Description (Spanish)

Emotion expressions sometimes are mixed with the utterance expression in spontaneous face-to-face communication, which makes difficulties for emotion recognition. This article introduces the methods of reducing the utterance influences in visual parameters for the audio-visual-based emotion recognition. The audio and visual channels are first combined under a Multistream Hidden Markov Model (MHMM). Then, the utterance reduction is finished by finding the residual between the real visual parameters and the outputs of the utterance related visual parameters. This article introduces the Fused Hidden Markov Model Inversion method which is trained in the neutral expressed audio-visual corpus to solve the problem. To reduce the computing complexity the inversion model is further simplified to a Gaussian Mixture Model (GMM) mapping. Compared with traditional bimodal emotion recognition methods (e.g., SVM, CART, Boosting), the utterance reduction method can give better results of emotion recognition. The experiments also show the effectiveness of our emotion recognition system when it was used in a live environment.

Files

1687-6180-2011-4.pdf

Files (933.0 kB)

⚠️ Please wait a few minutes before your translated files are ready ⚠️ Note: Some files might be protected thus translations might not work.
Name Size Download all
md5:b0e371681298b7a6d8cbd368e3cae0c4
933.0 kB
Preview Download

Additional details

Additional titles

Translated title (Arabic)
التعرف على العاطفة ثنائية النمط المستقلة عن الكلام في التواصل التلقائي
Translated title (English)
Utterance independent bimodal emotion recognition in spontaneous communication
Translated title (French)
Utterance independent bimodal emotion recognition in spontaneous communication
Translated title (Spanish)
Utterance independent bimodal emotion recognition in spontaneous communication

Identifiers

Other
https://openalex.org/W2164643209
DOI
10.1186/1687-6180-2011-4

GreSIS Basics Section

Is Global South Knowledge
Yes
Country
China

References

  • https://openalex.org/W1509031088
  • https://openalex.org/W1552278919
  • https://openalex.org/W1581153084
  • https://openalex.org/W1815942593
  • https://openalex.org/W1923034539
  • https://openalex.org/W1971063881
  • https://openalex.org/W1978649172
  • https://openalex.org/W2020944977
  • https://openalex.org/W2021127571
  • https://openalex.org/W2033773055
  • https://openalex.org/W2058787788
  • https://openalex.org/W2059348974
  • https://openalex.org/W2070726616
  • https://openalex.org/W2098790470
  • https://openalex.org/W2103743127
  • https://openalex.org/W2106115875
  • https://openalex.org/W2106390385
  • https://openalex.org/W2109138290
  • https://openalex.org/W2118640726
  • https://openalex.org/W2120157855
  • https://openalex.org/W2122609807
  • https://openalex.org/W2127429655
  • https://openalex.org/W2127462305
  • https://openalex.org/W2127531292
  • https://openalex.org/W2148071321
  • https://openalex.org/W2156503193
  • https://openalex.org/W2159017231
  • https://openalex.org/W2168053878
  • https://openalex.org/W2171939880
  • https://openalex.org/W2173219554
  • https://openalex.org/W3097096317
  • https://openalex.org/W4244952642