Toggle Main Menu Toggle Search

Open Access padlockePrints

M2ER: Multimodal Emotion Recognition Based on Multi-Party Dialogue Scenarios

Lookup NU author(s): Rui Sun

Downloads


Licence

This work is licensed under a Creative Commons Attribution 4.0 International License (CC BY 4.0).


Abstract

© 2023 by the authors. Researchers have recently focused on multimodal emotion recognition, but issues persist in recognizing emotions in multi-party dialogue scenarios. Most studies have only used text and audio modality, ignoring the video modality. To address this, we propose M2ER, a multimodal emotion recognition scheme based on multi-party dialogue scenarios. Addressing the issue of multiple faces appearing in the same frame of the video modality, M2ER introduces a method using multi-face localization for speaker recognition to eliminate the interference of non-speakers. The attention mechanism is used to fuse and classify different modalities. We conducted extensive experiments in unimodal and multimodal fusion using the multi-party dialogue dataset MELD. The results show that M2ER achieves superior emotion recognition in both text and audio modalities compared to the baseline model. The proposed method using speaker recognition in the video modality improves emotion recognition performance by 6.58% compared to the method without speaker recognition. In addition, the multimodal fusion based on the attention mechanism also outperforms the baseline fusion model.


Publication metadata

Author(s): Zhang B, Yang X, Wang G, Wang Y, Sun R

Publication type: Article

Publication status: Published

Journal: Applied Sciences

Year: 2023

Volume: 13

Issue: 20

Online publication date: 16/10/2023

Acceptance date: 11/10/2023

Date deposited: 20/05/2024

ISSN (electronic): 2076-3417

Publisher: MDPI

URL: https://doi.org/10.3390/app132011340

DOI: 10.3390/app132011340

Data Access Statement: Not applicable.


Altmetrics

Altmetrics provided by Altmetric


Share