Abstract
A description with regards to the experiments done in emotive spoken language user interfaces is given. It has been found out that when the use of multimodal, synthesizing, and recognizing information has been optimized in both the audio and video modalities, there has been an improvement when it comes to recognition accuracy and synthesis quality. Specific topics being covered include: the speech and emotion recognition by humans; the automatic audiovisual speech and emotion recognition; the audiovisual speech synthesis; the emotive prosody; and finally the emotionally nuanced audiovisual speech.