Vision-based emotion recognition

How many times would you have liked to know what the person in front of you was feeling? Perhaps, you were criticizing a specific brand in an interview, and the guy just in front of you was a self-devoted user of this brand. Or the other way around: the interviewer was interested on knowing what your feelings were regarding a specific issue during the interview. How useful would an emotion recognition system have been in those situations?


During the last decades, private institutions and governments have been interested in developing systems for emotion recognition. Besides the possibilities described above, this type of system offers many other applications. For instance, stress coaching (for civil and army purposes), psychology treatment assistance, discussion control, lie detection (useful during interrogations) and user feedback are some of the possible uses.

In the past years, emotion recognition has gained a lot of interest in the computer vision research community. In order to recognize emotions, vision-based systems can be characterized by two main components. An acquisition system, usually based on a single or multiple sensors (e.g. camera), and a processing system (e.g. computer), which processes and evaluates the acquired data, to finally yield a single or multiple outputs.

In 1969, Carl-Herman Hjortsjö defined the first facial muscle movements taxonomy. The coding system was formed by a list of facial muscle movements, a.k.a. Action Units (AU). This system has been widely used by psychologists since then, and more recently in Computer Vision. Some examples of AUs are: to turn your head left, to pull up your lip’s corner, to stretch your mouth or to raise your eyebrow.

As my colleague Anton already described in his previous post, there are two different approaches to formalising emotions. The dimensional approach, which was introduced in 1980 by James A. Russell, and is based on the hypothesis that emotions are interrelated. And the taxonomy approach, where emotions are considered discrete and independent.

The most promising vision-based methods are usually focused on basic emotion recognition (taxonomy approach). In particular, given a subset of detected AUs, this type of method outputs a basic emotion (or a score for each emotion type). The basic emotions normally used in the literature are: happy, sad, angry, surprise, fear and disgust. AUs are useful for determining not just basic emotions but also mental states such as agreeing, thinking, being unsure, etc. Both basic emotions and other mental states are characterized by a specific subset of AUs.

At Neurokai, we have been working on the development of both approaches: vision and EEG-based ones for emotion recognition. Our main goal is to be able to offer a multimodal system able to combine both modalities: facial recognition and EEG. Regarding our visual approach for emotion recognition, we have been working on a method that means we can output not only a basic emotion but also valence and arousal states (similar to [1]). Our method, though, does not rely on a predefined set of AUs for each emotion, but on relative changes between facial features (e.g. distance between tip of the nose and the mouth’s left corner). This way, during the training, the method learns which features are relevant for each emotion. The main reason for using such an approach is that not all people follow the same pattern in terms of AUs when showing a specific emotion (people coming from different cultures may react in different ways). If you want to learn a little bit more about our regocnition system, please leave your comments and I’ll be happy to answer or elaborate in my next post.

Leave a Reply

Your email address will not be published. Required fields are marked *