by M Wimmer, C Mayer and B Radig

Abstract:

Efficient and accurate localization of the components of human faces, such as skin, lips, eyes, and brows, provides benefit to various real-world applications. However, high intra-class and small inter-class variations in color prevent simple but quick pixel classifiers from yielding robust results. In contrast, more elaborate classifiers consider shape or region features but they do not achieve real-time performance. In this paper, we show that it definitely is possible to robustly determine the facial components and achieve far more than real-time performance. We also use quick pixel-level classifiers and provide them with a set of pixel features that are adapted to the image characteristics beforehand. We do not manually select the pixel features and specify the calculation rules. In contrast, our idea is to provide a multitude of features and let the Machine Learning algorithm decide which of them are important. The evaluation draws a comparison to fixed approaches that do not adapt the computation of the features to the image content in any way. The obtained accuracy is precise enough to be used for real-world applications such as for model-based interpretation of human faces.

Reference:

Robustly Classifying Facial Components Using a Set of Adjusted Pixel Features (M Wimmer, C Mayer and B Radig), In Proc. of the International Conference on Face and Gesture Recognition (FGR08), 2008.

Bibtex Entry:

@inproceedings{wimmer_robustly_2008,
 author = {M Wimmer and C Mayer and B Radig},
 title = {Robustly Classifying Facial Components Using a Set of Adjusted Pixel
	Features},
 booktitle = {Proc. of the International Conference on Face and Gesture Recognition
	({FGR08)}},
 year = {2008},
 address = {Amsterdam, Netherlands},
 month = {sep},
 abstract = {Efficient and accurate localization of the components of human faces,
	such as skin, lips, eyes, and brows, provides benefit to various
	real-world applications. However, high intra-class and small inter-class
	variations in color prevent simple but quick pixel classifiers from
	yielding robust results. In contrast, more elaborate classifiers
	consider shape or region features but they do not achieve real-time
	performance. In this paper, we show that it definitely is possible
	to robustly determine the facial components and achieve far more
	than real-time performance. We also use quick pixel-level classifiers
	and provide them with a set of pixel features that are adapted to
	the image characteristics beforehand. We do not manually select the
	pixel features and specify the calculation rules. In contrast, our
	idea is to provide a multitude of features and let the Machine Learning
	algorithm decide which of them are important. The evaluation draws
	a comparison to fixed approaches that do not adapt the computation
	of the features to the image content in any way. The obtained accuracy
	is precise enough to be used for real-world applications such as
	for model-based interpretation of human faces.},
 keywords = {facial expressions},
}

by M Wimmer, C Mayer and B Radig

Abstract:

Reference:

Robustly Classifying Facial Components Using a Set of Adjusted Pixel Features (M Wimmer, C Mayer and B Radig), In Proc. of the International Conference on Face and Gesture Recognition (FGR08), 2008.

Bibtex Entry:

@inproceedings{wimmer_robustly_2008,
 author = {M Wimmer and C Mayer and B Radig},
 title = {Robustly Classifying Facial Components Using a Set of Adjusted Pixel
	Features},
 booktitle = {Proc. of the International Conference on Face and Gesture Recognition
	({FGR08)}},
 year = {2008},
 address = {Amsterdam, Netherlands},
 month = {sep},
 abstract = {Efficient and accurate localization of the components of human faces,
	such as skin, lips, eyes, and brows, provides benefit to various
	real-world applications. However, high intra-class and small inter-class
	variations in color prevent simple but quick pixel classifiers from
	yielding robust results. In contrast, more elaborate classifiers
	consider shape or region features but they do not achieve real-time
	performance. In this paper, we show that it definitely is possible
	to robustly determine the facial components and achieve far more
	than real-time performance. We also use quick pixel-level classifiers
	and provide them with a set of pixel features that are adapted to
	the image characteristics beforehand. We do not manually select the
	pixel features and specify the calculation rules. In contrast, our
	idea is to provide a multitude of features and let the Machine Learning
	algorithm decide which of them are important. The evaluation draws
	a comparison to fixed approaches that do not adapt the computation
	of the features to the image content in any way. The obtained accuracy
	is precise enough to be used for real-world applications such as
	for model-based interpretation of human faces.},
 keywords = {facial expressions},
}

Analysis of Facial Expressions

As robots emerge from their classical domain - factories - to be included in every day life, they need to gain new abilities besides those needed in manufacturing. They need not only to support humans, but also be able to socialize with their users to enhance the interaction experience and allow for social bonding. Recent progress in the field of Computer Vision allows intuitive interaction via gesture or facial expressions between humans and technical systems. Recent research aims at enabling machines to utilize communication channels natural to human beings, such as gesture or facial expressions. Humans interpret emotion from video and audio information and heavily rely on this information during every-day communication. Therefore, knowledge about human behavior, intention, and emotion is necessary to construct convenient human-machine interaction mechanisms. The human face provides much of the information that is passed between humans in every-day communication. Although most of this information is passed on a subconscious level, we still rely on the interaction partner's facial expression to determine emotional state or attention to form a prediction of his or her reaction.

Project details

This project aims at determining facial expressions from camera images in real-time. Model-based image interpretation techniques have proven to be a successful method for extracting such high-level information from single images and image sequences. We rely on a model-based technique to determine the exact location of facial components such as eyes or eye brows in the image. Geometric models form an abstraction of real-world objects and contain knowledge about their properties, such as position, shape or texture. This representation of the image content facilitates and accelerates the subsequent interpretation task. In order to extract high-level information, model parameters have to be estimated that best describe the face within a given image. However, correctly estimated model parameters forms the basis of various more applications such as gaze detection or gender estimation.

Our demonstrator for facial expression recognition has been presented at several events with political audience and on TV. The face is detected and a 3D face model is fitted in real-time to extract the facial expression currently visible. We integrate the publicly available Candide-III face model and also rely on publicly available databases to train and evaluate classifiers for facial expression recognition. This contributes to the comparability of our approach with other research groups. Ekman and Friesen find six universal facial expressions that are expressed and interpreted independent from the cultural background, age or country of origin all over the world. The Facial Action Coding System (FACS) precisely describes the muscle activity within a human face that appear during the display of facial expressions. The Candide-III face model integrates the FACS-system in its model parameters.

Evidence suggests that feeling empathy for others is connected to the mirror neuron system and that emotional empathy, which is triggered by deriving the emotional state from facial expressions involves neural activity in the thalamus and cortical areas responsible of the face. Perception and display of facial expression form a closed loop in human-human communication, where the perception of the interaction partner's facial expression has influence on the display of the own facial expression. To research this also on the human-machine interface, we integrate our demonstrator in the Multi-Joint Action Scenario in the CoTeSys Central Robotics Lab. It is combined with the robot head EDDIE, provided by the Institute of Automatic Control Engineering, to form a closed-loop human-machine interaction scenario based on facial expression analysis and synthesis. In its current, preliminary state, the facial expression is merely mirrored, but future plans involve integrating a more complex emotional model on the robotic side.

Analysis of Facial Expressions

Project details

Related Publications