by S Pietzsch, M Wimmer, F Stulp and B Radig

Abstract:

In model-based fitting, the model parameters that best fit the image are determined by searching for the optimum of an objective function. Often, this function is designed manually, based on implicit and domain-dependent knowledge. We acquire more robust objective function by learning them from annotated images, in which many critical decisions are automated, and the remaining manual steps do not require domain knowledge. Still, the trade-off between generality and accuracy remains. General functions can be applied to a large range of objects, whereas specific functions describe a subset of objects more accurately. Gross et al. have demonstrated this principle by comparing generic to person-specific Active Appearance Models. As it is impossible to learn a person-specific objective function for the entire human population, we automatically partition the training images and then learn partition-specific functions. The number of groups influences the specificity of the learned functions. We automatically determine the optimal partitioning given the number of groups, by minimizing the expected fitting error. Our empirical evaluation demonstrates that the group-specific objective functions more accurately describe the images of the corresponding group. The results of this paper are especially relevant to face model tracking, as individual faces will not change throughout an image sequence.

Reference:

Face Model Fitting with Generic, Group-specific, and Person-specific Objective Functions (S Pietzsch, M Wimmer, F Stulp and B Radig), In 3rd International Conference on Computer Vision Theory and Applications (VISAPP), volume 2, 2008.

Bibtex Entry:

@inproceedings{pietzsch_face_2008,
 author = {S Pietzsch and M Wimmer and F Stulp and B Radig},
 title = {Face Model Fitting with Generic, Group-specific, and Person-specific
	Objective Functions},
 booktitle = {3rd International Conference on Computer Vision Theory and Applications
	({VISAPP)}},
 year = {2008},
 volume = {2},
 pages = {5--12},
 address = {Madeira, Portugal},
 month = {jan},
 abstract = {In model-based fitting, the model parameters that best fit the image
	are determined by searching for the optimum of an objective function.
	Often, this function is designed manually, based on implicit and
	domain-dependent knowledge. We acquire more robust objective function
	by learning them from annotated images, in which many critical decisions
	are automated, and the remaining manual steps do not require domain
	knowledge. Still, the trade-off between generality and accuracy remains.
	General functions can be applied to a large range of objects, whereas
	specific functions describe a subset of objects more accurately.
	Gross et al. have demonstrated this principle by comparing generic
	to person-specific Active Appearance Models. As it is impossible
	to learn a person-specific objective function for the entire human
	population, we automatically partition the training images and then
	learn partition-specific functions. The number of groups influences
	the specificity of the learned functions. We automatically determine
	the optimal partitioning given the number of groups, by minimizing
	the expected fitting error. Our empirical evaluation demonstrates
	that the group-specific objective functions more accurately describe
	the images of the corresponding group. The results of this paper
	are especially relevant to face model tracking, as individual faces
	will not change throughout an image sequence.},
 keywords = {facial expressions},
}

by S Pietzsch, M Wimmer, F Stulp and B Radig

Abstract:

Reference:

Bibtex Entry:

@inproceedings{pietzsch_face_2008,
 author = {S Pietzsch and M Wimmer and F Stulp and B Radig},
 title = {Face Model Fitting with Generic, Group-specific, and Person-specific
	Objective Functions},
 booktitle = {3rd International Conference on Computer Vision Theory and Applications
	({VISAPP)}},
 year = {2008},
 volume = {2},
 pages = {5--12},
 address = {Madeira, Portugal},
 month = {jan},
 abstract = {In model-based fitting, the model parameters that best fit the image
	are determined by searching for the optimum of an objective function.
	Often, this function is designed manually, based on implicit and
	domain-dependent knowledge. We acquire more robust objective function
	by learning them from annotated images, in which many critical decisions
	are automated, and the remaining manual steps do not require domain
	knowledge. Still, the trade-off between generality and accuracy remains.
	General functions can be applied to a large range of objects, whereas
	specific functions describe a subset of objects more accurately.
	Gross et al. have demonstrated this principle by comparing generic
	to person-specific Active Appearance Models. As it is impossible
	to learn a person-specific objective function for the entire human
	population, we automatically partition the training images and then
	learn partition-specific functions. The number of groups influences
	the specificity of the learned functions. We automatically determine
	the optimal partitioning given the number of groups, by minimizing
	the expected fitting error. Our empirical evaluation demonstrates
	that the group-specific objective functions more accurately describe
	the images of the corresponding group. The results of this paper
	are especially relevant to face model tracking, as individual faces
	will not change throughout an image sequence.},
 keywords = {facial expressions},
}

Analysis of Facial Expressions

As robots emerge from their classical domain - factories - to be included in every day life, they need to gain new abilities besides those needed in manufacturing. They need not only to support humans, but also be able to socialize with their users to enhance the interaction experience and allow for social bonding. Recent progress in the field of Computer Vision allows intuitive interaction via gesture or facial expressions between humans and technical systems. Recent research aims at enabling machines to utilize communication channels natural to human beings, such as gesture or facial expressions. Humans interpret emotion from video and audio information and heavily rely on this information during every-day communication. Therefore, knowledge about human behavior, intention, and emotion is necessary to construct convenient human-machine interaction mechanisms. The human face provides much of the information that is passed between humans in every-day communication. Although most of this information is passed on a subconscious level, we still rely on the interaction partner's facial expression to determine emotional state or attention to form a prediction of his or her reaction.

Project details

This project aims at determining facial expressions from camera images in real-time. Model-based image interpretation techniques have proven to be a successful method for extracting such high-level information from single images and image sequences. We rely on a model-based technique to determine the exact location of facial components such as eyes or eye brows in the image. Geometric models form an abstraction of real-world objects and contain knowledge about their properties, such as position, shape or texture. This representation of the image content facilitates and accelerates the subsequent interpretation task. In order to extract high-level information, model parameters have to be estimated that best describe the face within a given image. However, correctly estimated model parameters forms the basis of various more applications such as gaze detection or gender estimation.

Our demonstrator for facial expression recognition has been presented at several events with political audience and on TV. The face is detected and a 3D face model is fitted in real-time to extract the facial expression currently visible. We integrate the publicly available Candide-III face model and also rely on publicly available databases to train and evaluate classifiers for facial expression recognition. This contributes to the comparability of our approach with other research groups. Ekman and Friesen find six universal facial expressions that are expressed and interpreted independent from the cultural background, age or country of origin all over the world. The Facial Action Coding System (FACS) precisely describes the muscle activity within a human face that appear during the display of facial expressions. The Candide-III face model integrates the FACS-system in its model parameters.

Evidence suggests that feeling empathy for others is connected to the mirror neuron system and that emotional empathy, which is triggered by deriving the emotional state from facial expressions involves neural activity in the thalamus and cortical areas responsible of the face. Perception and display of facial expression form a closed loop in human-human communication, where the perception of the interaction partner's facial expression has influence on the display of the own facial expression. To research this also on the human-machine interface, we integrate our demonstrator in the Multi-Joint Action Scenario in the CoTeSys Central Robotics Lab. It is combined with the robot head EDDIE, provided by the Institute of Automatic Control Engineering, to form a closed-loop human-machine interaction scenario based on facial expression analysis and synthesis. In its current, preliminary state, the facial expression is merely mirrored, but future plans involve integrating a more complex emotional model on the robotic side.

Analysis of Facial Expressions

Project details

Related Publications