Facial Emotion Recognition by Humans and Deep Neural Networks

Posted: August 26th, 2021

Student’s Name

Instructor’s Name

Course

Date

Facial Emotion Recognition by Humans and Deep Neural Networks

Abstract

Recognizing and detecting human emotion is a great challenge in artificial intelligence and computer vision. In human communication, emotions take the most considerable contribution. Facial expression is the easiest way for human feelings and to communicate emotions; thus, recognizing it might improve the client experience. The journal targets at developing a system for recognizing and detecting human emotions from a live feed like neutral, fear, happy, surprise, sad or angry and disgust. This study uses convolutional neural networks (CNNs) for training over grayscale images obtained from the fer2013 dataset. Although there are strategies to identify expressions using artificial intelligence, this work endeavors to use image classification and an in-depth learning approach to recognize and classify expressions similar to the images. In this paper, different datasets are examined and investigated for the training expression recognition model. For instance, Inception Net is used to express and detect facial recognition with Kaggle, and Karolinska Directed Emotional Faces datasets.

Keywords: deep neural networks, facial emotion recognition, explainable artificial intelligence

Contents

Abstract 1

Introduction. 3

Facial Emotion Recognition. 5

Face Detection. 5

Pre-Processing. 5

Face Recognition and Feature Extraction. 6

Classification. 6

Related Work. 6

Proposed System.. 7

Facial Expression Databases. 7

Dataset 8

Datasets and Target Emotions. 9

Neural Network Classification. 10

Convolutional Neural System.. 11

Facial Expressions Recognition Methods. 12

Conclusion. 13

Works Cited. 15

Appendices. 19

Appendix 1: Sample System Output 19

Introduction

One of the critical aspects of human emotion recognition is facial expression. People assume the impression of emotions in others through observing their faces. Facial emotion recognition is challenging since it is associated with many applications in human-computer interaction, non-verbal human behavior, and computer vision (Barros, 140). In images, facial emotion recognition has attracted growing attention that is complicated by low-resolution and background faces. Martinez, Aleix, and Shichuan refers to emotion to dispositional, energy, and feeling impacts (1589). This work characterizes facial emotion as disgust, neutral, fear, happy, angry, sad and surprise  sad. Facial expression is one of the most remarkable characteristics and basic signs for humans to pass on their intentions and emotional states.

Facial emotion recognition aims to empower machines to appraise the emotional content automatically for human faces (Bartlett, 234). The facial expressions aid different intellectual undertakings whereby they give the most potent and natural means for communicating human intentions, opinions, and emotions. Barros assumed that around only 7% of the messages are conveyed verbally, 38% by paralanguage, and 55% through facial expression (145). Therefore, facial expressions tend to be the most critical communication approach in a face to face communication.

Understanding the expressions of a human face and their emotional state can help discover various applications for comprehensive domains (Shan, Caifeng, Shaogang, and Peter, 701). On the same note, Perveen explains that human and computer systems (HCI) interaction would be significantly clear and natural if the computer systems could adapt and recognize the human emotional state (89). Embodied conversational experts can incredibly benefit from understanding and spotting participants’ emotional states; thus, accomplishing more reasonable connections at an emotional level.

Sharma, Jayapradha, and Yash suggested that solving facial recognition features is no simple task since faces differ from individual to another (6472). Furthermore, many aspects influence features such as age, genes, sex, and physical characteristics that create intense variability. Therefore, for an emotion recognition system to be efficient, many factors need to be considered. Martinez, Aleix, and Shichuan claimed that the central aspect of a suitable face processing framework is to detect and recognize the face precisely and classify them accordingly (1603). Equally, the Facial emotion recognition framework should work in diverse conditions such as facial hair, use of spectacles, different illumination issues, and change in adjacent lights. Therefore, these are a few issues that the framework ought to overcome to make a perfect system.

Classification of human facial expression into the best possible emotional categories is crucial (Perikos, 56). This is to locate and excerpt the essential facial highlights that distinguish the expression’s emotions. In this work, a Facial emotion recognition system is designed to capture human facial expressions’ emotional condition. The system examines the facial picture before measuring and locating specific disfigurements and related attributes such as the mouth, eyes and related features like the eyebrows, among others. Subsequently, every part of the face is profoundly studied while disregarding its highlights. Afterward, this is presented as data vectors. Equally, feature vectors are classified appropriately according to identified or recognized emotions through trained multilayer neural system, used for facial expression classification. 

Facial Emotion Recognition

Facial Emotion Recognition attempts to distinguish the emotion from human facial expression. Pathar and Rohitargued that improvement in emotion recognition makes the complex systems less complicated (170). Emotion Recognition is a problematic errand since emotions may fluctuate contingent upon face reaction, culture, appearance, and environment leading to questionable information. Perikos claims that facial emotion recognition helps a great deal in investigating Facial emotion recognition (61). Therefore, the following sections expound on facial emotion recognition by assessing various aspects undertaken in improving technology.

Face Detection

Face detection is the initial step in the face processing approach. Perveen believes that the critical aspect of this step is detecting faces from dataset images (90). In this step, images are taken from the dataset, verified, and scanned to determine whether the images have an only background image or a face. The face assurance framework decides whether the image input is a face. The outcome is sent for pre-preparing with the goal that facial highlights can be removed from the face image. 

Pre-Processing

Pre-processing is performed with the aim of finding smooth face image by removing shadowing effects, image blurring, and undesirable noise (Bartlett, 237). Many approaches are available for pre-processing images, which depend on pixel transformation, such as image restoration, geometric change, and pixel brightness transformation. Without pre-preparing of excellent quality, pictures cannot be acquired to accomplish a high exactness detection framework. Therefore, the resultant images are utilized for the extraction of human facial expressions. 

Face Recognition and Feature Extraction

Face recognition and feature extraction perform information compression, remove, and reduce irrelevant features(González-Hernández and Francisco, 3327). The facial feature is then changed into a vector with a given measurement that corresponds appropriately. Feature analysis is then conducted, and the face recognition part is obtained. The system is trained to correlate with the database for image testing. Therefore, if it works; then, the system is ready to decide the individual’s personality without personal consent. 

Classification

Classification of images is composed of many techniques and methods, such as the neural system (Shan, Caifeng, Shaogang, and Peter, 701). The neural system works for both nonlinear and linear datasets. Additionally, the neural system works for images that are not in the dataset; since it is a self-learning model comprised of many concealed layers. Recently, Sun’s studies suggested that many models of artificial neural systems have been used for classification (1105). The notable ones include bilinear, radial basis function, and deep convolutional neural network. Therefore, classification stands out as a key aspect is realizing appropriate functionalities of the recognition system.

Related Work

Emotion recognition and detection have illustrated a notable growth in the field of artificial intelligence and computer vision. Some expression affirmation structures see emotions like anger, pity, and joy. Different systems can see the advancement of muscles on the individual’s face. Facial Action Coding System (FACS) is unprecedented contrasted with other mental structure which depicts muscle advancements that the face can make (Dehghan, Afshin, et al., 501). Convolutional neural networks show better execution in Arriaga and Octavio’sprevious work of emotion (107). Lee and MinSeop used a social event of Convolutional Neural Networks with five convolutional layers (3501). They achieved sensational implementation in affirmation and proposed data impedance and projected a voting form procedure, which further improved the declaration’s presentation. Sharma, Jayapradha, and Yash proposed a novel method to manage people’s trading of expression to various stylized characters (6480). To self-sufficiently understand human emotions and adjusted aspects, Liutrains two Convolutional Neural Networks and uses a trade learning technique to make sense of how to design human traits to make a feature space (804). Therefore, this created an effect on emotion expression, in which facial expression has contributed to social collaborations.

Proposed System

The proposed system aims to solve automated and robust face detection statement problems that need a critical analysis of meaningful facial expressions and captured images (Bonn-Rhein-Sieg, 909). Thus, this article design and implement perfect fitted classifiers to ensure understanding of these underlying classifiers, creating testing and learning data sets, and learning facial descriptors vectors. The use of entirely fitted classifiers to master first classifiers becomes familiar with the vectors of the facial descriptors. Reddy and Bhargavaclaim that the proposed model design is capable of perceiving up to six emotions: surprise, disgust, angry face, happiness, and fear (308). Therefore, our proposed system is to comprehend a look and its features and then make a presumption of the individual’s personality.

Facial Expression Databases

The facial expression databases are image sets that can express experiences, situations, and emotions. Cohn-Kanade and CK plus are databases representing six fundamental emotions, and they incorporate Action Units (AU) comments (Santhoshkumar and Kalaiselvi, 160). Equally, Li, Shan, and Weihong claim that the images in these databases illustrate the image classification inside the Facial Action Coding System (FACS). Every expression starts as a nonpartisan expression and afterward move to a pinnacle expression (more exceptional expression). Every expression can get an emotion mark. In addition to rendition, unconstrained appearances recorded eighty-four different subjects while they were diverted among every photograph meeting. Radboud Faces Database (RaFD) integrates photographs of eight fundamental emotions. Hence, the members indicated the facial expressions with three looks bearing and five camera angles.

Dataset

Deep networks and neural networks are known for their essentiality for training information. Additionally, the selection of images used for educational purpose impacts the proposed system (Li, Junnan, and Lam, 301). Therefore, this either requires or enables a profoundly subjective and a higher volume of a dataset. There are several well-reputed and standardized datasets for emotion recognition that store thousands of images in excellent resolution (Samsani, Surekha, and Vineel, 438). The datasets vary mostly in cleanliness, quality, and quantity of the images. For this system, the FER2013 dataset is appropriate, which has more than a great many faces with a wide range of emotions.

Figure 1:shows the statistical distribution of images in the dataset that it contains 1200 images à anger, 250 images à disgust, 1200 images à fear, 2600 images à happy, 1100 images à sad, 1100 images à surprise and 1800 images à neutral.

Datasets and Target Emotions

The determination of a dataset must be directed to classify set target emotions. Some emotional expressions take after one another. Additionally, unobtrusive feelings like contempt can be challenging to notice (Jain and Deepak Kumar, 72). In this manner, some datasets will beat others for certain emotional sets. Neural systems trained on a constrained set of emotions will improve outcome in higher paces of precise orders. For the system to give the most outrageous model, making on a dataset containing just instances of satisfaction and outrage will produce high precision. Thus, two such unmistakable objective groupings imply that the cover in the expressive facial range is limited.

More so, when neural systems are prepared on datasets containing under-spoke to emotions, for example, offend, they tend to misclassify those emotions (Byoung, 401). This happens because various emotions can have comparable facial highlights, and the dataset does not contain huge enough model arrangements of one passion for discovering particular authoritative examples.

Neural Network Classification

The facial expression classification ensures that emotions are accurately categorized depending on a Multilayer Perceptron Neural Network (MLPNN) (Ferreira, 53930). The Multilayer Perceptron Neural Network empowers higher adaptability in preparing an incredible transformation to the issue of sensitive order. The idea of neural systems is an inspired paradigm that permits an artificial intelligence system to comprehend from the given information. An Artificial Neural system model is based on a gathering of symbiotic nodes or units called artificial neurons (Ferreira, 53932). Every association between the neurons can send a sign to one another. The node gets the sign procedures and afterward flags its warning to interconnect nodes, and the cycle continues. Therefore, the units that are otherwise called single perceptron are joined to different groups of the layer.

Figure 2: The system work flow chart

The MLPNNs are feed-forward neural systems and can be utilized to characterize multiclass, nonlinear problems (Guo, 230). Additionally, the MLPNN used for the grouping of the facial expressions has three concealed layers containing thirteen, nine, and five neurons. Since there is no standard technique for choosing the number of shrouded layers and neurons because of the Neural Networks’ idea, the determination is frequently, as for this situation, the engineering with the best execution. Abbas, Asad, and Stephan claimed that the information layer has 25 real neurons to coordinate the data vectors (107). The yield layer has three neurons, that is, to order the seven unique classes. The yield neurons consolidated to produce a parallel 3-digit grouping, and each succession is planned for a grade. Therefore, the MLPNN arranges are prepared to utilize the back proliferation directed learning procedure, which gives an efficient method to refresh the synaptic loads of multilayer perceptron systems.

Convolutional Neural System

In a Convolutional Neural System, each layer is made out of two activities: max-pooling and convolution. These two activities mimic the reaction of complex and simple cell layers found in visual zone V1 by Hubel and Wiesel (1959). Jain, Neha suggested the Neo-cognitron was the principal profound neuro-computational model dependent on the complex and simple cells (105). In a Convolutional neural system, the simple cell reflection is spoken to by the convolution tasks that utilization nearby filters to process high-arrange highlights from input images. Therefore, the intricate cell reflectionbuilds the invariance by pooling straightforward cell units from the equivalent open field from past layers.

In each layer, Chen and Luefengsuggest thatbuilding the single cells’ capacity is essential for removing features, whereby a progression of various filters is applied to the image (50). The activity creates multiple images or filter maps, one for each filter, runs through all the system layers. Hence, the unpredictable cells act in every one of these images producing spatial invariance for each filter.

Figure 3: Convolutional Neural Structure

Facial Expressions Recognition Methods

Appearance-based strategies apply the filters and the operators over image pixels to help acquire many delegate high spots of the face. Local Binary Pattern refers to technique which takes the estimated pixel as threshold of the image center (Wang, Yilun, and Michal Kosinski, 246). Every pixel value is thought about against the limit, whether the edge is higher than the pixel value; thus, the outcome is estimated to be 0, else, it is 1. This procedure was applied to distinguish facial expressions in, and the results were palatable.

Local Phase Quantization utilizes obscure heartless surface order through nearby Fourier change neighborhoods by processing its neighborhood Zernike minutes (Mollahosseini, David, and Mohammad, 314). The procedure creates Local Phase Quantization codes, which are gathered into a histogram. This descriptor is perfect for picture obscuring. A few works have demonstrated Local Phase Quantization can be utilized for expression recognition with Fluorescence-activated cell sorting. Thus, histograms can reach up to twenty-five features, indicating that Local Phase Quantization covers an augmentation territory of the face.

Gabor filter is a portrayal of a convolving of an information picture utilizing a lot of Gabor channels with different scales and directions (Zadeh, Milad, Maryam, and Babak, 120). Gabor channels encode componential data and relying upon the enrollment plot. A filter may undoubtedly pass on configurable data. This procedure can be utilized with straightforward dimensionality decrease strategies such as mean, maximum, and minimum grouping. Therefore, the portrayal is dynamic to enlistment mistakes to a degree as the channels are smooth, and the size of sifted pictures is robust to small rotations and translation.

Conclusion

A programmed Facial emotion recognition framework aims to recognize and analyze human emotions and facial expressions. Initially, the model inspects the facial image and measures before locating and changing unmistakable human facial deformations using proper features. The features extricated attempt to model the geometric qualities of the human face. There isa relatively smaller number of images for specific emotions, like disturbingthe FER2013 dataset, that upshots in the normal execution of the model in perceiving the nauseating emotion. Finally, for the order, a multilayer perceptron neural system is prepared and utilized.

The face investigation and the assurance of frontal images of the human face; thus, a problematic expansion will be to identify and break down facial expressions from various stances. Moreover, the framework’s characterization approach depends on a multilayer perceptron neural system. By utilizing a neuro-fluffy methodology, broadening the grouping technique could improve the framework’s exactness in perceiving the emotional state. Hence, investigating this heading is a crucial part of future work.

Works Cited

Abbas, Asad, and Stephan K. Chalup. “Group emotion recognition in the wild by combining deep neural networks for facial expression classification and scene-context analysis.” Proceedings of the 19th ACM International Conference on Multimodal Interaction. 2017.

Arriaga, Octavio, Matias Valdenegro-Toro, and Paul Plöger. “Real-time convolutional neural networks for emotion and gender classification.” arXiv preprint arXiv: 1710.07557 (2017).

Barros, Pablo, et al. “Multimodal emotional state recognition using sequence-dependent deep hierarchical features.” Neural Networks 72 (2015): 140-151.

Bartlett, Marian Stewart, et al. “Recognizing facial expression: machine learning and application to spontaneous behavior.” 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05). Vol. 2. IEEE, 2005.

Bonn-Rhein-Sieg, Octavio Arriaga Hochschule, et al. “Real-time Convolutional Neural Networks for Emotion and Gender Classification.”

Chen, Luefeng, et al. “Softmax regression-based deep sparse autoencoder network for facial emotion recognition in human-robot interaction.” Information Sciences 428 (2018): 49-61.

Dehghan, Afshin, et al. “Dager: Deep age, gender and emotion recognition using convolutional neural network.” arXiv preprint arXiv: 1702.04280 (2017).

Ferreira, Pedro M., et al. “Physiological inspired deep neural networks for emotion recognition.” IEEE Access 6 (2018): 53930-53943.

González-Hernández, Francisco, et al. “Recognition of learning-centered emotions using a convolutional neural network.” Journal of Intelligent & Fuzzy Systems 34.5 (2018): 3325-3336.

Guo, Yanan, et al. “Deep neural networks with relativity learning for facial expression recognition.” 2016 IEEE International Conference on Multimedia & Expo Workshops (ICMEW). IEEE, 2016.

Jain, Deepak Kumar, Pourya Shamsolmoali, and Paramjit Sehdev. “Extended deep neural network for facial emotion recognition.” Pattern Recognition Letters 120 (2019): 69-74.

Jain, Neha, et al. “Hybrid deep neural networks for facial emotion recognition.” Pattern Recognition Letters 115 (2018): 101-106.

Ko, Byoung Chul. “A brief review of facial emotion recognition based on visual information.” sensors 18.2 (2018): 401.

Lakomkin, Egor, et al. “On the robustness of speech emotion recognition for human-robot interaction with deep neural networks.” 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2018.

Lee, MinSeop, et al. “Emotion Recognition Using Convolutional Neural Network with Selected Statistical Photoplethysmogram Features.” Applied Sciences 10.10 (2020): 3501.

Li, Junnan, and Edmund Y. Lam. “Facial expression recognition using deep neural networks.” 2015 IEEE International Conference on Imaging Systems and Techniques (IST). IEEE, 2015.

Li, Shan, and Weihong Deng. “Deep facial expression recognition: A survey.” IEEE Transactions on Affective Computing (2020).

Li, Yuqing. “Deep Learning of Human Emotion Recognition in Videos.” (2017).

Liu, Xuan, et al. “Deep convolutional neural networks-based age and gender classification with facial images.” 2017 First International Conference on Electronics Instrumentation & Information Systems (EIIS). IEEE, 2017.

Martinez, Aleix, and Shichuan Du. “A model of the perception of facial expressions of emotion by humans: Research overview and perspectives.” The Journal of Machine Learning Research 13 (2012): 1589-1608.

Mollahosseini, Ali, David Chan, and Mohammad H. Mahoor. “Going deeper into facial expression recognition using deep neural networks.” 2016 IEEE Winter conference on applications of computer vision (WACV). IEEE, 2016.

Pathar, Rohit, et al. “Human Emotion Recognition using Convolutional Neural Network in Real-Time.” 2019 1st International Conference on Innovations in Information and Communication Technology (ICIICT). IEEE, 2019.

Perikos, Isidoros, Epaminondas Ziakopoulos, and Ioannis Hatzilygeroudis. “Recognizing emotions from facial expressions using a neural network.” IFIP International Conference on Artificial Intelligence Applications and Innovations. Springer, Berlin, Heidelberg, 2014.

Perveen, Nazia, et al. “Facial Expression Recognition through Machine Learning.” International Journal of Scientific & Technology Research 5.03 (2016).

Reddy, Bhargava, et al. “Real-time driver drowsiness detection for an embedded system using model compression of deep neural networks.” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. 2017.

Samsani, Surekha, and Vineel Abhinav Gottala. “A Real-Time Automatic Human Facial Expression Recognition System Using Deep Neural Networks.” Information and Communication Technology for Sustainable Development. Springer, Singapore, 2020. 431-441.

Santhoshkumar, R., and M. Kalaiselvi Geetha. “Deep Learning Approach for Emotion Recognition from Human Body Movements with Feedforward Deep Convolution Neural Networks.” Procedia Computer Science 152 (2019): 158-165.

Shan, Caifeng, Shaogang Gong, and Peter W. McOwan. “Beyond Facial Expressions: Learning Human Emotion from Body Gestures.” BMVC. 2007.

Sharma, J. Jayapradha Soumya, and Yash Dugar. “Detection and Recognition of Human Emotion using Neural Network.” International Journal of Applied Engineering Research 13.8 (2018): 6472-6477.

Sun, Yafei. “Neural Networks for Emotion Classification.” arXiv preprint arXiv: 1105.6014 (2011).

Wang, Yilun, and Michal Kosinski. “Deep neural networks are more accurate than humans at detecting sexual orientation from facial images.” Journal of personality and social psychology 114.2 (2018): 246.

Zadeh, Milad Mohammad Taghi, Maryam Imani, and Babak Majidi. “Fast facial emotion recognition using convolutional neural networks and Gabor filters.” 2019 5th Conference on Knowledge-Based Engineering and Innovation (KBEI). IEEE, 2019.

Appendices

Appendix 1: Sample System Output

Expert paper writers are just a few clicks away

Place an order in 3 easy steps. Takes less than 5 mins.

Calculate the price of your order

You will get a personal manager and a discount.
We'll send you the first draft for approval by at
Total price:
$0.00