Deep Learning Based Mobilenet and Multi-Head Attention Model for Facial Expression Recognition
Facial expressions is an intuitive reflection of a person’s emotional state, and it is one of the most important forms of interpersonal communication. Due to the complexity and variability of human facial expressions, traditional methods based on handcrafted feature extraction have shown insufficient performances. For this purpose, we proposed a new system of facial expression recognition based on MobileNet model with the addition of skip connections to prevent the degradation in performance in deeper architectures. Moreover, multi-head attention mechanism was applied to concentrate the processing on the most relevant parts of the image. The experiments were conducted on FER2013 database, which is imbalanced and includes ambiguities in some images containing synthetic faces. We applied a pre-processing step of face detection to eliminate wrong images, and we implemented both SMOTE and Near-Miss algorithms to get a balanced dataset and prevent the model to being biased. The experimental results showed the effectiveness of the proposed framework which achieved the recognition rate of 96.02% when applying multi-head attention mechanism.
[1] Ab Wahab M., Nazir A., Ren A., Noor M., Akbar M., and Mohamed A., “Efficientnet-Lite and Hybrid CNN-KNN Implementation for Facial Expression Recognition on Raspberry Pi,” IEEE Access, vol. 9, pp. 134065-134080, 2021. DOI: 10.1109/ACCESS.2021.3113337
[2] Amato, G., Falchi, F., Gennaro, C., and Vairo, C. “A Comparison of Face Verification with Facial Landmarks and Deep Features,” in Proceedings of 10th International Conference on Advances in Multimedia, Athens, pp. 1-6, 2018.
[3] Bahri S., Samsinar R., and Denta P., “Pengenalan Ekspresi Wajah untuk Identifikasi Psikologis Pengguna Dengan Neural Network dan Transformasi Ten Crops,” RESISTOR (Elektronika Kendali Telekomunikasi Tenaga Listrik Komputer), vol. 5, no. 1, pp. 15-20, 2022. https://doi.org/10.24853/resistor.5.1.15-20
[4] Bodavarapu P. and Srinivas P., “Facial Expression Recognition for Low Resolution Images Using Convolutional Neural Networks and Denoising Techniques,” Indian Journal of Science And Technology, vol. 14, no. 12, pp. 971- 983, 2021. https://doi.org/ 10.17485/IJST/v14i12.14
[5] Canal F., Müller T., Matias J., Scotton G., De Sa Junior A., Pozzebon E., and Sobieranski A., “A Survey on Facial Emotion Recognition Techniques: a State-of-The-Art Literature Review,” Information Sciences, vol. 582, pp. 593-617, 2022. https://doi.org/10.1016/j.ins.2021.10.005
[6] Chawla N., Bowyer K., Hall L., and Kegelmeyer W., “SMOTE: Synthetic Minority Over- Sampling Technique,” Journal of Artificial Intelligence Research, vol. 16, pp. 321-357, 2002. https://doi.org/10.1613/jair.953
[7] Chowdary M., Nguyen T., and Hemanth D., “Deep Learning-Based Facial Emotion Recognition for Human-Computer Interaction Applications,” Neural Computing and Applications, pp. 1-18, 2021.
[8] Goodfellow I., Erhan D., Carrier P., Courville A., Mirza M., Hamner B., … , and Bengio Y., “Challenges in Representation Learning: A Report on Three Machine Learning Contests,” in Proceedings of International Conference on Neural Information Processing, Daegu, pp. 117- 124, 2013.
[9] Howard A., Zhu M., Chen B., Kalenichenko D., Wang W., Weyand T., Andreetto M., and Adam H., “Mobilenets: Efficient Convolutional Neural Networks for Mobile Vision Applications,” arXiv preprint arXiv:1704.0486, 2017. https://doi.org/10.48550/arXiv.1704.04861
[10] Kaiser L., Gomez A., and Chollet F., “Depthwise Separable Convolutions for Neural Machine Translation,” arXiv preprint arXiv:1706.03059, 2018. https://doi.org/10.48550/arXiv.1706.03059
[11] Kaur P. and Gosain A., “Comparing the Behavior of Oversampling and Undersampling Approach of Class Imbalance Learning by Combining Class Imbalance Problem with Noise,” in Proceedings of ICT Based Innovations, Singapore, pp. 23-30, 2018.
[12] Kazemi V. and Sullivan J., “One Millisecond Face Alignment with an Ensemble of Regression Trees,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, pp. 1867-1874, 2014.
[13] Khairuddin Y. and Chen Z., “Facial Emotion Recognition: State of the Art Performance on FER2013,” arXiv preprint arXiv:2105.03588, 2021. https://doi.org/10.48550/arXiv.2105.03588
[14] Kumar K. and Reddy Y., “Facial Emotion Recognition Using Machine Learning,” International Research Journal of Modernization in Engineering Technology and Science, vol. 4, no. 4, pp. 1828- 1833, 2022. https://doi.org/10.31979/etd.w5fs-s8wd
[15] Kumar Y., Verma S., and Sharma S., “Multi- Pose Facial Expression Recognition Using Hybrid Deep Learning Model with Improved Variant of Gravitational Search Algorithm,” The Deep Learning Based Mobilenet and Multi-Head Attention Model for ... 491 International Arab Journal on Information Technology, vol. 19, no. 2, pp. 281-287, 2022. https://doi.org/10.34028/iajit/19/2/15
[16] Lee J. R., Wang L., and Wong A., “Emotionnet Nano: An Efficient Deep Convolutional Neural Network Design for Real-Time Facial Expression Recognition,” Frontiers in Artificial Intelligence, vol. 3, pp. 1-9, 2021. https://doi.org/10.3389/frai.2020.609673
[17] Li B., Yao Y., Tan J., Zhang G., Yu F., Lu J., and Luo Y., “Equalized Focal Loss for Dense Long- Tailed Object Detection,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Louisiana, pp. 6990-6999, 2022.
[18] Mani I. and Zhang I., “KNN Approach to Unbalanced Data Distributions: A Case Study Involving Information Extraction,” in Proceedings of Workshop on Learning from Imbalanced Datasets, Washington, pp. 1-7, 2003.
[19] Mehendale N., “Facial Emotion Recognition Using Convolutional Neural Networks (FERC),” SN Applied Sciences, vol. 2, no. 3, pp. 1-8, 2020.
[20] Nouisser A., Zouari R., and Kherallah M., “Enhanced Mobilenet And Transfer Learning For Facial Emotion Recognition,” in Proceedings of the International Arab Conference on Information Technology, Abu Dhabi, 2022. 10.1109/ACIT57182.2022.9994192
[21] Park S. and Park H., “Combined Oversampling and Undersampling Method Based On Slow-Start Algorithm for Imbalanced Network Traffic,” Computing, vol. 103, no. 3, pp. 401-424, 2021.
[22] Pathak A., Bhalsing S., Desai S., Gandhi M., and Patwardhan P., “Deep Learning Model for Facial Emotion Recognition,” in Proceedings of ICETIT: Emerging Trends in Information Technology, Delhi, pp. 543-558, 2019.
[23] Pecoraro R., Basile V., Bono V., and Gallo S., “Local Multi-Head Channel Self-Attention for Facial Expression Recognition,” arXiv preprint arXiv:2111.07224, 2021. https://doi.org/10.3390/info13090419
[24] Santoso B. and Kusuma G., “Facial Emotion Recognition on Fer2013 Using Vggspinalnet,” Journal of Theoretical and Applied Information Technology, vol. 100, no. 7, pp. 2088-2102, 2022.
[25] Tripathi M., “Face Emotion Recognition Using A Convoluting Neural Network,” Journal on Image and Video Processing, vol. 12, no. 1, pp. 2531- 2536, 2021.
[26] Vaswani A., Shazeer N., Parmar N., Uszkoreit J., Jones L., Gomez A., and Polosukhin I., “Attention is All You Need,” Advances in Neural Information Processing Systems, vol. 30, pp. 1- 11, 2017.
[27] Wang W., Li Y., Zou T., Wang X., You J., and Luo Y., “A Novel Image Classification Approach Via Dense-Mobilenet Models,” Mobile Information Systems, pp. 1-8, 2020. https://doi.org/10.1155/2020/7602384
[28] Yaseen A., Shaukat A., and Alam M. “Emotion Recognition From Facial Images Using Hybrid Deep Learning Models,” in Proceedings of 2nd International Conference on Digital Futures and Transformative Technologies, Rawalpindi, pp. 1- 7, 2022. 10.1109/ICoDT255437.2022.9787474
[29] Zahara L., Musa P., Wibowo E. P., Karim I., and Musa S. B., “The Facial Emotion Recognition (FER-2013) Dataset For Prediction System of Micro-Expressions Face Using The Convolutional Neural Network (CNN) Algorithm Based Raspberry Pi,” in Proceedings of 5th International Conference on Informatics and Computing, Gorontalo, pp. 1-9, 2020. 10.1109/ICIC50835.2020.9288560
[30] Zhu Q., Mao Q., Jia H., Noi O., and Tu J., “Convolutional Relation Network for Facial Expression Recognition in the Wild With Few- Shot Learning,” Expert Systems with Applications, vol. 189, pp. 1-9, 2022. https://doi.org/10.1016/j.eswa.2021.116046