基于MTCNN算法的多人脸识别研究.pdf

资源描述

1、PRINTING AND DIGITAL MEDIA TECHNOLOGY STUDY Tol.229 No.2 2024.04印刷与数字媒体技术研究 2024年第2期（总第229期）RESEARCH PAPERS研究论文Research on Multi-Face Recognition Based on MTCNN AlgorithmYANG Wen-peng1,SI Zhan-jun1,2*(1.College of Artificial Intelligence,Tianjin University of Science and Technology,Tianjin 300457,Ch

2、ina;2.College of Light Industry Science and Engineering,Tianjin University of Science and Technology,Tianjin 300457,China)Abstract With the rapid development of artificial intelligence in the field of computer vision,more and more classical artificial intelligence algorithms are applied to multiple

3、face recognition research.Among them,the MTCNN algorithm performs well in multi-face recognition,but there is still a relatively large space for improvement in recognition accuracy.In this study,based on the classical MTCNN algorithm framework,the refinement of its sub-algorithm NMS algorithm was ev

4、aluated and improved.The performance differences between the NMS algorithm and the improved NMS algorithm were compared and theoretically analyzed in each cascade network of P-Net,R-Net,and O-Net.The improved algorithm was evaluated and identified in multiple ways combining subjective and objective

5、and horizontal comparison and longitudinal comparison.The results showed that the model designed in this paper achieves a face recognition accuracy of 94.56%on the LFW dataset.It can provide a reference for multi-face recognition.Key words Multi-face recognition;MTCNN algorithm;Algorithm optimizatio

6、n基于MTCNN算法的多人脸识别研究杨文鹏1，司占军1,2*（1.天津科技大学人工智能学院，天津 300457；2.天津科技大学轻工科学与工程学院，天津 300457）摘要随着人工智能在计算机视觉领域的飞速发展，越来越多的经典人工智能算法被应用于多人脸识别研究。其中，MTCNN算法在多人脸识别方面表现较好，但在识别精度上还有较大提升空间。本研究从经典的MTCNN算法框架出发，对其子算法NMS算法进行评估与改进，并对NMS算法与改进NMS算法在P-Net、R-Net、O-Net各个级联网络中的表现差异进行比较与理论分析。对改进后的NMS算法进行主观和客观相结合、横向比较与纵向比较相结合的多

7、种维度方式的评估与鉴别。实验结果表明，本研究设计的模型在数据集LFW上的人脸识别准确率为94.56%，可为多人脸识别研究提供参考。关键词多人脸识别；MTCNN算法；算法优化中图分类号 TP39文献标识码 A文章编号 2097-2474(2024)02-116-07DOI 10.19370/10-1886/ts.2024.02.013收稿日期：2023-06-25 修回日期：2023-08-31 *为通讯作者本文引用格式：YANG Wen-peng，SI Zhan-jun.Research on Multi-Face Recognition Based on MTCNN Algorithm J

8、.Printing and Digital Media Technology Study，2024，(2)：116-122.2024年2期印刷与数字媒体技术研究（拼版）.indd 1162024年2期印刷与数字媒体技术研究（拼版）.indd 1162024/4/26 17:08:082024/4/26 17:08:08117研究论文YANG Wen-peng et al:Research on Multi-Face Recognition Based on MTCNN Algorithm0 IntroductionFace recognition has been a direction of

9、 research in both academia and industry.Multi-face recognition faces more challenges,such as overlapping due to dense faces,variable size and resolution of multi-faces,and different poses of multi-faces 1-3.Rowley 4-5proposed a solution,which generated a binary classification model by training a neu

10、ral network to detect whether the image contains a face.Viola-Jones 6 proposed a component-based face recognition algorithm named DPM(Deformable Parts Model),which was outstanding in solving the task of faces with complex poses.Influenced by this,subsequent works 7-8 focused on combining multiple mo

11、dels to obtain diverse features to improve the performance of face recognition.However,these face recognition algorithms all train classifiers with the help of a set of manually labeled features,relying on the local feature extraction of the face,so they still cannot deal with multi-face recognition

12、 in complex scenes.In recent years,deep learning has been widely used in face recognition,due to its superior performance.Yan 9 proposed AlexNet as a five-layer convolutional,three-layer fully-connected network,which achieved good recognition results in related competitions.Krizhevsky 10-11 based on

13、 the Viola-Jones method proposed a idea of cascading to train CNNs,which made the overall model have the strong discriminative ability and recognition performance.Subsequently,more efficient target recognition networks R-CNN 12 series were proposed,including Fast R-CNN 13 and Faster R-CNN 6,which fu

14、rther improved the target recognition performance by extracting the feature vectors of candidate regions through the CNN network.With the public availability of the WIDER FACE dataset,a large number of multi-face scenarios appear,and conventional target recognition networks often fail to achieve goo

15、d recognition results.Therefore,more networks optimized for face recognition networks have been proposed,focusing on the problems of small-sized faces and multi-angle faces brought by multi-face scenarios.Girshick R et al 14 proposed Densebox,which cleverly used the full convolutional network,obtain

16、ed the results of predicting the target position coordinates and the target category simultaneously,and employed a multi-scale fusion strategy to provide a good recognition effect for small-sized faces.MTCNN(Multi-task Convolutional Neural Network)is a face recognition algorithm that utilizes a mult

17、i-task convolutional neural network.It works by using a cascade approach with a range of image pyramids to detect faces of varying sizes.It also incorporates three sub-networks to form a deep convolutional network that predicts faces and the locations of their features from coarse to fine.MTCNN algo

18、rithm is a popular face detection and alignment algorithm known for its high accuracy and efficiency.But,MTCNN performance may degrade when dealing with low-resolution or noisy images,as well as in challenging lighting conditions or occlusions.The algorithm may struggle to accurately detect faces in

19、 such scenarios.Also,MTCNN may produce false positive detections,where non-face regions are incorrectly identified as faces.This can impact the overall accuracy of the algorithm.In this study,aiming at the shortcomings of MTCNN,some algorithms within MTCNN were optimized for overall enhancement,and

20、the rationality and effectiveness of the optimization were demonstrated through experiments.The algorithm of this study can provide a reference for multi-face recognition.1 Research Method1.1 MTCNN AlgorithmMTCNN is a multi-task neural network model for 2024年2期印刷与数字媒体技术研究（拼版）.indd 1172024年2期印刷与数字媒体技

21、术研究（拼版）.indd 1172024/4/26 17:08:082024/4/26 17:08:08118印刷与数字媒体技术研究2024年第2期（总第229期）face recognition tasks proposed by the Shenzhen Research Institute of Chinese Academy of Sciences in 2016,which mainly employs three cascaded networks for fast and efficient face recognition using the idea of a candida

22、te box plus classifier.Its sub-algorithm is NMS(Non-Maximum Suppression).These three cascaded networks are P-Net for fast candidate window generation,R-Net for high-precision candidate window filter selection,and O-Net for generating the final bounding box with the key points of the face.The model a

23、lso uses techniques such as image pyramid,border regression,and non-maximal suppression for dealing with image problems.The technology roadmap in this study was shown in Fig.1.ImagepyramidNormaliz-ationPhotoswithfacesPhotos with facesand locationsSoft-NMSP-netSoft-NMSImproved-MTCNNR-netSoft-NMSO-net

24、Fig.1 Technology flow chart图1 技术流程图The full name of P-Net is Proposal Network and its basic construction is a fully convolutional network.For the image pyramid constructed in the previous step(images in MTCNN should all be normalized first with image pyramid operations),the FCN(Fully Convolutional N

25、etworks)was used for preliminary feature extraction and border calibration,and the Bounding-Box Regression was used to adjust the window and NMS was used to filter most of the windows.P-Net was a region proposal network for face region,which used a face classifier to determine whether the region was

26、 a face after three convolutional layers of feature input results.Whether the region was a face or not,while using edge regression and a locator of facial key points for the initial proposal of face regions,this part would eventually output many sheets of face regions that may have faces and feed th

27、ese regions into the R-Net for further processing.The loss function Lidet of P-Net was shown in Formula(1).()()()()()log11logdetdetdetiiiiiLypyp=+(1)where pi is the probability of occurrence of the face,and yidet is the true labeling of the region.The full name of R-Net is Refine Network,its basic c

28、onstruction is a convolutional neural network,compared with the first layer of P-Net,a fully connected layer is added,so the screening of input data will be more strict.After the image passed through the P-Net,all the prediction windows into the R-Net were fed,this network would filter out a large n

29、umber of candidate boxes with poorer results and finally performed Bounding-Box Regression and NMS on the selected candidate boxes to further optimize the prediction results.Because the output of P-Net was only possible face regions with some confidence,in this network,the inputs would be selected w

30、ith refinement and most of the wrong inputs would be discarded,and Bounding-Box Regression and Facial Keypoint Locator would be used again for Bounding-Box Regression and Keypoint Localizer for the face regions,and finally the more credible face regions would be outputted for the use of O-Net.Compar

31、ed with the 1132 features output by P-Net using full convolution,R-Net used a 128 fully connected layer after the last convolutional layer,which retained more image features,and the accuracy performance was also better than that of P-Net.The loss function Libox of R-net was shown in Formula(2).(2)Wh

32、ere is the border coordinates obtained by network prediction,and yibox is the actual border coordinates(a quaternion(Xleft,Yleft,Width,Height)representing a rectangular region).O-Net is known as Output Network.The basic 2024年2期印刷与数字媒体技术研究（拼版）.indd 1182024年2期印刷与数字媒体技术研究（拼版）.indd 1182024/4/26 17:08:09

33、2024/4/26 17:08:09119研究论文YANG Wen-peng et al:Research on Multi-Face Recognition Based on MTCNN Algorithmstructure is a more complex convolutional neural network with one more convolutional layer as compared to R-Net.The difference between the effect of O-Net and that of R-Net is that the structure o

34、f this layer will recognize the region of the face through more supervision and will regress the facial feature points of a person to finally output five facial feature points of the face.O-Net has more input features,and the end of the network structure is also a larger 256 fully connected layer,wh

35、ich retains more image features,and at the same time performs face recognition,face region border regression,and face feature localization.Finally,the upper-left and lower-right coordinates of the face region with the five facial feature points are output.O-Net has features with more inputs and more

36、 complex network structure,and also has better performance.The output of this layer is used as the final network model output.The loss function of O-net was shown in Formula(3).(3)where is the prediction result,and is the actual key point location.Since a total of 5 human face key points need to be

37、predicted,with 2 coordinate values for each point,so is a 10-tuple.1.2 Improved MTCNN Algorithm In MTCNN algorithm,its sub-algorithm NMS(Non-Maximum Suppression)algorithm is typically used to address the issue of overlapping Bounding-Boxes in tasks such as object detection or bounding box Regression

38、.Specifically,in object detection tasks,the model generates multiple candidate boxes to represent regions where objects may exist.However,these candidate boxes may overlap,leading to problems such as duplicate detections or multiple detections of the same object.The role of NMS is to select the most

39、 representative candidate box from all candidates and filter out other candidate boxes that have high overlap with this most representative box.This is done to ensure that the final detection results are highly accurate,reduce redundant detections,and improve detection efficiency and precision.In MT

40、CNN,NMS is typically applied to the candidate boxes output from the final layer of the network.The candidate boxes generated by the model come with their respective confidence scores,and NMS filters the candidate boxes based on these scores,selecting the most suitable bounding boxes as the final det

41、ection results while eliminating redundant candidate boxes.The role of traditional NMS algorithm was shown in Fig.2.0.80.750.90.9NMSOriginalFig.2 Role of traditional NMS algorithms图2 传统NMS算法的作用The NMS algorithm first generates a series of recognition frames in the detected image B and the correspond

42、ing scores S.The recognition frame M with the largest score is removed from set B and placed in the final result set D.At the same time,any recognition frames in set B whose overlap with recognition frame M is greater than the overlap threshold Nt are also removed.The biggest problem in the NMS algo

43、rithm is that it forces the scores of all neighboring recognition frames to zero.In this case,if a real object is present in the overlapping region,it will fail to detect that object and reduce the average recognition rate of the algorithm.In order to address the issue of NMS algorithm,Soft NMS algo

44、rithm was introduced into MTCNN network in this study.Although it is also a greedy algorithm,compared to NMS algorithm,it employs a gentler candidate box-handling approach.The pseudo-code of NMS algorithm and Soft NMS algorithm was shown in Fig.3.2024年2期印刷与数字媒体技术研究（拼版）.indd 1192024年2期印刷与数字媒体技术研究（拼版）

45、.indd 1192024/4/26 17:08:102024/4/26 17:08:10120印刷与数字媒体技术研究2024年第2期（总第229期）beginInput：B=b1,bN,S=s1,sN,NtDmargmax SMbmDDM；BB-Mfor bi in B dosisi f(iou(M,bi)iou(M,bi)Nt thenBB-bi；SS-siifendwhile Bempty doB is the list of initial detection boxesS contains corresponding detection scoresNt is the NMS thr

46、esholdendendendNMSSoft-NMSreturn D,SFig.3 Pseudo-code of NMS algorithm and Soft NMS algorithm图3 NMS算法与Soft NMS算法的伪代码NMS algorithm differs from Soft NMS both in terms of prediction frames and confidence scores.NMS took the box with the largest score and IOUs it with other boxes of the same category i

47、n the current region,and IOUthres deleted it,otherwise retained it.Soft NMS didnt simply delete and retain by comparing IOUs with thres,but rather by confidence score filtering and retention.For confidence,NMS directly set the factor to zero for factors larger than confidence,otherwise retained it.S

48、oft NMS scaled and weighted the scores,and then picked an appropriate threshold,left pre-checked boxes larger than the threshold,and deleted those smaller than that threshold to complete the algorithmic task.The weighting of the scores for NMS was shown in Formula(4),and the linear weighting and Gau

49、ssian weighting for Soft NMS were shown in Formula(5)and Formula(6).(4)(5)(6)where si is the current processing box,and M is the current highest scoring box,and bi is the box to be processed,and IOU is the confidence score of the particular box.2 Results and Discussion2.1 DatasetWIDER FACE was selec

50、ted in this study as the face recognition dataset.WIDER FACE15 is the most commonly used open-source face benchmark dataset in face recognition research,which involves 61 event categories.For each event category,training data accounts for 40%,validation data accounts for 10%,and test data accounts f

展开阅读全文