1、文章编号:1002-2082(2024)01-0079-10基于改进 YOLOv4 的低慢小无人机实时探测算法吴璇,张海洋,赵长明,李志朋,王元泽(北京理工大学光电学院,北京100081)摘摘 要:要:针对低慢小无人机探测任务中精度不高、在嵌入式平台上部署实时性能差的问题,提出了一种基于改进 YOLOv4 的小型无人机目标检测算法。通过增加浅层特征图、改进锚框、增强小目标,提高网络对小目标的检测性能,通过稀疏训练和模型修剪,大大缩短了模型运行时间。在1080Ti 上平均精度(mAP)达到 85.8%,帧率(FPS)达 75frame/s,实现了网络轻量化。该模型部署在 Xavier 边缘计算平
2、台上,可实现 60frame/s 的无人机目标检测速度。实验结果表明:与 YOLOv4和 YOLOv4-tiny 相比,该算法实现了运行速度和检测精度的平衡,能够有效解决嵌入式平台上的无人机目标检测问题。关键词:关键词:低慢小无人机;目标检测;YOLOv4;剪枝;嵌入式中图分类号:TN201文献标志码:ADOI:10.5768/JAO202445.0102002Improved YOLOv4 for real-time detection algorithm of low-slow-smallunmanned aerial vehiclesWUXuan,ZHANGHaiyang,ZHAOCha
3、ngming,LIZhipeng,WANGYuanze(SchoolofOpticsandPhotonics,BeijingInstituteofTechnology,Beijing100081,China)Abstract:Inordertosolvethelowaccuracyinlow-slow-smallunmannedaerialvehicles(UAVs)missiononembeddedplatformanddeploymentproblemofpoorreal-timeperformance,asmallUAVtargetdetectionalgorithm based on
4、improved YOLOv4 was proposed.By increasing the shallow characteristic figure,improvingtheanchor,enhancingthesmalltarget,andthedetectionperformanceofnetworkforsmalltargetwasimproved,throughsparsetrainingandmodelpruning,themodelrunningtimewasgreatlyreduced.Theaverageaccuracy(mAP)reaches85.8%onthe1080T
5、i,andtheframerate(FPS)reaches75frame/s,whichachievingnetworklightweight.ThislightweightmodelwasdeployedontheXavieredgecomputingplatform,which could achieve the UAV target detection speed of 60 frame/s.Experimental results show that,incompared with YOLOv4 and YOLOv4-TINY,this algorithm achieves the b
6、alance of running speed anddetectionaccuracy,andcaneffectivelysolvetheproblemofUAVtargetdetectiononembeddedplatform.Key words:low-slow-smallunmannedaerialvehicles;targetdetection;YOLOv4;pruning;embeddedIntroductionWith the rapid development of aviation techn-ologyandtheupgradeofcommunicationtechnolo
7、gy,theunmannedaerialvehicles(UAVs)havebeenwidelyused in fire fighting1,agricultural monitoring2 andother fields.UAV has low flying altitude,uncertainflighttrajectoryandhighflexibility3,whichwillposeathreat to public security and privacy when used bycriminals.It is necessary to take countermeasuresag
8、ainstUAVs,andUAVtargetdetectionisthekeytointerfereandstrikethem.CommonUAVtargetdetectionmethodsinclude收稿日期:2022-12-30;修回日期:2023-03-06基金项目:冬季项目场景三维感知及重建技术(2018YFF0300802)作者简介:吴璇(1999),女,硕士研究生,主要从事目标检测与识别研究。E-mail:通信作者:张海洋(1981),男,博士,副教授,主要从事光电成像、目标智能感知研究。E-mail:第45卷第1期应用光学Vol.45No.12024年1月JournalofAp
9、pliedOpticsJan.2024classicalmovingtargetdetectionbasedonopticalflowmethodandframedifferencemethod4.SinceAlexNetnetwork was proposed,deep learning has beengraduallyappliedtoobjectdetection5.Althoughthedetectionaccuracyoftwo-stagealgorithmssuchasR-CNN(regionwithconvolutionalneuralnetworkfeature)6,Fast
10、 R-CNN7,Faster R-CNN8,SPP-Net(spatialpyramidpoolingnetwork)9issignificantlyimproved compared with traditional algorithms,theycannotmeetthereal-timerequirementsofengineering.In one-stage algorithm,SSD(singleshot multiboxdetector)algorithm10 adopts multi-scale feature mapcombined with anchor mechanism
11、 to improve thedetectionaccuracyasmuchaspossiblewhileensuringthespeed.Forsmalltargetdetection,WANGRuoxiaoetal.reducedthechannelsofVGG16tomeetthereal-timedetectionofUAVontheembeddedplatform11.LINTYetal.proposedRetinaNet,whichusesfocallosstoovercometheclassimbalanceproblemcausedbyhighforegroundtobackg
12、roundratio12.RAZAMAetal.proposedBV-RNet,whichcaneffectivelydetectsmall scale targets by extracting dense features andoptimizingpredefinedanchorpoints13.SUNHanetal.proposed a lightweight detection network for UAVs:TIB-Net14.Inviewofthelackoftextureandshapefeatures of infrared UAVs,DING Lianghui et al
13、.enhanced the high-resolution network layer andadopted the adaptive pipeline filter(APF)based ontemporalcorrelationandmotioninformationtocorrecttheresults15.FANGHetal.transformedtheinfraredsmall UAV target detection into nonlinear mappingfrominfraredimagespacetoresidualimageandgotbetterdetectionperf
14、ormanceincomplexbackground16.YOLO(you only look once)algorithm uses whole-process convolution for target discrimination andcandidateboxprediction17,whichhashighdetectionaccuracy and fast detection speed.HU Y et al.usedfeaturemapsof4scalestopredictboundingboxesinYOLOv3 to obtain more texture and cont
15、ourinformation,themAPwasincreasedbyabout4.16%18.LIZhipengetal.usedthesuper-resolutionalgorithmtoreconstruct high-resolution UAV images,and usedYOLOv3torealizetheeffectivedetectionoflow-slow-smallUAVs19.ThelackofsemanticinformationinsmallUAVtargetimagingwillreducethedetectionaccuracy,andthe memory an
16、d computing power of the embeddedplatformarelimited,whichcannotmeetthereal-timerequirementsofUAVdetectiontasks,thereisalackofhigh-precisionreal-timetargetdetectionalgorithmsforsmallUAV.Aimingattheaboveproblems,thispaperimprovesthemAP(meanaverageprecision)by6.2%andtheFPS(framepersecond)by22frame/sont
17、hebasisofYOLO4throughmodelimprovementandpru-ning,andachieves85.6%mAPandnearly60frame/sdetectionperformancewithhalf-precisiondeploymentontheembeddedplatform.Experimentshaveverifiedtheeffectivenessofthismethodforhigh-precisionreal-timedetectionoflow-slow-smallUAVtargets.1 Algorithm design for low-slow
18、-smallUAV target detectionYOLOv4algorithmwasproposedin2020.Com-pared with YOLOv3,it has been optimized inbackbone network,multi-scale fusion,activationfunction,lossfunctionandotheraspects20,itsstru-ctureisshowninFigure1.ThebackbonenetworkpartreferstotheideaofjumpconnectionofCSPNet21,andforms CSPDark
19、Net53 on the basis of DarkNet53(asshownintheresidualpartinFig.1),whichenhancesthenetworkfeatureextractionabilityandspeedsupthenetwork training speed.The neck part uses the SPPstructure(see SPP structure diagram in Fig.1)toimprovethesizeofthereceptivefield,andthenPANetis used to achieve the fusion of
20、 feature maps ofdifferent scales and sizes.Through repeated featureextraction,thefeatureextractioncapabilityofnetworkforobjectsofdifferentsizesiseffectivelyenhanced.Inthepositionlossfunction,CIoU(completeintersectionover union)is used to comprehensively evaluate theoverlap area,aspect ratio,distance
21、 of the centerpositionandotherfactorsbetweenthegroundtruthboxandthepredictedbox.TheMishactivationfunctionisusedtoavoidgradientsaturation.Since YOLOv4 performs well in the field oftraditional target detection and has made someoptimization for small target detection,this paperimproved the YOLOv4 algor
22、ithm according to the80应用光学第45卷第1期characteristicsoflow-slow-smallUAVtargets.1.1 Improvements to YOLOv4There are still some problems in YOLOv4algorithm for the detection of low-slow-small UAVtargets:the feature maps extracted by YOLO4 havefewer small target features;deep feature extractionnetwork mak
23、es UAV features easy to be lost;thegeneralization ability of anchor adopted in YOLOv4algorithm for small targets is weak22.This paperimproves YOLOv4 from the aspects of networkstructure,smalltargetenhancementandcandidateboxadjustment.1.1.1NetworkstructureimprovementAs shown in Figure 2,this work imp
24、roves thefeature fusion part of YOLOv4 by up-sampling theshallowfeaturemapandsplicingitwiththeshallowUAVfeatureimage,addingtheoutputbranchwithascale of 104104 pixel.Figure 3 shows the featuremaps output from neck and head of the improvedYOLOv4.More details of UAV are obtained in the1.Input2.Backbone
25、3.Neck4.Prediction60860837676338383355551919CBMCSP1CSP2CSP8CSP8CSP4CBLCBLCBLCBLCBLCBLCBLCBLCBL ConvCBL ConvCBLCBLCBLCBL ConvCBLConcatConcatConcatConcatConcatUpsamplingUpsamplingSPPCBMCBLSPPConcatMaxpoolMaxpoolMaxpoolConvConvCBMCBMCBMCBMCBMCBMCBMAddCSPXBNBNLeakyreluMish 7676255 3838255 1919255Resunit
26、Resunitx residual componentsFig.1 Network structure diagram of YOLOv4(a)Network structure of YOLOv4(b)Network structure of improved YOLOv4CSPResBlockCSPResBlockSPPPANSPPPAN116116126126136149160171182139150161ConcateAddFuther layersDetection layerConv layerConcateAddFuther layersDetection layerConv l
27、ayerFig.2 Comparison of YOLOv4 network structure before and after improvement应用光学2024,45(1)吴璇,等:基于改进 YOLOv4 的低慢小无人机实时探测算法81newly added scale,which is conducive to theimprovement of UAV detection accuracy.Theimprovednetworkmakesfulluseofthelow-levelandhigh-level information,and achieves the detection
28、 ofsmallobjectscalethroughthenewdetectionlayer.(a)Partial feature maps output from neck module(b)Four scale feature maps output from head moduleFig.3 Feature maps of improved YOLOv41.1.2AdjustmentofanchorboxesYOLOv4 adopts k-means clustering,k is thenumberofclusters,thehigherthevalueofk,thebetterthe
29、 quality of the preset anchor box,which isconducive to the convergence of the model in thetrainingprocess23.YOLOv4allocates3anchorboxestoeachscale,andgets9anchorboxesintotal.k-Meansrandomly selects k initial cluster centers,which cangreatlyaffecttheresultswhennotinitializedproperly.The improved YOLO
30、v4 adopts k-means+toclusterUAVsamples,k-means+randomlyselectsacluster center and calculates the distance with othersamples.Thesamplewithlargerdistanceismorelikelytobecomethenextclustercenter,untilkclustercentersareobtained.Euclideandistanceisusedtomeasurethedistancebetweenthesampleandtheclustercente
31、r,andthe objective function of clustering is expressed asfollows:f=minki=1xkidist(ci,x)2(1)kidist(ci,x)2xciwherekisthenumberofclusters,istheithcluster,andisthesquareddistancefromsample tothe ith cluster center.For the improved YOLOv4,resizetheimageto416416pixel,3anchorboxesareassignedtothefeaturemap
32、sofeachscale,resultinginatotalof12anchorboxes.TheclusteringprocessforanchorboxesisshowninFig.4.Theclusteringresultsare shown in Table 1.The k-means+makes theanchorframeofclusteringpaymoreattentiontosmalltargets,andtheclusteringresultismoreconsistentwiththereallabel.3503002502001501005000501001502002
33、50300350300250200150100500050100150200250300(a)Cluster=1(b)Cluster=5(c)Cluster=9(d)Cluster=12350300250200150100500050100150200250300350300250200150100500050100150200250300Width/pixelHeight/pixelHeight/pixelWidth/pixelHeight/pixelWidth/pixelWidth/pixelHeight/pixel82应用光学第45卷第1期350300250200150100500050
34、100150200250300350300250200150100500050100150200250300(a)Cluster=1(b)Cluster=5(c)Cluster=9(d)Cluster=12350300250200150100500050100150200250300350300250200150100500050100150200250300Width/pixelHeight/pixelHeight/pixelWidth/pixelHeight/pixelWidth/pixelWidth/pixelHeight/pixelFig.4 Process of obtaining
35、anchor boxes by k-means+clusteringTable 1 Clustering results of different clustering methods intraining setAlgorithmLayer1Layer2Layer3Layer4YOLOv4(282,242)(73,58)(15,24)(203,160)(38,32)(11,15)(110,95)(21,16)(8,11)Improved YOLOv4(k-means)(331,301)(227,250)(99,86)(20,24)(292,238)(195,159)(74,57)(12,17
36、)(252,186)(144,118)(40,36)(8,11)Improved YOLOv4(k-means+)(298,269)(130,108)(55,48)(18,20)(220,190)(103,60)(35,19)(12,17)(178,135)(74,85)(26,31)(8,11)1.1.3DataaugmentationforsmallUAVtargetsThemosaicdataenhancementusedinYOLOv4will randomly scale the target,possibly resulting inserious loss of drone ta
37、rget information.This paperadoptsthemethodofcopyingmultipleUAVsintooneimagetoincreasethenumberofUAVs(asshowninFigure 5),so that the model pays more attention tosmall UAVs and improves the contribution of smallUAVstothelossfunction24.Fig.5 UAV data augmentation1.2 Model pruning of improved YOLOv4 alg
38、orit-hmNetworkpruningreducesnetworkparametersandcomputationalcomplexitybyremovingalargenumberofunimportantchannelstoimproveinferencespeed,itsgeneral process includes sparse training,networkpruning,andmodelfine-tuning25.1.2.1SparsetrainingThescalefactor ofthebatchnormalization(BN)layerisusedastheinde
39、xtoevaluatetheimportanceofthechannel,andL1regularizationisusedtotrain,thelossfunctionisexpressedas:L()=l()YOLOv4+1(2)L()l()YOLOv41whereisthetotallossfunction,isthelossfunctionofYOLOv4,isthepenaltyterm,andprepresentstheparameterfactorofL1norm.1.2.2NetworkpruningChannel pruning is carried out accordin
40、g to thesparselytrained value,thechannelcorrespondingtoasmall valuehasasmallcontributiontothenetworkinferenceresults.Sortthevalueofsandsetthepruningrate to remove unimportant channels in the network.Thechannelpruningofshortcutstructurereferstothepractice of SlimYOLOv326,as shown in Figure 4.Assuming
41、thatlayerAretainschannels1and2,layerCretainschannels2and3,andlayerFretainschannels2and4,layerA,C,D,FandGretainschannels1,2,3and4.Layer pruning is based on the value of theconvolutionmodulebeforetheshortcutlayer.Thetwoconvolutionmodulesbeforetheshortcutlayerarepru-nedtogetherwithit.Asshownintheredbox
42、ofFig.6,whenlayerDiscut,layerBandlayerCarealsocut.Layer ALayer BLayer CLayer DLayer GLayer FLayer EFig.6 Structure diagram of shortcut layer2 Experiments2.1 Experimental settingAlargenumberofUAVimages(withthesizeof19201080pixel)collectedbythecamerawerecom-应用光学2024,45(1)吴璇,等:基于改进 YOLOv4 的低慢小无人机实时探测算法
43、83binedwiththeUAVDataset(DroneDataset,Drone-data-2021)toformanexperimentaldatasetcontaining20000UAVimages,ofwhich80%wereusedasthetrainingsetandtherestwereusedasthetestingset.The comparison experiments of model impr-ovementandpruningwerecarriedoutonwindows10operatingsystem,equippedwithi7-7700processo
44、randNVIDIA GeForce GTX 1080Ti.The network wasimplementedbyPytorch1.6-GPU.Theinputimagewasresizedto416416pixel,batchsizewassetto8,initiallearningratewassetto0.002324,andAdamoptimi-zation strategy was used.The network was trainedusingafine-tuningapproachtoreducetrainingtime,first on the COCO dataset a
45、nd then on the UAVtrainingset.Finally,theembeddedcomputingperfor-mancewasverifiedonaJetsonAGXXavier(16GB).2.2 Evaluation indexInobjectdetection,mAPandFPSarecommonlyusedforevaluation,whereFPSrepresentsthemodelinferencespeed,andmAPneedstobecalculatedbyconfusionmatrix(seeTable2).Table 2 Confusion matri
46、xAnnotatedresultsPredictedresultTrueFalseTrueTPFNFalseFPTNTheaverageprecision(AP)istheareaenclosedbythePRcurveplottedwithprecision(P)andrecall(R).SeeFormula(3)andFormula(4)forthecalculationofaccuracyandrecall,andAPiscalculatedbyFormula(5):P=TPTP+FP(3)R=TPFN+TP(4)PAP=ni=1P(i)R(i)(5)mAP is the average
47、 accuracy of all categories,whichcanbecalculatedbyFormula(6):PmAP=PcAPc(6)2.3 Performance comparison before and afteralgorithm improvementTheIoU(intersectionoverunion)thresholdissetto0.5totestthealgorithmbeforeandafterimprov-ement.Fig.7 shows the loss curve,the improvedYOLOv4 has better convergence
48、effect on the UAVdata set,and the loss is reduced to below 0.6 aftertraining.The PR curve plotted against recall andprecisionisshowninFig.8,wherethecurveclosertothe top right corner indicates better detection perfor-mance.The PR curve of the improved YOLOv4completely enveloped the curve of the origi
49、nalYOLOv4,provingitsstrongerdetectionability.Fig.9showsthedetectionresultsforlow-slow-smallUAVs.Compared with the original YOLOv4,the improvedYOLOv4addsasmallUAVtargetpredictionbranchandadjuststhecandidatebox,whichcanreducemisseddetection and false detection,improve the predictionaccuracyofthesizean
50、dpositionoftheboundingbox.(b)Loss curve of improved YOLOv4(a)Loss curve of YOLOv4EpochEpochTrain loss2.01.61.20.80.400200400600800Train loss2.42.01.61.20.80.400200400600800Fig.7 Comparison of loss curves during training ofYOLOv4 and improved YOLOv4The comparison results of mAP and FPS ofdifferentalg