1、Robustness benchmark for unsupervised anomaly detectionmodelsPeiWang1,WeiZhai1,andYangCao1,21Department of Automation,University of Science and Technology of China,Hefei 230027,China;2Institute of Artificial Intelligence,Hefei Comprehensive National Science Center,Hefei 230088,ChinaCorrespondence:Ya
2、ngCao,E-mail:2024TheAuthor(s).ThisisanopenaccessarticleundertheCCBY-NC-ND4.0license(http:/creativecommons.org/licenses/by-nc-nd/4.0/).Cite This:JUSTC,2024,54(1):0103(10pp)ReadOnlineAbstract:Duetothecomplexityanddiversityofproductionenvironments,itisessentialtounderstandtherobustnessofunsupervisedano
3、malydetectionmodelstocommoncorruptions.Toexplorethisissuesystematically,weproposeadata-setnamedMVTec-Ctoevaluatetherobustnessofunsupervisedanomalydetectionmodels.Basedonthisdataset,weex-ploretherobustnessofapproachesinfiveparadigms,namely,reconstruction-based,representationsimilarity-based,nor-maliz
4、ingflow-based,self-supervisedrepresentationlearning-based,andknowledgedistillation-basedparadigms.Further-more,weexploretheimpactofdifferentmoduleswithintwooptimalmethodsonrobustnessandaccuracy.Thisincludesthemulti-scalefeatures,theneighborhoodsize,andthesamplingratiointhePatchCoremethod,aswellasthe
5、multi-scalefeatures,theMMFmodule,theOCEmodule,andthemulti-scaledistillationintheReverseDistillationmethod.Finally,weproposeafeaturealignmentmodule(FAM)toreducethefeaturedriftcausedbycorruptionsandcombinePatchCoreandtheFAMtoobtainamodelwithbothhighperformanceandhighaccuracy.Wehopethisworkwillserveasa
6、nevalu-ationmethodandprovideexperienceinbuildingrobustanomalydetectionmodelsinthefuture.Keywords:robustnessbenchmark;anomalydetection;unsupervisedlearning;automatedopticalinspectionCLC number:TP391Document code:A1 IntroductionWiththeincreasingdemandsonproductquality,itissignific-anttodetectsurfacede
7、fectsinproductsduringproduction.Inscenariossuchasindustrialdefectdetection,thereisahighdemandforunsupervisedanomalydetectionmodelsduetothe lack of defect samples.Additionally,since the modelsneedtohandleimagedegradationinterferenceduringtesting,therobustnessofthemodelsneedstobeevaluated.Manyun-super
8、visedanomalydetectionmethodshavebeenproposed;these methods can be divided into five categories:reconstruction-based13,representation similarity-based46,normalizing flow-based7,8,self-supervised representationlearning-based9,andknowledgedistillation-based1012.The existing methods can achieve high acc
9、uracy on theMVTec13datasetthatisarelativelysimpledataset.However,therobustnessofthesemodelsisunknown.Duetothetem-poralandspatialdifferencesbetweenthetraininganddeploy-mentphases,thereisnoguaranteethattheimagesprocessedinactualproductionwillhavethesamequalityasthetrainingimagesandthatunpredictabledeg
10、radationofimagequalitymayoccur.Forexample,devicesthatrunforlongperiodsmaywearout,causingtherelativepositionofthecameratotheproducttodeviatefromthepresetvalue,whichcanleadtodefocusblurandgeometricshiftsintheimages.Therefore,thereisanurgentneedforameansofevaluatingtherobust-nessofmodelstoensureproduct
11、ionsafety.Paststudiesonthe robustness of CNN models have involved imageclassification14,objectdetection15,semanticsegmentation16,human pose estimation17,and other domains.Since defectsamples are unavailable to unsupervised anomaly detectionmodelsduring training,specialized research on the robust-nes
12、sofunsupervisedanomalydetectionmodelsisnecessary.Toaddresstheaforementionedneeds,weproposeaframe-work that includes a robust evaluation dataset,evaluationmetrics,and a method to improve the robustness of themodel.Itisworthnotingthatthedatasetandevaluationmet-ricsarealsoapplicableforevaluatingtherobu
13、stnessofsuper-vised models.Additionally,based on this framework,weevaluateandanalyzetherobustnessofmainstreamunsuper-viseddefectdetectionmodelsandobtainconclusionsaboutthefactorsaffectingrobustness.Inspired by Dan et al.14,we propose a dataset namedMVTec-Cwithdifferenttypesofcorruptionstoinvestigate
14、the robustness of unsupervised anomaly detection methodssystematically(see Fig.1).We select the corruption typesconsideringtheuncertaintiesasmuchaspossibleinthepro-duction scenario,including Gaussian noise,Poisson noise,motionblur,defocusblur,light,contrast,JPEGcompression,and geometry,with each cor
15、ruption corresponding to fiveseveritylevels.Weconsiderrobustnesstomeanhowwelltheperformanceontheoriginaldatacanbemaintainedoncor-rupteddata,sowedefinetherelativecorruptionperformancetomeasuretherobustnessofthemodel.To understand the robustness of existing unsupervisedArticlehttp:/Received:January 01
16、,2023;Accepted:March 22,202301031DOI:10.52396/JUSTC-2022-0165JUSTC,2024,54(1):0103anomalydetectionmodels,weselectoneortworepresentat-ivemethodsforevaluationfromeachofthefiveclassesofanomalydetectionmethods.Wefindthattherepresentationsimilarity-based and knowledge distillation-based methodsachievethe
17、 best balance between performance and robust-ness,whilethereconstruction-basedmethodshowsgoodro-bustnessattheworstperformance.SincePatchCoreandRe-verseDistillationbothhavehighperformanceandhighro-bustness,studyingoftheirrobustnessisvaluableforpracticalapplications.Therefore,weinvestigatetheeffectsof
18、differentfactorsontherobustnessofthesetwomethodsthroughex-haustiveablationexperiments.Wedrawseveralconclusionsbased on the experimental results:()High-level featurescanhelpconstructrobustfeaturerepresentations.()Multi-scalefeaturesarebeneficialforimprovingperformanceandrobustness.()Neighborhoodaggre
19、gationcannotimprovethe generalization capability of features.()The memorybank can maintain good robustness even with a smallsamplingratio,soitissuitableformodelinganormalsampledistribution.()Informationbottlenecksmakethefeaturesmorecompactandthusimprovetherobustnessofreversedis-tillation.()Multi-sca
20、le distillation can consider differentdefectsizesandhencecanimproverobustness.Toconstructarobustanomalydetectionmodel,wefocuson representation similarity-based methods(which performbestconsideringrobustnessandperformancesimultaneously)and attempt to optimize the feature representation.Defectscausead
21、riftinthefeatureswerelyontodiscriminatedefects.However,corruptionsintroduceanadditionaldriftthatleadstoadecreaseintheaccuracyofdetectingdefects.Consider-ingthedifferentcharacteristicsofcorruptionsanddefects,wehypothesizethatcorruptionsleadtogloballyconsistentdrift,whiledefectsleadtolocaldriftinfeatu
22、res.Basedonthishy-pothesis,weproposeafeaturealignmentmodule(FAM)toreducethegloballyconsistentdriftwhilepreservingthelocaldriftcausedbydefects,thusobtainingafeaturerepresentationthatisrobusttocorruptions.WeapplytheFAMtoPatch-Coreandsignificantlyimproveitsrobustnesswhilemaintain-ing high performance.O
23、verall,our contributions are asfollows:()Weconstructarobustbenchmarkforunsupervisedan-omalydetectionmethods,includingadatasetwitheightcor-ruption types,five severity levels,and metrics to assessrobustness.()Weevaluatetheperformanceandrobustnessofmain-streamunsupervisedanomalydetectionmethodsandfindt
24、hatrepresentation similarity-based and knowledge distillation-basedapproachesarethebestparadigmsintermsofperform-anceandrobustness.()Thedifferentcomponentsofthetwobest-performingmethodsarestudiedforablation,thushelpingtounderstandtheimpactofdifferentfactorsonrobustness.()Weproposeafeaturealignmentmo
25、duletorectifythecorrupted features.Combining the proposed module withPatchCoreyieldsamodelwithrobustnesswhilemaintaininghighperformance.2 Related work2.1 Unsupervised anomaly detectionInrecentyears,researchersinterestinanomalydetection,es-peciallyunsupervisedanomalydetection,hasincreasedrap-idly,and
26、manydatasetsandunsupervisedanomalydetectionalgorithms have been proposed.The most frequently useddatasetistheMVTec13thatcontainsfivetextureandtenob-jectclasses,totaling5354images.Toidentifydefectsfromimages,unsupervisedanomalyde-tectionalgorithmsshouldproduceeithertheimage-levelan-omalyscorerequired
27、foranomalydetectionorthepixel-levelanomaly map as required for the anomaly segmentation orboth.Themainstreamunsuperviseddetectionalgorithmscanbeclassifiedintofivecategories:reconstruction-based,rep-resentation similarity-based,self-supervised representationlearning-based,normalizing flow-based,and k
28、nowledgedistillation-based.Reconstruction.Reconstruction-basedapproaches typic-allyusegenerativeadversarialnetworks,autoencoders,andvariationalautoencoderstoreconstructtheinputimageundertheassumptionthatamodeltrainedonnormalsamplescancorrectlyreconstructonlynormalregionsbutnotdefectivere-gions.AnoGA
29、N1 used generative adversarial networks tolearnthedistributionofnormalimages,andthetrainingphasetrainsageneratorofnormalimagepatches.Inthetestphase,foreachpatchinthetestimage,thehiddenvectorisiterat-ivelyadjustedusingthediscriminantscoreandtheintermedi-atefeaturesofthediscriminatorasaguide.Finally,t
30、heresid-ualmapofthetestimageandtheoutputimageofthegenerat-orarecombinedwiththeresiduallossofthediscriminatorsintermediate features to identify defects.The f-AnoGAN18usedanautoencodertogeneratethehiddenvector,avoidingthe problem of time-consuming iterative optimization inAnoGAN.Ganomaly2detecteddefec
31、tsbasedonanencoder-decoder-encoderstructurebycomparingtheencodingofthefirstencoderwiththeencodingreconstructedbythesecondContrastMotion blurOriginalContrastMotion blurOriginalBightnessDefocus blurJPEG compressionBightnessDefocus blurJPEG compressionImpulse noiseGeometryGaussian noiseImpulse noiseGeo
32、metryGaussian noiseContrastMotion blurOriginalBightnessDefocus blurJPEG compressionImpulse noiseGeometryGaussian noiseFig.1.Sampleswithdifferentcorruptiontypes.Thefirstimageistheori-ginalimageinMVTec.RobustnessbenchmarkforunsupervisedanomalydetectionmodelsWangetal.01032DOI:10.52396/JUSTC-2022-0165JU
33、STC,2024,54(1):0103encoder.AE-SSIM3 introduced the structural similarityindexmeasure(SSIM)asthereconstructionlossoftheau-toencoder.Toaddresstheproblemthatdefectsareunexpec-tedlywellreconstructedintheautoencoder-basedapproaches,RIAD19proposedtrainingantheautoencoderbyrecoveringimagesthatarepartiallye
34、rased.Representation similarity.In representation similarity-basedmethods,thenormalfeaturedistributionisfirstexpli-citlymodeled during training.K nearest neighbors or Ma-halanobisdistanceisusedtocalculatethesimilaritybetweentestfeaturesandnormalfeaturesduringtesting.Sampleswithlowsimilarityareconsid
35、ereddefective.GaussianAD20pro-posedtheuseofamultivariateGaussianmodeltodescribethe distribution of pre-trained features of normal samples.SPADE4savedpatch-levelfeaturesandimage-levelfeaturesextracted from ResNet;K nearest neighbors searched fromimage-level normal features are used to compute anomaly
36、scores;andpatchfeaturesoftheseselectedimagesarefurtherusedtocomputepatch-levelanomalymaps.PaDim5estim-atedamultivariateGaussianmodelateachpatchpositionandusesarandomdimensionselectionstrategytoreducethesizeof the patch features.PatchCore6 employed a position-independentfeaturememorybankandusesagreed
37、yselec-tionstrategytoreducethesizeofthememorybank.Self-supervised representation learning.Severalmethods2123arededicatedtolearningdiscriminativerepres-entationsthroughproxytaskssuchaspredictingthegeomet-ric transformation of the image or contrast learning24.However,thesemethodsarelimitedtolearninghi
38、gh-levelse-manticinformation.CutPaste9constructedanomaliesbyran-domlycroppingandpastingimageblocksanddemonstratesthatthenetworksperceptionofsuchanomaliescanbegener-alizedtoactualdefects.Normalizing flow.Thenormalizingflow-basedapproachusesnormalizingflowstotransformthenormalfeaturedistri-bution into
39、 a Gaussian distribution.The network outputs aprobabilityindicatingwhethertheinputiswithinthenormaldistribution,andalowprobabilityisexpectedwhentheinputisdefective.DifferNet7appliedanormalizingflowtoimage-levelpre-trainedfeaturestoobtaintheimage-levelanomalyscore.Cflow8formedaconditionalnormalizingf
40、lowbyin-corporatingpositionalencodingintheflow,thentheanom-aly map is obtained by sliding the conditional normalizingflowoverpatches.Knowledge distillation.In knowledge distillation-basedapproaches,theteachermodelandthestudentmodelareex-pectedtoproducefeatureswithdifferencesinthedefectiveimages1012,
41、25 that use a set of teacherstudent model pairswithdifferentperceptualfieldstocomputeresidualmapsatmultiplescales.Salehietal.11proposedthatthedistillationoffeaturesfromanexpertnetworkatvariouslayersisprefer-abletothatofasinglelayer.Intheseapproaches,theteacherandstudentmodelsusesimilarstructures,lea
42、dingtoacon-vergenceoftheteacherandstudentmodelsrepresentationsof defects.Deng et al.12 proposed a reverse distillationparadigmtosolvethisproblem.2.2 Evaluating the robustness of anomaly detectionmodelsInrecentyears,manyworkshavefocusedonevaluatingtherobustnessofanomaly-basedintrusiondetectionsystems
43、.Inintrusiondetectionscenarios,attacksareappliedtoanomalydetectionmodels,andthechangesinmodelperformancearecalculatedasrobustnesscriteria.Researchersarededicatedtodesigning better attacks to evaluate model weaknesses.Goodgeetal.26designeddifferenttypesofattackstoevalu-atemodelrobustnessandimprovethe
44、robustnesstoadversari-alattacksbyoptimizingthelatentrepresentation.Schneideretal.27proposedusingdifferentfeaturerobustnessmetrics,andHanetal.28proposedusinggray/black-boxtraffic-spacead-versarialattackstoevaluatemodelrobustness.Tobetterde-terminewhethertheattackedsamplesareanomalies,Gmezetal.29propo
45、sedusingmultiplesupportingmodelsandfoundthat the 1D CNN is more robust than LSTM according totheirevaluationmethod.Theimpactofdatapoisoningonan-omalydetectionsystemshasalsobeenstudied3032.However,thesestudiesontherobustnessofanomalydetec-tionhavefocusedmainlyonintrusiondetectionscenariosandarediffic
46、ulttoapplytootherscenariosforthefollowingreas-ons.First,mostoftheseworkshaveevaluatedtherobustnessagainstadversarialattacksordatapollution,buttherobust-nessagainstcommoncorruptionisalsoimportant,especiallyinindustrialdefectdetectionscenarios.Second,anomalyde-tectionmodels in intrusion detection scen
47、arios mainly ad-dressone-dimensionaltime-seriesdata,whileimageanomalydetectionmodelsprocessmulti-dimensionalimagedata,androbustmodelsneedtoadapttomorecomplexenvironmentalchanges.Tothebestofourknowledge,researchonthero-bustnessofimage-basedanomalydetectioninscenariossuchasindustrialdefectdetectionisl
48、acking.2.3 Benchmarking robustness to common corruptionsManymethodshavebeenproposedforstudyingtherobust-nessofCNNsagainstcommonimagecorruptions1417,33,34.Toinvestigatetherobustnessofdifferentclassificationnetworksagainstcommonimagecorruptionsandperturbations,Danetal.14 proposed two datasets,ImageNet
49、-C and ImageNet-P.Michaelisetal.15provideabenchmarkforevaluatingtheper-formanceofatargetdetectionmodelinthefaceofimagecor-ruptions.Similarly,Kamannetal.16accessedtherobustnessofthesemanticsegmentationmodeltoreal-worldimagecor-ruptions.Altindis et al.33 evaluated instance segmentationmodelswithreal-w
50、orldimagecorruptionsandout-of-domainimages.Wang et al.17 constructedthree robustness bench-marks to study the drawbacks of human pose estimationmodels.3 Methods3.1 Robustness benchmarkWe construct the robust benchmark dataset MVTec-C byaddingeightcorruptiontypeswithfiveseveritylevelstoeachimageinthe