基于知识蒸馏的GAN生成图像质量评价方法.pdf

资源描述

1、PRINTING AND DIGITAL MEDIA TECHNOLOGY STUDY Tol.228 No.1 2024.02印刷与数字媒体技术研究 2024年第1期（总第228期）RESEARCH PAPERS研究论文GAN Image Quality Assessment Method Based on Knowledge DistillationYAN Jia-kuo1,SI Zhan-jun1,2*(1.College of Light Industry Science and Engineering,Tianjin University of Science and Technol

2、ogy,Tianjin 300457,China;2.College of Artificial Intelligence,Tianjin University of Science and Technology,Tianjin 300457,China)Abstract Aiming to enhance the prediction accuracy of image quality generated by GAN model to better align with human subjective assessment of image quality,in this study,a

3、 semi-supervised Image Quality Assessment(IQA)method based on knowledge distillation was introduced.This method combined CNN and ViT models to fully capture the global and local information.The distribution differences of features between high-quality images and distorted images were learned and the

4、 advanced feature information was transferred through knowledge distillation.Image quality assessment scores were obtained through forward propagation.To increase the diversity of input features and improve the processing speed of the model a Cascaded Group Attention(CGA)mechanism was employed for i

5、nput feature processing.Experimental results on multiple public datasets demonstrated that this method outperforms existing evaluation methods,yielding overall favorable outcomes,exhibiting relatively robust performance and can achieve IQA results that better align with human visual perception.Key w

6、ords Knowledge distillation;GAN model；Image quality assessment;Cascaded Group Attention基于知识蒸馏的GAN生成图像质量评价方法闫嘉阔1，司占军1,2*（1.天津科技大学轻工科学与工程学院，天津 300457；2.天津科技大学人工智能学院，天津 300457）摘要为了是提高GAN模型生成的图像质量的预测精度，以更好地符合人类对图像质量的主观评价，本研究介绍了一种基于知识蒸馏的半监督图像质量评价（IQA）方法。该方法利用CNN和ViT模型的结合，充分捕获全局和局部信息，学习高质量图像与失真图像之间特征的分

7、布差异，并通过知识蒸馏传输高级特征信息。模型通过前向传播，得到图像质量评价分数。为了提高输入特征的多样性和模型的处理速度，本研究还采用了级联分组注意（CGA）机制对输入特征处理。通过在多个公共数据集上的实验表明，本研究方法的评价结果优于现有的评价方法，总体效果理想，相对鲁棒性相对优异，可以获得更加符合人类视觉效果的IQA结果。收稿日期：2023-06-10修回日期：2023-09-02*为通讯作者本文引用格式：YANJia-kuo，SIZhan-jun.GANImageQualityAssessmentMethodBasedonKnowledgeDistillationJ.Printingan

8、dDigitalMediaTechnologyStudy，2024，(1)：51-59.2024年1期印刷与数字媒体技术研究.indd 512024年1期印刷与数字媒体技术研究.indd 512024/3/6 15:48:112024/3/6 15:48:1152印刷与数字媒体技术研究2024年第1期（总第228期）0 IntroductionIn recent years,with the continuous development of image processing and computer vision technology,image quality assessment has

9、 become an important research field.Currently,some models for image restoration,enhancement,and generation have significantly improved the quality of generated images,such as the use of Generative Adversarial Networks(GAN)in image processing 1-4,5.However,traditional image quality evaluation methods

10、 often yield low quality scores for these generated images 3-4.Therefore,there is a need to develop new methods that better align with the Mean Opinion Score(MOS)of human perception for image quality evaluation.In this regard,the combination of image quality assessment and deep learning techniques i

11、s an effective approach,which can help us achieve more accurate and reliable image quality evaluation.Deep learning-based image quality assessment methods typically require a dataset of distorted images and corresponding subjective scores to obtain evaluation results.However,it is often difficult to

12、 obtain the original reference images,and only the distorted or generated images are available 6.In this study,a GAN-based no-reference image quality assessment method using knowledge distillation to better evaluate image quality was introduced.This method used knowledge distillation to transfer fea

13、ture difference information learned by the teacher model to the student model for sufficient training.To extract feature information that is more in line with human visual characteristics,a CNN+Vision Transformer(ViT)model was used to extract global and local features of the image,and a Cascaded Gro

14、up Attention(CGA)mechanism was added to process the input features,thereby enhancing the diversity of input features and improving the accuracy and reliability of image quality evaluation.Our proposed method outperforms existing methods in some relevant experimental indicators.This indicates that ou

15、r knowledge distillation-based GAN-generated image quality assessment method has higher accuracy and reliability,and can provide better solutions in practical application scenarios.1 Related Work1.1 Image Quality AssessmentImage Quality Assessment(IQA)quantifies the degree of quality degradation of

16、images after a series of operations such as acquisition,compression,and generation.Therefore,IQA plays a crucial role in most image processing tasks,and evaluation methods mainly include subjective quality evaluation and objective quality evaluation 7.Subjective quality evaluation refers to obtainin

17、g the visual quality of images by subjective judgment,which is the most accurate method for measuring image quality.It is usually represented by MOS of subjective evaluation,but it is very labor-intensive and time-consuming.Objective quality evaluation refers to predicting the visual quality of imag

18、es through the design of IQA algorithms and can be divided into three categories:Full-Reference IQA(FR IQA),Reduced-Reference IQA(RR IQA),and No-Reference IQA(NR IQA).关键词知识蒸馏；GAN模型；图像质量评价；级联分组注意力中图分类号 TP391.41文献标识码 A文章编号 2097-2474(2024)01-51-09DOI 10.19370/10-1886/ts.2024.01.0062024年1期印刷与数字媒体技术研究.i

19、ndd 522024年1期印刷与数字媒体技术研究.indd 522024/3/6 15:48:112024/3/6 15:48:1153研究论文YAN Jia-kuo et al:GAN Image Quality Assessment Method Based on Knowledge DistillationFR IQA methods require the use of the original image as a reference,and commonly used evaluation metrics include Mean Square Error(MSE),Peak Si

20、gnal-to-Noise Ratio(PSNR),and Structural Similarity Index Measurement(SSIM).These metrics are calculated based on the pixel differences between the reference image and the tested image,and can provide relatively accurate evaluation results.However,FR IQA methods require the use of the original image

21、 as a reference and do not apply to situations where the original image is not available.RR IQA methods use partial reference image information for evaluation,and commonly used evaluation methods include those based on the original image features,wavelet domain statistical models,and multi-scale geo

22、metric analysis.These methods can evaluate image quality without a complete reference image,but the evaluation results may not be accurate enough due to the limited information used.NR IQA methods do not require reference images and evaluate image quality by calculating the statistical characteristi

23、cs of the tested image.This method has the widest range of applicability,but the evaluation results may not be accurate due to the lack of comparison with a reference image.1.2 Generated Image Quality Assessment Generated image quality assessment is a special evaluation method that focuses on the qu

24、ality evaluation of images generated by GAN models5.Unlike other evaluation methods,generated images may contain specific distortions,such as checkerboard patterns,unreasonable structures,and so on.Algorithms based on GAN usually do not match the quality of detail loss,as shown in the distortion exa

25、mple based on GAN in the following Fig.1.Therefore,the quality evaluation of generated images needs to be evaluated based on the characteristics of GAN models,rather than natural images.Image quality assessment is a fundamental problem in the fields of image processing and computer vision,aiming to

26、measure the perceived quality of images using computational models.From the early days of image quality databases with limited image data and single distortion types to the current databases with diverse distortion types and large amounts of image data,from the combination of feature engineering and

27、 traditional machine learning algorithms to the current end-to-end deep learning models,from relatively simple evaluation metric to the diverse application of related evaluation metrics,all of these have witnessed Face aging with conditionalgenerative adversarialnetworksTransposed convolutional neur

28、alnetworkTransposed convolutional neuralNetworkBigger batchStyle-based generatorarchitectureText to photo-realisticimage synthesisUnpaired image-to-imagetranslationGANDCGANBigGANStyleGANStackGANCycleGANAgeGANDifficultSimpleModel complexityHighLowQuality and diversityFig.1 Distortion examples based o

29、n GAN图1基于GAN模型的失真示例2024年1期印刷与数字媒体技术研究.indd 532024年1期印刷与数字媒体技术研究.indd 532024/3/6 15:48:122024/3/6 15:48:1254印刷与数字媒体技术研究2024年第1期（总第228期）the rapid development of the IQA research field.With the advancement and development of deep learning technology,a series of image generation models have also achieve

30、d significant progress in generating images that are more in line with human visual perception,such as GAN-based image generation algorithms,VAE-based image generation technology,and so on.As GAN continues to progress,the quality of images generated by GAN and the complexity of models are also rapid

31、ly developing,as shown in Fig.2.However,the images generated by deep learning models exhibit completely different characteristics from traditional image distortions such as blur,noise,and compression artifacts,so a new IQA method needs to be proposed to better reflect the perceived image quality.1.3

32、 Knowledge DistillationKnowledge Distillation is a model compression tech-nique that aims to approximate a large and complex model by training a smaller model.As shown in Fig.3,the basic idea of this method is to transfer the knowledge of the complex model(such as the output probability distribution

33、 of the model)to the small model,thereby improving the performance of the small model 8.Specifically,the process of knowledge distillation includes two stages.In the first stage,the pre-trained complex model is used to perform forward inference on the training set to obtain the output probability di

34、stribution of each sample in the training set.In the second stage,these probability distributions are used as“soft labels”to train a small model,so that it can output probability distributions similar to those of the complex model.Specifically,the cross-entropy loss function can be used to train the

35、 small model,while considering both the original“hard labels”(i.e.,class labels)and“soft labels”(i.e.,output probability distributions of the complex model).The advantage of knowledge distillation is that it can significantly reduce the size and computational complexity of the model without sacrific

36、ing model performance.In addition,knowledge distillation can also help train more robust and generalizable models by using the knowledge of the complex model to compensate for the deficiencies of the small model.2 IQA Based on Knowledge DistillationOur method used two deep convolutional neural BlurS

37、RGANNoiseRankSRGANComfort NoiseESRGANJPEGBOE(GAN)RCANEPSR(GAN)GAN-based algorithms outputsReference imageFig.2 Flowchart of the changes in generated image quality,diversity,and model complexity图2 生成的图像质量、多样性和模型复杂度的变化流程图TeacherStudentHard labelsTrue labelPredictionsPredictionsSoft labelsDistilledknow

38、ledgeTraining dataPer-trainedTo be trainedFig.3 Knowledge Distillation model framework图3 知识蒸馏模型框架2024年1期印刷与数字媒体技术研究.indd 542024年1期印刷与数字媒体技术研究.indd 542024/3/6 15:48:122024/3/6 15:48:1255研究论文YAN Jia-kuo et al:GAN Image Quality Assessment Method Based on Knowledge Distillationnetwork models,namely the

39、main teacher model and the auxiliary student model.The main teacher model uses Inception-ResNet-v2 as the pre-trained model to extract features from the reference and distorted images,compare them,and generate quality scores.The auxiliary student model is a lightweight network structure that learns

40、relevant features from the main teacher model through knowledge distillation,introduces the CGA attention mechanism,and adds noise to improve its performance.Specifically,our proposed image quality assessment method based on knowledge distillation can be roughly divided into three parts:the input la

41、yer(for image preprocess),the feature extraction layer(for obtaining high-level feature representations of images),and the quality prediction network(for obtaining image quality scores).The overall model structure was shown in Fig.4.In the image preprocessing stage,we obtained the minimum just-notic

42、eable-difference image that can effectively express the distortion sensitivity characteristics and visual masking effects of the human visual system.The feature extraction layer captures image features,and the feature extraction network uses Inception-ResNet-v2 pre-trained on ImageNet to extract fea

43、ture maps from the reference and distorted images,achieving more reasonable and comprehensive image information acquisition.To better conform to human visual characteristics,we added the CGA attention mechanism to improve the expressive power of the model.Finally,the quality prediction network obtai

44、ns image quality scores.The quality prediction network is based on fully connected layers and introduces a semantic information branch that perceives overall content changes in the image to simulate the subjective evaluation process of humans and improve the rationality of the evaluation results.2.1

45、 Image Feature Extraction We used the Inception-ResNet-v2 network pre-trained on ImageNet to extract feature maps from the reference and distorted images,and incorporate a Transformer model to encode the input image features into a high-dimensional feature vector that contains various feature inform

46、ation.To better conform to human visual characteristics,we added the CGA(Cascaded Group Attention)attention mechanism to enhance the diversity of input features and improve the expressive power of the model.Our method received a tuple x containing two images as input,where the first image is the ref

47、erence image,and the second image is the image to be evaluated.The method first uses the backbone method to extract features from the reference and evaluated images,and enhances the diversity of input features through the CGA attention mechanism.It then calculates the difference between the two feat

48、ure vectors,obtaining a feature vector representing the difference between the two images.Next,convolutional layers are used to process the reference image and feature vector,obtaining corresponding feature tensors.The feature InceptionResnet-V2Input layerFeatureextraction layerQualityevaluation net

49、workImagepreprocessingTransformer1*1ConvMulti resolutionfeature extractionCGADSCEncoderDecoderMLP headMOSFig.4 Flowchart of GAN-generated Image Quality Assessment图4 GAN生成图像质量评价流程图2024年1期印刷与数字媒体技术研究.indd 552024年1期印刷与数字媒体技术研究.indd 552024/3/6 15:48:122024/3/6 15:48:1256印刷与数字媒体技术研究2024年第1期（总第228期）tensor

50、 is then added to a learnable embedding vector through an embedding layer to improve the models expressive ability.Finally,the feature tensor is encoded using the encoder method,and the reference image and encoded feature tensor are decoded using the decoder method to obtain the evaluation result.2.

展开阅读全文