秦晓飞系列-深度学习-4.2深层卷积网络-实例探究.ppt

资源描述

单击此处编辑母版标题样式,单击此处编辑母版文本样式,第二级,第三级,第四级,第五级,2019/6/30,#,深层卷积网络：实例探究,Deep Convolutional Models:Case Studies,主讲教师：,秦晓飞,上海理工大学光电学院,2.1 Why Look at Case Studies,2.1 Why Look at Case Studies,CONV,，,POOL,，,FC are building blocks of neural network.Like referencing others codes,learning some validated models can also fertilize your own models.,Outline,Classic networks:,LeNet-5 (1990s),AlexNet(2012,Hindon and his student Alex Krizhevsky,ImageNet Championship),VGG(2014,Oxford Visual Geometry Group,ILSVRC Runner-up,140M parameters),ResNet,(CVPR 2016,Jian Sun,Kaiming He,Xiangyu Zhang)(34,50,101,152 layers),Inception,(Google,V1(2014,ILSVRC Championship),V2,V3,V4,increase networks width),2.2 Classic Networks,2.2 Classic Networks,LeNet-5,LeCun et al.,1998.Gradient-based learning applied to document recognition,32321,28286,101016,Avg Pool,14146,Avg Pool,5516,FC,FC,LeNet-5 have only 60k parameters,which is much smaller than modern Nets.However it has the same patterns with modern Nets:,Details of original LeNet-5(Optional),Original LeNet-5 use sigmoid/tanh instead of ReLU as active functions.,Original LeNet-5 use non-linearity after pooling instead of after conv.,Due to computation capacity limits,original LeNet-5 take complex measures to compress parameters.,Original LeNet-5s last layer is not softmax layer.,2.2 Classic Networks,AlexNet,Krizhevsky et al.,2012.ImageNet classification with deep convolutional neural networks,2272273,555596,2727256,Max Pool,272796,Max Pool,1313256,FC,FC,Details of AlexNet,AlexNet is the first net encourages a large number of CVers applying deep neural networks in CV.,Compared with LeNet-5,AlexNet is much larger,having 60M parameters.,Compared with LeNet-5,AlexNet use ReLU instead of Sigmoid.,(Optional)Due to the GPU speed at that time is slow,original AlexNet take complex measures to handle communications between multiple GPUs.,(Optional)Original AlexNet use local response normalization(LRN),which is now not used today.,1313384,1313384,1313256,Max Pool,66256,=9216,9216,4096,4096,Softmax 1000,=,2.2 Classic Networks,VGG-16,Simonyan&Zisserman 2015.Very deep convolutional networks for large-scale image recognition,2242243,22422464,FC,4096,VGG-16 is almost always works as well as VGG-19,so,VGG-16 is used much more often,.,VGG dont have many hyperparameters,because it,only use two building blocks,:CONV 33 filter,s=1,same&Max-Pool 22,s=2,VGG-16 has about 138M parameters,which is,very large,even compared to todays more advance networks.VGG is attracting because is,simple and regular structures,:double#channels every conv part,half resolutions every pool.,CONV,64,2,2242243,22422464,22422464,22422464,POOL,112112128,CONV,128,2,5656128,POOL,5656256,CONV,256,3,2828256,POOL,2828512,CONV,512,3,1414512,POOL,1414512,CONV,512,3,77512,POOL,FC,4096,Softmax,1000,2.3 Residual Networks(ResNets),2.3 Residual Networks(ResNets),Residual block,KM He et al.,2015.Deep residual networks for image recognition,Because of gradient exploding and vanishing,deep neural networks are very difficult to train.,“Main path”,“Short cut”/”Skip connection”,2.3 Residual Networks(ResNets),Residual Network,KM He et al.,2015.Deep residual networks for image recognition,“Plain network”,“ResNet”,“Plain network”,“ResNet”,Training error,#layers,Training error,#layers,theory,reality,2.4 Why ResNets Work?,2.4 Why ResNets Work?,Key intuition of ResNets:,Residual block with skip connection make the 2/3 extra layers,at least do not hurt,the performance of original network,and whats more,often,is,the 2/3 extra layers might be,lucky to learn something useful,which will improve the performance of original network.,Original big NN,Original big NN,As example,we use,as active function,so,L2 regularization let,and,so,This identity function makes network with residual block tends to behave like the original network.,In order to let skip connections make sense,ResNet,use many same convolution,.If dimensions are different due to pool or other operation,we need an,adjust matrix,which can be learnable parameter or fixed matrix that padding 0s to make dimensions same.,2.4 Why ResNets Work?,2.5 Network in Network and 11 Convolutions,2.5 Network in Network and 11 Convolutions,Why does a 11 convolution do?,Lin Min,Chen Qiang,Yan Shuicheng.Network in network,1,2,3,6,5,8,3,5,5,1,3,4,2,1,3,4,9,3,4,7,8,5,7,9,1,5,3,7,4,8,5,4,9,8,3,5,2,*,=,2,4,6,12,10,16,6,10,10,2,6,8,4,2,6,8,18,6,8,14,16,10,14,18,2,10,6,14,8,16,10,8,18,16,6,10,*,=,11 convolution is called network in network,because it can be seen as full connection network of channels.,2.5 Network in Network and 11 Convolutions,One application of 11 convolution is to compress#channel.,You can also use 11 convolution to keep or even increase#channel.In this case,the key function of 11 convolution is to add some non-linearity to original network.,2.6 Inception Network Motivation,2.6 Inception Network Motivation,The key motivation of inception network is,you do not have to choose which filter(33,55,or even pooling filters)to use,it let the network learn itself which to use.,Christian Szegedy,Liu Wei,Jia Yangqing,et al.2014.Going deeper with convolutions,2.6 Inception Network Motivation,One important problem of inception network is computation cost is very large.We use 55 conv filters as example:,The computation cost of 1 conv is,the total#points of output is,the total computation cost:,Using 11 convolution to compress computation cost.,2.7 Inception Network,2.7 Inception Network,Inception module,(All dimensions here are hypotheses for illustration),Previous,Activation,11,CONV,55,CONV,11,CONV,33,CONV,11,CONV,MAXPOOL,33,s=1,same,11,CONV,Channel,Cascade,2.7 Inception Network,Christian Szegedy,Liu Wei,Jia Yangqing,et al.2014.Going deeper with convolutions,Previous,Activation,11,CONV,55,CONV,11,CONV,33,CONV,11,CONV,MAXPOOL,33,s=1,same,11,CONV,Channel,Cascade,Softmax,Interlayer features are useful for predictions.Using interlayer features are helpful to avoid overfitting.,2.7 Inception Network,Naming of Inception Network,Inception net is also called,GoogLeNet,(not GoogleNet)to pay respect to LeNet.,Inception comes from the movie“Inception”,（盗梦空间）,2.8 Using Open-source Implementations,2.8 Using Open-source Implementations,Many neural networks are very complicated,and have so many details,it is very difficult to re-implement by just reading the papers.Fortunately,many DL researcher are used to public their codes on open source website,such as GitHub.,The ResNets URL on GitHub:,Transfer Learning,2.9 Transfer Learning,Transfer learning if very useful for DL,especially in CV.There are some large scale datasets,such as ImageNet,MSCOCO,Pascal etc.,on which CVers have spent many weeks or even months on training their neural networks with multiple GPU resources.,You can reuse these,pre-trained,neural networks by download these networks,as well as their weights,then,retrain,them with your own smaller scale data set.,Task1:Cat classifier,Task2:radiology diagnosis,Retrain options:first randomly initialize the new,then,Retrain only the last layers weights,if,is small.,Retrain only the last few layers weights,if,is not large.,Retrain all the layers weights,if,is large.,2.10 Data Augmentation,2.10 Data Augmentation,CV tasks almost always do not have enough data to train complex models,so CVers almost always use data augmentation to get more data.,Some common methods:,Mirroring,Random Cropping,Rotation,Shearing,Local warping,2.10 Data Augmentation,Color shifting can make your model more robust to color.,R,G,B,+20,-20,+20,-20,+20,+20,+5,0,+50,The AlexNet paper used a method called PCA color shifting,which can handle color distortion problem better.,2.10 Data Augmentation,CV data sets are often very large,it is impossible to feed all data,included the augmented data,in the training algorithm at a time.What is often done is use CPU/GPU constantly read data streams from data pool in hard disk,then use different CPU/GPU treads to hand different data augmentation algorithms,which generate training mini-batch data.,Hard disk,Different threads,distortion,mirror,sheer,Mini-batch data,CPU/GPU,Training,2.11 The State of Computer Vision,2.11 The State of Computer Vision,Almost all DL problem can find a place in the data spectrum.The more data you have,the less hand-engineering you need to do.,Little data,Lots of data,More hand-engineering(hacks),Simpler algorithms,less hand-engineering,Speech,Recognition,Image,Recognition(classification),Object Detection,Two sources of knowledge,Labeled data,Hand engineered features/network architecture/other components,Transfer learning often helps when data is small.,2.11 The State of Computer Vision,Tips for doing well on benchmarks/winning competitions.(However Andrew Ng do not thins these tips are much useful in practical systems),Ensembling,Train several(e.g.3-15)networks independently and average their outputs.,Multi-crop at test time,Run classifier on multiple versions of test images and average results.E.g.10-crop method.,2.11 The State of Computer Vision,Use open source code,Use architectures of networks published in the literature,Use open source implementations if possible,Use pretrained models and fine-tune on your dataset,谢谢！,

展开阅读全文