利用空间上下文做物件识别.ppt

资源描述

,Click to edit the title text format,Click to edit the outline text format,Second Outline Level,Third Outline Level,Fourth Outline Level,Fifth Outline Level,Sixth Outline Level,Seventh Outline Level,Eighth Outline Level,Ninth Outline Level,利用空间上下文寻找识别物体,原作者：,Geremy Heitz,，,Daphne Koller,Stanford University,Things vs.Stuff,Stuff,(n):Material defined by a homogeneous or repetitive pattern of fine-scale properties,but has no specific or distinctive spatial extent or shape.,Thing,(n):An object with a specific size and shape.,From:Forsyth et al.,Finding pictures of objects in large collections of images,.,Object Representation in Computer Vision,1996.,Finding Things,Context is key!,Outline,What is Context?,The Things and Stuff(TAS)model,Results,Satellite Detection Example,D(W)=0.8,D(W)=0.8,Error Analysis,Typically,We need to look outside the bounding box!,False Positives areOUT OF CONTEXT,True Positives areIN CONTEXT,Types of Context,Scene-Thing:,Stuff-Stuff:,gist,car“likely”,keyboard“unlikely”,Thing-Thing:,Torralba et al.,LNCS 2005,Gould et al.,IJCV 2008,Rabinovich et al.,ICCV 2007,Types of Context,Stuff-Thing:,Based on spatial relationships,Intuition:,Trees=no cars,Houses=cars nearby,Road=cars here,“Cars drive on roads”,“Cows graze on grass”,“Boats sail on water”,Goal:Unsupervised,Outline,What is Context?,The Things and Stuff(TAS)model,Results,Things,Detection“candidates”,Low detector threshold-“over-detect”,Each candidate has a detector score,Things,Candidate detections,Image Window,W,i,+Score,Boolean R.V.,T,i,T,i,=1:Candidate is a positive detection,Thing model,T,i,ImageWindow,W,i,Stuff,Coherent image regions,Coarse“superpixels”,Feature vector,F,j,in R,n,Cluster label,S,j,in 1C,Stuff model,Nave Bayes,S,j,F,j,Relationships,Descriptive Relations,“Near”,“Above”,“In front of”,etc.,Choose set,R=,r,1,r,K,R,ijk,=1:Detection i and region j have relation k,Relationship model,S,72,=Trees,S,4,=Houses,S,10,=Road,T,1,R,ijk,T,i,S,j,R,1,10,in,=1,The TAS Model,R,ijk,T,i,S,j,F,j,ImageWindow,W,i,W,i,:Window,T,i,:Object Presence,S,j,:Region Label,F,j,:Region Features,R,ijk,:Relationship,N,J,K,Supervisedin Training Set,AlwaysObserved,AlwaysHidden,Unrolled Model,T,1,S,1,S,2,S,3,S,4,S,5,T,2,T,3,R,2,1,above,=0,R,3,1,left,=1,R,1,3,near,=0,R,3,3,in,=1,R,1,1,left,=1,CandidateWindows,ImageRegions,Learning the Parameters,Assume we know,R,S,j,is hidden,Everything else observed,Expectation-Maximization,“Contextual clustering”,Parameters are readily interpretable,R,ijk,T,i,S,j,F,j,ImageWindow,W,i,N,J,K,Supervisedin Training Set,AlwaysObserved,AlwaysHidden,Learned Satellite Clusters,Which Relationships to Use?,Rijk=spatial relationship between candidate i and region j,Rij1=candidate in region,Rij2=candidate closer than 2 bounding boxes(BBs)to region,Rij3=candidate closer than 4 BBs to region,Rij4=candidate farther than 8 BBs from region,Rij5=candidate 2BBs left of region,Rij6=candidate 2BBs right of region,Rij7=candidate 2BBs below region,Rij8=candidate more than 2 and less than 4 BBs from region,RijK=candidate near region boundary,How do we avoid overfitting?,Learning the Relationships,Intuition,“Detached”R,ijk,=inactive relationship,Structural EM iterates:,Learn parameters,Decide which edge to toggle,Evaluate with,l,(T|F,W,R),Requires inference,Better results than using standard E,l,(T,S,F,W,R),R,ij1,T,i,S,j,F,j,R,ij2,R,ijK,Inference,Goal:,Block Gibbs Sampling,Easy to sample T,i,s given S,j,s and vice versa,Outline,What is Context?,The Things and Stuff(TAS)model,Results,Base Detector-HOG,Dalal&Triggs,CVPR,2006,HOG Detector:,Feature Vector X,SVM Classifier,Results-Satellite,Prior:Detector Only,Posterior:Detections,Posterior:Region Labels,Results-Satellite,40,80,120,160,0,0.2,0.4,0.6,0.8,1,False Positives Per Image,Recall Rate,Base Detector,TAS Model,10%improvement in recall at 40 fppi,PASCAL VOC Challenge,2005 Challenge,2232 images split into train,val,test,Cars,Bikes,People,and Motorbikes,2006 Challenge,5304 images plit into train,test,12 classes,we use Cows and Sheep,Base Detector Error Analysis,Cows,Discovered Context-Bicycles,Bicycles,Cluster#3,TAS Results Bicycles,Examples,Discover“true positives”,Remove“false positives”,BIKE,?,?,?,Results VOC 2005,Results VOC 2006,Conclusions,Detectors can benefit from context,The TAS model,captures,an important type of,context,We can,improve,any,sliding window,detector,using TAS,The TAS model can be,interpreted,and matches our intuitions,We can learn,which relationships,to use,Merci!,Object Detection,Task:Find the things,Example:Find all the cars in this image,Return a“bounding box”for each,Evaluation:,Maximize true positives,Minimize false positives,Sliding Window Detection,Consider every bounding box,All shifts,All scales,Possibly all rotations,Each such window gets a score:,D(W),Detections:Local peaks in D(W),Pros:,Covers the entire image,Flexible to allow variety of D(W)s,Cons:,Brute force can be slow,Only considers features in box,D,=1.5,D,=-0.3,Sliding Window Results,PASCALVisual Object Classes ChallengeCows 2006,score(A,B)0.5 TRUE POSITIVE,score(A,B)0.5 FALSE POSITIVE,B,A,Recall(T)=TP/(TP+FN)Precision(T)=TP/(TP+FP),score(A,B)=|AB|/|AUB|,D(W)T,Quantitative Evaluation,0,40,80,120,160,0.2,0.4,0.6,0.8,1,False Positives Per Image,Recall Rate,Prior:Detector Only,Posterior:TAS Model,Region Labels,Detections in Context,Task:Identify all cars in the satellite image,Idea:The surrounding context adds info to the local window detector,+,=,Houses,Road,Equations,Features:Haar wavelets,Haar filters and integral image,Viola and Jones,ICCV 2001,The average intensity in the block is computed with four sums independently of the block size.,BOOSTING!,Features:Edge fragments,Weak detector=Match of edge chain(s)from training image to edgemap of test image,Opelt,Pinz,Zisserman,ECCV 2006,BOOSTING!,Histograms of oriented gradients,Dalal&Trigs,2006,SIFT,D.Lowe,ICCV 1999,SVM!,

展开阅读全文