1、 附录(原文及译文) 翻译原文来自 Thomas David Heseltine BSc. Hons. The University of York Department of Computer Science For the Qualification of PhD. -- September 2005 - 《Face Recognition: Two-Dimensional and Three-Dimensional Techniques》 4 Two-dimensional Face Recog
2、nition 4.1 Feature Localization Before discussing the methods of comparing two facial images we now take a brief look at some at the preliminary processes of facial feature alignment. This process typically consists of two stages: face detection and eye localisation. Depending on the application
3、 if the position of the face within the image is known beforehand (for a cooperative subject in a door access system for example) then the face detection stage can often be skipped, as the region of interest is already known. Therefore, we discuss eye localisation here, with a brief discussion of f
4、ace detection in the literature review(section 3.1.1). The eye localisation method is used to align the 2D face images of the various test sets used throughout this section. However, to ensure that all results presented are representative of the face recognition accuracy and not a product of the p
5、erformance of the eye localisation routine, all image alignments are manually checked and any errors corrected, prior to testing and evaluation. We detect the position of the eyes within an image using a simple template based method. A training set of manually pre-aligned images of faces is taken,
6、 and each image cropped to an area around both eyes. The average image is calculated and used as a template. Figure 4-1 - The average eyes. Used as a template for eye detection. Both eyes are included in a single template, rather than individually searching for each eye in turn, as the chara
7、cteristic symmetry of the eyes either side of the nose, provides a useful feature that helps distinguish between the eyes and other false positives that may be picked up in the background. Although this method is highly susceptible to scale(i.e. subject distance from the camera) and also introduces
8、the assumption that eyes in the image appear near horizontal. Some preliminary experimentation also reveals that it is advantageous to include the area of skin just beneath the eyes. The reason being that in some cases the eyebrows can closely match the template, particularly if there are shadows in
9、 the eye-sockets, but the area of skin below the eyes helps to distinguish the eyes from eyebrows (the area just below the eyebrows contain eyes, whereas the area below the eyes contains only plain skin). A window is passed over the test images and the absolute difference taken to that of the avera
10、ge eye image shown above. The area of the image with the lowest difference is taken as the region of interest containing the eyes. Applying the same procedure using a smaller template of the individual left and right eyes then refines each eye position. This basic template-based method of eye local
11、isation, although providing fairly preciselocalisations, often fails to locate the eyes completely. However, we are able to improve performance by including a weighting scheme. Eye localisation is performed on the set of training images, which is then separated into two sets: those in which eye de
12、tection was successful; and those in which eye detection failed. Taking the set of successful localisations we compute the average distance from the eye template (Figure 4-2 top). Note that the image is quite dark, indicating that the detected eyes correlate closely to the eye template, as we would
13、expect. However, bright points do occur near the whites of the eye, suggesting that this area is often inconsistent, varying greatly from the average eye template. Figure 4-2 – Distance to the eye template for successful detections (top) indicating variance due to noise and failed dete
14、ctions (bottom) showing credible variance due to miss-detected features. In the lower image (Figure 4-2 bottom), we have taken the set of failed localisations(images of the forehead, nose, cheeks, background etc. falsely detected by the localisation routine) and once again computed the average dist
15、ance from the eye template. The bright pupils surrounded by darker areas indicate that a failed match is often due to the high correlation of the nose and cheekbone regions overwhelming the poorly correlated pupils. Wanting to emphasise the difference of the pupil regions for these failed matches an
16、d minimise the variance of the whites of the eyes for successful matches, we divide the lower image values by the upper image to produce a weights vector as shown in Figure 4-3. When applied to the difference image before summing a total error, this weighting scheme provides a much improved detectio
17、n rate. Figure 4-3 - Eye template weights used to give higher priority to those pixels that best represent the eyes. 4.2 The Direct Correlation Approach We begin our investigation into face recognition with perhaps the simplest approach,known as the direct correlation method (also referred
18、to as template matching by Brunelli and Poggio [ 29 ]) involving the direct comparison of pixel intensity values taken from facial images. We use the term ‘Direct Correlation’ to encompass all techniques in which face images are compared directly, without any form of image space analysis, weighting
19、schemes or feature extraction, regardless of the distance metric used. Therefore, we do not infer that Pearson’s correlation is applied as the similarity function (although such an approach would obviously come under our definition of direct correlation). We typically use the Euclidean distance as o
20、ur metric in these investigations (inversely related to Pearson’s correlation and can be considered as a scale and translation sensitive form of image correlation), as this persists with the contrast made between image space and subspace approaches in later sections. Firstly, all facial images must
21、 be aligned such that the eye centres are located at two specified pixel coordinates and the image cropped to remove any background information. These images are stored as greyscale bitmaps of 65 by 82 pixels and prior to recognition converted into a vector of 5330 elements (each element containing
22、 the corresponding pixel intensity value). Each corresponding vector can be thought of as describing a point within a 5330 dimensional image space. This simple principle can easily be extended to much larger images: a 256 by 256 pixel image occupies a single point in 65,536-dimensional image space a
23、nd again, similar images occupy close points within that space. Likewise, similar faces are located close together within the image space, while dissimilar faces are spaced far apart. Calculating the Euclidean distance d, between two facial image vectors (often referred to as the query image q, and
24、gallery image g), we get an indication of similarity. A threshold is then applied to make the final verification decision. d = q - g (d £ threshold ⇒ accept ) Ù (d > threshold ⇒ reject ) . Equ. 4-1 4.2.1 Verification Tests The primary concern in any face recognition system is its ability to c
25、orrectly verify a claimed identity or determine a person's most likely identity from a set of potential matches in a database. In order to assess a given system’s ability to perform these tasks, a variety of evaluation methodologies have arisen. Some of these analysis methods simulate a specific mod
26、e of operation (i.e. secure site access or surveillance), while others provide a more mathematical description of data distribution in some classification space. In addition, the results generated from each analysis method may be presented in a variety of formats. Throughout the experimentations i
27、n this thesis, we primarily use the verification test as our method of analysis and comparison, although we also use Fisher’s Linear Discriminant to analyse individual subspace components in section 7 and the identification test for the final evaluations described in section 8. The verification test
28、 measures a system’s ability to correctly accept or reject the proposed identity of an individual. At a functional level, this reduces to two images being presented for comparison, for which the system must return either an acceptance (the two images are of the same person) or rejection (the two ima
29、ges are of different people). The test is designed to simulate the application area of secure site access. In this scenario, a subject will present some form of identification at a point of entry, perhaps as a swipe card, proximity chip or PIN number. This number is then used to retrieve a stored im
30、age from a database of known subjects (often referred to as the target or gallery image) and compared with a live image captured at the point of entry (the query image). Access is then granted depending on the acceptance/rejection decision. The results of the test are calculated according to how m
31、any times the accept/reject decision is made correctly. In order to execute this test we must first define our test set of face images. Although the number of images in the test set does not affect the results produced (as the error rates are specified as percentages of image comparisons), it is imp
32、ortant to ensure that the test set is sufficiently large such that statistical anomalies become insignificant (for example, a couple of badly aligned images matching well). Also, the type of images (high variation in lighting, partial occlusions etc.) will significantly alter the results of the test
33、 Therefore, in order to compare multiple face recognition systems, they must be applied to the same test set. However, it should also be noted that if the results are to be representative of system performance in a real world situation, then the test data should be captured under precisely
34、the same circumstances as in the application environment.On the other hand, if the purpose of the experimentation is to evaluate and improve a method of face recognition, which may be applied to a range of application environments, then the test data should present the range of difficulties that are
35、 to be overcome. This may mean including a greater percentage of ‘difficult’ images than would be expected in the perceived operating conditions and hence higher error rates in the results produced. Below we provide the algorithm for executing the verification test. The algorithm is applied to a sin
36、gle test set of face images, using a single function call to the face recognition algorithm: CompareFaces(FaceA, FaceB). This call is used to compare two facial images, returning a distance score indicating how dissimilar the two face images are: the lower the score the more similar the two face ima
37、ges. Ideally, images of the same face should produce low scores, while images of different faces should produce high scores. Every image is compared with every other image, no image is compared with itself and no pair is compared more than once (we assume that the relationship is symmetrical). Once
38、 two images have been compared, producing a similarity score, the ground-truth is used to determine if the images are of the same person or different people. In practical tests this information is often encapsulated as part of the image filename (by means of a unique person identifier). Scores are t
39、hen stored in one of two lists: a list containing scores produced by comparing images of different people and a list containing scores produced by comparing images of the same person. The final acceptance/rejection decision is made by application of a threshold. Any incorrect decision is recorded as
40、 either a false acceptance or false rejection. The false rejection rate (FRR) is calculated as the percentage of scores from the same people that were classified as rejections. The false acceptance rate (FAR) is calculated as the percentage of scores from different people that were classified as acc
41、eptances. For IndexA = 0 to length(TestSet) For IndexB = IndexA+1 to length(TestSet) Score = CompareFaces(TestSet[IndexA], TestSet[IndexB]) If IndexA and IndexB are the same person Append Score to AcceptScoresList Else Append Score to RejectScoresList For Threshold = Minimum Score to Maxim
42、um Score: FalseAcceptCount, FalseRejectCount = 0 For each Score in RejectScoresList If Score <= Threshold Increase FalseAcceptCount For each Score in AcceptScoresList If Score > Threshold Increase FalseRejectCount FalseAcceptRate = FalseAcceptCount / Length(AcceptScoresList) FalseRejectRate
43、 = FalseRejectCount / length(RejectScoresList) Add plot to error curve at (FalseRejectRate, FalseAcceptRate) These two error rates express the inadequacies of the system when operating at a specific threshold value. Ideally, both these figures should be zero, but in reality reducing either the
44、FAR or FRR (by altering the threshold value) will inevitably result in increasing the other. Therefore, in order to describe the full operating range of a particular system, we vary the threshold value through the entire range of scores produced. The application of each threshold value produces a
45、n additional FAR, FRR pair, which when plotted on a graph produces the error rate curve shown below. Figure 4-5 - Example Error Rate Curve produced by the verification test. The equal error rate (EER) can be seen as the point at which FAR is equal t
46、o FRR. This EER value is often used as a single figure representing the general recognition performance of a biometric system and allows for easy visual comparison of multiple methods. However, it is important to note that the EER does not indicate the level of error that would be expected in a r
47、eal world application. It is unlikely that any real system would use a threshold value such that the percentage of false acceptances were equal to the percentage of false rejections. Secure site access systems would typically set the threshold such that false acceptances were significantly lower
48、than false rejections: unwilling to tolerate intruders at the cost of inconvenient access denials. Surveillance systems on the other hand would require low false rejection rates to successfully identify people in a less controlled environment. Therefore we should bear in mind that a system with a
49、lower EER might not necessarily be the better performer towards the extremes of its operating capability. There is a strong connection between the above graph and the receiver operating characteristic (ROC) curves, also used in such experiments. Both graphs are simply two visualisations of the sa
50、me results, in that the ROC format uses the True Acceptance Rate(TAR), where TAR = 1.0 – FRR in place of the FRR, effectively flipping the graph vertically. Another visualisation of the verification test results is to display both the FRR and FAR as functions of the threshold value. This presentatio






