1、 基于BP神经网络的印刷体字母识别 1背景 随着社会的发展,英语作为国际通用语言得到了日益广泛的应用,因此有大量的英文文档整理、查询、统计的工作需要完成,而英文字母识别系统可以轻而易举地完成很多以前难以想象的工作。 智能控制作为一门新兴的交叉学科,在许多方面都优于传统控制,而智能控制中的人工神经网络由于模仿人类的神经网络,具有感知识别、学习、联想、记忆、推理等智能,更是有着广阔的发展前景。 人工神经网络理论的应用主要在人工智能,自动控制,模式识别,机器人,信息处理,CAD/CAM等方面。如: (1)空间科学。航空飞行器及汽车的自动驾驶导航系统,飞行路径模拟,飞行器制导和飞行程序优化
2、管理等。 (2)控制和优化。机器人运动控制,各种工业过程控制和制造过程控制,如集成电路布线设计,生产流程控制等等。 (3)模式识别和图像处理。如人脸识别,语言识别,指纹识别,签名识别,手写体和印刷体字符识别,目标检测与识别,图像复原,图像压缩等等。 (4)智能信息管理系统。如股价预测,不动产价格预测,外汇,黄金等大宗产品价格预测,公司财务分析,地震及各种自然灾害预报等等。 其中最核心的是反向传播网络(Back Propagation Network),简称BP网络。本文介绍了运用matlab工具箱确定隐层神经元的个数和构造BP神经网络,并用两组样本对该神经网络进行训练,然后运用训练
3、后的网络对字母进行识别。 2 BP网络介绍 BP神经网络又称误差反向传递神经网络。它是一种依靠反馈值来不断调整节点之间的连接权值而构建的一种网络模型。它的整个体系结构分为输入层、隐藏层和输出层,其中隐藏层根据具体情况的需要,可以是一层结构也可为多层结构。 BP算法的基本思想是:学习过程由信号的正向传播与误差的反向传播两个过程组成。正向传播时,输入样本从输入层传入,经各隐藏层逐层处理后,传向输出层。若输出层的实际输出与期望的输出(教师信号)不符,则转入误差的反向传播阶段。误差反传是将输出误差以某种形式通过隐藏层向输入层反传,并将误差分摊给各层的所有单元,从而获得各层单元的误差信号,
4、此误差信号即作为修正各单元权值的依据。这种信号正向传播与误差反向传的各层权值调整过程,是周而复始地进行的。权值不断调整的过程,也就是网络的学习训练过程。此过程一直进行到网络输出的误差减少到可接受到的程度,或进行到预先设定的学习次数为此。 3系统实现思想 字母识别系统一般分为预处理、特征提取和分类器三部分。其中,预处理包括将图片从模拟图像到进行二值化、归一化等过程;特征提取和分类器的设计是整个系统的核心部分。通过对各个部分分别进行编程处理来实现,将每部分编为可调用的函数,最后统一对函数进行调用,清晰方便。 3.1字母识别整体框图 a BP神经网络训练过程 b BP神经网络识别过
5、程 图2.1 BP神经网络识别系统 3.2 预处理及其特征提取方法 本文使用傅里叶描述符及其反变换进行图片的二值化、字母轮廓提取,之后进行归一化,将其特征变成1*120的矩阵,之后选取里面的六十个点变为1*60的矩阵。特征提取程序: function FD=Feature_Building(RGB) %RGB=imread('d:A.bmp'); %figure(1),inshow(RGB) [B]=outline(RGB); %figure(2) %subplot(221),draw_outline(B); %title('outline of object');
6、[m,n]=size(B); FD=fsd(B,30,m,4); 其中outline、fsd为傅里叶描述及其反变换程序。 outline程序: %%Function for extracting outline of object; Q.K., 2008.4.29 %%Deaprtment of Automation, Tsinghua Univ. Beijing 100084, China. function [outline]=outline(RGB) I=rgb2gray(RGB); [junk threshold] = edge(I, 'sobel'); fudgeF
7、actor = .5; BWs = edge(I,'sobel', threshold * fudgeFactor); %Step 3: Dilate the image se90 = strel('line', 3, 90); se0 = strel('line', 3, 0); BWsdil = imdilate(BWs, [se90 se0]); %Step 4: Fill interior gaps BWdfill = imfill(BWsdil, 'holes'); %Step 5: Remove connected objects on border BWnobo
8、rd = imclearborder(BWdfill, 4); %Step 6: Smoothen the object seD = strel('diamond',1); BWfinal = imerode(BWnobord,seD); BWfinal = imerode(BWfinal,seD); bw = bwareaopen(BWfinal,30); % % fill a gap in the pen's cap [B,L] = bwboundaries(bw,'noholes'); outline = B{1}; fsd程序见程序清单。 3.3 BP神经网络结构
9、3.3.1 输入层神经元个数的确定 将图像的特征向量作为神经网络的输入,所以神经网络的输入层神经元个数等于特征向量的维数,即1×60=60个输入神经元。 3.3.2 隐含层神经元个数的确定 隐层节点数对网络的学习和计算特性具有非常重要的影响,是该网络结构成败的关键。若隐层节点数过少,则网络难以处理复杂的问题;但若隐层节点数过多,则将使网络学习时间急剧增加,而且还可能导致网络学习过度,抗干扰能力下降。本文根据实际的实验,确定隐含层神经元的个数为15个。 3.3.3 输出层神经元个数的确定 因为要识别26个英文大写字母,因此输出选择26×1的矩阵,即输出层神经元的个数为26个。当26个字
10、母输入神经网络后,在对应的位置上输出1,其他位置上输出零。当网络进入识别过程时,哪个位置上输出的期望值最大,认为识别出的是这个位置上的字母。 3.3.4 BP神经网络的构造 建立一个前向BP神经网络函数newff: net=newff(minmax(P),[S1,S2], {‘logsig’,‘logsig’} , ‘trainlm’); net.LW{2,1}=net.LW{2,1}*0.01; net.b{2}=net.b{2}*0.01; 其中minma(P)为神经网络的对它的60个输入元素的最大值和最小值的限制。P为训练样本集合。S1、S2分别为该神经网络的隐含层和输出层的
11、神经元个数。{‘logsi g’,‘logsig’}为神经网络的各层的转移函数 ,均设置为对数S型激活函数。训练函数采用‘trainlm ’。 3.4 BP神经网络的训练 3.4.1 训练样本集合和目标值集合 字母图片归一化后的图像为60×1的矩阵,用60×26的矩阵形成一个训练样本;目标矢量是希望每一个数字输入神经网络后在输出神经元对应的位置上为1,其他的位置为0。为此取目标矢量为对角线上为1的26×26的单位阵 ,用matlab命令实现为:T=eye(26); 3.4.2 网络训练 隐含层神经元的传递函数采用s型对数函数logsig,输出层神经元传递函数也采用s型对数函数,训练函
12、数采用trainlm,性能函数采用sse,训练步数设置为最大5000,性能目标值为0.05,。BP训练程序: net.performFcn='sse';%设置目标性能函数 net.trainParam.goal=0.05;%性能目标值 net.trainParam.show=20;%显示间隔次数 net.trainParam.epochs=5000;%最大训练次数 net.trainParam.mc=0.95; [net,tr]=train(net,P,T); BP网络训练流程图: 使用第一组样本进行训练的结果: TRAINLM, Epoch 0/5000, SSE 16
13、9.303/0.05, Gradient 39.0748/1e-010 TRAINLM, Epoch 20/5000, SSE 9.07917/0.05, Gradient 0.647529/1e-010 TRAINLM, Epoch 40/5000, SSE 5.45171/0.05, Gradient 0.465044/1e-010 TRAINLM, Epoch 60/5000, SSE 3.85999/0.05, Gradient 1.13736/1e-010 TRAINLM, Epoch 80/5000, SSE 3.37108/0.05, Gradient 0.970379/
14、1e-010 TRAINLM, Epoch 100/5000, SSE 1.43394/0.05, Gradient 0.27961/1e-010 TRAINLM, Epoch 120/5000, SSE 1.13878/0.05, Gradient 0.661835/1e-010 TRAINLM, Epoch 140/5000, SSE 0.561939/0.05, Gradient 0.497918/1e-010 TRAINLM, Epoch 160/5000, SSE 0.537153/0.05, Gradient 0.0963243/1e-010 TRAINLM, Epoch
15、 180/5000, SSE 0.518194/0.05, Gradient0.00990168/1e-010 TRAINLM, Epoch 200/5000, SSE 0.461637/0.05, Gradient 11.4576/1e-010 TRAINLM, Epoch 206/5000, SSE 0.0350697/0.05, Gradient 0.265104/1e-010 TRAINLM, Performance goal met. 可见经过206次训练后,网络误差达到要求,误差曲线如下图: 使用第二组样本进行训练的结果: TRAINLM, Epoch 0/5000
16、 SSE 168.635/0.05, Gradient 33.7987/1e-010 TRAINLM, Epoch 20/5000, SSE 3.28669/0.05, Gradient 40.5407/1e-010 TRAINLM, Epoch 32/5000, SSE 0.0441687/0.05, Gradient 0.0844925/1e-010 TRAINLM, Performance goal met. 可见经过26次训练之后,网络误差达到要求。误差曲线如下图所示: 3.5 字母的识别 以上所介绍为网络的学习期,学习过程结束后,网络进入工作期,即可以进行字母的识别
17、单一字母识别程序如下: RGB=imread('D:\Program Files\MATLAB71\work\新建文件夹 1\A11.bmp');%工作期 A11为大写字母A略带噪声的图片 FDB=Feature_Building(RGB);%提取字母轮廓特征 FDB=reshape(FDB,1,120); FDB=FDB(1:2:120);%归一化处理 [a,b]=max(sim(net,(FDB)'))% 字母识别 a为网络工作后输出层输出的最大值,b为所识别字母的行数 即如果识别为A,则b为1;识别为B,则b为2,以此类推。 识别结果如下: a=0.8316 b=1
18、可见能够正确识别字母A。 RGB=imread('D:\Program Files\MATLAB71\work\新建文件夹 1\B11.bmp');%工作期 B11为大写字母B略带噪声的图片 FDB=Feature_Building(RGB);%提取字母轮廓特征 FDB=reshape(FDB,1,120); FDB=FDB(1:2:120);%归一化处理 [a,b]=max(sim(net,(FDB)'))% 字母识别 a为网络工作后输出层输出的最大值,b为所识别字母的行数 即如果识别为A,则b为1;识别为B,则b为2,以此类推。 识别结果如下: a=0.9741 b=2
19、可见也能够正确的识别字母B。 本文使用两组样本进行BP神经网络的训练,一组样本进行字母的识别。识别程序及其结果如下: load('D:\Program Files\MATLAB71\work\新建文件夹 1\index1.mat') for i=1:26 RGB{i}=imread(['D:\Program Files\MATLAB71\work\新建文件夹 1\',index1{i}]); FD{i}=Feature_Building(RGB{i}); FD{i}=reshape(FD{i},1,120); FD{i}=FD{i}(1:
20、2:120); [a,b]=max(sim(net,(FD{i})')) end 结果: a =0.8316 b =1 a =0.9741 b =2 a =0.8805 b =3 a =0.9315 b =4 a =0.6114 b =5 a =0.9755 b =6 a =0.9715 b =7 a =0.9780 b =8 a =0.9770 b =9 a =0.9958 b =10 a =0.8759 b =11 a =0.9610 b =12 a =0.9695 b =13 a =0.8119 b =8 a =
21、0.9718 b =15 a =0.9752 b =5 a =0.9039 b =17 a =0.5457 b =18 a =0.8177 b =6 a =0.9728 b =20 a =0.2953 b =18 a =0.8482 b =22 a =0.9092 b =23 a =0.9743 b =24 a =0.9534 b =25 a =0.9764 b =26 由以上结果可知:识别了22个字母,有四个字母未被正确识别(N P S U)。为了使识别准确率更高,训练更多的样本,尽量选择一些略带有噪声的图片,识别时准确率更高。
22、 程序清单 BP网络训练程序: clc clear load('D:\Program Files\MATLAB71\work\新建文件夹 1\index.mat') for i=1:52 RGB{i}=imread(['D:\Program Files\MATLAB71\work\新建文件夹 1\',index{i}]); FD{i}=Feature_Building(RGB{i}); FD{i}=re
23、shape(FD{i},1,120); FD{i}=FD{i}(1:2:120); end P{1}=[(FD{1})' (FD{2})' (FD{3})' (FD{4})' (FD{5})' (FD{6})' (FD{7})' (FD{8})' (FD{9})' (FD{10})' (FD{11})' (FD{12})' (FD{13})' (FD{14})' (FD{15})' (FD{16})' (FD{17})' (FD{18})' (FD{19})' (FD{20})' (FD{21})' (FD{22})' (FD{23})' (FD{24})' (FD{25})
24、' (FD{26})']; P{2}=[(FD{27})' (FD{28})' (FD{29})' (FD{30})' (FD{31})' (FD{32})' (FD{33})' (FD{34})' (FD{35})' (FD{36})' (FD{37})' (FD{38})' (FD{39})' (FD{40})' (FD{41})' (FD{42})' (FD{43})' (FD{44})' (FD{45})' (FD{46})' (FD{47})' (FD{48})' (FD{49})' (FD{50})' (FD{51})' (FD{52})']; %P=[P1;P2]; T=[
25、eye(26)]; S1=15;S2=26; for n=1:2 %学习期 net=newff(minmax(P{n}),[S1 S2],{'logsig' 'logsig'},'trainlm'); net.LW{2,1}=net.LW{2,1}*0.01; net.b{2}=net.b{2}*0.01; net.performFcn='sse'; net.trainParam.goal=0.05; net.trainParam.show=20; net.trainParam.epochs=5000; net.trainParam.mc=0.95; [net,tr]=t
26、rain(net,P{n},T); end 识别程序: load('D:\Program Files\MATLAB71\work\新建文件夹 1\index1.mat') for i=1:26 RGB{i}=imread(['D:\Program Files\MATLAB71\work\新建文件夹 1\',index1{i}]); FD{i}=Feature_Building(RGB{i}); FD{i}=reshape(FD{i},1,120); FD{i}=FD{i}(1:2:120); [a,b]=max(sim(ne
27、t,(FD{i})')) end 傅里叶变换程序: function FD=Feature_Building(RGB) %RGB=imread('d:A.bmp'); %figure(1),inshow(RGB) [B]=outline(RGB); %figure(2) %subplot(221),draw_outline(B); %title('outline of object'); [m,n]=size(B); FD=fsd(B,30,m,4); %%Function for extracting outline of object; Q.K., 20
28、08.4.29 %%Deaprtment of Automation, Tsinghua Univ. Beijing 100084, China. function [outline]=outline(RGB) I=rgb2gray(RGB); [junk threshold] = edge(I, 'sobel'); fudgeFactor = .5; BWs = edge(I,'sobel', threshold * fudgeFactor); %Step 3: Dilate the image se90 = strel('line', 3, 90); s
29、e0 = strel('line', 3, 0); BWsdil = imdilate(BWs, [se90 se0]); %Step 4: Fill interior gaps BWdfill = imfill(BWsdil, 'holes'); %Step 5: Remove connected objects on border BWnobord = imclearborder(BWdfill, 4); %Step 6: Smoothen the object seD = strel('diamond',1); BWfinal = imerode(BWn
30、obord,seD); BWfinal = imerode(BWfinal,seD); bw = bwareaopen(BWfinal,30); % % fill a gap in the pen's cap [B,L] = bwboundaries(bw,'noholes'); outline = B{1}; function rFSDs = fsd(outline,H,b,bN) % Forward elliptical Fourier transform - see Kuhl FP and Giardina CR % "Elliptic Fourier features o
31、f a closed contour" Computer Graphics and % Image Processing 18:236-258 1982 for theory. % Returns a shape spectrum of input x,y data "outline" with % iNoOfHarmonicsAnalyse elements. % The output FSDs will be normalised for location, size and orientation % if bNormaliseSizeState and bNormaliseO
32、rientationState are TRUE % Pre-calculate some constant arrays % n * 2 * pi % n^2 * 2* pi^2 % where n is the number of harmonics to be used in the analysis %H = iNoOfHarmonicsAnalyse %b = bNormaliseSizeState %[m n] = size(outline), b = m; %bN = bNormaliseOrientationState rTwoNPi
33、 (1:1:H)* 2 * pi; rTwoNSqPiSq = (1:1:H) .* (1:1:H)* 2 * pi * pi; iNoOfPoints = size(outline,1) - 1; % hence there is 1 more data point in outline than iNoOfPoints rDeltaX = zeros(iNoOfPoints+1,1); % pre-allocate some arrays rDeltaY = zeros(iNoOfPoints+1,1); rDeltaT = zeros(iNoOfPoints+1,
34、1); for iCount = 2 : iNoOfPoints + 1 rDeltaX(iCount-1) = outline(iCount,1) - outline(iCount-1,1); rDeltaY(iCount-1) = outline(iCount,2) - outline(iCount-1,2); end % Calculate 'time' differences from point to point - actually distances, but we are % carrying on the fiction of a po
35、int running around the closed figure at constant speed. % We are analysing the projections on to the x and y axes of this point's path around the figure for iCount = 1 : iNoOfPoints rDeltaT(iCount) = sqrt((rDeltaX(iCount)^2) + (rDeltaY(iCount)^2)); end check = (rDeltaT ~= 0); % remove
36、zeros from rDeltaT, rDeltaX... rDeltaT = rDeltaT(check); rDeltaX = rDeltaX(check); rDeltaY = rDeltaY(check); iNoOfPoints = size(rDeltaT,1) - 1; % we have removed duplicate points % now sum the incremental times to get the time at any point rTime(1) = 0; for iCount = 2 : iNoOfPoints + 1
37、 rTime(iCount) = rTime(iCount - 1) + rDeltaT(iCount-1); end rPeriod = rTime(iNoOfPoints+1); % rPeriod defined for readability % calculate the A-sub-0 coefficient rSum1 = 0; for iP = 2 : iNoOfPoints + 1 rSum2 = 0; rSum3 = 0; rInnerDiff = 0; % calculate the partial
38、sums - these are 0 for iCount = 1 if iP > 1 for iJ = 2 : iP-1 rSum2 = rSum2 + rDeltaX(iJ-1); rSum3 = rSum3 + rDeltaT(iJ-1); end rInnerDiff = rSum2 - ((rDeltaX(iP-1) / rDeltaT(iP-1)) * rSum3); end rIncr1 = ((rDeltaX(iP-1) / (2*r
39、DeltaT(iP-1)))*(rTime(iP)^2-rTime(iP-1)^2) + rInnerDiff*(rTime(iP)-rTime(iP-1))); rSum1 = rSum1 + rIncr1; end rFSDs(1,1) = ((1 / rPeriod) * rSum1) + outline(1,1); % store A-sub-0 in output FSDs array - this array will be 4 x iNoOfHarmonicsAnalyse % calculate the a-sub-n coefficients f
40、or iHNo = 2 : H rSum1 = 0; for iP = 1 : iNoOfPoints rIncr1 = (rDeltaX(iP) / rDeltaT(iP))*((cos(rTwoNPi(iHNo-1)*rTime(iP+1)/rPeriod) - cos(rTwoNPi(iHNo-1)*rTime(iP)/rPeriod))); rSum1 = rSum1 + rIncr1; end rFSDs(1,iHNo) = (rPeriod / rTwoNSqPiSq(iHNo-1)) * rSum1;
41、end % "foriHNo = 1 :..." rFSDs(2,1) = 0; % there is no 0th order sine coefficient % calculate the b-sub-n coefficients for iHNo = 2 : H rSum1 = 0; for iP = 1 : iNoOfPoints rIncr1 = (rDeltaX(iP) / rDeltaT(iP))*((sin(rTwoNPi(iHNo-1)*rTime(iP+1)/rPeriod) - sin(rTwoNPi(iH
42、No-1)*rTime(iP)/rPeriod))); rSum1 = rSum1 + rIncr1; end rFSDs(2,iHNo) = (rPeriod / rTwoNSqPiSq(iHNo-1)) * rSum1; end % "foriHNo = 1 :..." % calculate the C-sub-0 coefficient rSum1 = 0; for iP = 2 : iNoOfPoints + 1 rSum2 = 0; rSum3 = 0; rInnerDiff = 0;
43、 % calculate the partial sums - these are 0 for iCount = 1 if iP > 1 for iJ = 2 : iP-1 rSum2 = rSum2 + rDeltaY(iJ-1); rSum3 = rSum3 + rDeltaT(iJ-1); end rInnerDiff = rSum2 - ((rDeltaY(iP-1) / rDeltaT(iP-1)) * rSum3); end rIncr1 = (
44、rDeltaY(iP-1) / (2*rDeltaT(iP-1)))*(rTime(iP)^2-rTime(iP-1)^2) + rInnerDiff*(rTime(iP)-rTime(iP-1))); rSum1 = rSum1 + rIncr1; end rFSDs(3,1) = ((1 / rPeriod) * rSum1) + outline(1,2); % store C-sub-0 in output FSDs array - this array will be 4 x iNoOfHarmonicsAnalyse % calculate the
45、C-sub-n coefficients for iHNo = 2 : H rSum1 = 0; for iP = 1 : iNoOfPoints rIncr1 = (rDeltaY(iP) / rDeltaT(iP))*((cos(rTwoNPi(iHNo-1)*rTime(iP+1)/rPeriod) - cos(rTwoNPi(iHNo-1)*rTime(iP)/rPeriod))); rSum1 = rSum1 + rIncr1; end rFSDs(3,iHNo) = (rPeriod / rTwoNSqP
46、iSq(iHNo-1)) * rSum1; end % "foriHNo = 1 :..." rFSDs(4,1) = 0; % there is no 0th order sine coefficient % calculate the D-sub-n coefficients for iHNo = 2 : H rSum1 = 0; for iP = 1 : iNoOfPoints rIncr1 = (rDeltaY(iP) / rDeltaT(iP))*((sin(rTwoNPi(iHNo-1)*rTime(iP+1)/rP
47、eriod) - sin(rTwoNPi(iHNo-1)*rTime(iP)/rPeriod))); rSum1 = rSum1 + rIncr1; end rFSDs(4,iHNo) = (rPeriod / rTwoNSqPiSq(iHNo-1)) * rSum1; end % "foriHNo = 1 :... % the non-normalised coefficients are now in rFSDs % if we want the normalised ones, this is where it happens if
48、 (b == 1) || (bN == 1) % rTheta1 is the angle through which the starting position of the first % harmonic phasor must be rotated to be aligned with the major axis of % the first harmonic ellipse rFSDsTemp = rFSDs; rTheta1 = 0.5 * atan(2 * (rFSDsTemp(1,2) * rFSDsTemp(2,2) +
49、rFSDsTemp(3,2) * rFSDsTemp(4,2)) / ... (rFSDsTemp(1,2)^2 + rFSDsTemp(3,2)^2 - rFSDsTemp(2,2)^2 - rFSDsTemp(4,2)^2)); % calculate the partially normalised coefficients - normalised for % starting point for iHNo = 1 : H rStarFSDs(1,iHNo) = cos((iHNo-1) * rTheta1) *
50、 rFSDsTemp(1,iHNo) + sin((iHNo-1) * rTheta1) * rFSDsTemp(2,iHNo); rStarFSDs(2,iHNo) = -sin((iHNo-1) * rTheta1) * rFSDsTemp(1,iHNo) + cos((iHNo-1) * rTheta1) * rFSDsTemp(2,iHNo); rStarFSDs(3,iHNo) = cos((iHNo-1) * rTheta1) * rFSDsTemp(3,iHNo) + sin((iHNo-1) * rTheta1) * rFSDsTemp(4,






