资源描述
单击此处编辑母版标题样式,单击此处编辑母版文本样式,第二级,第三级,第四级,第五级,*,LL1文法和其分析程序,5.1确定的自顶向下语法分析思想,1.,语法分析概念,2.,自上而下的语法分析的一般过程,3.,自上而下的语法分析面临的问题,4.开始符号集,5.后跟符号集,6.select集,7.LL(1)文法,2,1。语法分析,在语言的编译实现中,把,句子分析,的过程称为,语法分析,即,完成这个任务的程序称为语法分析,程序或称为,识别程序,。,分析算法又称识别算法。,从左到右的分析算法,,即,总是从,左,到,右,地,识别输入符号串,,首先识别符号串中的,最左,符号,进而,依次识别右边,的一个符号,,直到分析结束,。,3,(,上下文无关文法),句型的分析,句型分析,就是,识别,一个符号串是否为某文法的,句型的,过程,或者说是某个,推导,的构造过程。,4,语法树推导的几何表示,句型,a,a,bbaa的可能,推导序列和语法树,例:GS:,S,a,AS,A,S,b,A,A,SS,S,a,A,ba,S,a,A S,S,b,A,a,a,b,a,S,a,A,S,a,A,a,a,S,b,A,a,a,S,b,ba,a,a,a,bbaa,S,a,A,S,a,S,b,A,S,a,a,b,A,S,aab,ba,S,aabba,a,S,a,A,S,a,S,b,A,S,a,S,b,A,a,a,a,b,A,a,aab,ba,a,5,分析算法分类,分析算法可分为:,自上而下分析法,:,从文法的开始符号出发,,,寻找,与,输入符号串,匹配,的,推导,,或者说,为输入串寻找一个最左推导。,自下而上分析法,:,从,输入符号串,开始,,,逐步进行,归约,,直至,归约,到,文法的,开始符号,。,6,两种方法反映了语法树的两种构造过程。,自上而下方法,是从文法符号开始,将它做为语法树的根,向下逐步建立语法树,使语法树的结果正好是输入符号串,自下而上方法,则是从输入符号串开始,以它做为语法树的结果,自底向上的构造语法树,7,2。自上而下分析方法,对任何输入串,试图用一切可能的办法,从文法开始符号着手,自上而下地为输入串构造一棵语法树,或者说,为输入串寻找一个最左推导。本质上是一个试探过程,反复使用不同地产生式谋求匹配输入串的过程。,8,自上而下的语法分析的一般过程,例:文法G:S,c,A,d,A,ab,A,a,识别输入串w=,cabd,是否为该文法的,句子,SSS,c,A,d,c,A,d,a,b,推导过程:S,c,A,d,c,A,d,c,ab,d,9,自上而下的语法分析的一般过程,(1)S,c,A,d,(2)A,ab(3),A,a,识别输入串w=,cad,是否为,该文法的,句子,1.S,cAd,2.后选择(2)扩展A,得到推导S,cAd,cabd,这时,w的第二个符号可以与叶子结点a得以匹配,但第三个符号d却不能与下一叶子结点b匹配,怎么办?,-查看A有无另一个选择,有!,回溯,,把A为根的子树剪掉,扫描过的输入串中的a吐出来,再试探用产生式(3)构造推导S,cAd,cad,识别输入串w=,caa的,过程,:,1.S,cAd,2.选择(2)扩展A,得到推导S,cAd,cabd,3.,回溯回,到推导S,cAd,4.选择(3)扩展A,得到推导S,cAd,cad,5.A没有选择了!,回溯,到推导S,cAd,6.再,回溯,S有无另一个选择?没有!,宣告分析失败。,(请思考,若有,(4),S,cB,(5)B,aa 会怎样?,),10,自上而下,分析,的进一步讨论,自上而下分析,也称面向目标的分析方法,也就是从文法的开始符号出发企图推导出与输入的符号串完全匹配的句子,若能构造出推导则表明输入串是给定文法的句子,否则表明该输入不是给定文法的句子。,自上而下分析对文法的要求,文法不能含有左递归规则。,自上而下分析,又可分为,确定的,和,不确定的,两种,不确定的,分析方法称为带回溯的分析方法,这种方法实际上是一种穷举的试探方法,确定的分析方法,需对文法有一定的限制,11,3。自上而下的语法分析面临的问题-,实现考虑,回溯,文法的左递归性,SSa,12,自上而下分析对文法的要求,例文法G,0,S:,(1)SSa,(2)Sb,分析baa是不是文法的句子,按照自上而下的分析思想,选用产生式(1)来推导,S,Sa,语法树末端结点最左符号为非终结符,所以选用(1)继续推导,S,SaSaa,此时语法树末端结点最左符号仍为非终结符,所以选用(1)继续推导,S,SaSaa Saaa,问题试图用S匹配输入串时,出现:在没有读入任何输入符 号的情况下,又得重新要求S去进行新的匹配.,无法确定什么时候使用(2)产生式最适当,只能采用带回溯的不确定方法解决。,原因文法含有左递归。,13,回溯的原因,例GS:,SxAy,Aaba,若当前输入串为xay,首先构造的推导S,xAy,匹配 ,进一步推导对A可选择Aab替换,得S,xAy,xaby,xay xaby,匹配 ,xa都已匹配,当前面临输入符为y与b不能匹配,所以将输入串指针退回到a,对A的替换重新选用下一个产生式Aa进行试探,S,xAy,xay输入串中当前符a得到匹配,指针向前移动到y,与语法树中y匹配,匹配成功。,由于相同左部的产生式的右部开始符号相同而引起回溯,。,14,在自上而下的分析方法中,如何,选择,使用,哪个,产生式进行推导,?,假定要被代换的最左非终结符号是B,且有n条规则:B,A1|A2|An,那么如何确定用哪个右部去替代B?-什么信息用于Parser做正确选择?(输入串,文法特点),15,可预测的试探,推导过程,例 文法GS:,S,p,A|,q,B,A,c,A,d,|,a,B,d,B,|,c,识别输入串,w,=,pccadd,是否是G1S的句子,可预测的试探,推导过程:,S,p,A,pc,A,d,pcc,A,dd,pccadd,试探,成功。,16,4。开始符号集-FIRST集,设G=(V,T,V,N,P,S),是上下文无关文法,FIRST(,)=a|,=*,a,a,V,T,、,V,*,若,=*,则规定FRIST(,),17,FOLLOW(A)=aS,=*,A 且a,FRIST(,),,V,*,,,V,+,若S,=*,u A ,且,=*,,则#F,OLLOW(A),5。后跟符号集-FOLLOW集,18,6。SELECT集,给定上下文无关文法的产生式,A,V,N,V,*,若,*,,则SELECT(,)=FIRST(,),若,=*,则SELECT(,)=(FIRST(,)-)FOLLOW(A,),19,7。LL(1)文法,一个文法G是LL(1)的,当且仅当对于G的每一个非终结符的任何两个不同产生式,,下面的条件成立:,SELECT(,),SELECT(,)=,其中,和,不能同时,=*,20,书中例子,21,5.2 LL(1)文法的判别,判别步骤:,1)。求出能推出,的非终结符,22,2)。计算FIRST集,1.若X,V,则FIRST(X)=X,2.若X,V,N,且有产生式X,a,则把a加入到FIRST(X)中;若X,也是一条产生式,则把,也加到FIRST(X)中.,3.若X,Y是一个产生式且Y,V,N,则把FIRST(Y)中的所有非,元,素都加到FIRST(X)中;若X,Y,1,Y,2,Y,K,是一个产生式,Y,1,Y,2,Y,(i-1),都是非终结符,而且,对于任何j,1,j,i-,1,FIRST(Y,j,)都含有,(即Y,1,.Y,(i-1),=*,),则把FIRST(Y,j,)中的所有非,元素和FIRST(Y,i,)中的所有元素都加到FIRST(X)中;特别是,若所有的FIRST(Y,j,j=1,2,K)均含有,则把,加到FRIST(X)中.,23,3)。计算FOLLOW集,1.对于文法的开始符号S,置#于FOLLOW(S)中;,2.若,B是一个产生式,则把,FIRST()-,加至,FOLLOW(B)中;,3.若,B是一个产生式,或,B是,一个产生式而,=*,(即,FIRST()),,则把FOLLOW(A)加至FOLLOW(B)中,24,G E:(1)E TE (2)E +TE (3)E,(4)T FT (5)T *FT (6)T,(7)F (E)(8)F a,各非终结符的FIRST集合如下:,FIRST(E)=(,a,FIRST(E)=+,FIRST(T)=(,a,FIRST(T)=*,FIRST(F)=(,a,各非终结符的FOLLOW集合为:,FOLLOW(E)=),FOLLOW(E)=),,FOLLOW(T)=,),,FOLLOW(T)=,),#,FOLLOW(F)=*,,),#,25,4)。计算SELECT集,计算产生式的SELECT集,26,G E:(1)E TE (2)E +TE (3)E,(4)T FT (5)T *FT (6)T,(7)F (E)(8)F a,E +TE|,FIRST(+TE)=+,FOLLOW(E)=),,T *FT|,FIRST(*FT)=*,FOLLOW(T)=+,),,F (E)|a,FIRST(E)=(,FIRST(a)=a,所以GE是LL(1)的,27,5)。判断文法是否LL(1)文法,若文法所有具有相同左部产生式的SELECT集两两不相交,则文法是LL(1)文法。,28,LL(1)文法的性质,:,LL(1)文法是无二义的,LL(1)文法不含左递归,29,5.3 某些,非LL(1)文法的改造,1。提取左公共因子,提左公因子:,将产生式,|,变换为:,B,B,|,30,一般形式:,1,|,2,|,n,提取左公共因子后:,AA,A,1,|,2,|,n,31,2。消除左递归,左递归 关于非终结符P的规则,直接左递归:,P,P,|,、V,*,且、不以,P,开头,一般 左递归:P,=*,P,例:,SAa,A,Sb,32,消除文法中左递归规则,1)消除直接左递归:,形如:P,P,|,非,,,,不以P打头,改写为:P,Q,Q,Q|,其中,Q为新增加的非终结符,33,消除文法中左递归规则举例,例:E,E+T,|T,T,T*F|F,F,(E)|a,G E:(1)E,TE (2)E,+TE,(3)E,(4)T,FT,(5)T,*FT (6)T,(7)F,(E)(8)F,a,34,消除一般左递归对,文法要求:1.无回路(A,(,=+,(,A)2.无空产生式,2)消除一般左递归的方法,:,(1),.以某种顺序将文法非终结符排列A,1,A,2,A,n,(2),for i:=1 to n do,begin,for j:=1 to i-1 do,用A,j,-,1,|,2,|,k,替代形如,A,i,-A,j,r的规则,其中A,j,-,1,|,2,|,k,是关于A,j,的全部产生式;,消除,A,i,规则的直接左递归;,end;,(3),化简由(2)得到的文法:去掉无用产生式,35,例P90,36,消除左递归和提左公因子并不一定都能将,非LL(1)文法改造为LL(1)的,S,if,C t S|,if,C t S e S,C b,提左因子,S,if,C t S A,A e S|,First集 Follow集,S,if,#,e,A e,#,e,C b t,Select(,AeS,),Select(,A,)=e,#,e,改造后文法不是LL(1)文法,37,5.5 确定的自顶向下分析方法,特征根据下一个(几个)输入符号为当前要处理,的非终结符选择产生式,要求文法是LL(1)的,第一个L 从左到右扫描输入串,第二个L 生成的是最左推导,1 向前看一个输入符号(lookahead),38,无,回溯,的自顶向下分析程序,预测分析程序的实现技术,1.递归(下降)子程序,2.表驱动分析程序,39,例:,递归下降子程序,ParseFunction(),BNF(Backus-Naur Form)描述,program function_list,function_list function function_list|,function FUNC identifier(parameter_list)statement,void ParseFunction(),MatchToken(T_FUNC);,ParseIdentifier();,MatchToken(T_LPAREN);,ParseParameterList();,MatchToken(T_RPAREN);,ParseStatement();,40,例:,递归下降子程序,ParseFunction()(续),void MatchToken(int expected),if(lookahead!=expected),printf(syntax error n);,exit(0);,else/if match,consume token and move on,lookahead=yylex();/读入一个单词,41,预测分析程序的实现,表驱动,预测分析程序模型,Input,#,总控程序,预测分析表,stack,42,预测分析表构造算法,1.对文法G的每个产生式,执行第二步,和第三步;,2.对每个终结符a,FIRST(,),把,加,至,A,a,中,,3.若,FIRST(,),,则对任何b,FOLLOW(A),把,加至,A,b,中,,4.把所有无定义的,A,a,标上“出错标志”。,可以证明,一个文法G的预测分析表不含多重入口,当且仅当该文法是LL(1)的,43,例:表驱动,予测分析程序,G E:(1)E,TE,(2)E,+TE,(3)E,(4)T,FT,(5)T,*FT,(6)T,(7)F,(E),(8)F,a,用预测分析表表示状态转换。,44,a,+,*,(,),#,E,(1),(1),E,(2),(3),(3),T,(4),(4),T,(6),(5),(6),(6),F,(8),(7),G E:(1)E,TE (2)E,+TE (3)E,(4)T,FT (5)T,*FT (6)T,(7)F,(E)(8)F,a,预测分析表,45,表驱动,预测分析程序分析算法,首先把#然后把文法开始符号推入栈;把第一个输入符,号读进b;,FLAG:=TRUE;,WHILE FLAG DO,BEGIN,把栈顶符号上托出去并放在,中;,IF X,Vt THEN IF X=b THEN,把下一,个输入符号读进b,ELSE ERROR,ELSE IF X=#THEN,IF b=#THEN FLAG:=FALSE,ELSE ERROR,ELSE IF,X,b=X,X,1,X,2,.X,K,THEN,把X,K,,X,K-1,.,X,1,一一推进栈,ELSEERROR,END OF WHILE;,STOP/*分析成功,过程完毕*,46,分析输入串#a+a#的步骤,栈内容 栈顶符号 当前输入 余留串 MX,b,1#E E a +a#E,TE,2#ET T a +a#T,FT,3#ETF F a +a#F,a,4#ETa a a +a#,5#ET T +a#,T,6#E E +a#E,+TE,7#ET+a#,8#ET T a#T,FT,9#ETF F a#F,a,10#ETa a a#,11#ET T#,T,12#E E#E,13#,47,LL(1)分析中的一种错误处理办法,发现错误,1栈顶的终结符与当前输入符不匹配,2非终结符A于栈顶,面临的输入符为a,但分析表M的MA,a为空,“应急”恢复策略,跳过输入串中的一些符号直至遇到“同步符号”为止。,同步符号的选择,1把FOLLOW(A)中的所有符号作为A的同步符号。跳过输入串中的一些符号直至遇到这些“同步符号”,把A从栈中弹出,可使分析继续,2把FIRST(A)中的符号加到A的同步符号集,当FIRST(A)中的符号在输入中出现时,可根据A恢复分析,48,review-parsing,The syntax analysis phase of a compiler verifies that the sequence of tokens returned from the scanner represent valid sentences in the grammar of the programming language.,There are two major parsing approaches:,top-down,and,bottom-up,.,In top-down parsing,you start with the start symbol and apply the productions until you arrive at the desired string.,In bottom-up parsing,you start with the string and reduce it to the start symbol by applying the productions backwards.,49,In the top-down parsing,we begin with the start symbol and at each step,expand one of the remaining nonterminals by replacing it with the right side of one its productions.,We repeat until only terminals remain.The top-down parse prints a leftmost derivation of the sentence.,A bottom-up parse works in reverse.We begin with the sentence of terminals and each step applies a production in reverse,replacing a substring that matches the right side with the nonterminal on the left.,We continue until we have substituted our way back to the start symbol.If you read from the bottom to top,the bottom-up parse prints out a rightmost derivation of the sentence.,50,lookahead symbol.,The,lookahead symbol,is the next symbol coming up in the input.,backtracking,.Based on the information the parser currently has about the input,a decision is made to go with one particular production.If this choice leads to a dead end,the parser would have to backtrack to that decision point,moving,backwards,through the input,and start again making a different choice and so on until it either found the production that was the appropriate one or ran out of choices.,51,predictive parser and LL(1)grammar,Predictive parser is a,non-backtracking top-down parser.A predictive parser is characterized by its ability to choose the production to apply solely on the basis of the next input symbol and the current nonterminal being processed.,To enable this,the grammar must take a particular form.We call such a grammar,LL(1).,The first“L”means we scan the input from left to right;the second“L”means we create a leftmost derivation;and the 1 means one input symbol of lookahead.,52,recursive-descent,The first technique for implementing a predictive parser is called,recursive-descent,.,A recursive descent parser consists of several small functions(procedures),one for each nonterminal in the grammar.As we parse a sentence,we call the functions(procedures)that correspond to the left side nonterminal of the productions we are applying.If these productions are recursive,we end up calling the functions recursively.,53,Table-driven LL(1)parsing,In a recursive-descent parser,the production information is embedded in the individual parse functions for each nonterminal and the run-time execution stack is keeping track of our progress through the parse.There is another method for implementing a predictive parser that uses a table to store that production along with an explicit stack to keep track of where we are in the parse.,54,How,a table-driven predictive parser works,We push the start symbol on the stack and read the first input token.As the parser works through the input,there are the following possibilities for the top stack symbol,X,and the input token nonterminal,a,:,1.If,X=a,and,a,=end of input(#):parser halts and parse completed successfully,2.If,X=a,and,a,!=#:successful match,pop,X,and advance to next input token.This is called a,match,action.,3.If,X!=a,and,X,is a nonterminal,pop,X,and consult table at,X,a,to see which production applies,push right side of production on stack.This is called a,predict,action.,4.If none of the preceding cases applies or the table entry from step 3 is blank,there has been a parse error,55,The,first set,of a sequence of symbols,u,written as,First(u),is the set of terminals which,start all the sequences of symbols derivable from,u,.A bit more formally,consider all,strings derivable from,u,by a leftmost derivation.If,u,=*,v,where,v,begins with some,terminal,that terminal is in,First(u).,If,u,=*,then,is in,First(u).,56,The,follow set,of a nonterminal,A,is the set of terminal symbols that can appear immediately,to the right of,A,in a valid sentential form.A bit more formally,for every valid sentential form,S,=*,uAv,where,v,begins with some terminal,that terminal is in,Follow(A).,57,Calculating,first set,To calculate,First(u),where,u,has the form,X1X2.Xn,do the following:,1.If,X1,is a terminal,then add,X1,to,First(u),otherwise add,First(X1)-,to,First(u),.,2.If,X1,is a nullable nonterminal,i.e.,X1=*,add,First(X2)-,to,First(u),.Furthermore,if,X2,can also go to,then add,First(X3)-,and so on,through all,Xn,until the first nonnullable one.,3.If,X1X2.Xn=*,add,to the first set.,58,Calculating follow sets.,For each nonterminal in the grammar,do the following,:,1.Place,#,in,Follow(S),where,S,is the start symbol and,#,is the inputs right endmarker.The endmarker might be end of file,it might be newline,it might be a special symbol,whatever is the expected end of input indication for this grammar.We will typically use,#,as the endmarker.,2.For every production,A uBv,where,u,and,v,are any string of grammar symbols and,B,is a nonterminal,everything in,First(v),except,is placed in,Follow(B).,3.For every production,A uB,or a production,A u Bv,where,First(v),contains,(i.e.,v,is nullable),then everything in,Follow(A),is added to,Follow(B).,59,Constructing the parse table,1.For each production,A u,of the grammar,do steps 2 and 3,2.For each terminal,a,in,First(u),add,A u,to M,A,a,3.If,in,First(u),(i.e.,A,is nullable)add,A u,to M,A,b,for each terminal,b,in,Follow(A),If,in,First(u),and,#,is in,Follow(A),add,A u,to M,A,#,4.All undefined entries are errors,60,LL(1)grammar,A grammar G is LL(1)iff whenever,A u|v,are two distinct productions of G,the following conditions hold:,-for no terminal,a,do both,u,and,v,derive strings beginning with,a,(i.e.first sets are disjoint),-at most one of,u,and,v,can derive the empty string,-if,v,=*,then u,does not derive any string beginning with a terminal in,Follow(A),61,Error-reporting and recovery,An error is detected in predictive parsing when the terminal on top of the stack does not match the next input symbol or,when nonterminal,A,is on top of the stack,a is the next input symbol and the parsing table entry M,A,a,is empty.,62,Panic-mode,error recovery,Panic-mode,error recovery is a simple technique that just bails out of the current construct,looking for a safe symbol at which to restart parsing.The parser just discards input tokens until it finds what is called a,synchronizing,token.The set of synchronizing tokens are those that we believe confirm the end of the invalid statement and allow us to pick up at the next piece of code.,63,
展开阅读全文