资源描述
单击此处编辑母版标题样式,单击此处编辑母版文本样式,第二级,第三级,第四级,第五级,*,1,XML,Steven Holzner,,,Sams Teach Yourself XML in 21 Days,Third Edition.2003.,2,1.XML,基础,3,Markup Languages,Hello From HTML,An HTML Document,This is an HTML document!,标记是文档中数据的描述和解释,4,XML,e,X,tensible,M,arkup,L,anguage,元语言,创建标记语言的语言,5,Elements are nested,Root element contains all others,Element(or tag)names,Example,elements,Root,element,Empty element,attributes,declaration,6,More Terminology,John is a nice fellow,21,Main St.,Opening tag,Closing tag:,What is open must be closed,Nested element,child of,Person,Parent of,Address,Ancestor of,number,“standalone”text,not useful as data,Child of,Address,Descendant of,Person,Content of,Person,7,IE,中浏览,XML,文档,8,An XML Document Using a Style Sheet,John is a nice fellow,21,Main St.,.,.,9,ch01_04.css,Person display:block;font-size:18pt;color:#0000ff;text-align:left,10,使用,JavaScript,抽取数据,Hello From XML,This is an XML document!,11,使用,JavaScript,抽取数据,Retrieving data from an XML document,function getData(),xmldoc=document.all(firstXML).XMLDocument;,nodeDoc=xmldoc.documentElement;,nodeHeading=nodeDoc.firstChild;,outputMessage=Heading:+nodeHeading.firstChild.nodeValue;,message.innerHTML=outputMessage;,12,使用,JavaScript,抽取数据,Retrieving data from an XML document,13,使用,JavaScript,抽取数据,14,使用,Java,从,XML,文档中抽取数据,import javax.xml.parsers.*;,import org.w3c.dom.*;,import java.io.*;,public class ch01_06,static public void main(String argv),try,DocumentBuilderFactory dbf=DocumentBuilderFactory.newInstance();,DocumentBuilder db=null;,try,db=dbf.newDocumentBuilder();,catch(ParserConfigurationException pce),Document doc=null;,doc=db.parse(ch01_02.xml);,15,使用,Java,从,XML,文档中抽取数据,for(Node node=doc.getDocumentElement().getFirstChild();node!=null;node=node.getNextSibling(),if(node instanceof Element),if(node.getNodeName().equals(heading),StringBuffer buffer=new StringBuffer();,for(Node subnode=node.getFirstChild();,subnode!=null;subnode=subnode.getNextSibling(),if(subnode instanceof Text),buffer.append(subnode.getNodeValue();,System.out.println(buffer.toString();,catch(Exception e),e.printStackTrace();,16,使用,Java,从,XML,文档中抽取数据,java ch01_06 Hello From XML,17,Well-formed XML Documents,Must have a,root element,Every,opening tag,must have matching,closing tag,Elements must be,properly nested,is a no-no,An,attribute,name can occur,at most once,in an opening tag.It it occurs,It,must have a value,(boolean attrs,like in HTML,are not allowed),The value,must be quoted,(with“or),XML processors are not supposed to try and fix ill-formed documents(unlike HTML browsers),18,Valid XML Documents,Hello From XML,This is an XML document!,Valid XML Document,A valid XML document is defined by the W3C as a well-formed XML document which also conforms to the rules of a Document Type Definition(DTD)or an XML Schema/ski:m/(XSD),19,20,XML,应用,XML,用于存储、传输、结构化数据,纯文本格式使它容易被在互联网上传输并被不同平台上的应用所处理,过去,5,年中,已经出现了上百种,XML,子语言,21,使用,MathML,显示,4x,2,5x+6=0,4,x,2,-,5,x,+,6,=,0,22,在,Amaya,浏览器中显示,MathML,文档,23,XHTML,Extensible Hypertext Markup Language,更加严格,允许增加自己的标记,HTML 4.01,(当前版本),24,An XHTML Document,An XHTML Page,Welcome to XHTML!,This is an XHTML document.,Pretty cool,eh?,25,在,IE,中显示,26,An SVG Document,SVG Example,27,练习,valid XML,文档一定是,well-formed,吗?,well-formed XML,文档一定是,valid,吗?,28,编辑,XML,文档,XML,编辑器,XML Spy,Visual Studio XML Designer,XRay,XML,浏览器,IE,,,Most powerful general,Jumbo,CML browser,29,Jumbo,30,XML Validators,Make sure it is well formed and valid,Scholarly Technology Groups validator,Microsofts Visual Studio.NET,31,error.xml,Hello From XML,This is an XML document!,32,33,XML validation in Visual Studio.NET,34,建立一个完整的,XML,文档,Hello From XML,This is an XML document!,35,XML,文档,Prologs,XML declarations,Processing instructions,Elements and attributes,Comments,CDATA sections,Entities,36,字符编码问题,ASCII,仅有,256,个字符,Chinese,Armenian,Hebrew,Thai,Tibetan,从数量上、通用上,,ASCII,不能在,Web,上使用,Unicode(www.unicode.org),65,536,个字符,前,256,个对应,ASCII,但让所有的软件转去支持,Unicode,太困难,UCS Transformation Format-8(UTF-8),37,UTF-8,所有,ASCII,字符的编码保持不变,(8bit),其它,Unicode,字符使用,2,字节,直至,6,字节编码,W3C requires all XML processors to support both UTF-8,UTF-16,大部分支持,UTF-8,38,字符实体引用,Hello From XML,This text is inside a,message,element.,39,字符实体引用,40,字符实体引用,Replaced with,Replaced with&,Replaced with,Replaced with,实体,实体是一大段文本的别名,假如你为你的信件署名定义了一个实体,lettersign,,它代表下面这一大段文本:,张三,某网络公司销售部门,北京市海淀区中关村,88,号,,100000,41,实体,邮件,收件人,李四,/,收件人,主题,hello/,主题,正文,晚上吃饭,!&lettersign;/,正文,/,邮件,42,一般实体和参数实体,一般实体声明:,!ENTITY lettersign,张三某网络公司销售部门北京市海淀区中关村,88,号,,100000,参数实体声明:,!ENTITY%,实体名,文本内容,43,实体引用,&,;,注意:,在引用,XML,实体之前,必须已经在,XML,文件中对此实体进行过声明;,在实体引用中不能出现空格。也就是说,,和,的用法都会引起错误。,尽管在一个实体中可以再引用其它实体,但是不能出现循环引用。也就是说,一个实体不能引用它自己;同样,也不能出现实体,A,引用实体,B,,然后实体,B,再反过来引用实体,A,的情况。,实体引用不能在,DOCTYPE,声明中出现。,实体引用的文本必须是形式良好的,XML,。,44,45,!DOCTYPE,联系人列表,张三,A,公司,&A,公司地址,姓名,gt;,李四,B,公司,&B,公司地址,/,地址,王五,B,公司,&B,公司地址,/,地址,一旦哪个公司搬家了,只须改变实体声明中有关该公司的地址,所有这个公司的联系人的地址也就都改过来了,46,空白,Spaces,carriage returns,line feeds,and tabs are all treated as whitespace,Hello From XML,This is an XML document!,headingHello From XML,This is an XML document!,47,Prologs(,序言,),XML declarations,XML comments,processing instructions,whitespace,doctype declarations,48,Kelly,Grace,October 15,2005,Printer,XML Declaration,XML Comments,Processing Instruction,由处理器定义,Root Element,Element,属性,49,CDATA,CDATA stands for character data,PCDATA stands for parsed character data.,50,Heres how the element starts:,51,Internet Explorer treats this CDATA section as unparsed text,52,练习,一个文本编辑器保存,XML,文档时,并不提示字符集。该编辑器是否可用?,把文本数据,“This is a element”,包含在元素,中,怎样做才不会迷惑,XML processor?,XML prolog,可以包含哪些项目,?,53,XML Namespaces,解决文档内和文档间名字冲突的机制,Namespace declaration,Namespace,符号串,通常为,URL,Prefix,名字空间的缩写,相当于别名,Actual name(element or attribute),prefix,:,name,Declarations/prefixes,作用范围,(,scope,),与,begin/end,类似,Example,:,backpack,cyberpet,Default,namespace,toy,namespace,reserved keyword,54,Namespaces(,续,),Scopes of declarations are color-coded:,New default;overshadows old default,Redeclaration of,cde,;overshadows old declaration,55,Namespaces(,续,),xmlns=“www.w3.org/2001/XMLSchema,xsl,for www.w3.org/1999/XSL/Transform,Etc.,56,Valid&DTD,57,Kelly,Grace,October 15,2005,Printer,111,$111.00,Laptop,222,$989.00,58,Grant,Cary,October 20,2005,Desktop,333,$2995.00,Scanner,444,$200.00,Gable,Clark,October 25,2005,Keyboard,555,$129.00,Mouse,666,$25.00,在,NetBeans,中新建,XML,文档,59,在,NetBeans,中新建,XML,文档,60,Check XML,61,XML checking started.,Checking file:/F:/work/DS2011/SimpleHttpServer/src/test.xml.,XML checking finished.,Validate XML,62,XML validation started.,Checking file:/F:/work/DS2011/SimpleHttpServer/src/test.xml.,XML validation finished.,63,Validating,www.stg.brown.edu/service/xmlvalid,64,Validating,制造一个错误,比如,改为,重新使用,Validate XML,命令,65,Document Type Definition(DTD),DTD,定义了,XML,文档的语法,DTD,是可选的,如果文档符合它的,DTD,,该文档就称为,valid,66,DTD(,续,),DTD,作为文档的一部分:,67,DTD(,续,),DTD,作为单独的文件:,单独的,DTD,文件,68,69,DTD(,续,),DTD,作为单独的文件:,绝对路径引用方式,相对路径引用方式:,DTD,与,XML,文档同一路径,相对路径引用方式:,DTD,与,XML,文档相对路径,70,DTD(,续,),system,关键字主要用于引用一个作者或组织所编写的众多,XML,文件中通用的,DTD,。还存在一种外部,DTD,,它是一个由权威机构制订的,提供给特定行业或公众使用的,DTD,。,因此,另一个引用外部,DTD,的办法是使用关键字,public,,引用这一类公开给公众使用的,DTD,例,根元素,DTD,名称,(,所有者,/,类型,/,语言,),DTD,的,URL,71,DTD,成分,注意:,ELEMENT,读,lmnt,ATTRIBUTE,读,trbjut,可选,72,Valid&DTD,根元素,0,个或多个子元素,被分析的字符数据,序列,选择,空元素,73,子元素,x+x,出现,1,次或多次,.,x*x,出现,0,次或多次,.,x?x,可选,不出现或出现一次,.,x,y x,的后继是,y.,顺序,Sequence,x|y x,或,y,但不同时,.,Choice,顺序,74,张三,zhang,zhang,张三,重复,75,张三,zhang,zhang,张三,成组,76,张三,zhang,李四,li,或:必取其一,77,张三,zhang,或,78,张三,张三,zhang,86268438,混合元素,既包括子元素又包括文字,79,!DOCTYPE CONTACTS,张三,(010)62345678,zhang,这是关于张三的信息,80,空元素,Kelly,Grace,DTD,注意,81,在定义元素时,,ETD,的顺序是无关紧要的。因此,和,所定义的文件结构是完全相同的。,DTD,注意,82,元素名的第一个字母,必须是字母、或下划线,_,、或冒号:,后跟字母、数字、句号,.,、冒号、下划线、连结号,-,的组合,不能包含空白符,不能以“,xml”,开头。,83,提问,属性,84,缺省值,85,属性取值约束,REQUIRED,IMPLIED,FIXED,默认,86,REQUIRED,关键字,REQUIRED,说明,XML,文件中必须为这个属性给出一个属性值,87,IMPLIED,当使用,IMPLIED,关键字时,,XML,分析器不再强行要求你在,XML,文件中给该属性赋值,而且也无须在,DTD,中为该属性提供缺省值,88,FIXED,需要为一个特定的属性提供一个缺省值,并且不希望,XML,文件的编写者把缺省值替代掉。,89,默认,如果不使用上面任何一种关键字的话,该种属性就是属于这种类型。对于这种属性,你需要在,DTD,中为它提供一个缺省值。而在,XML,文件中可以为该属性给出新的属性值来覆盖事先定义的缺省值,也可以不另外给出属性值,90,属性类型,CDATA,Enumerated,ID,IDREF,IDREFS,ENTITY,ENTITIES,91,NMTOKEN,NMTOKENS,NOTATION,CDATA,CDATA,指的是纯文本,即由字符、符号“,&”,、小于号“,”,和引号“,”,组成的字符串。,当然,使用实体,代替“,&,”,,代替“,”,,",代替“,”,”。,92,枚举,93,!DOCTYPE,购物篮,ID,ID,是用属性值的方式为文件中的某个元素定义唯一标识的方法,它的作用类似于,HTML,文件中的内部链接,在一个文档中,任何两个元素的,ID,属性值不同,一个元素只能有一个,ID,属性,94,95,!DOCTYPE,联系人列表,张三,zhang,IDREF,IDREF,类型允许一个元素的属性使用文件中的另一个元素,方法就是把那个元素的,ID,标识值作为该属性的取值,存放多个其它元素的,ID,值,以空白分开,96,97,!DOCTYPE,联系人列表,张三,zhang,李四,li,李四,libbb.org,实体,实体在,XML,中充当着宏或别名的角色,它的定义方式是:,或利用,SYSTEM,定义外部实体,方式为:,引用方式为:,&,实体名;,98,99,!DOCTYPE,联系人列表,张三,zhang,100,DTD,局限性,不支持,namespaces,仅支持字符串数据类型,一致性约束非常弱,(ID/IDREF/IDREFS only),不能够很方便地表达无序,(unorder),的内容,所有元素的名字是全局的,101,练习,1,指出错误,October 15,2005,Grace Kelly,102,练习,2,指出错误,October 15,2005,Grace Kelly,103,练习,3,指出错误,October 15,2005,Grace Kelly,104,October 15,2005,Grace Kelly,555.8888,October 16,2005,Myrna Loy,Muriel Blandings,555.9999,4.,指出错误,105,练习,!DOCTYPE document,Kelly,Grace,October 15,2005,106,属性默认值,-,立即值,!DOCTYPE document,.,107,属性默认值,-#REQUIRED,Kelly,Grace,108,属性默认值,-#IMPLIED,Kelly,Grace,.,.,.,109,属性默认值,-#FIXED,Kelly,Grace,October 15,2005,110,属性类型,-CDATA,Kelly,Grace,.,.,.,111,属性类型,-ID,.,.,.,.,.,.,.,.,112,属性类型,-IDREF,Kelly,Grace,October 15,2005,Grant,Cary,October 20,2005,113,属性类型,-ENTITY,Kelly,Grace,October 15,2005,Grant,Cary,October 20,2005,114,练习,1.,在,元素中约束,married,属性的取值为,yes,或者,no,默认值,no?,2.,使用,DTD,定义 可选的,CDATA,属性,属性名,date,,其值格式,4/1/05,;属性,sex,,取值,male,和,female,;一个必须的,(required),属性,name.,并建立其实例文档。,115,XML schema,编辑工具,XMLspy,XRay,Microsoft Visual Studio.NET,116,XML schema validating,Visual Studio.NET,Internet Explorer,Xerces,XRay,117,Welcome.xml,Welcome to XML Schemas!,118,example.xsd,119,XRay validating,120,XML Schemas(and DTDs),用途,定义,:,实例文档的,结构,this element contains these elements,which contains these other elements,etc,每个元素,/,属性的,数据类型,this element shall hold an integer with the range 0 to 12,000 (DTDs dont do too well with specifying datatypes like this),121,XML Schemas,动机,人们对,DTDs,不满:,语法与,XML,不同,You write your XML(instance)document using one syntax and the DTD using another syntax-bad,inconsistent,有限的数据类型,DTDs support a very limited capability for specifying datatypes.You cant,for example,express I want the element to hold an integer with a range of 0 to 12,000,Desire a set of datatypes compatible with those found in databases,DTD supports 10 datatypes;XML Schemas supports 44+datatypes,122,XML Schemas,特性,增强的数据类型,44+versus 10,Can create your own datatypes,Example:This is a new type based on the string type and elements of this type must follow this pattern:ddd-dddd,where d represents a digit.,与实例文档的语法相同,less syntax to remember,面向对象,Can extend or restrict a type(derive new type definitions on the basis of old ones),可以表达集合,i.e.,can define the child elements to occur in any order,123,BookStore.dtd,124,ATTLIST,ELEMENT,ID,#PCDATA,NMTOKEN,ENTITY,CDATA,BookStore,Book,Title,Author,Date,ISBN,Publisher,This is the vocabulary that,DTDs provide to define your,new vocabulary,125,element,complexType,schema,sequence,www.w3.org/2001/XMLSchema,string,integer,boolean,BookStore,Book,Title,Author,Date,ISBN,Publisher,www.books.org(,targetNamespace,),This is the vocabulary that,XML Schemas provide to define your,new vocabulary,126,BookStore.xsd,xsd=Xml-Schema Definition,(explanations on,succeeding pages),127,128,All XML Schemas have,schema as the root,element.,129,The elements and,datatypes that,are used to construct,schemas,-schema,-element,-complexType,-sequence,-string,come from the,XMLSchema,namespace,130,element,complexType,schema,sequence,www.w3.org/2001/XMLSchema,XMLSchema Namespace,string,integer,boolean,131,Indicates that the,elements defined,by this schema,-BookStore,-Book,-Title,-Author,-Date,-ISBN,-Publisher,are to go in the,books.org,namespace,132,BookStore,Book,Title,Author,Date,ISBN,Publisher,www.books.org(targetNamespace),Book Namespace(targetNamespace),133,This is referencing a,Book element declaration.,The Book in what,namespace?Since there,is no namespace qualifier,it is referencing the Book,element in the default,namespace,which is the,targetNamespace!Thus,this is a reference to the,Book element declaration,in this schema.,The default namespace is,www.books.org,which is the,targetNamespace!,134,实例文档中的任何元素,必须指定名字空间,135,在,XML,实例文档中引用,schema,My Life and Times,Paul McCartney,July,1998,94303-12021-43892,McMillin Publishing,.,1.,声明默认的名字空间,告诉,schema-validator,本实例文档中所有元素来自于,www.books.org,名字空间,2.schemaLocation,告诉,schema-validator,名字空间,www.books.org,由,BookStore.xsd,定义,3.,告诉,schema-validator,属性,schemaLocation,在,XMLSchema-instance,名字空间中,1,2,3,136,schemaLocation,type,noNamespaceSchemaLocation,www.w3.org/2001/XMLSchema-instance,XMLSchema-instance Namespace,nil,137,在,XML,实例文档中引用,schema,BookStore.xml,BookStore.xsd,targetNamespace=www.books.org,schemaLocation=www.books.org,BookStore.xsd,-,defines,elements in,namespace www.books.org,-,uses,elements from,namespace www.books.org,A schema,defines,a new vocabulary.Instance documents,use,that new vocabulary.,138,验证,BookStore.xml,BookStore.xsd,XMLSchema.xsd,(schema-for-schemas),Validate that the xml document,conforms to the rules described,in BookStore.xsd,Validate that BookStore.xsd is a valid,schema document,i.e.,it conforms,to the rules described in the,schema-for-schemas,139,(see example02),Note that,XMLSchema,is the default,namespace.,Consequently,there,are no namespace,qualifiers on,-schema,-element,-complexType,-sequence,-string,140,Here we are,referencing a,Book element.,Where is that,Book element,defined?In,what namespace?,The bk:prefix,indicates what,namespace this,element is in.bk:,has been set to,be the same as the,targetNamespace.,141,bk:References the targetNamespace,BookStore,Book,Title,Author,Date,ISBN,Publisher,www.books.org(targetNamespace),www.w3.org/2001/XMLSchema,bk,element,complexType,schema,sequence,string,integer,boolean,Consequently,bk:Book,refers to the Book element in the targetNamespace.,142,内联元素声明,143,Note that we have moved,all the element declarations,inline,and we are no,longer refing to the,element declarations.,This results in a much,more compact schema!,This way of designing the schema-by inlining everything-is called the,Russian Doll design,.,144,Anonymous types(no name),145,Named Types,使用命名的,complexType,的等价定义,.,146,Named type,The advantage of,splitting out Books,element declarations,and wrapping them,in a named type is,that now this type,can be,reused,by,other elements.,147,Please note that:,is equivalent to:,Element A,references,the,complexType foo.,Element A has the,complexType definition,inlined,in the element,declaration.,148,定义元素小结,(,两种方式,),A simple type,(e.g.,xsd:string),or the name of,a complexType,(e.g.,BookPublication),1,2,A nonnegative,integer,A nonnegative,integer or unbounded,Note:minOccurs and maxOccurs can only,be used in nested(local)element declarations.,149,问题,日期不同于字符串,ISBN,格式,:d-ddddd-ddd-d or d-ddd-ddddd-d or d-dd-dddddd-d,150,数据类型,151,gYear,数据类型,A built-in datatype(Gregorian calendar year),Elements declared to be of t
展开阅读全文