1、檄械学曾一夕二园I田孝西学院大学情幸艮X亍教育七夕一TUT 2000/06/071一夕7一入力J(D知髡 知的一夕解析.纸才缶亡一儿 大容量亍一夕一入 亍一夕7XV一厅亍 对象(D多棣化Web上(D亍牛入卜L/TUT 2000/06/072知口七入口|抽出|口|奕换J_Jj 0统合外部DBJ 1T/X知TUT 2000/06/0737彳二技法 统tt学 43 5辞戢 g入弓 zL Jly 卜 决定木 Rough set 相 Uli/7b Graph Based Induction J帚系解命理口A 变数逗捐 一夕O)可视化TUT 2000/06/074What is supervised le
2、arning?Input inst ances co nt ains Class at t ribut es Explanat io n at t ribut es.Generat e rules t o describe class descript io ns induct ively.IF co ndit io ns THEN class Learning fro m examples,Inco rpo rat io n o f backgro und kno wledgecf.regressio n,discriminant analysis,neural net wo rk,near
3、est neighbo rTUT 2000/06/075Typical applications Kno wledge acquisit io n t o be used in plant o perat ing expert syst em Act io n predict io n o f o ppo nent t eams in spo rt s mat ch Diagno sis fro m medical t est s Disco very o f act ive mo t ifs in chemical co mpo unds fro m st ruct ure act ivit
4、 y relat io nship dat aset sTUT 2000/06/076Classification of ProblemsTypeOutputUnderstandingExampleClassificationdefinite answers to all questionsUnnecessaryplant operation,character recognitionGuessprobable answers to some questionsUnnecessarysports action prediction,stock price predictionUnderstan
5、dingprobability to all questionsNecessarymedical diagnosis,grammar acquisitionTUT 2000/06/077St reams in learning research I.Classificat io n Pursuit o f Accuracy UCI Repo sit o ry o f machine learning dat abasesMert z,C.J.and Murphy,P.M,(1996):ht t p:/wwwjcs.uci.edu/mleam/MLRepo sit o ry.ht ml St a
6、ndard pro gram fo r co mpariso nQuinlan,J.R,(1993):C4.5:Programs for Machine Learning,Mo rgan Kaufmann;古川(1995):4足上石夕筋只卜勿口.Review 秋集 u a,金田(1998):例力 b o学者1技街必用仁向rrp情辍处理学会Vo l.39,No.2,pp.145-151;No.3,pp.245-251.TUT 2000/06/078决定木方法7眼G色身:鬓G色目的变数青低里 八、青高里 八、茶高里 八、茶高:/口茶低yn y H青低+青高:/口+青高赤+TUT 2000/06/0
7、79决定木青青茶,具具具 枭鼻黑 低嬴高低口八,青:+高,7,口八,,青:+高,7,口八,茶:一 低,7,口八二茶:一TUT 2000/06/0710平均情辍量仁上5变数iiiR-平均情辍量“、p 1 p n 1 nI(p)n)=-log 2-log 2-(p+n)(p+n)(p+n)(p+n)-分前()=-lo g2-lo g2-o o o o=0.954Z;。TUT 2000/06/0711分HI二上盲平均情辍量 利得身;fikcfci)分类真 0.003,/2 :L 均得 高 低 平利2-5 1-33-5 2-3 38-+2-5 1-3 13-5 2-30.97 Ibit0.918bit
8、0.918=0.95 Ibit=0.003bit5 82 g鬓(D色仁上盲分 0.454/;/Y眼(D色仁上分类直0.3476TUT 2000/06/07 12数值属性结合及一儿仁上马糖尿病粉断木SONAR:ht t p:/www.t rl.ibm.co m/pro ject s/s7800/DBmining/index.ht mTUT 2000/06/0713Pro gress in Decisio n Tree Variable with continuous values Entropy gain ratio,Gini index Sampling Pruning Bagging,Boo
9、sting User interface Interactive expansion of a tree Visualization RulesTUT 2000/06/0714G/n/index vs.Ent ro pyGini-index=Z P/(1-Pr)=1-Z P,TUT 2000/06/0715决定木(D方法-秋集金田:例力Jo学雪技街o 必用【二向【十。情辍处理学会Vo L39,No.2,pp.145-151;No.3,pp.245-251(1998).Breiman,L.,Friedman,J.H.,Olshen,R.A.&St o ne,C.J.:Classificat io
10、 n and Regressio n Trees,The Wadswo rt h&Bro o ks/Co le(1984).CART Quinlan,J.R.:C4.5:Pro grams fo r Machine Learning,Mo rgan Kaufmann(1993).古川IR:一夕解析,卜、:(1995).TUT 2000/06/0716St reams in learning research III.Ro ugh set Characteristics Non exploratory Methodology for decision table Analysis of vari
11、able dependencies NP hard to attributes&values References Pawlak,Z.:Ro ugh Set s:Theo ret ical Aspect s o f Reaso ning abo ut Dat a,Kluwer Academic Publishers(1991).W.Ziarko:Review o f Basics o f Ro ugh Set s in t he Co nt ext o f Dat a Mining,Proc.Fourth International Workshop on Rough Sets,Fuzzy S
12、ets,and Machine Discovery,pp.447-457,To kyo(1996).Dat alo gic/R:Reduct Syst ems Inc.TUT 2000/06/0717Ro ugh set rliPositive regionBoundary regionNegative regionTUT 2000/06/0718言十算谩程1:雕散化Obj-112.0218.2319.0 9555 17.5955618.0955719.6955815.7HECT132.217157.075148.0301532.3130175.8182611.260 199.1 1917 4
13、.0 143111.0200117.195186.6422229.9152103.2383241.1161Class SHECT100100210211311100402111512101610100712211800211Reductl=Size,Height,EnergyReduct2=Size,Height,Current Core=Size,HeightTUT 2000/06/0719ClassSHECT100100210211311100402111512101610100712211800211HeightEnergyTemperat ure010021110211221脱明变数目
14、的变数QP=Size,Height,Energy,Current Q=Temperat ureReduct 1(P,Q)=Height,Energy)Reduct 2(P,Q)=Height,Current Co re(P,Q)=Height TUT 2000/06/0720tt算谩程 2:Decisio n mat rix(cjz)62 1 0 2 1 1Rule醇出J123IOBJele3e61e2(S,l)(F;2)(qi)(H0)(E,2)(qi)2)(CD2e4(H2)(C1)(S,0)(ft 2)(Cl)(S,O)(H,2)(C1)3e5(S,1)012)(H2)(H2)4e7(S
15、,1)(H,2)(E,2)(C1)(H,2)(E;2)(C1)(H2)(E,2)(C1)5e8(E,2)(CD(S,0)(H,0)(E,2)(Cl)(S,O)(E,2)(C1)B=BS.1)V(E,2)VC 1)A(QLO)V(E,2)V(C 1)A(E,2)V(C D)=(E,2)V(C,1)B12=(H,2)V(C,1)A(O)V(H,2)V(C,1)A(O)V(H,2)V(C 1)=(H,2)V(C,1)Bl3=(2)A BH,2)A QL2)=(H,2)B14 二(1)V(H,2)V(E,2)V(C,lb A(H,2)V(E,2)V(C,1)A(H,2)V(E,2)V(C 1)=(H,
16、2)V(E,2)V(C,1)B)二(E,2)V(C 1)A(O)V(H,O)V(E,2)V(C,1)A BS,O)V(E,2)V(C,1)=(E,2)V(C,1)(Energy=2)(Current=1)(Height=2)9(Temperat ure=1)(Temperat ure=1)(Temperat ure=1)TUT 2000/06/0721Variable Precisio n Ro ugh Set Mo delPositive regionBoundary regionNegative regionTUT 2000/06/0722Variable Dependency Analy
17、sisNecessary and Sufficient Variable Set sTUT 2000/06/0723Cars exampleNoSizeCylTurboFuelsysDisplace Co mpPo werTransWeightMileage1co mpact6yesEFImedium highhighaut omediummedium2co mpact6noEFImedium mediumhighmanualmediummedium3co mpact4noEFImedium highhighmanualmediummedium4co mpact6yesEFImedium hi
18、ghhighmanuallighthigh5co mpact6noEFImedium medium mediummanualmediummedium6co mpact6no2-BBLmedium medium mediumaut oheavylo w7co mpact6noEFImedium mediumhighmanualheavylo w8subco mpact4no2-BBLsmall highlo wmanuallighthigh9co mpact4no2-BBLsmall highlo wmanualmediummedium10co mpact4no2-BBLsmall highme
19、dium aut o mediummedium11subco mpact4noEFIsmall highlo wmanuallighthigh12subco mpact4noEFImedium medium mediummanualmediumhigh13co mpact4no2-BBLmedium medium mediummanualmediummedium14subco mpact4yesEFIsmall highhighmanualmediumhigh15subco mpact4no2-BBLsmall mediumlo wmanualmediumhigh16co mpact4yesE
20、FImedium mediumhigh highmanual aut omedium mediummedium medium17co mpact6noEFImedium medium18co mpact4noEFImedium mediumhighaut omediummedium19subco mpact4noEFIsmall highmediummanualmediumhigh20co mpact4noEFIsmall highmediummanualmediumhigh21co mpact4no2-BBLsmall highmediummanualmediummediumReduct s
21、(1)cyl,fuelsys,co mp,po wer,weight(2)size,fuelsys,co mp,po wer,weight(3)size,fuelsys,displace,weight(4)size,cyl,fuelsys,po wer,weight(5)cyl,t urbo,fuelsys,displace,co mp,t rans,weight(6)size,cyl,fuelsys,co mp,weight(7)size,cyl,t urbo,fuelsys,t rans,weightCo re:fuelsys,weight Ziarko:The disco very,an
22、alysis,and represent at io n o f dat a dependencies in dat abases,Knowledge Discovery in Databases pp.195-209,Piat et sky-Shapiro&Frawley ed.AAAI Press(1991).TUT 2000/06/0724Reduct&Core Effects to Sum of SquaresSize cyl t urbo fuelsys displace co mp po wer t rans weightVariables 25TUT 2000/06/07Ro u
23、gh Set Met ho d as a To o l o f Dat a Analysis Very go o d rules fo r underst andingDespit e To o many reduct s Number o f reduct s changes wit h co nfidence value in VPRSM Disregard o f frequenciesTUT 2000/06/0726Ro ugh set Pawlak,Z.:Ro ugh Set s:Theo ret ical Aspect s o f Reaso ning abo ut Dat a,K
24、luwer Academic Publishers(1991).W.Ziarko:Review o f Basics o f Ro ugh Set s in t he Co nt ext o f Dat a Mining,Proc.Fourth International Workshop on Rough Sets,Fuzzy Sets,and Machine Discovery,pp.447-457,To kyo(1996).Datalogic/R:Reduct Systems Inc.方法/俞0特徵 雕散表瑰仁对守己方法 共起的玄分布力知得力,可能 tt算量勺一入数kN,属性数占属性值数
25、Uexp(N)TUT 2000/06/07 27St reams in learning research II.Charact erist ic Rules Evaluat io n by Usefulness Pat t erns wit h Accuracy&Suppo rt St at ist ical est imat io n o f generalit y and accuracy 於木(1999):一夕Z一久力、特徵的及一髡兄(D太的(D一般性t正碓性内信赖性同畤FMffi手法、人工知能学会14,139-147.Except io ns as int erest ingness
26、於木、志村(1997):情辍理的手法在用!/、太一夕 一久力例外的知髡见、人工知能学会12,305-312.Rat ing usefulness by human est imat io n Rule generat io n by Genet ic Algo rit hmTerano,T.and Ishino,Y.(1996):Int eract ive kno wledge disco very fro m market ing quest io naire using simulat ed breeding and induct ive learning met ho ds,Proc.K
27、DD-96,279-282.Market basket analysisTUT 2000/06/0728相儿一儿0抽出Asso ciat io n rules mining、)一7?一弓相 UllzJlz番号艮售入了彳公1013d、口一、If 一川102W厂人A八加103104二 一、lf 一及105/一人 n mx w106心107W H八W108一卜碓信度7P-7P37.5%100.0%V k人 夕3-737.5%100.0%If 一儿n-750.0%80.0%可、3-X-3-750.0%80.0%n-7 W 丁入37.5%75.0%”:/n-737.5%75.0%W 厂人 3-7 37.
28、5%75.0%”:/T H 厂人 3-737.5%75.0%T W 门37.5%75.0%W L人 3-7 TUT 2000/06/0729/4pr/o r/algo rit hm候祷了彳亍a集合江舛集合 力集合1 河集合也”卜2 油集合也K 一卜3 力集合小”卜优厂打62.5%少厂人A力50.0%/1 k入 37.5%口1方62.5%少厂人125.0%37.5%以U仅37.5%/尸人A 八 Iff候祷力、三 除外50.0%25.0%物”37.5%37.5%口 V、候撤为 除外37.5%口一人八12.5%乙乂%外12.5%0.0%刀人心12.5%TUT 2000/06/0730畤系列一夕(D解
29、析名前Tid品目Yamada105Yamada210Hiro sawa010Hiro sawa012Hiro sawa109彳、水、廿彳夕一Mit a103亡一及、廿彳夕一Ybshino002Ybshino106Ybshino205Haneda011TUT 2000/06/0731畴系列一夕O解析名前(Tid品目)Yamada(105 ET-7)(210yyHirosawa(010?工一久、-y)(012 ET-7P)(109 7.水、廿彳夕一)Mita(103、一及、廿彳夕一)Yoshino(002 tf-7)(106 彳 廿彳夕一)(205Haneda(011:/七)娟一卜Z夕一40.0%
30、(亡一及)一40.0%(亡一及)一 廿彳夕一)TUT 2000/06/0732分Ji横造(D醇入丁彳TUT 2000/06/0733Peo pleIDAgeMarried#Cars10023No120025Yes130029No040034Yes250038Yes2数值属性(D取Int erval20.24252930.3435.39VFrequent it emset(part)It emsetSuppo rt32323,2 高隹散化 Max-suppor 废越;grange统合一未复数(Drange Frequent itemsetlfWRule醇出 Rule Interest 仁上1J刈
31、1J认否 Partial completeness概念Interva殷定、健全性碓保Srikant,R.&Agrawal,R.:Mining Quant it at ive Asso ciat io n Rules in Large Relat io nal Tables,Pro c.ACM SIGMOD,pp.1-12(1996).RuleSuppo rtCo nfidence and +40%100%60%66.6%TUT 2000/06/0734倪想卜醇入仁上盲要因分析客#年龄性别125男学生232女OL 客#日付瞒黄商品100/00/00S-男100/00/00A-2 0 代197/0
32、1/30CD-X197/02/05CD-Y1 200/00/00s-女200/00/00A-3 0 代297/01/15Video-A297/03/03Video-B 沼尾.清水:流通判:招九6一夕二/Z,人工知能学会 Vo l.12,No.4,pp.528-535(1997).TUT 2000/06/0735畴系列一夕二事例温度1E力畴系列1己号化仁上马夕一 濠戢教肺付帚纳学普1 V喇频2fT1侬局T2周13瞰悟14回下降T5瞰T5用T7瞰局T8瞰1W19用IF压力上昇AND温度下降THEN昇常身生:碓率80%佐藤:亍一夕7彳二JZ向【力1/一及心/死7X:/技庙用,情辍如:理学会西支部平成
33、9年度第1回卜工了研究会TUT 2000/06/0736横造拯弓鼠WWW7入履摩(D分析-KWWWURL:8300,link:40,000lo g:19,000人/day,400MB/day Transact io n0 表琨IP address,Access t ime,URL URL pairCD变换(A,B),(B,C),(C,A),(A,D)Rule 例入球技=球技-野球才 于:夕一木卜猪口他:人工知能学会基磁研究会SIG-FAI-9801-10,pp.55-60(1998).TUT 2000/06/0737相儿一儿(D探索 Agrawal,A,et.al.:Dat abase Min
34、ing:A Perfo rmance Perspect ive,IEEE Trans,o n Kno wledge and Dat a Engineering,Vo l.5,No.6,pp.914-925(1993).Fast Alo go rit hms fo r Mining Asso ciat io n Rules,Pro c.VLDB,pp.487-499(1994).ht t p:/www.almaden.ibm.co m/cs/-喜建川:一夕二相及一儿抽出技法,人工知能学会 Vo l.12,No.4,pp.513-520(1997).Wasio,T.et.al.:Mining As
35、so ciat io n Rules fo r Est imat io n and Predict io n,Pro c.PAKDD98(Lect ure No t es in Al,1394)pp,417-419(1998).Agrawal,A.et.al.:Mining Sequent ial Pat t erns,Pro c.Dat a Engineering,pp.3-14(1995).沼尾他:要因分析亍一夕二情辍处理第51回全国大会,5E-1(1995).流通渠仁招,6一夕7f二:/人工知能学会赢Vo l.12,No.4,pp.528-535(1997).-猪口他:八毛八/卜分析情造
36、亍一夕拯强占通信抡一一 夕遹用,人工知能学会 SIG-FAI 9801,pp.55-60(1998).TUT 2000/06/0738Graph Based Induct io n 逐次了拯弓rn KA吉田.元田:逐次7r据副二基V帚纳推人工知能学会H Vo L12,pp.58-67(1997).TUT 2000/06/0739GBlCDziq:/卜操作履雁解析(D必用手芸 稼 直前 形嬲U 1-NNCART GBI精度 22.6%20.7%22.6%20.8%34.6%57.8%TUT 2000/06/0740Graph Based Induct io n 0特徵高速菁造化才工夕卜(D解析可
37、-概念狸得,分类直规即学雪,推高速 化。)何孔仁吞逾用可能 Sequence(DNA,pro t ein)用 Negat ive玄条件表琪仁工夫力弋必要 Ordered Graphl二限定-烧期概念f士速结二限定-障害忆上雉玄才工夕卜(D 取困莫隹TUT 2000/06/0741席隔产关涉前提知后战 parent(1,2).parent(15 3).正例他例 grandparent。4).grandparent(1,5).余吉果 grandparent(X,Y)parent(X5 Z),parent(Z5 Y).TUT 2000/06/0742Versio n space中探索探用低脱 粢却低
38、脱 o正例 翼例Grandparent(X,Y)?被覆集合了及=i刃犬A 新太玄M(D付加 变数(D定数化 言己述房最少原理-C低脱逗扒 FOIL:Quinlan(1990)entropyl::上Z最良探索 Progol:Muggleton(1995)逆伴意(Inverse entailment二 上盲探索空宿小TUT 2000/06/0743Pro go kck5变臭原性物散别-230槿(D二卜口化合物:Ames t est po sit ive 138/negat ive 92,Debnat h et al:J.Med.Chem.34:786-797(1991).-188槿:重回iI帚分析
39、实施.Progol:188(12hr)/42(6hr):l分割解析 atmfcompound,atom,element,type,charge),bondfcompound,atoml,atom2,bondtype).-9?l0Ru le分精度士同梯-指示变数自勤的髡足Phenant hrene 骨格、例外的acet ylene-使用法困If性、:算(Cj/King et.al.:Relating chemical activity to structure:W-x an examination of ILP success,New GenerationComputing,Vol.13,pp.
40、411-433(1995).44TUT 2000/06/07Induct ive lo gic pro gramming-人工知能学会ft小特集:,帚纳Ift理Vo l.12,No.5,pp.654-688(1997).Lavrac&Dzero ski:Inductive Logic Programming:Techniques and Applications,Hert fo rdshire,Ellis Ho rwo o d(1994),Dzero ski,S.:Induct ive Lo gic Pro gramming and Kno wledge Disco very in Dat a
41、bases,In Fayyad et.al.Advances in Knowledge Discovery and Data Mining,pp.117-152,AAAI Press(1996).Quinlan,J.R.:Learning Lo gical Definit io ns fro m Relat io ns,Machine Learning,Vo l.5,pp.239-266(1990).Mugglet o n,S.:Induct ive Lo gic Pro gramming,New Generat io n Co mput ing,Vo l.8,pp.295-318(1991)
42、;Inverse Ent ailment and Pro go l,ibid.Vo l.13,pp.245-286(1995).King,R.D.et.al.:Relat ing Chemical Act ivit y t o St ruct ure:an examinat io n o f ILP successes,ibid.Vo l.13,pp-411-433(1995).ht t p:/gruffle.co mlab.o x.ac.uk/o ucl/gro ups/machlearn/TUT 2000/06/0745参考资料-人工知能学会特集大规模一夕7一久力jo知得,Vo l.12,
43、No.4(1997).Mit ch elly T.:Machine Learning,McGraw-Hi 11(1997).Michalski,R.,Brat o ko,I.&Kubat M.:Machi ne Learning and Dat a Mining Met ho ds and Applicat io ns,Jo hn Wiley&So ns(1998).Berry,M.J.A.&Lino ff,G.:Dat a Mining Techniques fo r Market ing,Sales,and Cust o mer Suppo rt,Jo hn Wiley&So ns(199
44、7).Adriaans,P.Zant in ge:Dat a Mining,Addiso n-WesIey(1996).山本、梅村IR:一叽共立(1998).Gro t h,R.:Dat a Mining:A Hands-On Appro ach fo r Business Pro fessio nals,Pren t ice Hal I PTR(1997).Quinlan J.R.:C4.5:Pro grams fo r Machine Learning,Mo rgan Kaufmann,1993.古川iR“Al仁一夕解析(1995).Piat et sky-Shapi ro:ht t p:
45、www.kdnugget s.co m/&ma i I i ng-1 i stIBM:Int el I igent Miner ht t p:/www.so ft ware.ibm.co m/dat a/iminer/index.ht ml SAS:Ent erprise Miner ht t p:/www.sas.co m/so ft ware/co mpo nent s/miner.ht ml SGI:MineSet ht t p:/www.sgi.co m/Pro duct s/so ft ware/MineSet/-学情辍处理研究七:/夕一 一夕解析(0k360知A”.ht t p:/
46、www.cIab.kwansei.ac.Jp/mining/index.ht mITUT 2000/06/0746十八漳三重!TUT 2000/06/0747Rule Induction asData Analysis Tool Rules accurat e?Yes.So ft ware available?Yes.Co mput ing fast?Yes.Easy underst anding?Yes.Po pular?No.TUT 2000/06/0748Po ssible Reaso ns Co nservat ive users Unix enviro nment No famili
47、ar examples To o many met ho ds To o many rulesSelf-evident rules Impressio ns:ad ho c met ho ds explo rat o ryTUT 2000/06/0749Respo nse o f Users fro m Expect ed Result sRegression by a few variables TSS=ESS+RSS100%99%1%Hypo t hesis co nfirmed=Sat isfact o ry wit h Dat ascape Rule induction A few s
48、imple rules b Average accuracy:,Sum o f co verage:99%99%Self-evident rules=Unsat isfact o ry wit ho ut Dat ascapeTUT 2000/06/0750What is Datascape?Quantification Pro blem quant ificat io n So lut io n quant ificat io n Multiple data dependencies Explanat io n fro m plural viewpo int s Co rrelat io n
49、 amo ng explanat io n variables Co ncise&levelwise deepening descript io ns Views of solution Inspect io n o f individual dat um Surro undings o f so lut io nTUT 2000/06/0751Answers to Datascape by Cascade Model Quant ificat io n by SS SS:sum o f squares Dat a dependencies Det ect io n o f lo cal in
50、t eract io ns Unified mechanism fo r Discriminat io n rules Charact erist ic rules Levelwise creat io n o f rule set sTUT 2000/06/0752Pro blem in decisio n t ree 1Heurist ic search is used t o get t he best t ree.Aa1a1a1a1a2MMMCC1C2C1C2C1C2C1C2 3b1b1tet2b1b1t2teaapnnnppnpTUT 2000/06/0753Pro blem in