1、普适机器学习普适机器学习(Pervasive Machine Learning)周志华周志华http:/ 器器 学学 习习机器学习主要性第4页u美国航空航天局JPL试验室科学家在Science(年9月)上撰文指出:机器学习对科学研究整个过程正起到越来越大支持作用,该领域在今后若干年内将取得稳定而快速发展机器学习主要性第5页入侵检测:是否是入侵?是何种入侵?怎样检测?历史数据:以往正常访问模式及其表现、以往入侵模式及其表现对当前访问模式分类这是一个经典机器学习问题惯用技术:神经网络 决议树支持向量机 贝叶斯分类器k近邻 序列分析 聚类 例子1:网络安全第6页惯用技术:神经网络 支持向量机隐马尔可
2、夫模型贝叶斯分类器 k近邻决议树 序列分析 聚类 例子2:生物信息学第7页例子3:搜索引擎Google成功,使得Internet搜索引擎成为一个新兴产业不但有众多专营搜索引擎企业出现(比如专门针对汉字搜索就有慧聪、baidu等),而且Microsoft等巨头也开始投入巨款进行研发Google掘到第一桶金,起源于其创始人Larry Page和Sergey Brin提出PageRank算法机器学习技术正在支撑着各类搜索引擎(尤其是贝叶斯学习技术)第8页美国PAL计划uDARPA 年开始开启PAL(Perceptive Assistant that Learns)计划u5年期,首期(1-1.5年)投
3、资2千9百万美元u以机器学习为关键计划(包括到AI其它分支,如知识表示和推理、自然语言处理等);包含2个子计划u目标:u“is expected to yield new technology of significant value to the military,business,and academic sectors”u“develop software that will help decision-makers manage their complex worlds of multiple simultaneous tasks and unexpected events”第9页RA
4、DAR(Reflective Agents with Distributed Adaptive Reasoning),负担单位为CMU,首期7百万美元目标:“the system will help busy managers to cope with time-consuming tasks”“RADAR must learn by interacting with its human master and by accepting explicit advice and instruction”美国PAL计划:RADAR子计划第10页CALO(Cognitive Agent that Le
5、arns and Observes),负担单位为SRI,首期2千2百万美元除SRI外,这个子计划参加单位有20家:Boeing,CMU,Dejima Inc.,Fetch Tech Inc.,GATech,MIT,Oregon HSU,Stanford,SUNY-Stony Brook,UC Berkeley,UMass,UMich,UPenn,Rochester,USC,UT Austin,UW,Yale,CALO无疑是PAL中更关键部分美国PAL计划:CALO子计划(1)第11页美国PAL计划:CALO子计划(2)目标:“the name CALO was inspired by the
6、Latin word calonis,which means soldiers assistant”“the CALO software,which will learn by working with and being advised by its users,will handle a broad range of interrelated decision-making tasks It will have the capability to engage in and carry out routine tasks,and to assist when the unexpected
7、happens”从CALO目标来看,DARPA已经开始把机器学习技术主要性放到了国家安全角度来考虑第12页美国PAL计划:CALO子计划(3)第13页历史回顾(1)下述事件(大致)标志着机器学习正式成为一个学科u1983年,R.S.Michalski等人撰写机器学习:通往人工智能路径一书u1986年,Machine Learning杂志创刊与人工智能乃至计算机科学中很多其它分支学科相比,机器学习还非常年轻、很不成熟以Tom Mitchell经典教科书(McGraw Hill出版社,1997)为例,极难看到基础学科(比如数学、物理学)教科书中那种贯通一直体系,可能会让人感到这不过是不一样方法和技
8、术堆砌第14页历史回顾(2)主要范式发展:u80年代中叶以前:符号主义,代表:ILP受到传统人工智能研究深刻影响,以逻辑推理为基础u80年代中叶至90年代初:连接主义,代表:NN对传统人工智能批评:“看上去漂亮,但处理不了实际问题”对上述批评,AI不一样分支学科实际上都做出了自己回应,ML回应是连接主义受到重视NN并不漂亮(最少在理论体系上远远没有ILP那么漂亮),但处理了很多实际问题第15页历史回顾(3)u90年代中叶至今:统计学习,代表:SVMNN即使处理了不少问题,但处理问题时“试错性”引来了“trick”批评作为回应,统计学习开始占据支配地位。即使SVM依然有“试错性”,但毕竟在理论基
9、础上比NN漂亮得多(实际上,统计学习与连接主义一脉相承)u现在:?统计学习并不是万能,有很多问题不能处理(或不能很好地处理),比如结构化数据学习作为回应,以逻辑为基础符号主义与统计学习结合开始受到重视第16页从主要范式发展能够看出,ML实际上是一个应用驱动学科,其根本驱动力是“更多、更加好地处理实际问题”因为近飞速发展,机器学习已经具备了一定处理实际问题能力,似乎逐步开始成为一个基础性、透明化“支持技术、服务技术”基础性:在众多学科领域都得以应用(“无所不在”)透明化:用户看不见机器学习,看见是防火墙、生物信息、搜索引擎;(“无所不在”)“机器更加好用了”(正如CALO一些描述:“you wo
10、nt leave home without it”;”embodied as a software environment that transcends workstations,PDAs,cell phones,”)似乎趋势“普适机器学习”第17页作为支持和服务技术“普适机器学习”带来了挑战和机遇:出现了很多被传统ML研究忽略、但非常主要且尚无好处理方案问题(下面将以医疗和金融为代表来举几个例子)ML支持和服务学科领域越多,新问题越多ML与众多学科领域产生了交叉,而交叉领域正是大有可为处挑战与机遇第18页医疗:以乳腺癌诊疗为例,“将病人误诊为健康人代价”与“将健康人误诊为病人代价”是不一样
11、金融:以信用卡盗用检测为例,“将盗用误认为正常使用代价”与“将正常使用误认为盗用代价”是不一样传统ML技术基本上只考虑同一代价怎样处理代价敏感性?在教科书中找不到现成答案,比如:Tom Mitchell,Machine Learning,McGraw-Hill,1997Nils J.Nilsson,Introduction to Machine Learning,draft 1996-例子1:代价敏感第19页医疗:以乳腺癌诊疗为例,“健康人”样本远远多于“病人”样本金融:以信用卡盗用检测为例,“正常使用”样本远远多于“被盗用”样本传统ML技术基本上只考虑平衡数据怎样处理数据不平衡性?在教科书中
12、找不到现成答案例子2:不平衡数据第20页医疗:以乳腺癌诊疗为例,需要向病人解释“为何做出这么诊疗”金融:以信用卡盗用检测为例,需要向保安部门解释“为何这是正在被盗用卡”传统ML技术基本上只考虑泛化不考虑了解怎样处理可了解性?在教科书中找不到现成答案例子3:可了解第21页走向普适机器学习u 把机器学习真正当成一种支持技术、服务技术,考虑不一样学科领域对机器学习需求,找出其中具有共性、必须解决问题,并进而着手研究u一方面可以促进和丰富ML本身发展,其次可以促进使用ML技术学科领域本身发展u作为“应用基础”,与“ML应用”有根本区别:u基础性:不是直接做应用,而是做“更广泛应用”或“更成功应用”所需
13、要方法和技术u广泛性:重点不是去解决单一应用所面临问题,而是要解决众多应用领域所面临共性问题第22页致谢应明生教授:与基础科学教科书比较王珏教授:屡次富有启发性讨论第23页请各位教授 批评指正!第24页SQNKIFDAxvspnkifca752+)&!ZWUROMJHEBzwtromjgeb9631-*%#YVSQNLIFDAxvsqnkifda752+)&$ZWURPMJHEBzwuromjheb9631-(%#YVTQNLIFDAyvsqnlifda7520)&$ZXURPMJHECzwurpmjheb9641-(%!YVTQNLIGDAyvtqnligda8520)*$ZXUSPMKHE
14、Czxurpmkhec9641+(%!YWTQOLIGDByvtqoligda8530)*$#XUSPMKHFCzxuspmkhec9741+(&!YWTQOLJGDBywtqoligdb8530-*$#XUSPNKHFCAxuspmkhfc9742+(&!YWTROLJGEBywtqoljgdb8630-*$#XVSPNKIFCAxuspnkhfca742+(&!ZWTROMJGEBywtroljgeb8630-*%#XVSQNKIFCAxvspnkifca742+)&!ZWUROMJGEBzwtromjgeb8631-*%#YVSQNKIFDAxvsqnkifca752+)&$ZWUROM
15、JHEBzwuromjgeb9631-(%#YVTQNLIFDAyvsqnlifda7520)&$ZXURPMJHECzwurpmjheb9641-(%!YVTQNLIGDAyvtqnlifda8520)*$ZXURPMKHECzxurpmjhec9641+(%!YVTQOLIGDByvtqnligda8530)*$ZXUSPMKHFCzxurpmkhec9741+(%!YWTQOLJGDByvtqoligdb8530)*$#XUSPNKHFCzxuspmkhfc9741+(&!YWTROLJGDBywtqoljgdb8530-*$#XVSPNKHFCAxuspnkhfc9742+(&!ZWT
16、ROLJGEBywtroljgdb8630-*%#XVSPNKIFCAxvspnkhfca742+)&!ZWTROMJGEBzwtroljgeb8631-*%#YVSQNKIFDAxvsqnkifca752+)&$ZWUROMJHEBzwuromjgeb9631-(%#YVSQNLIFDAyvsqnkifda7520)&$ZWURPMJHECzwuromjheb9641-(%#YVTQNLIGDAyvsqnlifda8520)&$ZXURPMKHECzwurpmjhec9641-(%!YVTQOLIGDAyvtqnligda8520)*$ZXUSPMKHECzxurpmkhec9641+(%!
17、YWTQOLIGDByvtqoligda8530)*$#XUSPMKHFCzxuspmkhec9741+(&!YWTQOLJGDBywtqoligdb8530-*$#XUSPNKHFCAxuspmkhfc9742+(&!YWTROLJGEBywtroljgdb8630-*%#XVSPNKIFCAxvspnkhfca742+)&!ZWTROMJGEBzwtroljgeb8631-*%#XVSQNKIFDAxvspnkifca752+)&!ZWUROMJHEBzwtromjgeb9631-*%#YVSQNLIFDAxvsqnkifda752+)&$ZWURPMJHEBzwuromjheb9631-
18、(%#YVTQNLIFDAyvsqnhfc9742+(&!ZWTROLJGEBywtroljgdb8630-*%#XVSPNKIFCAxvspnkhfca742+)&!ZWTROMJGEBzwtromjgeb8631-*%#YVSQNKIFDAxvsqnkifca752+)&$ZWUROMJHEBzwuromjgeb9631-(%#YVSQNLIFDAyvsqnkifda7520)&$ZWURPMJHECzwuromjheb9641-(%#YVTQNLIGDAyvsqnlifda8520)&$ZXURPMKHECzwurpmjhec9641-(%!YVTQOLIGDAyvtqnligda852
19、0)*$ZXUSPMKHECzxurpmkhec9641+(%!YWTQOLIGDByvtqoligda8530)*$#XUSPMKHFCzxuspmkhec9741+(&!YWTQOLJGDBywtqoligdb8530-*$#XUSPNKHFCAxuspmkhfc9742+(&!ZWTROLJGEBywtroljgdb8630-*%#XVSPNKIFCAxvspnkhfca742+)&!ZWTROMJGEBzwtroljgeb8631-*%#XVSQNKIFDAxvspnkifca752+)&!ZWUROMJHEBzwtromjgeb9631-*%#YVSQNLIFDAxvsqnkifda
20、752+)&$ZWURPMJHEBzwuromjheb9631-(%#YVTQNLIFDAyvsqnlifda7520)&$ZXURPMJHECzwurpmjheb9641-(%!YVTQNLIGDAyvtqnlifda8520)*$ZXURPMKHECzxurpmjhec9641+(%!YVTQOLIGDByvtqnligda8530)*$ZXUSPMKHFCzxurpmkhec9741+(&!YWTQOLJGDBywtqoligdb8530-*$#XUSPNKHFCAxuspmkhfc9742+(&!YWTROLJGEBywtqoljgdb820)&$ZXURPMKHECzwurpmjhe
21、c9641-(%!YVTQOLIGDAyvtqnligda8520)*$ZXUSPMKHECzxurpmkhec9641+(%!YWTQOLIGDByvtqoligda8530)*$#XUSPMKHFCzxuspmkhec9741+(&!YWTQOLJGDBywtqoligdb8530-*$#XUSPNKHFCAxuspmkhfc9742+(&!ZWTROLJGEBywtroljgdb8630-*%#XVSPNKIFCAxvspnkhfca742+)&!ZWTROMJGEBzwtroljgeb8631-*%#XVSQNKIFDAxvspnkifca752+)&!ZWUROMJHEBzwtrom
22、jgeb9631-*%#YVSQNLIFDAxvsqnkifda752+)&$ZWURPMJHEBzwuromjheb9631-(%#YVTQNLIFDAyvsqnlifda7520)&$ZXURPMJHECzwurpmjheb9641-(%!YVTQNLIGDAyvtqnlifda8520)*$ZXURPMKHECzxurpmjhec9641+(%!YVTQOLIGDByvtqnligda8530)*$ZXUSPMKHFCzxuspmkhec9741+(&!YWTQOLJGDBywtqoligdb8530-*$#XUSPNKHFCAxuspmkhfc9742+(&!YWTROLJGEBywt
23、qoljgdb8630-*$#XVSPNKIFCAxuspnkhfca742+(&!ZWTROMJGEBywtroljgeb8630-*%#XVSQNKIFCAxvspnkifca742+)&!ZWUROMJGEBzwtromjgeb8631-*%#YVSQNKIFDAxvsqnkifca752+)&$ZWUROMJHEBzwuromjgeb530-*$#XUSPNKHFCAxuspnkhfc9742+(&!ZWTROLJGEBywtroljgdb8630-*%#XVSPNKIFCAxvspnkhfca742+)&!ZWTROMJGEBzwtroljgeb8631-*%#XVSQNKIFDAx
24、vspnkifca752+)&!ZWUROMJHEBzwtromjgeb9631-*%#YVSQNLIFDAxvsqnkifda752+)&$ZWURPMJHEBzwuromjheb9631-(%#YVTQNLIFDAyvsqnlifda7520)&$ZXURPMJHECzwurpmjheb9641-(%!YVTQNLIGDAyvtqnlifda8520)*$ZXURPMKHECzxurpmjhec9641+(%!YVTQOLIGDByvtqnligda8530)*$#XUSPMKHFCzxuspmkhec9741+(&!YWTQOLJGDBywtqoligdb8530-*$#XUSPNKHF
25、CAxuspmkhfc9742+(&!YWTROLJGEBywtqoljgdb8630-*$#XVSPNKIFCAxuspnkhfca742+(&!ZWTROMJGEBywtroljgeb8630-*%#XVSQNKIFCAxvspnkifca742+)&!ZWUROMJGEBzwtromjgeb8631-*%#YVSQNKIFDAxvsqnkifca752+)&$ZWUROMJHEBzwuromjgeb9631-(%#YVSQNLIFDAyvsqnkifda7520)&$ZWURPMJHECzwuromjheb9641-(%#YVTQNLFCAxvspnkhfca742+)&!ZWTROMJ
26、GEBzwtroljgeb8631-*%#XVSQNKIFDAxvspnkifca752+)&!ZWUROMJHEBzwtromjgeb9631-*%#YVSQNLIFDAxvsqnkifda752+)&$ZWURPMJHEBzwuromjheb9631-(%#YVTQNLIFDAyvsqnlifda7520)&$ZXURPMJHECzwurpmjheb9641-(%!YVTQNLIGDAyvtqnlifda8520)*$ZXURPMKHECzxurpmjhec9641+(%!YVTQOLIGDByvtqnligda8530)*$#XUSPMKHFCzxuspmkhec9741+(&!YWTQ
27、OLJGDBywtqoligdb8530-*$#XUSPNKHFCAxuspmkhfc9742+(&!YWTROLJGEBywtqoljgdb8630-*$#XVSPNKIFCAxuspnkhfca742+(&!ZWTROMJGEBywtroljgeb8630-*%#XVSQNKIFCAxvspnkifca742+)&!ZWUROMJGEBzwtromjgeb8631-*%#YVSQNKIFDAxvsqkhec9741+(&!YWTQOLJGDBywtqoligdb8530-*$#XVSPNKHFCAxuspnkhfc9742+(&!ZWTROLJGEBywtroljgdb8630-*%#XV
28、SPNKIFCAxvspnkhfca742+)&!ZWTROMJGEBzwtroljgeb8631-*%#XVSQNKIFDAxvspnkifca752+)&!ZWUROMJHEBzwtromjgeb9631-*%#YVSQNLIFDAxvsqnkifda752+)&$ZWURPMJHEBzwuromjheb9631-(%#YVTQNLIFDAyvsqnlifda7520)&$ZXURPMJHECzwurpmjheb9641-(%!YVTQNLIGDAyvtqnlifda8520)*$ZXURPMKHECzxurpmjhec9641+(%!YVTQOLIGDByvtqoligda8530)*$
29、#XUSPMKHFCzxuspmkhec9741+(&!YWTQOLJGDBywtqoligdb8530-*$#URPMJHECzwuromjheb9641-(%#YVTQNLIGDAyvsqnlifda8520)&$ZXURPMKHECzwurpmjhec9641-(%!YVTQOLIGDAyvtqnligda8520)*$ZXUSPMKHECzxurpmkhec9641+(%!YWTQOLIGDByvtqoligda8530)*$#XUSPMKHFCzxuspmkhec9741+(&!YWTQOLJGDBywtqoljgdb8530-*$#XVSPNKHFCAxuspnkhfc9742+(&第25页