收藏 分销(赏)

人工智能学习讲义.ppt

上传人:a199****6536 文档编号:13308656 上传时间:2026-02-26 格式:PPT 页数:57 大小:1.76MB 下载积分:8 金币
下载 相关 举报
人工智能学习讲义.ppt_第1页
第1页 / 共57页
人工智能学习讲义.ppt_第2页
第2页 / 共57页


点击查看更多>>
资源描述
Click to edit Master title style,Click to edit Master text styles,Second level,Third level,Fourth level,Fifth level,*,人工智能学习,单击此处编辑母版标题样式,单击此处编辑母版文本样式,第二级,第三级,第四级,第五级,*,人工智能学习,单击此处编辑母版标题样式,单击此处编辑母版文本样式,第二级,第三级,第四级,第五级,*,人工智能学习,人工智能学习,Supervised learning,正规的参数表示,分类,回归,人工智能学习,Classification(分类),We are given a set of N observations(,x,i,y,i,),i,=1.,N,Need to map x X to a label y Y,Examples:,人工智能学习,Decision Trees决策树,教材Section 18.3,人工智能学习,学习决策树,Problem:,基于以下属性决定是否在一家餐馆等座位,:1.,Alternate,(别的选择),:is there an alternative restaurant nearby?2.,Bar,:is there a comfortable bar area to wait in?3.,Fri/Sat,:is today Friday or Saturday?4.,Hungry,:are we hungry?5.,Patrons,(顾客),:number of people in the restaurant(None,Some,Full)6.,Price,:price range($,$,$)7.,Raining,:is it raining outside?8.,Reservation,(预约),:have we made a reservation?9.,Type,:kind of restaurant(French,Italian,Thai,Burger)10.,Wait Estimate,:estimated waiting time(0-10,10-30,30-60,60),人工智能学习,Attribute-based representations,以下是,12,个基于这,10,个属性描述的例子,属性值是布尔、离散和连续的,E.g.,situations where I will/wont wait for a table:,Classification,(分类),of examples is,positive(T),or,negative(F),人工智能学习,Decision trees,一个可能的假设表示,E.g.,here is the“true”tree for deciding whether to wait:,人工智能学习,Decision Tree Learning,人工智能学习,Expressiveness(表达能力),决策树能够表达关于输入属性的任何函数,E.g.,for Boolean functions,truth table row path to leaf,(函数真值表的每行对应于树中的一条路径),:,Trivially,there is a consistent decision tree for any training set with one path to leaf for each example(unless,f,nondeterministic in,x,)but it probably wont generalize to new examples,需要找到一颗更,紧凑,的决策树,人工智能学习,Decision tree learning,目标,:,找到一颗小的决策树来满足训练样本,Idea:(,递归地,),选择最佳属性作为(子)树的根,人工智能学习,Choosing an attribute,Idea:,一个好的属性选择将样本分割成理想的子集,例如,“all positive”or“all negative“,Patrons,?,is a better choice,人工智能学习,Using information theory(信息论),algorithm,落实,DTL,算法中,Choose-Attribute,函数的实施,Information Content,信息量,(Entropy,熵,):,对于一个包含,p,个正例和,n,个反例的训练集:,人工智能学习,Information gain(信息增益),任何属性,A,都可以根据属性,A,的值将训练集,E,划分为几个子集,E,1,E,v,,其中,A,可以有,v,个不同的值,从属性,A,测试中得到的,信息增益,(IG),是原始的信息需求和新的信息需求之间的差异,:,Choose the attribute with the largest IG,人工智能学习,信息增益,对于训练集,p=n=6,I(6/12,6/12)=1,bit,考虑属性,Patrons,and,Type,(and others too):,Patrons,has the highest IG of all attributes and so is chosen by the DTL algorithm as the root,人工智能学习,Example contd.,Decision tree learned from the 12 examples:,明显比前面那颗,“true”tree,要简单得多,人工智能学习,性能评估,How do we know that,h f,?1.Use theorems of computational/statistical learning theory2.Try,h,on a new test set,(测试集),of examples(use same distribution over example space as training set),Learning curve,(学习曲线),=%correct on test set as a function of training,人工智能学习,评论基于决策树的分类,Advantages:,易于构造,在分类位置记录时速度快,对于“小号”树易于解释,在简单数据集上分类精度相当于其他分类算法,Example:C4.5Simple depth-first construction.Uses Information Gain,人工智能学习,K nearest neighbor classifier最近邻模型,教材,Section 20.4,Linear predictions,线性预测,人工智能学习,Learning Framework,人工智能学习,Focus of this part,Binary classification(e.g.,predicting spam or not spam):,Regression(e.g.,predicting housing price):,人工智能学习,Classification,Classification=learning from data with finite discrete labels.Dominant problem in Machine Learning,人工智能学习,线性分类器,Binary classification can be viewed as the task ofseparating classes in feature space,(特征空间),:,人工智能学习,Roadmap,人工智能学习,线性分类器,h,(,x,),=,sign(,w,T,x,+,b,),需要寻找合适的,w,(direction),和,b,(location)of,分界线,Want to minimize the expected zero/one loss,(损失),for classifier,h,:X,Y,which is,h,(,x,),=,sign(,w,T,x,+,b,),理想情况下,完全分割,人工智能学习,线性分类器,损失最小化,理想情况下我们想找到一个分类器,h,(,x,),=,sign(,w,T,x,+,b,),来最小化,0/1 loss,Unfortunately,this is a hard problem.,替换的损失函数,:,人工智能学习,Learning as Optimization,人工智能学习,Least Squares Classification最小二乘分类,Least squares loss function:,目标,:,学习一个分类器,h,(,x,),=,sign(,w,T,x,+,b,),来使最小二乘损失最小,人工智能学习,最小二乘分类解决方案,人工智能学习,W解决方案,人工智能学习,通用的线性分类,人工智能学习,Regression(回归),Regression=learning from continuously labeled data.,(连续的标签数据),人工智能学习,线性回归,人工智能学习,一般的 线性/多项式 回归,人工智能学习,模型复杂度及过拟合,人工智能学习,模型复杂度及过拟合,欠拟合,高偏离,人工智能学习,模型复杂度及过拟合,人工智能学习,模型复杂度及过拟合,过拟合,高方差,人工智能学习,模型复杂度及过拟合,人工智能学习,模型复杂度及过拟合,人工智能学习,Prediction Errors预测误差,Training errors(apparent errors)训练误差 Errors committed on the training set,Test errors 测试误差 Errors committed on the test set,Generalization errors 泛化误差 Expected error of a model over random selection of records from same distribution(未知记录上的期望误差),人工智能学习,模型复杂度及过拟合,欠拟合,:when model is too simple,both training and test errors are large,过拟合,:when model is too complex,training error is small but test error is large,人工智能学习,Incorporating Model Complexity,基本原理,:Ockhams Razor,奥卡姆剃刀原则,Given two models of similar generalization errors,oneshould prefer the simpler model over the more complex model,A complex model has a greater chance of being fittedaccidentally by errors in data,复杂的模型在拟合上更容易受错误数据误导,因此在评估一个模型时需要考虑其模型复杂度,人工智能学习,Regularization(规范化),直观的,:small values for parameters “Simpler”hypothesis Less prone to overfitting,人工智能学习,Regularization,人工智能学习,L-2 and L-1 regularization,L-2:easy to optimize,closed form solution,L-1:sparsity,人工智能学习,More than two classes?,人工智能学习,More than two classes,人工智能学习,评论最小二乘分类,不是分类问题最好的办法,But,易于训练,closed form solution,(闭式解),可以与很多经典的学习原理相结合,人工智能学习,Cross-validation(交叉验证),基本思想,:,如果一个模型有一些过拟合(对训练数据敏感),那么这个模型是不稳定的。也就是说移除部分数据会显著地改变拟合结果。,因此我们先,取出,部分数据,在剩余数据中做拟合,然后在取出的数据中做测试,人工智能学习,Cross-validation,人工智能学习,Cross-validation,人工智能学习,Cross-validation,人工智能学习,Cross-validation,人工智能学习,Learning Framework,人工智能学习,Model/parameter learning paradigm,Choose a model classNB,kNN,decision tree,loss/regularization combination,Model selectionCross validation,TrainingOptimization,Testing,人工智能学习,Summary,Supervised learning(1)ClassificationNave Bayes modelDecision treeLeast squares classification(2)RegressionLeast squares regression,人工智能学习,课后思考题,试证明对于不含冲突数据(即特征向量完全相同但标记不同)的训练集,必存在与训练集一致(即训练误差为 0)的决策树。,人工智能学习,
展开阅读全文

开通  VIP会员、SVIP会员  优惠大
下载10份以上建议开通VIP会员
下载20份以上建议开通SVIP会员


开通VIP      成为共赢上传

当前位置:首页 > 包罗万象 > 大杂烩

移动网页_全站_页脚广告1

关于我们      便捷服务       自信AI       AI导航        抽奖活动

©2010-2026 宁波自信网络信息技术有限公司  版权所有

客服电话:0574-28810668  投诉电话:18658249818

gongan.png浙公网安备33021202000488号   

icp.png浙ICP备2021020529号-1  |  浙B2-20240490  

关注我们 :微信公众号    抖音    微博    LOFTER 

客服