ImageVerifierCode 换一换
格式:DOC , 页数:66 ,大小:522KB ,
资源ID:9779721      下载积分:14 金币
验证码下载
登录下载
邮箱/手机:
图形码:
验证码: 获取验证码
温馨提示:
支付成功后,系统会自动生成账号(用户名为邮箱或者手机号,密码是验证码),方便下次登录下载和查询订单;
特别说明:
请自助下载,系统不会自动发送文件的哦; 如果您已付费,想二次下载,请登录后访问:我的下载记录
支付方式: 支付宝    微信支付   
验证码:   换一换

开通VIP
 

温馨提示:由于个人手机设置不同,如果发现不能下载,请复制以下地址【https://www.zixin.com.cn/docdown/9779721.html】到电脑端继续下载(重复下载【60天内】不扣币)。

已注册用户请登录:
账号:
密码:
验证码:   换一换
  忘记密码?
三方登录: 微信登录   QQ登录  

开通VIP折扣优惠下载文档

            查看会员权益                  [ 下载后找不到文档?]

填表反馈(24小时):  下载求助     关注领币    退款申请

开具发票请登录PC端进行申请。


权利声明

1、咨信平台为文档C2C交易模式,即用户上传的文档直接被用户下载,收益归上传人(含作者)所有;本站仅是提供信息存储空间和展示预览,仅对用户上传内容的表现方式做保护处理,对上载内容不做任何修改或编辑。所展示的作品文档包括内容和图片全部来源于网络用户和作者上传投稿,我们不确定上传用户享有完全著作权,根据《信息网络传播权保护条例》,如果侵犯了您的版权、权益或隐私,请联系我们,核实后会尽快下架及时删除,并可随时和客服了解处理情况,尊重保护知识产权我们共同努力。
2、文档的总页数、文档格式和文档大小以系统显示为准(内容中显示的页数不一定正确),网站客服只以系统显示的页数、文件格式、文档大小作为仲裁依据,个别因单元格分列造成显示页码不一将协商解决,平台无法对文档的真实性、完整性、权威性、准确性、专业性及其观点立场做任何保证或承诺,下载前须认真查看,确认无误后再购买,务必慎重购买;若有违法违纪将进行移交司法处理,若涉侵权平台将进行基本处罚并下架。
3、本站所有内容均由用户上传,付费前请自行鉴别,如您付费,意味着您已接受本站规则且自行承担风险,本站不进行额外附加服务,虚拟产品一经售出概不退款(未进行购买下载可退充值款),文档一经付费(服务费)、不意味着购买了该文档的版权,仅供个人/单位学习、研究之用,不得用于商业用途,未经授权,严禁复制、发行、汇编、翻译或者网络传播等,侵权必究。
4、如你看到网页展示的文档有www.zixin.com.cn水印,是因预览和防盗链等技术需要对页面进行转换压缩成图而已,我们并不对上传的文档进行任何编辑或修改,文档下载后都不会有水印标识(原文档上传前个别存留的除外),下载后原文更清晰;试题试卷类文档,如果标题没有明确说明有答案则都视为没有答案,请知晓;PPT和DOC文档可被视为“模板”,允许上传人保留章节、目录结构的情况下删减部份的内容;PDF文档不管是原文档转换或图片扫描而得,本站不作要求视为允许,下载前可先查看【教您几个在下载文档中可以更好的避免被坑】。
5、本文档所展示的图片、画像、字体、音乐的版权可能需版权方额外授权,请谨慎使用;网站提供的党政主题相关内容(国旗、国徽、党徽--等)目的在于配合国家政策宣传,仅限个人学习分享使用,禁止用于任何广告和商用目的。
6、文档遇到问题,请及时联系平台进行协调解决,联系【微信客服】、【QQ客服】,若有其他问题请点击或扫码反馈【服务填表】;文档侵犯商业秘密、侵犯著作权、侵犯人身权等,请点击“【版权申诉】”,意见反馈和侵权处理邮箱:1219186828@qq.com;也可以拔打客服电话:4009-655-100;投诉/维权电话:18658249818。

注意事项

本文(第六讲-极大似然估计.doc)为本站上传会员【精****】主动上传,咨信网仅是提供信息存储空间和展示预览,仅对用户上传内容的表现方式做保护处理,对上载内容不做任何修改或编辑。 若此文所含内容侵犯了您的版权或隐私,请立即通知咨信网(发送邮件至1219186828@qq.com、拔打电话4009-655-100或【 微信客服】、【 QQ客服】),核实后会尽快下架及时删除,并可随时和客服了解处理情况,尊重保护知识产权我们共同努力。
温馨提示:如果因为网速或其他原因下载失败请重新下载,重复下载【60天内】不扣币。 服务填表

第六讲-极大似然估计.doc

1、 第六讲 极大似然估计 The Likelihood Function and Identification of the Parameters (极大似然函数及参数识别) 1、似然函数的表示 在具有n个观察值的随机样本中,每个观察值的密度函数为。由于n个随机观察值是独立的,其联合密度函数为 函数被称为似然函数,通常记为,或者。 与Greene书中定义的区别 The probability density function, or pdf for a random variable y, conditioned on a set

2、 of parameters, , is denoted . This function identifies the data generating process that underlies an observed sample of data and, at the same time, provides a mathematical description of the data that the process will produce. The joint density of n independent and identically distributed (iid) obs

3、ervations from this process is the product of the individual densities; (17-1) This joint density is the likelihood function, defined as a function of the unknown parameter vector, , where is used to indicate the collection of sample data. Note that we write the joint density as a function o

4、f the data conditioned on the parameters whereas when we form the likelihood function, we write this function in reverse, as a function of the parameters, conditioned on the data. Though the two functions are the same, it is to be emphasized that the likelihood function is written in this fashion

5、to highlight our interest in the parameters and the information about them that is contained in the observed data. However, it is understood that the likelihood function is not meant to represent a probability density for the parameters as it is in Section 16.2.2. In this classical estimation fram

6、ework, the parameters are assumed to be fixed constants which we hope to learn about from the data. It is usually simpler to work with the log of the likelihood function: . (17-2) Again, to emphasize our interest in the parameters, given the observed data, we denote this function . The likeli

7、hood function and its logarithm, evaluated at , are sometimes denoted simply and , respectively or, where no ambiguity can arise, just or . It will usually be necessary to generalize the concept of the likelihood function to allow the density to depend on other conditioning variables. To jump imm

8、ediately to one of our central applications, suppose the disturbance in the classical linear regression model is normally distributed. Then, conditioned on it’s specific is normally distributed with mean and variance . That means that the observed random variables are not iid; they have different

9、means. Nonetheless, the observations are independent, and as we will examine in closer detail, (17-3) where is the matrix of data with row equal to . 2、识别问题 The rest of this chapter will be concerned with obtaining estimates of the parameters, and in testing hypotheses about them and ab

10、out the data generating process. Before we begin that study, we consider the question of whether estimation of the parameters is possible at all—the question of identification. Identification is an issue related to the formulation of the model. The issue of identification must be resolved before

11、 estimation can even be considered. The question posed is essentially this: Suppose we had an infinitely large sample—that is, for current purposes, all the information there is to be had about the parameters. Could we uniquely determine the values of from such a sample? As will be clear shor

12、tly, the answer is sometimes no. 注意:希望大家能够熟练地写出不同分布的密度函数,以及对应的似然函数。这是微观计量经济学的基本功。特别是正态分布、Logistic分布。更一般地讲,指数类分布的密度函数。 17.3 Efficient estimation: the Principle of Maximum Likelihood The principle of maximum likelihood provides a means of choosing an asymptotically efficient estimator f

13、or a parameter or a set of parameters. The logic of the technique is easily illustrated in the setting of a discrete distribution. Consider a random sample of the following 10 observations from a Poisson distribution: 5, 0, 1, 1, 0, 3, 2, 3, 4, and 1. The density for each observation is Si

14、nce the observations are independent, their joint density, which is the likelihood for this sample, is . The last result gives the probability of observing this particular sample, assuming that a Poisson distribution with as yet unknown parameter generated the data. What value of would make this

15、 sample most probable? Figure 17.1 plots this function for various values of. It has a single mode at , which would be the maximum likelihood estimate, or MLE, of . Consider maximizing with respect to . Since the log function is monotonically increasing and easier to work with, we usually maximi

16、ze instead; in sampling from a Poisson population, For the assumed sample of observations, and The solution is the same as before. Figure 17.1 also plots the log of to illustrate the result. The reference to the probability of observing the given sample is not exact in a continuous d

17、istribution, since a particular sample has probability zero. Nonetheless, the principle is the same. The values of the parameters that maximize or its log are the maximum likelihood estimates, denoted . Since the logarithm is a monotonic function, the values that maximize are the same as those t

18、hat maximize . The necessary condition for maximizing is . (17-4) This is called the likelihood equation. The general result then is that the MLE is a root of the likelihood equation. The application to the parameters of the dgp for a discrete random variable are suggestive that max

19、imum likelihood is a “good” use of the data. It remains to establish this as a general principle. We turn to that issue in the next section. 17.4 Properties of maximum Likelihood Estimation Maximum likelihood estimators (MLEs) are most attractive because of their large sample or asymptotic proper

20、ties. If certain regularity conditions are met, the MLE will have these properties. The finite sample properties are sometimes less than optimal. For example, the MLE may be biased; the MLE of in Example 17.2 is biased downward. The occasional statement that the properties of the MLE are

21、 only optimal in large samples is not true, however. It can be shown that when sampling is from an exponential family of distributions (see Definition 18.1), there will exist sufficient statistics. If so, MLEs will be functions of them, which means that when minimum variance unbiased estimators

22、exist, they will be MLEs. [See Stuart and Ord (1989).] Most applications in econometrics do not involve exponential families, so the appeal of the MLE remains primarily its asymptotic properties. We use the following notation: is the maximum likelihood estimator; denotes the true value of

23、 the parameter vector; denotes another possible value of the parameter vector, not the MLE and not necessarily the true values. Expectation based on the true values of the parameters is denoted . If we assume that the regularity conditions discussed below are met by , then we have the follow

24、ing theorem. 定理4.2(克拉美-劳下界)(信息数和信息矩阵) 若x的密度函数满足一定的正则条件,参数的一个无偏估计量的方差总是大于等于 证明() 定义4.12(渐进正态和渐进有效) 若成立,则估计量是渐进正态的;若任何其他一致渐进正态分布估计量的协方差阵超出一个非负定阵,则估计量是渐进有效的。 对于大多数的估计问题,渐进正态和渐进有效通常是选择估计量的准则。 渐进期望 一个随机变量的渐进期望和渐进方差是指渐近分布的期望和方差。于是,遵从极限分布为的估计量的渐进期望是、渐进方差是。这意味着,估计量是渐进无偏的。 一致性和

25、渐进无偏性的关系(三种可能的定义): (1)极限分布的均值为0; (2) (3) 这些定义的意义是什么? 渐进方差(常用的定义之一) ML估计的特性 由于其大样本特性或渐进特性,ML估计量具有很大的吸引力,当在满足正则条件的基础上,有: 定理4.18(极大似然估计的性质) 若似然函数满足正则条件,极大似然估计量有下列渐进性质: M1、一致性: M2、渐进正态:, M3、渐进有效:是渐进有效的,且达到一致估计量的克拉美-劳下界: M4、不变性:若是的ML估计,是连续函数,则的ML估计是。

26、 对这些性质的理解。 这些渐进特性说明了ML在计量经济学中盛行的原因: 第一个是说明估计量的极限分布; 第二个是大大地促进了假设检验和区间估计的构造; 第三个是一个特别强有力的结果,MLE具有一个一致估计量所能达到的最小方差; 第四个是为构造函数估计提供方便: 两层含义: 1.若对一组参数已经得到估计,并要求他们的一个函数的估计时,则不需要重新估计模型; 2.不变性原理暗示我们可以按我们自己喜欢的方式自由地对似然函数(re-parameterize)(重参数化),以达到简化估计的目的。 不过,这些都是渐进特性,有限样本特性通常是未知的,当我们已知这些有限样本特性时,有时会

27、发现MLE在小样本情况下并不是最佳估计量。 为了证明上述的性质,我们需要一些关于概率密度函数有用的性质,在这些有用性质的支撑下,进行上述性质的证明。 17.4.1 Regularity Conditions 首先是正则条件,然后是有用的性质。 To sketch proofs of these results, we first obtain some useful properties of probability density functions. We assume that is a random sample from the population with dens

28、ity function and that the following regularity conditions hold. [Our statement of these is informal. A more rigorous treatment may be found in Stuart and Ord (1989) or Davidson and MacKinnon (1993).] 设是来自密度函数为的单元(多元)总体,密度函数遵从下列正则条件: R1. 对几乎所有的和所有的,关于的前三阶导数是有限的。(这样就确保了某些Taylor级数近似的存在和导数

29、的有限方差); R2. 满足获得一阶二阶导数期望所需的条件; R3. 对于所有的取值,小于一个具有有限期望的函数(这点使我们能够对Taylor级数进行舍去项数)。 关于正则条件的理解: What they are n 1. has three continuous derivatives wrt parameters n 2. Conditions needed to obtain expectations of derivatives are met (E.g., range of the variable is not a function of the param

30、eters) . n 3. Third derivative has finite expectation. What they mean n Moment conditions and convergence. We need to obtain expectations of derivatives. n We need to be able to truncate Taylor series. n We will use central limit theorems With these regularity conditions, we will obtain the

31、 following fundamental characteristics of : D1 is simply a consequence of the definition of the likelihood function. D2 leads to the moment condition which defines the maximum likelihood estimator. On the one hand, the MLE is found as the maximizer of a function, which mandates finding the vecto

32、r which equates the gradient to zero. On the other, D2 is a more fundamental relationship which places the MLE in the class of generalized method of moments estimators. D3 produces what is known as the Information matrix equality. This relationship shows how to obtain the asymptotic covariance mat

33、rix of the MLE. 在这些正则条件,我们有下列关于的基本性质: D1.,和 ()是随机变量的全部随机样本;(这一性质可从我们关于随机抽样的假设中得到。) D2. (前提条件:,, 其中:分别是随机变量变化的上限和下限) D3. 语言的描述:(弄清楚含义) D2. 一阶导数的期望为零; D3. 二阶导数矩阵期望的负值等于一阶导数的方差。 证明: 首先,考虑的定义域(范围)与参数有关,对于每个,有。依据定义,有多重积分 。 对上式关于求导。 依据莱不利滋(Leibnitz)理论,有 如果上式中第二、三项为零我们则可以对第一项的微

34、分和积分顺序进行调整。问题是第二、三项为零的充要条件是什么?必要条件是在积分端点密度函数为零;而充分条件则是观测到的随机变量的范围与参数无关,这就意味着,而这点正是正则条件R2。 If the second and third terms go to zero, then we may interchange the operations of differentiation and integration. The necessary condition is that . (Note that the uniform distribution suggested above violat

35、es this condition.) Sufficient conditions are that the range of the observed random variable,, does not depend on the parameters, which means that or that the density is zero at the terminal points. This condition, then, is regularity condition R2. The latter is usually assumed, and we will assume

36、it in what follows. So, (D2得证)。 由于微分积分顺序可以交换,对再度关于微分,有 但是,, 同时,和的积分等于积分的和。因此, 上式表明,左边是二阶导数矩阵期望的负值,右边是一阶导数平方的期望。根据D2(一阶导数的期望为零),右边的意义是一阶导数的方差,因此有,二阶导数矩阵期望的负值等于一阶导数的方差。D3得证, 即有: 17.4.3 The l

37、ikelihood Equation ( ML估计量渐进性质的衍生) 设对数似然函数为 则: (17-9) 和 遵从D1和D2, (17-10) which is the likelihood equation mentioned earlier. 17.4.4 The Information Matrix Equality 考虑,且依据D1(随机抽样性质),下标不等的项被剔除,得 , 因此,有 (17-11) This very useful result is known as th

38、e information matrix equality. 有了这些准备工作,我们就可以对M1,M2,M3和M4进行证明了。 证明:(细节比较多,需要花费一些时间。)(pp477-480) 理解: We will sketch formal proofs of these results: The log-likelihood function, again The likelihood equation and the information matrix. A linear Taylor series approximation to the first order

39、 conditions: ( under regularity, higher order terms will vanish in large samples.) Our usual approach. Large sample behavior of the left and right hand sides is the same. A Proof of consistency. (Property 1) The limiting variance of . We are using the central limit theorem here. Leads to

40、 asymptotic normality (Property 2). We will derive the asymptotic variance of the MLE. Efficiency (we have not developed the tools to prove this.) The Cramer-Rao lower bound for efficient estimation (an asymptotic version of Gauss-Markov). Estimating the variance of the maximum likelihood estima

41、tor. Invariance. (A VERY handy result.) Coupled with the Slutsky theorem and the delta method, the invariance property makes estimation of nonlinear functions of parameters very easy. Deriving the Properties of the Maximum Likelihood Estimator 一个例子(多元正态分布的信息矩阵)例4.21。 ML估计量渐进方差的估计 BHHH估计量 用例子说明具体求法 一个ML估计量的方差估计量

移动网页_全站_页脚广告1

关于我们      便捷服务       自信AI       AI导航        抽奖活动

©2010-2025 宁波自信网络信息技术有限公司  版权所有

客服电话:4009-655-100  投诉/维权电话:18658249818

gongan.png浙公网安备33021202000488号   

icp.png浙ICP备2021020529号-1  |  浙B2-20240490  

关注我们 :微信公众号    抖音    微博    LOFTER 

客服