第六讲-极大似然估计.doc-资源下载-咨信网-让知识获取变得高效, 在线文档学习和咨询诚信服务

第六讲-极大似然估计.doc

1、第六讲极大似然估计 The Likelihood Function and Identification of the Parameters (极大似然函数及参数识别) 1、似然函数的表示在具有n个观察值的随机样本中，每个观察值的密度函数为。由于n个随机观察值是独立的，其联合密度函数为函数被称为似然函数，通常记为，或者。与Greene书中定义的区别 The probability density function, or pdf for a random variable y, conditioned on a set

2、 of parameters, , is denoted . This function identifies the data generating process that underlies an observed sample of data and, at the same time, provides a mathematical description of the data that the process will produce. The joint density of n independent and identically distributed (iid) obs

3、ervations from this process is the product of the individual densities; (17-1) This joint density is the likelihood function, defined as a function of the unknown parameter vector, , where is used to indicate the collection of sample data. Note that we write the joint density as a function o

4、f the data conditioned on the parameters whereas when we form the likelihood function, we write this function in reverse, as a function of the parameters, conditioned on the data. Though the two functions are the same, it is to be emphasized that the likelihood function is written in this fashion

5、to highlight our interest in the parameters and the information about them that is contained in the observed data. However, it is understood that the likelihood function is not meant to represent a probability density for the parameters as it is in Section 16.2.2. In this classical estimation fram

6、ework, the parameters are assumed to be fixed constants which we hope to learn about from the data. It is usually simpler to work with the log of the likelihood function: . (17-2) Again, to emphasize our interest in the parameters, given the observed data, we denote this function . The likeli

7、hood function and its logarithm, evaluated at , are sometimes denoted simply and , respectively or, where no ambiguity can arise, just or . It will usually be necessary to generalize the concept of the likelihood function to allow the density to depend on other conditioning variables. To jump imm

8、ediately to one of our central applications, suppose the disturbance in the classical linear regression model is normally distributed. Then, conditioned on it’s specific is normally distributed with mean and variance . That means that the observed random variables are not iid; they have different

9、means. Nonetheless, the observations are independent, and as we will examine in closer detail, (17-3) where is the matrix of data with row equal to . 2、识别问题 The rest of this chapter will be concerned with obtaining estimates of the parameters, and in testing hypotheses about them and ab

10、out the data generating process. Before we begin that study, we consider the question of whether estimation of the parameters is possible at all—the question of identification. Identification is an issue related to the formulation of the model. The issue of identification must be resolved before

11、 estimation can even be considered. The question posed is essentially this: Suppose we had an infinitely large sample—that is, for current purposes, all the information there is to be had about the parameters. Could we uniquely determine the values of from such a sample? As will be clear shor

12、tly, the answer is sometimes no. 注意：希望大家能够熟练地写出不同分布的密度函数，以及对应的似然函数。这是微观计量经济学的基本功。特别是正态分布、Logistic分布。更一般地讲，指数类分布的密度函数。 17.3 Efficient estimation: the Principle of Maximum Likelihood The principle of maximum likelihood provides a means of choosing an asymptotically efficient estimator f

13、or a parameter or a set of parameters. The logic of the technique is easily illustrated in the setting of a discrete distribution. Consider a random sample of the following 10 observations from a Poisson distribution: 5, 0, 1, 1, 0, 3, 2, 3, 4, and 1. The density for each observation is Si

14、nce the observations are independent, their joint density, which is the likelihood for this sample, is . The last result gives the probability of observing this particular sample, assuming that a Poisson distribution with as yet unknown parameter generated the data. What value of would make this

15、 sample most probable? Figure 17.1 plots this function for various values of. It has a single mode at , which would be the maximum likelihood estimate, or MLE, of . Consider maximizing with respect to . Since the log function is monotonically increasing and easier to work with, we usually maximi

16、ze instead; in sampling from a Poisson population, For the assumed sample of observations, and The solution is the same as before. Figure 17.1 also plots the log of to illustrate the result. The reference to the probability of observing the given sample is not exact in a continuous d

17、istribution, since a particular sample has probability zero. Nonetheless, the principle is the same. The values of the parameters that maximize or its log are the maximum likelihood estimates, denoted . Since the logarithm is a monotonic function, the values that maximize are the same as those t

18、hat maximize . The necessary condition for maximizing is . (17-4) This is called the likelihood equation. The general result then is that the MLE is a root of the likelihood equation. The application to the parameters of the dgp for a discrete random variable are suggestive that max

19、imum likelihood is a “good” use of the data. It remains to establish this as a general principle. We turn to that issue in the next section. 17.4 Properties of maximum Likelihood Estimation Maximum likelihood estimators (MLEs) are most attractive because of their large sample or asymptotic proper

20、ties. If certain regularity conditions are met, the MLE will have these properties. The finite sample properties are sometimes less than optimal. For example, the MLE may be biased; the MLE of in Example 17.2 is biased downward. The occasional statement that the properties of the MLE are

21、 only optimal in large samples is not true, however. It can be shown that when sampling is from an exponential family of distributions (see Definition 18.1), there will exist sufficient statistics. If so, MLEs will be functions of them, which means that when minimum variance unbiased estimators

22、exist, they will be MLEs. [See Stuart and Ord (1989).] Most applications in econometrics do not involve exponential families, so the appeal of the MLE remains primarily its asymptotic properties. We use the following notation: is the maximum likelihood estimator; denotes the true value of

23、 the parameter vector; denotes another possible value of the parameter vector, not the MLE and not necessarily the true values. Expectation based on the true values of the parameters is denoted . If we assume that the regularity conditions discussed below are met by , then we have the follow

24、ing theorem. 定理4.2（克拉美-劳下界）（信息数和信息矩阵）若x的密度函数满足一定的正则条件，参数的一个无偏估计量的方差总是大于等于证明（）定义4.12（渐进正态和渐进有效）若成立，则估计量是渐进正态的；若任何其他一致渐进正态分布估计量的协方差阵超出一个非负定阵，则估计量是渐进有效的。对于大多数的估计问题，渐进正态和渐进有效通常是选择估计量的准则。渐进期望一个随机变量的渐进期望和渐进方差是指渐近分布的期望和方差。于是，遵从极限分布为的估计量的渐进期望是、渐进方差是。这意味着，估计量是渐进无偏的。一致性和

25、渐进无偏性的关系（三种可能的定义）：（1）极限分布的均值为0；（2）（3）这些定义的意义是什么？渐进方差(常用的定义之一) ML估计的特性由于其大样本特性或渐进特性，ML估计量具有很大的吸引力，当在满足正则条件的基础上，有：定理4.18（极大似然估计的性质）若似然函数满足正则条件，极大似然估计量有下列渐进性质： M1、一致性： M2、渐进正态：， M3、渐进有效：是渐进有效的，且达到一致估计量的克拉美-劳下界： M4、不变性：若是的ML估计，是连续函数，则的ML估计是。

26、对这些性质的理解。这些渐进特性说明了ML在计量经济学中盛行的原因：第一个是说明估计量的极限分布；第二个是大大地促进了假设检验和区间估计的构造；第三个是一个特别强有力的结果，MLE具有一个一致估计量所能达到的最小方差；第四个是为构造函数估计提供方便：两层含义： 1.若对一组参数已经得到估计，并要求他们的一个函数的估计时，则不需要重新估计模型； 2.不变性原理暗示我们可以按我们自己喜欢的方式自由地对似然函数（re-parameterize）(重参数化)，以达到简化估计的目的。不过，这些都是渐进特性，有限样本特性通常是未知的，当我们已知这些有限样本特性时，有时会

27、发现MLE在小样本情况下并不是最佳估计量。为了证明上述的性质，我们需要一些关于概率密度函数有用的性质，在这些有用性质的支撑下，进行上述性质的证明。 17.4.1 Regularity Conditions 首先是正则条件，然后是有用的性质。 To sketch proofs of these results, we first obtain some useful properties of probability density functions. We assume that is a random sample from the population with dens

28、ity function and that the following regularity conditions hold. [Our statement of these is informal. A more rigorous treatment may be found in Stuart and Ord (1989) or Davidson and MacKinnon (1993).] 设是来自密度函数为的单元（多元）总体，密度函数遵从下列正则条件： R1. 对几乎所有的和所有的，关于的前三阶导数是有限的。（这样就确保了某些Taylor级数近似的存在和导数

29、的有限方差）； R2. 满足获得一阶二阶导数期望所需的条件； R3. 对于所有的取值，小于一个具有有限期望的函数（这点使我们能够对Taylor级数进行舍去项数）。关于正则条件的理解： What they are n 1. has three continuous derivatives wrt parameters n 2. Conditions needed to obtain expectations of derivatives are met (E.g., range of the variable is not a function of the param

30、eters) . n 3. Third derivative has finite expectation. What they mean n Moment conditions and convergence. We need to obtain expectations of derivatives. n We need to be able to truncate Taylor series. n We will use central limit theorems With these regularity conditions, we will obtain the

31、 following fundamental characteristics of : D1 is simply a consequence of the definition of the likelihood function. D2 leads to the moment condition which defines the maximum likelihood estimator. On the one hand, the MLE is found as the maximizer of a function, which mandates finding the vecto

32、r which equates the gradient to zero. On the other, D2 is a more fundamental relationship which places the MLE in the class of generalized method of moments estimators. D3 produces what is known as the Information matrix equality. This relationship shows how to obtain the asymptotic covariance mat

33、rix of the MLE. 在这些正则条件，我们有下列关于的基本性质： D1.,和（）是随机变量的全部随机样本；（这一性质可从我们关于随机抽样的假设中得到。） D2. (前提条件：，，其中：分别是随机变量变化的上限和下限) D3. 语言的描述：（弄清楚含义） D2. 一阶导数的期望为零； D3. 二阶导数矩阵期望的负值等于一阶导数的方差。证明：首先，考虑的定义域（范围）与参数有关，对于每个，有。依据定义，有多重积分。对上式关于求导。依据莱不利滋（Leibnitz）理论，有如果上式中第二、三项为零我们则可以对第一项的微

34、分和积分顺序进行调整。问题是第二、三项为零的充要条件是什么？必要条件是在积分端点密度函数为零；而充分条件则是观测到的随机变量的范围与参数无关，这就意味着，而这点正是正则条件R2。 If the second and third terms go to zero, then we may interchange the operations of differentiation and integration. The necessary condition is that . (Note that the uniform distribution suggested above violat

35、es this condition.) Sufficient conditions are that the range of the observed random variable,, does not depend on the parameters, which means that or that the density is zero at the terminal points. This condition, then, is regularity condition R2. The latter is usually assumed, and we will assume

36、it in what follows. So, （D2得证）。由于微分积分顺序可以交换，对再度关于微分，有但是，，同时，和的积分等于积分的和。因此，上式表明，左边是二阶导数矩阵期望的负值，右边是一阶导数平方的期望。根据D2（一阶导数的期望为零），右边的意义是一阶导数的方差，因此有，二阶导数矩阵期望的负值等于一阶导数的方差。D3得证，即有： 17.4.3 The l

37、ikelihood Equation ( ML估计量渐进性质的衍生) 设对数似然函数为则：（17-9）和遵从D1和D2，（17-10） which is the likelihood equation mentioned earlier. 17.4.4 The Information Matrix Equality 考虑，且依据D1（随机抽样性质），下标不等的项被剔除，得，因此，有 (17-11) This very useful result is known as th

38、e information matrix equality. 有了这些准备工作，我们就可以对M1，M2，M3和M4进行证明了。证明：（细节比较多，需要花费一些时间。）（pp477-480）理解： We will sketch formal proofs of these results: The log-likelihood function, again The likelihood equation and the information matrix. A linear Taylor series approximation to the first order

39、 conditions: ( under regularity, higher order terms will vanish in large samples.) Our usual approach. Large sample behavior of the left and right hand sides is the same. A Proof of consistency. (Property 1) The limiting variance of . We are using the central limit theorem here. Leads to

40、 asymptotic normality (Property 2). We will derive the asymptotic variance of the MLE. Efficiency (we have not developed the tools to prove this.) The Cramer-Rao lower bound for efficient estimation (an asymptotic version of Gauss-Markov). Estimating the variance of the maximum likelihood estima

41、tor. Invariance. (A VERY handy result.) Coupled with the Slutsky theorem and the delta method, the invariance property makes estimation of nonlinear functions of parameters very easy. Deriving the Properties of the Maximum Likelihood Estimator 一个例子（多元正态分布的信息矩阵）例4.21。 ML估计量渐进方差的估计 BHHH估计量用例子说明具体求法一个ML估计量的方差估计量

邮箱/手机：
图形码：
验证码：	获取验证码
温馨提示：	支付成功后，系统会自动生成账号（用户名为邮箱或者手机号，密码是验证码），方便下次登录下载和查询订单；
特别说明：	请自助下载，系统不会自动发送文件的哦；如果您已付费，想二次下载，请登录后访问：我的下载记录
支付方式：
验证码：	换一换

账号：
密码：
验证码：	换一换
当日自动登录忘记密码？