收藏 分销(赏)

计量经济学工具变量IV-(2SLS).ppt

上传人:1587****927 文档编号:1457606 上传时间:2024-04-27 格式:PPT 页数:81 大小:1.32MB 下载积分:16 金币
下载 相关 举报
计量经济学工具变量IV-(2SLS).ppt_第1页
第1页 / 共81页
计量经济学工具变量IV-(2SLS).ppt_第2页
第2页 / 共81页


点击查看更多>>
资源描述
Week14InstrumentVariableRegressionModelsSimultaneous Equation Using 2SLS(Chapter 16),IV Estimation in Multiple Regression models(15.1-3)计量经济学(研究生)计量经济学(研究生)ANewApproachtotheOmittedVariableProblemnWe have talked about the problem of omitted variable bias(in Ch.3),and have shown that it will lead to inconsistency,for nIf we have a suitable proxy,we can minimize the bias,to some degree.(see Chapter 9)nFurthermore,if the omitted variable is time invariant,then we can use a panel data model without much hesitation.nWithout a suitable proxy,no panel data,or if the omitted variable does change with time we need a new approachInstrumentalVariablesRegressionnThree important threats to internal validity are:1.omitted variable bias from a variable that is correlated with X but is unobserved,so cannot be included in the regression;(遗留变量偏差)2.simultaneous causality bias(X causes Y,Y causes X);(联立因果)3.errors-in-variables bias(X is measured with error)(变量误差)nInstrumental variables regression can eliminate bias from these three sources.Terminology:endogeneityandexogeneitynAn endogenous variable is one that is correlated with u.nAn exogenous variable is one that is uncorrelated with u.nHistorical note:“Endogenous”literally means“determined within the system,”that is,a variable that is jointly determined with y.nIn other words,it is a variable subject to simultaneous causality.nHowever,this definition is narrow and IV regression can be used to address OV bias and errors-in-variable bias,not just to simultaneous causality bias.What is Simultaneous CausalitynSuppose we have two endogenous variables Y1,Y2 and two exogenous variables X1,X2 such that Y1i=0+1X1i+2Y2i+u1i(1)Y2i=0+1Y1i+2X2i+u2i(2)nLets see why Y2(or Y1)is endogenousnSuppose u1i 0 and u2i=0,then we have Y1i E(Y1i)from(1)nBut in(2),if 20,this will cause a change in Y2i,so Y2i is correlated with u1i through(2)nThe same is true for Y1i and u2i in(2)through(1)Simultaneous BiasCan we estimate these two equations consistently?y1=a1y2+1z1+u1 y2=a2y1+2z2+u2For consistency,we need cov(y2,u1)=0,and cov(y1,u2)=0However,a large u2 means a larger y2,which implies a larger y1(if a10),so cov(y1,u2)0The same is true for cov(y2,u1)due to the circular effect of u1TheIVEstimatorwithaSingleRegressorandaSingleInstrumentyi=0+1xi+uinLoosely,IV regression breaks x into two parts:a part that might be correlated with u,and a part that is not.nBy isolating the part that is not correlated with u,it is possible to estimate 1.nThis is done using an instrumental variable,zi,which is uncorrelated with ui.nThe instrumental variable detects movements in xi that are uncorrelated with ui,and use these to estimate 1.Twoconditionsforavalidinstrumentyi=0+1xi+uinFor an instrumental variable(an“instrument”)z to be valid,it must satisfy two conditions:1.Instrument relevance:cov(zi,xi)02.Instrument exogeneity:cov(zi,ui)=0nIn other words,IV variable zi must be an exogenous variable that is correlated with x nOr,zi s effect on y is only through xnWhich condition can we test?A)1 B)2 C)BothD)Neither E)Dont knownWe can test the 1st but have to assume the 2ndExample:Labor EconomicsSuppose log(wage)=0+1educ+u,u=2abil+vnWhen abil is unobserved,how can we estimate 1 consistently if cov(educ,abil)0?nIf we have a proxy for abil,such as IQ and substitute it into our model,then we are finenOtherwise,we need something that is correlated with educ but not with abilnParents education,or number of siblings might be an instrument for educ nSuppose we have:yi=0+1xi+uicov(x,ui)0nOur estimate of 1 will be inconsistent nEither we find the omitted variable in ui and add it into our model to overcome the inconsistencynOr we find an instrument zi for the included variablenSuppose for now that you have such a zi(well discuss how to find instrumental variables later)nHow can you use zi to estimate 1?nWe will explain this in two waysInstrument Variable RegressionTheIVEstimator,onexandonezExplanation#1:Two Stage Least Squares(TSLS)nAs it sounds,TSLS has two stages two regressions:(1)First isolates the part of x that is uncorrelated with u:regress x on z using OLSxi=0+1zi+vi(1)nBecause zi is uncorrelated with ui,0+1zi is uncorrelated with ui.nWe dont know 0 or 1 but we have estimated them,sonCompute the predicted values of xi,xi,where xi=0+1 zi,i=1,n.(2)Replace xi by xi in the regression of interest:regress y on xi using OLS:yi=0+1 xi+ui(2)nBecause xi is uncorrelated with ui in large samples,so the first least squares assumption holdsnThus 1 can be estimated by OLS using regression(2)nThis argument relies on large samples(so 0 and 1 are well estimated using regression(1)nThis the resulting estimator is called the“Two Stage Least Squares”(TSLS)estimator,.TheIVEstimator,onexandonez,ctd.nExplanation#2:(only)a little algebrayi=0+1xi+uiButxi=0+1zi+vinThus,cov(yi,zi)=cov(0+1xi+ui,zi)=cov(0,zi)+cov(1xi,zi)+cov(ui,zi)=0 +cov(1xi,zi)+0=1cov(xi,zi)nwhere cov(ui,zi)=0(instrument exogeneity);thus1=in large samplesnThe instrument relevance condition,cov(x,z)0,ensures that you dont divide by zero.Supply and Demand Examplen Start with an equation youd like to estimate,say a supply function in a market.qs=a1p+b1z+u1,where p is the price and z is a supply shifter.n Call this a structural equation its derived from economic theory and has a causal interpretation where p directly affects qs.Example(cont)nProblem that cant just regress observed quantity on price,since observed quantity are determined by the equilibrium of supply and demandnConsider a second structural equation,in this case the demand function qd=a2p+u2nSo quantity are determined by a SEMExample(cont)nBoth q and p are endogenous because they are both determined by the equilibrium of supply and demandnz is exogenous,and its the availability of this exogenous supply shifter that allows us to identify the structural demand equationnWith no observed demand shifters,supply is not identified and cannot be estimatedIdentification of Demand EquationpqDS(z=z1)S(z=z2)S(z=z3)Using IV to Estimate DemandnGiven qs=a1p+b1z+u1,qd=a2p+u2nSo,we can estimate the structural demand equation,using z as an instrument for pn First stage equation is p=0+1z+v2n Second stage equation is q=a2p+u2n Thus,2SLS provides a consistent estimator of a2,the slope of the demand curven We cannot estimate a1,the slope of the supply curveThe General SEMnSuppose our structural equations are:y1=a1y2+1z1+u1y2=a2y1+2z2+u2nThus,y2=a2(a1y2+1z1+u1)+2z2+u2nSo,(1 a2a1)y2=a2 1z1+2z2+a2 u1+u2,which can be rewritten(if a2a1 1)as y2=1z1+2z2+v2 v2=(a2u1+u2)/(1a2a1)nThis is the so called“reduced”formnHowever,in the reduced form,we dont know what is the value of a1 or a2Example#1:SupplyanddemandforbutternIV regression was originally developed to estimate demand elasticities for agricultural goods,for example butter:nlog(Qbutter)=0+1 log(Pbutter)+uin1=price elasticity of butter=percent change in quantity for a 1%change in price(recall log-log specification discussion)nData:observations on price and quantity of butter for different yearsnThe OLS regression of log(Qbutter)on log(Pbutter)suffers from simultaneous causality bias(why?)Simultaneous causality bias in the OLS regression of log(Qbutter)on log(Pbutter)arises because price and quantity are determined by the interaction of demand and supplyAsidenote:Whatistherelationshipbetween,sayMarxianconceptoflabor theory of valueandtheMicroeconomicstheoryofprice formation?Whatisthelong-termsupplycurveanditsdetermination?A Quick Note on Marxian EconomicsnAtQ1,theproductionislessthensociallynecessary,andiscausingashortagenThecompetitionwilldrivethepriceaboveitvalue,untilmoreproducersentersthemarketormoreproductisbeingproducednThisleadstoanincreaseinthelevelofoutput,allthewaytoQ*.nAtQ2,theproductionismorethensociallynecessary,andiscausingasurplus.nThecompetitionwilldrivethepricebelowitvalue,untilsomeproducersleavesthemarketorlessproductisbeingproducednThisleadstoadropinthelevelofoutput,allthewaytoQ*.SListhelong-termsupplycurvethatisconsistentwiththeMarxianconceptofsociallynecessarylabortimeIsittruethatmainstreameconomichasnotheorytoexplainwhyitisatSLratherthensomeotherlevel?Back to our supply and demand for butterThis interaction of demand and supply producesnWould a regression using these data produce the demand curve?nA)Demand B)Supply C)NeitherWhat would you get if only supply shifted?nTSLS estimates the demand curve by isolating shifts in price and quantity that arise from shifts in supply.nZ is a variable that shifts supply but not demand.nTSLS in the supply-demand example:log(Qbutter)=0+1log(Pbutter)+uinLet Z=rainfall in dairy-producing regions.nIs Z a valid instrument?Lets check 2 conditions(1)Exogenous?corr(raini,ui)=0?nA)Yes B)No C)In sufficient informationnPlausibly:whether it rains in dairy-producing regions shouldnt affect demand(2)Relevant?corr(raini,log(Pbutter)0?nA)Yes B)No C)In sufficient informationnPlausibly:insufficient rainfall means less grazing means less butterlog(Qbutter)=0+1log(Pbutter)+uiZ=raini=rainfall in dairy-producing regions.nStage 1:regress log(Pbutter)on rain,get log(Pbutter)log(Pbutter)isolates changes in log price that arise from supply(part of supply,at least)nStage 2:regress log(Qbutter)on log(Pbutter)The regression counterpart of using shifts in the supply curve to trace out the demand curve.TSLSinthesupply-demandexample,ctd.TSLS(2 stage lest squares)in EViews:Everything the same as in OLS except:nIn“Estimation Methods”,select“TSLS Two-stage lest squares(TSNLS and ARMA)”.nProvide a list of instrument variables,be sure to include all exogenous variables as well.nOnly the variables on the right hand side not in the list of instruments are considered endogenous.nIn Options,select“Heteroskedasiticity consistent coefficient covariance”.Example 15.5 using 2SLSDependentVariable:LOG(WAGE)Method:Two-StageLeastSquaresSample:1753IFINLFIncludedobservations:428Instrumentlist:EXPEREXPERSQFATHEDUCMOTHEDUCVariableCoefficientStd.Errort-StatisticProb.EDUC0.0613970.0314371.953024 0.0515EXPER0.0441700.0134323.288329 0.0011EXPERSQ-0.0008990.000402-2.2379930.0257C0.0481000.4003280.120152 0.9044R-squared0.135708 Meandependentvar1.190173AdjustedR-squared0.129593 S.D.dependentvar0.723198S.E.ofregression0.674712 Sumsquaredresid193.0200F-statistic8.140709 Durbin-Watsonstat1.945659Prob(F-statistic)0.000028Note:RedareinstrumentsBlueareexogenousGreenisendogenous Example:DemandforCigarettesnHow much will a hypothetical cigarette tax reduce cigarette consumption?nTo answer this,we need the elasticity of demand for cigarettes,that is,1,in the regression,nlog(Qcigarettes)=0+1log(Pcigarettes)+uinWill the OLS estimator plausibly be unbiased?nWhy or why not?Example:Cigarettedemand,ctd.nlog(Qcigarettes)=0+1log(Pcigarettes)+uinPanel data:Annual cigarette consumption and average prices paid(including tax)48 continental US states,1985-1995nProposed instrumental variable:Zi=general sales tax per pack in the state=GSTaxiIs this a valid instrument?(1)Relevant?corr(GSTaxi,log(Pcigarettes)0?(2)Exogenous?corr(GSTaxi,ui)=0?Example:Cigarettedemand,twoinstrumentsDependentVariable:LOG(PACKPC)Method:Two-StageLeastSquaresSample:1528IFYEAR=1995Includedobservations:48WhiteHeteroskedasticity-ConsistentStandardErrors&CovarianceInstrumentlist:LOG(INCOME/POP)(TAX-TAXS)/CPITAXS/CPIVariableCoefficientStd.Errort-StatisticProb.LOG(INCOME/POP)0.2804050.2538901.104436 0.2753LOG(AVGPRS/CPI)-1.2774240.249610-5.1176800.0000C9.7768100.96176310.16551 0.0000R-squared0.429422 Meandependentvar4.538837AdjustedR-squared0.404063 S.D.dependentvar0.243346S.E.ofregression0.187856 Sumsquaredresid1.588044F-statistic13.28079 Durbin-Watsonstat1.946351Prob(F-statistic)0.000029IdentificationThegeneralIVregressionmodel,ctd.Y1=0+1Y2+kYk+1+k+1Z1+k+rZr+uWe need to introduce some new concepts and to extend some old concepts to the general IV regression model:Terminology:identification and overidentificationTSLS with included exogenous variablesone endogenous regressormultiple endogenous regressorsAssumptions that underlie the normal sampling distribution of TSLSInstrument validity(relevance and exogeneity)General IV regression assumptionsIdentification,ctd.nThe coefficients 1,k are said to be:exactly identified if m=k.nThere are just enough instruments to estimate 1,k.overidentified if m k.nThere are more than enough instruments to estimate 1,k.If so,you can test whether the instruments are valid(a test of the“overidentifying restrictions”)well return to this later underidentified if m k.nThere are too few enough instruments to estimate 1,k.If so,you need to get more instruments!IdentificationnIn general,a parameter is said to be identified if different values of the parameter would produce different distributions of the data.nIn IV regression,whether the coefficients are identified depends on the relation between the number of instruments(m)and the number of endogenous regressors(k)nIntuitively,if there are fewer instruments than endogenous regressors,we cant estimate 1,knFor example,suppose k=1 but m=0(no instruments)!Identification of General SEMnOnce again,our structural equations are:y1=a1y2+1z1+u1y2=a2y1+2z2+u2nLet z1 be all the exogenous variables in the first equation,and z2 be all the exogenous variables in the second equationnIts okay for there to be overlap in z1 and z2nHow are we able to identify which equation is which?nWe need to state the rank conditionIdentification of General SEMnGiven our two equations:y1=a1y2+1z1+u1y2=a2y1+2z2+u2nTo identify equation 1,there must be some variables(at least 1)in z2 that are not in z1nTo identify equation 2,there must be some variables(at least 1)in z1 that are not in z2n We refer to this as the rank conditionnWe are able to identify the two equations if the rank condition is satisfiedExample:Labor MarketnSuppose the structural equations for the labor market are:1.hours=a1log(wage)+10+11educ+12age+13kidslt6+14nwifeinc+15exper+16exper2+u12.log(wage)=a2hours+20+21educ+22age+23kidslt6+24nwifeinc+25exper+26exper2+u2nCan we identify which is the supply/demand equation for labor?nNo!nThat is the reason for the rank conditionExample:Labor MarketnSuppose the structural equations for the labor market instead are as follows:1.hours=a1log(wage)+10+11educ+12age+13kidslt6+14nwifeinc+u12.log(wage)=a2hours+20+21educ+22exper+23exper2+u2nWhich is the supply/demand equation for labor?n1.is supply and 2.is demand equations for labor,for age,kidslt6 and nwifeinc affects supply but not demand for labor,while experience affects demand but not supply of labor.Order ConditionnNote that the exogenous variable excluded from the first equation must have a non-zero coefficient in the second equation for the rank condition to holdnOrder condition states that there must be at least as many exogenous variables excluded in the first equation as there are endogenous variables in the first equation(see page 529)n Note that the order condition clearly holds if the rank condition does there will be an exogenous variable for the endogenous onenRank co
展开阅读全文

开通  VIP会员、SVIP会员  优惠大
下载10份以上建议开通VIP会员
下载20份以上建议开通SVIP会员


开通VIP      成为共赢上传

当前位置:首页 > 教育专区 > 其他

移动网页_全站_页脚广告1

关于我们      便捷服务       自信AI       AI导航        抽奖活动

©2010-2026 宁波自信网络信息技术有限公司  版权所有

客服电话:0574-28810668  投诉电话:18658249818

gongan.png浙公网安备33021202000488号   

icp.png浙ICP备2021020529号-1  |  浙B2-20240490  

关注我们 :微信公众号    抖音    微博    LOFTER 

客服