资源描述
单击此处编辑母版标题样式,单击此处编辑母版文本样式,第二级,第三级,第四级,第五级,*,*,*,中级计量经济学,INTERMEDIATE ECONOMETRICS,多元回归虚拟变量之二,Chapter Outline,Describing Qualitative Information,A Single Dummy Independent Variable,Using Dummy Variable for Multiple Categories,Interactions Involving Dummy Variables,A Binary Dependent Variable:The Linear Probability Model,5/26/2025,2,Interactions Among Dummy Variables,Interactions of dummy variables with other dummy variables and/or other explanatory variables makes the regression analysis much more powerful.,We start with the interaction of a dummy with another dummy.,Interactions of dummies allows another way to observe and test group difference among different groups.,5/26/2025,3,Interactions Among Dummy Variables,Consider the wage model with female-married interaction with the base group still be single men:,This model also allows us to obtain the estimated wage differential among all four groups,but here we must be careful to plug in the correct combination of zeros and ones.,Comment:no real advantage over using multiple dummy variables.Just another possible way.,5/26/2025,4,Interacting Dummies with Other Explanatory Variables,For what?To allow differences in slopes.,Now suppose that we wish to test whether the return to education is the same for men and women,allowing for a constant wage differential between men and women.,We need a model that allows for a constant wage differential as well as different returns to education.,5/26/2025,5,Interacting Dummies with Other Explanatory Variables,To be more specific,we wish to model to be,log(wage)=,0,+,1,educ+u,for male,log(wage)=,0,+,1,educ+u,for female,where,0,0,as we have already found evidence that there exists wage differential between male and female.,Let,0,=,0,+,0,1,=,1,+,1,then we can combine the above two equations using a dummy,female,:,log(wage)=,0,+,0,female,+(,1,+,1,female,),educ+u,rearranging,log(wage)=,0,+,0,female,+,1,educ+,1,female,educ+u,where we observe the dummy,and the interaction of the dummy with other explanatory variables.,5/26/2025,6,Interacting Dummies with Other Explanatory Variables,The equation,log(wage)=,0,+,0,female,+,1,educ+,1,female,educ+u,shows that,0,measures t,he,difference in intercepts between women and men,and,1,measures the difference in the,return to education between women and men.,After we estimate the above equation using OLS,we can perform hypothesis about return to education and wage differential.,Holding other factors unchanged,we can use figures to show the impacts of different combinations of,0,and,1,on log(wage).,5/26/2025,7,(a)women earn less than men at all levels of education,and the gap increases as,educ,gets,larger.(b,)women earn less than men at low levels of education,but the gap narrows as education increases.,5/26/2025,8,Hypothesis testing,H,0,:the return to education is the same for women and men.,This amounts to test H,0,:,1,=,0.That is,the wage differential should be the same for all levels of education level.,H,0,:Average wages are identical for men and women who have the same levels of education.,This is to test H,0,:,0,=,0,1,=,0.F test can be used in this case.,5/26/2025,9,Example:Log hourly wage equation,The wage equation is estimated with tenure:,Interpretation:The estimated return to education for men in this equation is.082,or 8.2%.For women,it is.082-.0056=.0764,or about 7.6%.The difference,.56%,or just over one-half a percentage point less for women,is not economically large nor statistically significant.No evidence against the hypothesis that the return to education is the same for men and women,.,5/26/2025,10,Example:Log hourly wage equation,Caution:interacting dummy with other explanatory variables can cause,multicollinearity,problem.,Evidence 1:The standard error the dummy female is 0.036 without interaction term,it is 0.168 with it.5 times larger.,Evidence 2:the correlation matrix directly measures the degree of correlation:,5/26/2025,11,Example:Log hourly wage equation,What to do?,Notice the wage differential between women and men is estimated when,educ,=0.Not interesting.,More interesting would be to estimate the gender differential at,say,the average education level in the sample(about 12.5).,To do this,we would replace,female,educ,with,female,(,educ,-,12.5)and rerun the regression;this only changes the coefficient on,female,and its standard error.,5/26/2025,12,Example:Effects of race on baseball player salaries,The following equation is estimated for the 330 major league baseball players for which city racial composition statistics are available.,black,and,hispan,:,binary indicators for the individual players.(The base group is white players.),percblck,:,percentage of the teams city that is black,perchisp,is the percentage of Hispanics.,Other variables measure aspects of player productivity and longevity.,The interactions,blackpercblck,and,hispanperchisp,are added in to observe race effects.,5/26/2025,13,Example:Effects of race on baseball player salaries,5/26/2025,14,Example:Effects of race on baseball player salaries,We focus on the race variables.The F test shows all four variables are jointly statistically significant at 5%sig.level.,How to interpret?For example,consider what happens for black players,holding,perchisp,fixed.,The coefficient.198 on,black,literally means that,if a black player is in a city with no blacks(,percblck,=,0),then the black player earns about 19.8%less than a comparable white player.,As,percblck,increase while holding,perchisp,is held fixed,the salary of blacks increases relative to that for whites.,When,percblck,=,20,blacks earn about 5.2%more than whites.The largest percentage of blacks in a city is about 74%(Detroit).,Conclusion:cannot simply claim discrimination,city composition matters.,5/26/2025,15,Testing for Differences in Regression FunctionsAcross Groups,Sometimes we wish to test the null hypothesis that two populations or groups follow the same regression function.,For example,we want to test whether the same regression model describes college GPA for male and female college athletes.If the regression function for male is,cumgpa,=,0,+,1,sat,+,2,hsperc+,3,tothrs+u,and that for female is,cumgpa,=(,0,+,0,)+(,1,+,1,),sat,+(,2,+,2,)hsperc+(,3,+,3,),tothrs+u,The null hypothesis is H,0,:,0,=,0,1,=,0,2,=,0,3,=,0.,This hypothesis can be tested by combing the above two equations into one,5/26/2025,16,Testing for Differences in Regression FunctionsAcross Groups,A F test can be performed after the regression,The F test statistic calculated using both restricted and unrestricted model is 8.14,rejects the null at 5%significance level.,Again,need to be careful when interpreting parameters for female.Reasonable to compare gender differential at mean values of other explanatory variables,.,5/26/2025,17,The General Case:The Chow Test,In the general model with,k,explanatory variables and an intercept,suppose we have two groups,call them,g=,1 and,g=,2.We would like to test whether the intercept and all slopes are the same across the two groups.Write the model as,The hypothesis that each beta in the above equation is the same across the two groups involves,k+,1 restrictions.,The unrestricted model is the one that has a group dummy variable and,k,interaction terms in addition to the intercept and variables themselves,has,n-,2(,k+,1)degrees of freedom.,5/26/2025,18,The General Case:The Chow Test,Let SSR,1,and SSR,2,be the sum of squared residuals obtained estimating the above equation for the first and the second group,respectively.,Key insight:the sum of squared residuals for the unrestricted model is simply,SSR,ur,=SSR,1,+SSR,2,.,The restricted sum of squared residuals is just the SSR from pooling the groups and,stimating,a single equation.,Once we have these,we compute the,F,statistic,which is called the Chow Statistic in econometrics.,5/26/2025,19,The General Case:The Chow Test,The Chow Statistic:,The DOF for the numerator is calculated from,n,-(,k,+1)-,n,1,-(,k,+1)+,n,2,-(,k,+1)=,k,+1,using,n,1,+,n,2,=,n,.,In the GPA example,k,=3,n,1,=90,n,2,=276,SSR=85.515,SSR,1,=19.603,SSR,2,=58.752,SSR,ur,=19.603+58.752=78.355,n,=366.,F=(85.515-78.355)/78.355*(358/4)8.18.,5/26/2025,20,The General Case:The Chow Test,One important limitation of the Chow test is that the null hypothesis allows for no differences at all between the groups.,In many cases,it is more interesting to allow for an intercept difference between the groups and then to test for slope differences.,To do this,we must use the approach of putting interactions directly in the equation and testing joint significance of all interactions.,5/26/2025,21,The Linear Probability Model:A Binary Dependent Variable,We have just studied through the use of binary independent variables,how to incorporate qualitative information as explanatory variables in a multiple regression model.,What happens if we want to use multiple regression to,explain,a qualitative event?,Marry or not to marry;work or not to work;have a child or not to have a child,5/26/2025,22,The Linear Probability Model:A Binary Dependent Variable,Now in a regression model,beta,cannot be interpreted as the change in,y,given a one-unit increase in,x,holding all other factors fixed,since,y,either changes from zero to one or from one to zero.,Since under the zero conditional mean,we have,and since,E(y|x,)=,P(y,=1|x),we get the important equation,5/26/2025,23,The Linear Probability Model:A Binary Dependent Variable,We call,Pr(,y,=1|,x,)the response probability.The above model is an example of binary response model.,The multiple linear regression model with a binary dependent variable is called the,linear probability model,(,LPM,)because the response probability is linear in the parameters,j,.,In the LPM,j,measures the change in the probability of success when,x,j,changes,holding other factors fixed:,5/26/2025,24,Example:Female Labor Force Participation,Let,inlf,(“in the labor force”)be a binary variable indicating labor force participation by a married woman during 1975:,inlf,=,1 if the woman reports working for a wage outside the home at some point during the year,and 0 otherwise.,5/26/2025,25,Interpreting the estimated coefficients,We must remember that a change in the independent variable changes the probability that,inlf,=,1.,For example,the coefficient on,educ,means that,everything else held fixed,another year of education increases the probability of labor force participation by.038.(figure next page),The coefficient on,nwifeinc,implies that,if,nwifeinc,=,10(which means an increase of$10,000),the probability that a woman is in the labor force falls by.034.,Experience:Holding other factors fixed,the estimated change in the probability is approximated as.039-2(.0006),exper=,.039-.0012,exper,.,Having one additional child less than six years old reduces the probability of participation by.262,at given levels of the other variables.,5/26/2025,26,5/26/2025,27,Problems with Linear Probability Model,First,we can get predicted probabilities either less than zero or greater than one.,Second,sometimes it is unreasonable to assume that a probability cannot be linearly related to the independent variables for all their possible values.,The above example indicates that predicts that the effect of going from zero children to one young child reduces the probability of working by.262,and this is also the predicted drop if the woman goes from have one young child to two.What about 5 to 6?Unreasonable.,5/26/2025,28,What to do?,One way is to use more complex models,like the,logit,model,or the,probit,model.,The other way is to live with this problem,enjoy its simplicity and powerfulness when the above two problems are not central to our analysis.,5/26/2025,29,Caution:,Heteroskedasticity,is Under Way,When,y,is a binary variable,its variance,conditional on,x,is,where,p,(,x,)is shorthand for the probability of success.,HSK does not cause bias in OLS estimators,but the t and F statistics needs the HMK assumption,hence the standard error in LPM need to be interpreted with caution.,We shall learn how to correct for HSK in the coming chapter.,Good news:in many applications,the usual OLS statistics are not far off,and it is still acceptable in applied work to present a standard OLS analysis of a linear probability model.,5/26/2025,30,Example:A Linear Probability Model of Arrests,Let,arr86,be a binary variable equal to unity if a man was arrested during 1986,and zero otherwise.The population is a group of young men in California born in 1960 or 1961 who have at least one arrest prior to 1986.,P,cnv:,the,proportion of prior arrests that led to a conviction,avgsen:,the,average sentence served in the past(in months),tottime,:months spent in prison since age 18 prior to 1986,ptime86:,months spent in prison in 1986,qemp86,:the number of quarters(0 to 4)that the man was legally employed in 1986.,5/26/2025,31,Example:A Linear Probability Model of Arrests,The estimated equation is,Interpretations:,Increasing,pcnv,by.5 decreases the probability of arrest by.081.,Since,ptime86,is measured in months,6 more months in prison reduces the probability of arrest by.022(6)=.132.But if,ptime86=,12,the probability of arrests should be zero.Another example that the probability cannot be linear in all range of the control variables.,Employment reduces the probability of arrest in a significant way.All other factors fixed,a man employed in all four quarters is.172 less likely to be arrested than a man who was not employed at all.,5/26/2025,32,Including Dummies in LPM,We can also include dummy independent variables in models with dummy dependent variables.,The coefficient measures the predicted difference in probability when the dummy variable goes from zero to one.,For example,two race dummies are added in:,5/26/2025,33,Including Dummies in LPM,Interpretation:,The coefficient on black:the probability of arrest is 17 percentage points higher for blacks than for whites.The difference is statistically significant as well.,Similarly,Hispanic men have a.096 higher chance of being arrested than white men.,5/26/2025,34,
展开阅读全文