1、Click to edit Master title style,Click to edit Master text styles,Second level,Third level,Fourth level,Fifth level,12-,*,主教材,何晓群,应用回归分析,中国人民大学出版社,,2015,年 第,4,版,王斌会,多元统计分析及,R,语言建模,暨南大学出版社,,2016,年 第,3,版,何晓群,多元统计分析(第四版),中国人民大学出版社,,2015,年,概率论与数理统计,,雷平主编,立信会计出版社,商务与经济统计,,雷平译,机械工业出版社,数据统计分析及,R,语言编程,,王斌会,
2、北京大学出版社,,2004,年,8,月,Introduction to Linear Regression Analysis,Montgomery,5th ed,Wiley Press,2013,参考书目,在终极的分析中,一切知识都是历史,在抽象的意义下,一切科学都是数学,在理性的基础上,所有的判断都是统计学,-C.R.,劳,统计与真理,怎样运用偶然性,12-,5,Correlation vs.Regression,A,scatter diagram,can be used to show the relationship between two variables,Correlation,a
3、nalysis is used to measure strength of the association(linear relationship)between two variables,Correlation is only concerned with strength of the relationship,No causal effect is implied with correlation,12-,6,TYPES OF REGRESSION MODEL,12-,8,r 0,r=0,THREE DEGREES OF CORRELATION,12-,9,Types of Rela
4、tionships,Y,X,Y,X,Y,Y,X,X,Strong relationships,Weak relationships,(,continued),12-,10,Types of Relationships,Y,X,Y,X,No relationship,(,continued),12-,11,Coefficient of Correlation,Measures the relative strength of the linear relationship between two variables,Sample coefficient of correlation,:,12-,
5、13,Features of Correlation Coefficient,r,Unit free,Ranges between 1 and 1,The closer to 1,the stronger the negative linear relationship,The closer to 1,the stronger the positive linear relationship,The closer to 0,the weaker any positive linear relationship,-1.0,+1.0,0,Perfect Positive Correlation,I
6、ncreasing degree of negative correlation,-.5,+.5,Perfect Negative Correlation,No Correlation,Increasing degree of positive correlation,COEFFICIENT OF CORRELATION VALUES,12-,15,Scatter Plots of Data with Various Correlation Coefficients,Y,X,Y,X,Y,X,Y,X,Y,X,r=-1,r=-.6,r=0,r=+.3,r=+1,Y,X,r=0,12-,16,Int
7、roduction to Regression Analysis,Regression analysis,is used to:,Predict the value of a dependent variable based on the value of at least one independent variable,Explain the impact of changes in an independent variable on the dependent variable,Dependent variable:,the variable we wish to explain,In
8、dependent variable:,the variable used to explain the dependent variable,12-,17,Simple Linear Regression Model,Only,one,independent variable,X,Relationship between X and Y is described by a linear function,Changes in Y are assumed to be caused by changes in X,12-,18,Linear component,Simple Linear Reg
9、ression Model,The population regression model:,Population Y intercept,Population SlopeCoefficient,Random Error term,Dependent Variable,Independent Variable,Random Error,component,12-,19,Random Error for this X,i,value,Y,X,Observed Value of Y for X,i,Predicted Value of Y for X,i,X,i,Slope=,1,Intercep
10、t=,0,i,Simple Linear Regression Model,12-,20,The simple linear regression equation provides an,estimate,of the population regression line,Simple Linear Regression Equation,Estimate of the regression intercept,Estimate of the regression slope,Estimated (or predicted)Y value for observation x,Value of
11、 X,The individual random error terms e have a mean of zero,Random error for X,i,value,Y,Observed value of Y for X,i,X,i,Slope=,1,Intercept=,0,i,i,Simple Linear Regression Model,12-,22,Least Squares Method,b,0,and b,1,are obtained by finding the values of b,0,and b,1,that,minimize,the sum of the squa
12、red differences,between Y and :,12-,23,b,0,is the estimated average value of Y when the value of X is zero,b,1,is the estimated change in the,average,value of Y as a result of a one-unit change in X,Interpretation of the Slope and the Intercept,12-,24,Simple Linear Regression Example,A real estate a
13、gent wishes to examine the relationship between the selling price of a home and its size(measured in square feet),A random sample of 10 houses is selected,Dependent variable(Y)=house price,in$1000s,Independent variable(X)=square feet,12-,25,Sample Data for House Price Model,House Price in$1000s,(Y),
14、Square Feet,(X),245,1400,312,1600,279,1700,308,1875,199,1100,219,1550,405,2350,324,2450,319,1425,255,1700,12-,26,Graphical Presentation,House price model:scatter plot,12-,27,Least-squares Method,This interpretation states:,The,sum of the squares,of the errors should be made as small as possible.,Sum
15、 of the squares of residuals.,The values of,b,0,and,b,1,that minimise this sum of the squares of the residuals are given by:,Where:,LEAST SQUARES ANALYSIS-EQUATIONS,Sample Regression Line,:,Slope of Regression line:,Intercept of Regression line:,COMPUTATIONAL PROCEDURE,The expression in the numerato
16、r of the slope formula can be denoted as SS,xy,The expression in the denominator of the slope formula can be denoted as SS,xx,Hence,slope can be written as:,12-,30,Calculation of b,0,&b,1,12-,31,Alternative formula for b,1,Simple Linear Regression Example:Scatter Plot,House price model:Scatter Plot,
17、DCO,V,A,Simple Linear Regression Example:Excel Output,Regression Statistics,Multiple R,0.76211,R Square,0.58082,Adjusted R Square,0.52842,Standard Error,41.33032,Observations,10,ANOVA,df,SS,MS,F,Significance F,Regression,1,18934.9348,18934.9348,11.0848,0.01039,Residual,8,13665.5652,1708.1957,Total,9
18、32600.5000,Coefficients,Standard Error,t Stat,P-value,Lower 95%,Upper 95%,Intercept,98.24833,58.03348,1.69296,0.12892,-35.57720,232.07386,Square Feet,0.10977,0.03297,3.32938,0.01039,0.03374,0.18580,The regression equation is:,DCOV,A,12-,34,Graphical Presentation,House price model:scatter plot and r
19、egression line,Slope,=0.10977,Intercept,=98.248,12-,35,Interpretation of the Intercept,b,0,b,0,is the estimated average value of Y when the value of X is zero(if X=0 is in the range of observed X values),Here,no houses had 0 square feet,so b,0,=98.24833 just indicates that,for houses within the rang
20、e of sizes observed,$98,248.33 is the portion of the house price not explained by square feet,12-,36,Interpretation of the Slope Coefficient,b,1,b,1,measures the estimated change in the average value of Y as a result of a one-unit change in X,Here,b,1,=.10977 tells us that the average value of a hou
21、se increases by.10977($1000)=$109.77,on average,for each additional one square foot of size,12-,37,Predict the price for a house with 2000 square feet:,The predicted price for a house with 2000 square feet is 317.85($1,000s)=$317,850,Predictions using Regression Analysis,12-,38,Interpolation vs.Extr
22、apolation,When using a regression model for prediction,only,predict within the relevant range of data,Relevant range for interpolation,Do not try to extrapolate beyond the range of observed Xs,12-,39,Interpretation of Results:,The slope of 1.487 means that for each increase of one unit in X,we predi
23、ct the average of Y to increase by an estimated 1.487 units.,The model,estimates,that for,each increase of one square foot,in the size of the store,the,expected,annual sales,are predicted to increase by$1487,.,12-,40,Example:Produce Stores,Y=1636.415+1.487X,Data for seven stores:,Regression Model Ob
24、tained:,Predict the annual sales for a store with 2000 square feet.,Annual Store Square Sales Feet($000),1 1,726 3,681,2 1,542 3,395,3 2,816 6,653,4 5,555 9,543,5 1,292 3,318,6 2,208 5,563,7 1,313 3,760,12-,41,Using the regression line for prediction,If the regression line is a poor fit of the data
25、the prediction will be of little use,If the regression line is a good fit of the data,it is always dangerous to make a prediction of,y,for an,x,-,value that was outside the limits(i.e.smallest and largest)of the,x,-,values used in finding the equation of the line,12-,42,(,continued),X,i,Y,X,Y,i,SST,
26、Y,i,-,Y,),2,SSE,=(,Y,i,-,Y,i,),2,SSR=(,Y,i,-,Y,),2,_,_,_,Y,Y,_,Y,i,Measures of Variation,12-,44,Measures of Variation,Total variation is made up of two parts:,Total Sum of Squares,Regression Sum of Squares,Error Sum of Squares,where:,=Average value of the dependent variable,Y,i,=Observed values
27、of the dependent variable,i,=Predicted value of Y for the given X,i,value,12-,45,SST=total sum of squares,(Total Variation),Measures the variation of the Y,i,values around their mean Y,SSR=regression sum of squares,(Explained Variation),Variation attributable to the relationship between X and Y,SSE=
28、error sum of squares,(Unexplained Variation),Variation in Y attributable to factors other than X,(continued),Measures of Variation,12-,46,The,coefficient of determination,is the portion of the total variation in the dependent variable that is explained by variation in the independent variable,The co
29、efficient of determination is also called,r-squared,and is denoted as,r,2,Coefficient of Determination,r,2,note:,12-,47,Goodness of fit,How can we determine how well a regression line fits the data?,Choose:,the line that has the smallest sum of the squares of errors,or,coefficient of determination(,
30、r,2,),the line of good fit,12-,48,Coefficient of determination,This quantity is defined as:,The square of the correlation coefficient,r,The value of,r,always lies between-1 and 1,the value of,r,2,must always lie between 0 and 1,If the value of,r,2,is close to 1,a straight line fits the data well,If
31、the value of,r,2,is close to 0,a straight line fits the data poorly,12-,49,COEFFICIENT OF DETERMINATION,r,2,Proportion of variability of the dependent variable(y)accounted for,or explained by,the independent variable(x)in a regression model,Range of r,2,is from 0 to 1:,0 r,2,1,r,2,of zero implies th
32、e predictor accounts for none of the variability of the dependent variable and that there is no regression prediction of y by x,r,2,of one implies perfect prediction of y by x and that 100%variability of y is accounted for by x,COEFFICIENT OF DETERMINATION,r,2,SS,yy=,Explained variation+Unexplained
33、Variation,0 r,2,1,12-,51,COMPUTATIONAL FORMULA FOR r,2,It can be shown through algebra that:,From this equation,a computational formula for r,2,can be developed.,This formula holds only for simple linear regression,12-,52,EXAMPLE,For the CD-Concert example,the coefficient of determination,can be com
34、puted as follows:,12-,53,r,2,=1,Examples of Approximate r,2,Values,Y,X,Y,X,r,2,=1,r,2,=1,Perfect linear relationship between X and Y:,100%of the variation in Y is explained by variation in X,12-,54,Examples of Approximate r,2,Values,Y,X,Y,X,0,r,2,1,Weaker linear relationships between X and Y:,Some b
35、ut not all of the variation in Y is explained by variation in X,12-,55,Examples of Approximate r,2,Values,r,2,=0,No,linear relationship,between X and Y:,The value of Y does not depend on X.(None of the variation in Y is explained by variation in X),Y,X,r,2,=0,12-,56,Excel Output,Regression Statistic
36、s,Multiple R,0.76211,R Square,0.58082,Adjusted R Square,0.52842,Standard Error,41.33032,Observations,10,ANOVA,df,SS,MS,F,Significance F,Regression,1,18934.9348,18934.9348,11.0848,0.01039,Residual,8,13665.5652,1708.1957,Total,9,32600.5000,Coefficients,Standard Error,t Stat,P-value,Lower 95%,Upper 95%
37、Intercept,98.24833,58.03348,1.69296,0.12892,-35.57720,232.07386,Square Feet,0.10977,0.03297,3.32938,0.01039,0.03374,0.18580,58.08%,of the variation in house prices is explained by variation in square feet,Simple Linear Regression Example:Coefficient of Determination,r,2,in Minitab,The regression eq
38、uation is,Price=98.2+0.110 Square Feet,Predictor Coef SE Coef T P,Constant 98.25 58.03 1.69 0.129,Square Feet 0.10977 0.03297 3.33 0.010,S=41.3303 R-Sq=58.1%R-Sq(adj)=52.8%,Analysis of Variance,Source DF SS MS F P,Regression 1 18935 18935 11.08 0.010,Residual Error8 13666 1708,Total 9 32600,58.08%of
39、 the variation in house prices is explained by variation in square feet,DCOV,A,12-,58,Standard Error of Estimate,The standard deviation of the variation of observations around the regression line is estimated by,Where,SSE =error sum of squares,n=sample size,Regression Statistics,Multiple R,0.76211,R
40、 Square,0.58082,Adjusted R Square,0.52842,Standard Error,41.33032,Observations,10,ANOVA,df,SS,MS,F,Significance F,Regression,1,18934.9348,18934.934,11.084,0.01039,Residual,8,13665.5652,1708.1957,Total,9,32600.5000,Coefficients,Standard Error,t Stat,P-value,Lower 95%,Upper 95%,Intercept,98.24833,58.0
41、3348,1.69296,0.12892,-35.57720,232.07386,Square Feet,0.10977,0.03297,3.32938,0.01039,0.03374,0.18580,SSE,EXCEL OUTPUT FOR HOUSE PRICE MODEL,Simple Linear Regression Example:Standard Error of Estimate in Minitab,The regression equation is,Price=98.2+0.110 Square Feet,Predictor Coef SE Coef T P,Consta
42、nt 98.25 58.03 1.69 0.129,Square Feet 0.10977 0.03297 3.33 0.010,S=41.3303 R-Sq=58.1%R-Sq(adj)=52.8%,Analysis of Variance,Source DF SS MS F P,Regression 1 18935 18935 11.08 0.010,Residual Error8 13666 1708,Total 9 32600,DCOV,A,12-,61,Comparing Standard Errors,Y,Y,X,X,S,YX,is a measure of the variati
43、on of observed Y values from the regression line,The magnitude of S,YX,should always be judged relative to the size of the Y values in the sample data,i.e.,S,YX,=$41.33K is,moderately small relative to house prices in the$200-$300K range,12-,62,Assumptions of RegressionL.I.N.E,L,inearity,The relatio
44、nship between X and Y is linear,I,ndependence of Errors,Error values are statistically independent,N,ormality of Error,Error values are normally distributed for any given value of X,E,qual Variance(also called homoscedasticity),The probability distribution of the errors has constant variance,12-,63,
45、Residual Analysis,The residual for observation i,e,i,is the difference between its observed and predicted value,Check the assumptions of regression by examining the residuals,Examine for linearity assumption,Evaluate independence assumption,Evaluate normal distribution assumption,Examine for constan
46、t variance for all levels of X(homoscedasticity),Graphical Analysis of Residuals,Can plot residuals vs.X,12-,64,Residual Analysis for Linearity,Not Linear,Linear,x,residuals,x,Y,x,Y,x,residuals,12-,65,Residual Analysis for Independence,Not Independent,Independent,X,X,residuals,residuals,X,residuals,
47、12-,66,Checking for Normality,Examine the Stem-and-Leaf Display of the Residuals,Examine the Boxplot of the Residuals,Examine the Histogram of the Residuals,Construct a Normal Probability Plot of the Residuals,12-,67,Residual Analysis for Normality,Percent,Residual,When using a normal probability pl
48、ot,normal errors will approximately display in a straight line,-3 -2 -1 0 1 2 3,0,100,12-,68,Residual Analysis for Equal Variance,Non-constant variance,Constant variance,x,x,Y,x,x,Y,residuals,residuals,Non,constant Variance,Graphs of Nonindependent Error Terms,Healthy Residual Plot,12-,72,Simple Lin
49、ear Regression Example:Excel Residual Output,RESIDUAL OUTPUT,Predicted House Price,Residuals,1,251.92316,-6.923162,2,273.87671,38.12329,3,284.85348,-5.853484,4,304.06284,3.937162,5,218.99284,-19.99284,6,268.38832,-49.38832,7,356.20251,48.79749,8,367.17929,-43.17929,9,254.6674,64.33264,10,284.85348,-
50、29.85348,Does not appear to violate,any regression assumptions,Simple Linear Regression Example:Minitab Residual Output,DCO,VA,Does not appear to violate any regression assumptions,12-,74,Used when data are,collected over time,to detect if autocorrelation is present,Autocorrelation exists if residua
©2010-2025 宁波自信网络信息技术有限公司 版权所有
客服电话:4009-655-100 投诉/维权电话:18658249818