收藏 分销(赏)

商务数据分析与统计建模:chap1.1 一元回归及其相关问题.ppt

上传人:可**** 文档编号:10290151 上传时间:2025-05-16 格式:PPT 页数:114 大小:9.45MB
下载 相关 举报
商务数据分析与统计建模:chap1.1 一元回归及其相关问题.ppt_第1页
第1页 / 共114页
商务数据分析与统计建模:chap1.1 一元回归及其相关问题.ppt_第2页
第2页 / 共114页
点击查看更多>>
资源描述
Click to edit Master title style,Click to edit Master text styles,Second level,Third level,Fourth level,Fifth level,12-,*,主教材,何晓群,应用回归分析,中国人民大学出版社,,2015,年 第,4,版,王斌会,多元统计分析及,R,语言建模,暨南大学出版社,,2016,年 第,3,版,何晓群,多元统计分析(第四版),中国人民大学出版社,,2015,年,概率论与数理统计,,雷平主编,立信会计出版社,商务与经济统计,,雷平译,机械工业出版社,数据统计分析及,R,语言编程,,王斌会,北京大学出版社,,2004,年,8,月,Introduction to Linear Regression Analysis,Montgomery,5th ed,Wiley Press,2013,参考书目,在终极的分析中,一切知识都是历史,在抽象的意义下,一切科学都是数学,在理性的基础上,所有的判断都是统计学,-C.R.,劳,统计与真理,怎样运用偶然性,12-,5,Correlation vs.Regression,A,scatter diagram,can be used to show the relationship between two variables,Correlation,analysis is used to measure strength of the association(linear relationship)between two variables,Correlation is only concerned with strength of the relationship,No causal effect is implied with correlation,12-,6,TYPES OF REGRESSION MODEL,12-,8,r 0,r=0,THREE DEGREES OF CORRELATION,12-,9,Types of Relationships,Y,X,Y,X,Y,Y,X,X,Strong relationships,Weak relationships,(,continued),12-,10,Types of Relationships,Y,X,Y,X,No relationship,(,continued),12-,11,Coefficient of Correlation,Measures the relative strength of the linear relationship between two variables,Sample coefficient of correlation,:,12-,13,Features of Correlation Coefficient,r,Unit free,Ranges between 1 and 1,The closer to 1,the stronger the negative linear relationship,The closer to 1,the stronger the positive linear relationship,The closer to 0,the weaker any positive linear relationship,-1.0,+1.0,0,Perfect Positive Correlation,Increasing degree of negative correlation,-.5,+.5,Perfect Negative Correlation,No Correlation,Increasing degree of positive correlation,COEFFICIENT OF CORRELATION VALUES,12-,15,Scatter Plots of Data with Various Correlation Coefficients,Y,X,Y,X,Y,X,Y,X,Y,X,r=-1,r=-.6,r=0,r=+.3,r=+1,Y,X,r=0,12-,16,Introduction to Regression Analysis,Regression analysis,is used to:,Predict the value of a dependent variable based on the value of at least one independent variable,Explain the impact of changes in an independent variable on the dependent variable,Dependent variable:,the variable we wish to explain,Independent variable:,the variable used to explain the dependent variable,12-,17,Simple Linear Regression Model,Only,one,independent variable,X,Relationship between X and Y is described by a linear function,Changes in Y are assumed to be caused by changes in X,12-,18,Linear component,Simple Linear Regression Model,The population regression model:,Population Y intercept,Population SlopeCoefficient,Random Error term,Dependent Variable,Independent Variable,Random Error,component,12-,19,Random Error for this X,i,value,Y,X,Observed Value of Y for X,i,Predicted Value of Y for X,i,X,i,Slope=,1,Intercept=,0,i,Simple Linear Regression Model,12-,20,The simple linear regression equation provides an,estimate,of the population regression line,Simple Linear Regression Equation,Estimate of the regression intercept,Estimate of the regression slope,Estimated (or predicted)Y value for observation x,Value of X,The individual random error terms e have a mean of zero,Random error for X,i,value,Y,Observed value of Y for X,i,X,i,Slope=,1,Intercept=,0,i,i,Simple Linear Regression Model,12-,22,Least Squares Method,b,0,and b,1,are obtained by finding the values of b,0,and b,1,that,minimize,the sum of the squared differences,between Y and :,12-,23,b,0,is the estimated average value of Y when the value of X is zero,b,1,is the estimated change in the,average,value of Y as a result of a one-unit change in X,Interpretation of the Slope and the Intercept,12-,24,Simple Linear Regression Example,A real estate agent wishes to examine the relationship between the selling price of a home and its size(measured in square feet),A random sample of 10 houses is selected,Dependent variable(Y)=house price,in$1000s,Independent variable(X)=square feet,12-,25,Sample Data for House Price Model,House Price in$1000s,(Y),Square Feet,(X),245,1400,312,1600,279,1700,308,1875,199,1100,219,1550,405,2350,324,2450,319,1425,255,1700,12-,26,Graphical Presentation,House price model:scatter plot,12-,27,Least-squares Method,This interpretation states:,The,sum of the squares,of the errors should be made as small as possible.,Sum of the squares of residuals.,The values of,b,0,and,b,1,that minimise this sum of the squares of the residuals are given by:,Where:,LEAST SQUARES ANALYSIS-EQUATIONS,Sample Regression Line,:,Slope of Regression line:,Intercept of Regression line:,COMPUTATIONAL PROCEDURE,The expression in the numerator of the slope formula can be denoted as SS,xy,The expression in the denominator of the slope formula can be denoted as SS,xx,Hence,slope can be written as:,12-,30,Calculation of b,0,&b,1,12-,31,Alternative formula for b,1,Simple Linear Regression Example:Scatter Plot,House price model:Scatter Plot,DCO,V,A,Simple Linear Regression Example:Excel Output,Regression Statistics,Multiple R,0.76211,R Square,0.58082,Adjusted R Square,0.52842,Standard Error,41.33032,Observations,10,ANOVA,df,SS,MS,F,Significance F,Regression,1,18934.9348,18934.9348,11.0848,0.01039,Residual,8,13665.5652,1708.1957,Total,9,32600.5000,Coefficients,Standard Error,t Stat,P-value,Lower 95%,Upper 95%,Intercept,98.24833,58.03348,1.69296,0.12892,-35.57720,232.07386,Square Feet,0.10977,0.03297,3.32938,0.01039,0.03374,0.18580,The regression equation is:,DCOV,A,12-,34,Graphical Presentation,House price model:scatter plot and regression line,Slope,=0.10977,Intercept,=98.248,12-,35,Interpretation of the Intercept,b,0,b,0,is the estimated average value of Y when the value of X is zero(if X=0 is in the range of observed X values),Here,no houses had 0 square feet,so b,0,=98.24833 just indicates that,for houses within the range of sizes observed,$98,248.33 is the portion of the house price not explained by square feet,12-,36,Interpretation of the Slope Coefficient,b,1,b,1,measures the estimated change in the average value of Y as a result of a one-unit change in X,Here,b,1,=.10977 tells us that the average value of a house increases by.10977($1000)=$109.77,on average,for each additional one square foot of size,12-,37,Predict the price for a house with 2000 square feet:,The predicted price for a house with 2000 square feet is 317.85($1,000s)=$317,850,Predictions using Regression Analysis,12-,38,Interpolation vs.Extrapolation,When using a regression model for prediction,only,predict within the relevant range of data,Relevant range for interpolation,Do not try to extrapolate beyond the range of observed Xs,12-,39,Interpretation of Results:,The slope of 1.487 means that for each increase of one unit in X,we predict the average of Y to increase by an estimated 1.487 units.,The model,estimates,that for,each increase of one square foot,in the size of the store,the,expected,annual sales,are predicted to increase by$1487,.,12-,40,Example:Produce Stores,Y=1636.415+1.487X,Data for seven stores:,Regression Model Obtained:,Predict the annual sales for a store with 2000 square feet.,Annual Store Square Sales Feet($000),1 1,726 3,681,2 1,542 3,395,3 2,816 6,653,4 5,555 9,543,5 1,292 3,318,6 2,208 5,563,7 1,313 3,760,12-,41,Using the regression line for prediction,If the regression line is a poor fit of the data the prediction will be of little use,If the regression line is a good fit of the data,it is always dangerous to make a prediction of,y,for an,x,-,value that was outside the limits(i.e.smallest and largest)of the,x,-,values used in finding the equation of the line,12-,42,(,continued),X,i,Y,X,Y,i,SST,=,(,Y,i,-,Y,),2,SSE,=(,Y,i,-,Y,i,),2,SSR=(,Y,i,-,Y,),2,_,_,_,Y,Y,_,Y,i,Measures of Variation,12-,44,Measures of Variation,Total variation is made up of two parts:,Total Sum of Squares,Regression Sum of Squares,Error Sum of Squares,where:,=Average value of the dependent variable,Y,i,=Observed values of the dependent variable,i,=Predicted value of Y for the given X,i,value,12-,45,SST=total sum of squares,(Total Variation),Measures the variation of the Y,i,values around their mean Y,SSR=regression sum of squares,(Explained Variation),Variation attributable to the relationship between X and Y,SSE=error sum of squares,(Unexplained Variation),Variation in Y attributable to factors other than X,(continued),Measures of Variation,12-,46,The,coefficient of determination,is the portion of the total variation in the dependent variable that is explained by variation in the independent variable,The coefficient of determination is also called,r-squared,and is denoted as,r,2,Coefficient of Determination,r,2,note:,12-,47,Goodness of fit,How can we determine how well a regression line fits the data?,Choose:,the line that has the smallest sum of the squares of errors,or,coefficient of determination(,r,2,),the line of good fit,12-,48,Coefficient of determination,This quantity is defined as:,The square of the correlation coefficient,r,The value of,r,always lies between-1 and 1,the value of,r,2,must always lie between 0 and 1,If the value of,r,2,is close to 1,a straight line fits the data well,If the value of,r,2,is close to 0,a straight line fits the data poorly,12-,49,COEFFICIENT OF DETERMINATION,r,2,Proportion of variability of the dependent variable(y)accounted for,or explained by,the independent variable(x)in a regression model,Range of r,2,is from 0 to 1:,0 r,2,1,r,2,of zero implies the predictor accounts for none of the variability of the dependent variable and that there is no regression prediction of y by x,r,2,of one implies perfect prediction of y by x and that 100%variability of y is accounted for by x,COEFFICIENT OF DETERMINATION,r,2,SS,yy=,Explained variation+Unexplained Variation,0 r,2,1,12-,51,COMPUTATIONAL FORMULA FOR r,2,It can be shown through algebra that:,From this equation,a computational formula for r,2,can be developed.,This formula holds only for simple linear regression,12-,52,EXAMPLE,For the CD-Concert example,the coefficient of determination,can be computed as follows:,12-,53,r,2,=1,Examples of Approximate r,2,Values,Y,X,Y,X,r,2,=1,r,2,=1,Perfect linear relationship between X and Y:,100%of the variation in Y is explained by variation in X,12-,54,Examples of Approximate r,2,Values,Y,X,Y,X,0,r,2,1,Weaker linear relationships between X and Y:,Some but not all of the variation in Y is explained by variation in X,12-,55,Examples of Approximate r,2,Values,r,2,=0,No,linear relationship,between X and Y:,The value of Y does not depend on X.(None of the variation in Y is explained by variation in X),Y,X,r,2,=0,12-,56,Excel Output,Regression Statistics,Multiple R,0.76211,R Square,0.58082,Adjusted R Square,0.52842,Standard Error,41.33032,Observations,10,ANOVA,df,SS,MS,F,Significance F,Regression,1,18934.9348,18934.9348,11.0848,0.01039,Residual,8,13665.5652,1708.1957,Total,9,32600.5000,Coefficients,Standard Error,t Stat,P-value,Lower 95%,Upper 95%,Intercept,98.24833,58.03348,1.69296,0.12892,-35.57720,232.07386,Square Feet,0.10977,0.03297,3.32938,0.01039,0.03374,0.18580,58.08%,of the variation in house prices is explained by variation in square feet,Simple Linear Regression Example:Coefficient of Determination,r,2,in Minitab,The regression equation is,Price=98.2+0.110 Square Feet,Predictor Coef SE Coef T P,Constant 98.25 58.03 1.69 0.129,Square Feet 0.10977 0.03297 3.33 0.010,S=41.3303 R-Sq=58.1%R-Sq(adj)=52.8%,Analysis of Variance,Source DF SS MS F P,Regression 1 18935 18935 11.08 0.010,Residual Error8 13666 1708,Total 9 32600,58.08%of the variation in house prices is explained by variation in square feet,DCOV,A,12-,58,Standard Error of Estimate,The standard deviation of the variation of observations around the regression line is estimated by,Where,SSE =error sum of squares,n=sample size,Regression Statistics,Multiple R,0.76211,R Square,0.58082,Adjusted R Square,0.52842,Standard Error,41.33032,Observations,10,ANOVA,df,SS,MS,F,Significance F,Regression,1,18934.9348,18934.934,11.084,0.01039,Residual,8,13665.5652,1708.1957,Total,9,32600.5000,Coefficients,Standard Error,t Stat,P-value,Lower 95%,Upper 95%,Intercept,98.24833,58.03348,1.69296,0.12892,-35.57720,232.07386,Square Feet,0.10977,0.03297,3.32938,0.01039,0.03374,0.18580,SSE,EXCEL OUTPUT FOR HOUSE PRICE MODEL,Simple Linear Regression Example:Standard Error of Estimate in Minitab,The regression equation is,Price=98.2+0.110 Square Feet,Predictor Coef SE Coef T P,Constant 98.25 58.03 1.69 0.129,Square Feet 0.10977 0.03297 3.33 0.010,S=41.3303 R-Sq=58.1%R-Sq(adj)=52.8%,Analysis of Variance,Source DF SS MS F P,Regression 1 18935 18935 11.08 0.010,Residual Error8 13666 1708,Total 9 32600,DCOV,A,12-,61,Comparing Standard Errors,Y,Y,X,X,S,YX,is a measure of the variation of observed Y values from the regression line,The magnitude of S,YX,should always be judged relative to the size of the Y values in the sample data,i.e.,S,YX,=$41.33K is,moderately small relative to house prices in the$200-$300K range,12-,62,Assumptions of RegressionL.I.N.E,L,inearity,The relationship between X and Y is linear,I,ndependence of Errors,Error values are statistically independent,N,ormality of Error,Error values are normally distributed for any given value of X,E,qual Variance(also called homoscedasticity),The probability distribution of the errors has constant variance,12-,63,Residual Analysis,The residual for observation i,e,i,is the difference between its observed and predicted value,Check the assumptions of regression by examining the residuals,Examine for linearity assumption,Evaluate independence assumption,Evaluate normal distribution assumption,Examine for constant variance for all levels of X(homoscedasticity),Graphical Analysis of Residuals,Can plot residuals vs.X,12-,64,Residual Analysis for Linearity,Not Linear,Linear,x,residuals,x,Y,x,Y,x,residuals,12-,65,Residual Analysis for Independence,Not Independent,Independent,X,X,residuals,residuals,X,residuals,12-,66,Checking for Normality,Examine the Stem-and-Leaf Display of the Residuals,Examine the Boxplot of the Residuals,Examine the Histogram of the Residuals,Construct a Normal Probability Plot of the Residuals,12-,67,Residual Analysis for Normality,Percent,Residual,When using a normal probability plot,normal errors will approximately display in a straight line,-3 -2 -1 0 1 2 3,0,100,12-,68,Residual Analysis for Equal Variance,Non-constant variance,Constant variance,x,x,Y,x,x,Y,residuals,residuals,Non,constant Variance,Graphs of Nonindependent Error Terms,Healthy Residual Plot,12-,72,Simple Linear Regression Example:Excel Residual Output,RESIDUAL OUTPUT,Predicted House Price,Residuals,1,251.92316,-6.923162,2,273.87671,38.12329,3,284.85348,-5.853484,4,304.06284,3.937162,5,218.99284,-19.99284,6,268.38832,-49.38832,7,356.20251,48.79749,8,367.17929,-43.17929,9,254.6674,64.33264,10,284.85348,-29.85348,Does not appear to violate,any regression assumptions,Simple Linear Regression Example:Minitab Residual Output,DCO,VA,Does not appear to violate any regression assumptions,12-,74,Used when data are,collected over time,to detect if autocorrelation is present,Autocorrelation exists if residua
展开阅读全文

开通  VIP会员、SVIP会员  优惠大
下载10份以上建议开通VIP会员
下载20份以上建议开通SVIP会员


开通VIP      成为共赢上传
相似文档                                   自信AI助手自信AI助手

当前位置:首页 > 包罗万象 > 大杂烩

移动网页_全站_页脚广告1

关于我们      便捷服务       自信AI       AI导航        抽奖活动

©2010-2025 宁波自信网络信息技术有限公司  版权所有

客服电话:4009-655-100  投诉/维权电话:18658249818

gongan.png浙公网安备33021202000488号   

icp.png浙ICP备2021020529号-1  |  浙B2-20240490  

关注我们 :微信公众号    抖音    微博    LOFTER 

客服