资源描述
上机课实验作业三
姓名:李姝仪
学号:2012310320
班级:12级审计二班
一:数据集DATA 4-7
1)先验地预期CM和各个变量之间的关系,并计算样本相关系数。
答:先验地预期:CM与FLR之间呈负相关关系,即女性文盲率越高,婴儿死亡率越低;CM与PGNP之间呈负相关关系,即人均国民产值越高,婴儿死亡率越低;CM与TFR之间呈正相关关系,即总生育率越高,婴儿死亡率就越高。
对应的样本相关系数如下表所示:
. correlate cm flr pgnp tfr
(obs=64)
| cm flr pgnp tfr
-------------+------------------------------------
cm | 1.0000
flr | -0.8183 1.0000
pgnp | -0.4077 0.2685 1.0000
tfr | 0.6711 -0.6260 -0.1857 1.0000
2)做CM对FLR的回归。
答:输入stata命令:
. reg cm flr
Source | SS df MS Number of obs = 64
-------------+------------------------------ F( 1, 62) = 125.65
Model | 243515.049 1 243515.049 Prob > F = 0.0000
Residual | 120162.951 62 1938.11211 R-squared = 0.6696
-------------+------------------------------ Adj R-squared = 0.6643
Total | 363678 63 5772.66667 Root MSE = 44.024
------------------------------------------------------------------------------
cm | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
flr | -2.390496 .2132625 -11.21 0.000 -2.816802 -1.96419
_cons | 263.8635 12.22499 21.58 0.000 239.4261 288.3009
得回归结果:CM=263.86-2.39FLR。
3)做CM对FLR和PGNP的回归。
答:输入stata命令:
. reg cm flr pgnp
Source | SS df MS Number of obs = 64
-------------+------------------------------ F( 2, 61) = 73.83
Model | 257362.373 2 128681.187 Prob > F = 0.0000
Residual | 106315.627 61 1742.87913 R-squared = 0.7077
-------------+------------------------------ Adj R-squared = 0.6981
Total | 363678 63 5772.66667 Root MSE = 41.748
------------------------------------------------------------------------------
cm | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
flr | -2.231586 .2099472 -10.63 0.000 -2.651401 -1.81177
pgnp | -.0056466 .0020033 -2.82 0.006 -.0096524 -.0016408
_cons | 263.6416 11.59318 22.74 0.000 240.4596 286.8236
得回归结果:CM=263.64-2.23FLR-0.0056PGNP。
4)做CM对FLR、PGNP和TFR的回归。观察校正拟合优度的变化。
答:输入stata命令:
. reg cm flr pgnp tfr
Source | SS df MS Number of obs = 64
-------------+------------------------------ F( 3, 60) = 59.17
Model | 271802.616 3 90600.8721 Prob > F = 0.0000
Residual | 91875.3836 60 1531.25639 R-squared = 0.7474
-------------+------------------------------ Adj R-squared = 0.7347
Total | 363678 63 5772.66667 Root MSE = 39.131
------------------------------------------------------------------------------
cm | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
flr | -1.768029 .2480169 -7.13 0.000 -2.264137 -1.271921
pgnp | -.0055112 .0018782 -2.93 0.005 -.0092682 -.0017542
tfr | 12.86864 4.190533 3.07 0.003 4.486323 21.25095
_cons | 168.3067 32.89166 5.12 0.000 102.5136 234.0998
得回归结果:CM=168.31-1.77FLR-0.0055PGNP+12.87TFR。观察发现校正拟合优度随着解释变量个数的增加而不断增大,但始终小于拟合优度的数值。
5)根据各种回归结果,选择哪个模型?为什么?
答:根据以上回归结果,选择4)中的模型。因为此模型中解释变量个数最多,考虑的变量因素多,且此模型的拟合优度和校正拟合优度都比前几个模型大,说明此模型对因变量的解释力较前几个模型更好些,得到的结果更准确。
6)对3)中的回归,检验FLR和PGNP的联合显著性。(写出原假设、备择假设、检验统计量)
答:输入stata命令:
. reg cm flr pgnp(结果略)
. test flr pgnp
( 1) flr = 0
( 2) pgnp = 0
F( 2, 61) = 73.83
Prob > F = 0.0000
其中:原假设:H0:β2=β3=0 备择假设:H1:β2与β3至少有一个不为零。检验统计量:F值,F(2,61) =73.83,且Prob > F =0.0000,说明FLR和PGNP通过联合显著性检验,FLR和PGNP是联合显著的。
二:数据集DATA6-8
(1)做收盘价格对时间的散点图。散点图呈现出什么样的模式?
答:输入stata命令,得散点图如下图所示:
. twoway scatter close time
可以发现,散点图呈现出正相关的模式。
(2)建立一个线性模型预测Qualcom股票的收盘价格。
答:建立线性模型:close=β1+β2time+μ。输入stata回归命令,可得:
. reg close time
Source | SS df MS Number of obs = 260
-------------+------------------------------ F( 1, 258) = 161.30
Model | 493579.523 1 493579.523 Prob > F = 0.0000
Residual | 789466.982 258 3059.94954 R-squared = 0.3847
-------------+------------------------------ Adj R-squared = 0.3823
Total | 1283046.51 259 4953.84751 Root MSE = 55.317
------------------------------------------------------------------------------
close | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
time | .5805136 .0457079 12.70 0.000 .4905056 .6705216
_cons | -4.69406 6.881046 -0.68 0.496 -18.24422 8.856105
得回归结果:close=-4.69+0.58time。
(3)建立一个二次模型,解释变量包括时间和时间的平方。模型的拟合效果如何?
答:建立二次模型:close=β1+β2time+β3time^2+μ。输入stata回归命令,可得:
. gen time2=time^2
. reg close time time2
Source | SS df MS Number of obs = 260
-------------+------------------------------ F( 2, 257) = 211.27
Model | 797808.219 2 398904.11 Prob > F = 0.0000
Residual | 485238.286 257 1888.08672 R-squared = 0.6218
-------------+------------------------------ Adj R-squared = 0.6189
Total | 1283046.51 259 4953.84751 Root MSE = 43.452
------------------------------------------------------------------------------
close | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
time | -1.191469 .1441386 -8.27 0.000 -1.475312 -.9076263
time2 | .0067892 .0005348 12.69 0.000 .005736 .0078424
_cons | 72.68253 8.146947 8.92 0.000 56.63926 88.7258
得回归结果:close=72.68-1.19time0.0068time^2。模型的拟合效果一般,拟合优度只有0.6218,校正拟合优度也只有0.6189。
(4)建立一个三次模型:
其中,是股票价格,是时间。哪一个模型更好地拟合了数据?
答:输入stata命令,得:
. gen time2=time^2
. gen time3=time^3
. reg close time time2 time3
Source | SS df MS Number of obs = 260
-------------+------------------------------ F( 3, 256) = 375.21
Model | 1045314.84 3 348438.28 Prob > F = 0.0000
Residual | 237731.665 256 928.639316 R-squared = 0.8147
-------------+------------------------------ Adj R-squared = 0.8125
Total | 1283046.51 259 4953.84751 Root MSE = 30.474
------------------------------------------------------------------------------
close | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
time | 2.612844 .254008 10.29 0.000 2.112632 3.113055
time2 | -.0295807 .0022591 -13.09 0.000 -.0340296 -.0251319
time3 | .0000929 5.69e-06 16.33 0.000 .0000817 .0001041
_cons | -10.85435 7.669922 -1.42 0.158 -25.95852 4.249829
得回归结果:close=-10.85+2.61time-0.0296time^2+0.000093time^3。由回归结果可知,三次模型较二次模型更好地拟合了数据,原因是三次模型的校正拟合优度比二次模型的大,说明了该模型更具解释力。
三:数据集DATA6-9
(1)利用数据拟合一个LIV(变量线性)模型,解释回归系数的涵义。模型拟合的效果如何?分别做对和对的散点图。散点图是否呈现出线性模式?
答:建立模型:Y=β1+β2 X1+β3 X2+μ。其中回归系数β2、β3分别表示当其他解释变量不变时,这一解释变量各自对被解释变量Y的影响程度。输入stata回归命令可得:(因为所给数据有缺失,故剔除了两组数据)
. rename lifeexp Y
. rename peopletv X1
. rename peoplephys X2
. reg Y X1 X2
Source | SS df MS Number of obs = 38
-------------+------------------------------ F( 2, 35) = 13.75
Model | 991.123688 2 495.561844 Prob > F = 0.0000
Residual | 1261.24473 35 36.0355638 R-squared = 0.4400
-------------+------------------------------ Adj R-squared = 0.4080
Total | 2252.36842 37 60.8748222 Root MSE = 6.003
------------------------------------------------------------------------------
Y | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
X1 | -.0234954 .0096469 -2.44 0.020 -.0430796 -.0039112
X2 | -.000432 .0002023 -2.14 0.040 -.0008427 -.0000214
_cons | 70.25196 1.087705 64.59 0.000 68.0438 72.46012
回归结果:Y=70.25-0.0235X1-0.0004X2。由回归数据可得,该模型的拟合优度只有0.44,即该模型的拟合效果不是很好。
分别做Y对X1和Y对X2的散点图如下:
. twoway scatter Y X1
. twoway scatter Y X2
Y对X1 Y对X2
可以上图看出,这两个散点图都没有呈现出线性模型。
(2)分别做对和对的散点图。散点图是否呈现出线性模式?
答:输入stata命令可得:
. gen lnY=log(Y)
. gen lnX1=log(X1)
. gen lnX2=log(X2)
. twoway scatter lnY lnX1
. twoway scatter lnY lnX2
lnY对lnX1 lnY对lnX2
由上图可以看出,这两个散点图都呈现出线性模式,而且是负相关。
(3)估计一个双对数模型。拟合的效果如何?
答:建立双对数模型:lnY=β1+β2 lnX1+β3 lnX2+μ。输入stata命令可得:
. reg lnY lnX1 lnX2
Source | SS df MS Number of obs = 38
-------------+------------------------------ F( 2, 35) = 69.45
Model | .423465472 2 .211732736 Prob > F = 0.0000
Residual | .106700184 35 .003048577 R-squared = 0.7987
-------------+------------------------------ Adj R-squared = 0.7872
Total | .530165656 37 .014328802 Root MSE = .05521
------------------------------------------------------------------------------
lnY | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
lnX1 | -.0449974 .0088061 -5.11 0.000 -.0628747 -.0271202
lnX2 | -.035013 .0111428 -3.14 0.003 -.057634 -.012392
_cons | 4.563085 .064933 70.27 0.000 4.431264 4.694906
得回归结果:lnY=4.56 -0.045lnX1-0.035lnX2。从回归数据可得,该双对数模型的拟合优度达0.7987,高于线性模型的拟合优度。所以此模型的拟合效果较线性模型的拟合效果更好些。
(4)解释双对数模型中的回归系数。这些回归系数是否合理?
答:回归系数β2表示,在其他解释变量不变的情况下,X1的1%的变化引起Y改变β2%,即为-0.045%;:回归系数β3表示,在其他解释变量不变的情况下,X2的1%的变化引起Y改变β3%,即为-0.035%;这些回归系数中,回归系数β2合理,回归系数β3不合理。
四、使用数据集wage.dta,求得wage、educ、exper、tenure之间的相关系数矩阵。
答:输入stata命令得:
. correlate wage educ exper tenure
(obs=526)
| wage educ exper tenure
-------------+------------------------------------
wage | 1.0000
educ | 0.4059 1.0000
exper | 0.1129 -0.2995 1.0000
tenure | 0.3469 -0.0562 0.4993 1.0000
wage、educ、exper、tenure之间的相关系数矩阵即如上图所示。
五、使用mroz.dta数据集,剔除其中在5%的显著性水平下不显著的自变量。
答:输入stata命令,得:
. sw reg wage inlf hours kidslt6 kidsge6 age educ hushrs husage huseduc huswage faminc motheduc fatheduc exper expersq,pr(.05)
(inlf dropped because constant)
begin with full model
p = 0.9747 >= 0.0500 removing fatheduc
p = 0.9230 >= 0.0500 removing expersq
p = 0.8791 >= 0.0500 removing husage
p = 0.7765 >= 0.0500 removing kidslt6
p = 0.4023 >= 0.0500 removing kidsge6
p = 0.2174 >= 0.0500 removing huseduc
p = 0.2087 >= 0.0500 removing age
p = 0.1749 >= 0.0500 removing motheduc
Source | SS df MS Number of obs = 428
-------------+------------------------------ F( 6, 421) = 16.88
Model | 907.486444 6 151.247741 Prob > F = 0.0000
Residual | 3771.56649 421 8.95859024 R-squared = 0.1939
-------------+------------------------------ Adj R-squared = 0.1825
Total | 4679.05293 427 10.9579694 Root MSE = 2.9931
------------------------------------------------------------------------------
wage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
hours | -.0007869 .0002103 -3.74 0.000 -.0012004 -.0003735
exper | .0380562 .0190773 1.99 0.047 .0005576 .0755548
faminc | .0001048 .0000207 5.07 0.000 .0000641 .0001454
huswage | -.1515962 .0689664 -2.20 0.028 -.2871576 -.0160348
educ | .3790003 .0690195 5.49 0.000 .2433345 .5146661
hushrs | -.0008611 .0003048 -2.82 0.005 -.0014602 -.0002619
_cons | .3999198 1.149705 0.35 0.728 -1.859958 2.659798
上述数据就是剔除了其中在5%的显著性水平下不显著的自变量之后回归分析所得到的数据。
展开阅读全文