资源描述
Matlab软件包与Logistic回归
在回归分析中,因变量可能有两种情形:(1)是一个定量的变量,这时就用通常的regress函数对进行回归;(2)是一个定性的变量,比如,0或1,这时就不能用通常的regress函数对进行回归,而是使用所谓的Logistic回归。
Logistic回归的基本思想是,不是直接对进行回归,而是先定义一种概率函数,令
要求。此时,如果直接对进行回归,得到的回归方程可能不满足这个条件。在现实生活中,一般有。直接求的表达式,是比较困难的一件事,于是,人们改为考虑
一般的,。人们经过研究发现,令
即,是一个Logistic型的函数,效果比较理想。于是,我们将其变形得到:
然后,对进行通常的线性回归。例如,Logistic型概率函数的图形如下:ezplot('1/(1+300*exp(-2*x))',[0,10])
例1 企业到金融商业机构贷款,金融商业机构需要对企业进行评估。例如,Moody公司就是New York的一家专门评估企业的贷款信誉的公司。设:
下面列出美国66家企业的具体情况:
Y X1 X2 X3
0 -62.8 -89.5 1.7
0 3.3 -3.5 1.1
0 -120.8 -103.2 2.5
0 -18.1 -28.8 1.1
0 -3.8 -50.6 0.9
0 -61.2 -56.2 1.7
0 -20.3 -17.4 1.0
0 -194.5 -25.8 0.5
0 20.8 -4.3 1.0
0 -106.1 -22.9 1.5
0 -39.4 -35.7 1.2
0 -164.1 -17.7 1.3
0 -308.9 -65.8 0.8
0 7.2 -22.6 2.0
0 -118.3 -34.2 1.5
0 -185.9 -280.0 6.7
0 -34.6 -19.4 3.4
0 -27.9 6.3 1.3
0 -48.2 6.8 1.6
0 -49.2 -17.2 0.3
0 -19.2 -36.7 0.8
0 -18.1 -6.5 0.9
0 -98.0 -20.8 1.7
0 -129.0 -14.2 1.3
0 -4.0 -15.8 2.1
0 -8.7 -36.3 2.8
0 -59.2 -12.8 2.1
0 -13.1 -17.6 0.9
0 -38.0 1.6 1.2
0 -57.9 0.7 0.8
0 -8.8 -9.1 0.9
0 -64.7 -4.0 0.1
0 -11.4 4.8 0.9
1 43.0 16.4 1.3
1 47.0 16.0 1.9
1 -3.3 4.0 2.7
1 35.0 20.8 1.9
1 46.7 12.6 0.9
1 20.8 12.5 2.4
1 33.0 23.6 1.5
1 26.1 10.4 2.1
1 68.6 13.8 1.6
1 37.3 33.4 3.5
1 59.0 23.1 5.5
1 49.6 23.8 1.9
1 12.5 7.0 1.8
1 37.3 34.1 1.5
1 35.3 4.2 0.9
1 49.5 25.1 2.6
1 18.1 13.5 4.0
1 31.4 15.7 1.9
1 21.5 -14.4 1.0
1 8.5 5.8 1.5
1 40.6 5.8 1.8
1 34.6 26.4 1.8
1 19.9 26.7 2.3
1 17.4 12.6 1.3
1 54.7 14.6 1.7
1 53.5 20.6 1.1
1 35.9 26.4 2.0
1 39.4 30.5 1.9
1 53.1 7.1 1.9
1 39.8 13.8 1.2
1 59.5 7.0 2.0
1 16.3 20.4 1.0
1 21.7 -7.8 1.6
其中,
建立破产特征变量的回归方程。
解:在这个破产问题中,
我们讨论,概率。设=企业2年后具备还款能力的概率,即, =企业不破产的概率。因为66个数据有33个为0,33个为1,所以,取分界值0.5,令
由于我们并不知道企业在没有破产前概率的具体值,也不可能通过的数据把这个具体的概率值算出来,于是,为了方便做回归运算,我们取区间的中值,。数据表变为:
X1 X2 X3
0.25 -62.8 -89.5 1.7
0.25 3.3 -3.5 1.1
0.25 -120.8 -103.2 2.5
0.25 -18.1 -28.8 1.1
0.25 -3.8 -50.6 0.9
0.25 -61.2 -56.2 1.7
0.25 -20.3 -17.4 1.0
0.25 -194.5 -25.8 0.5
0.25 20.8 -4.3 1.0
0.25 -106.1 -22.9 1.5
0.25 -39.4 -35.7 1.2
0.25 -164.1 -17.7 1.3
0.25 -308.9 -65.8 0.8
0.25 7.2 -22.6 2.0
0.25 -118.3 -34.2 1.5
0.25 -185.9 -280.0 6.7
0.25 -34.6 -19.4 3.4
0.25 -27.9 6.3 1.3
0.25 -48.2 6.8 1.6
0.25 -49.2 -17.2 0.3
0.25 -19.2 -36.7 0.8
0.25 -18.1 -6.5 0.9
0.25 -98.0 -20.8 1.7
0.25 -129.0 -14.2 1.3
0.25 -4.0 -15.8 2.1
0.25 -8.7 -36.3 2.8
0.25 -59.2 -12.8 2.1
0.25 -13.1 -17.6 0.9
0.25 -38.0 1.6 1.2
0.25 -57.9 0.7 0.8
0.25 -8.8 -9.1 0.9
0.25 -64.7 -4.0 0.1
0.25 -11.4 4.8 0.9
0.75 43.0 16.4 1.3
0.75 47.0 16.0 1.9
0.75 -3.3 4.0 2.7
0.75 35.0 20.8 1.9
0.75 46.7 12.6 0.9
0.75 20.8 12.5 2.4
0.75 33.0 23.6 1.5
0.75 26.1 10.4 2.1
0.75 68.6 13.8 1.6
0.75 37.3 33.4 3.5
0.75 59.0 23.1 5.5
0.75 49.6 23.8 1.9
0.75 12.5 7.0 1.8
0.75 37.3 34.1 1.5
0.75 35.3 4.2 0.9
0.75 49.5 25.1 2.6
0.75 18.1 13.5 4.0
0.75 31.4 15.7 1.9
0.75 21.5 -14.4 1.0
0.75 8.5 5.8 1.5
0.75 40.6 5.8 1.8
0.75 34.6 26.4 1.8
0.75 19.9 26.7 2.3
0.75 17.4 12.6 1.3
0.75 54.7 14.6 1.7
0.75 53.5 20.6 1.1
0.75 35.9 26.4 2.0
0.75 39.4 30.5 1.9
0.75 53.1 7.1 1.9
0.75 39.8 13.8 1.2
0.75 59.5 7.0 2.0
0.75 16.3 20.4 1.0
0.75 21.7 -7.8 1.6
于是,在Matlab软件包中编程如下,对进行通常的线性回归:
X=[1,-62.8,-89.5,1.7;
1,3.3,-3.5,1.1;
1,-120.8,-103.2,2.5;
1,-18.1,-28.8,1.1;
1,-3.8,-50.6,0.9;
1,-61.2,-56.2,1.7;
1,-20.3,-17.4,1;
1,-194.5,-25.8,0.5;
1,20.8,-4.3,1;
1,-106.1,-22.9,1.5;
1,-39.4,-35.7,1.2;
1,-164.1,-17.7,1.3;
1,-308.9,-65.8,0.8;
1,7.2,-22.6,2.0;
1,-118.3,-34.2,1.5;
1,-185.9,-280,6.7;
1,-34.6,-19.4,3.4;
1,-27.9,6.3,1.3;
1,-48.2,6.8,1.6;
1,-49.2,-17.2,0.3;
1,-19.2,-36.7,0.8;
1,-18.1,-6.5,0.9;
1,-98,-20.8,1.7;
1,-129,-14.2,1.3;
1,-4,-15.8,2.1;
1,-8.7,-36.3,2.8;
1,-59.2,-12.8,2.1;
1,-13.1,-17.6,0.9;
1,-38,1.6,1.2;
1,-57.9,0.7,0.8;
1,-8.8,-9.1,0.9;
1,-64.7,-4,0.1;
1,-11.4,4.8,0.9;
1,43,16.4,1.3;
1,47,16,1.9;
1,-3.3,4,2.7;
1,35,20.8,1.9;
1,46.7,12.6,0.9;
1,20.8,12.5,2.4;
1,33,23.6,1.5;
1,26.1,10.4,2.1;
1,68.6,13.8,1.6;
1,37.3,33.4,3.5;
1,59,23.1,5.5;
1,49.6,23.8,1.9;
1,12.5,7,1.8;
1,37.3,34.1,1.5;
1,35.3,4.2,0.9;
1,49.5,25.1,2.6;
1,18.1,13.5,4;
1,31.4,15.7,1.9;
1,21.5,-14.4,1;
1,8.5,5.8,1.5;
1,40.6,5.8,1.8;
1,34.6,26.4,1.8;
1,19.9,26.7,2.3;
1,17.4,12.6,1.3;
1,54.7,14.6,1.7;
1,53.5,20.6,1.1;
1,35.9,26.4,2;
1,39.4,30.5,1.9;
1,53.1,7.1,1.9;
1,39.8,13.8,1.2;
1,59.5,7,2;
1,16.3,20.4,1;
1,21.7,-7.8,1.6];
a0=0.25*ones(33,1);a1=0.75*ones(33,1);
y0=[a0;a1];
Y=log((1-y0)./y0);
[b,bint,r,rint,stats] =regress(Y,X)
rcoplot(r,rint)
执行后得到结果:
b =
0.3914
-0.0069
-0.0093
-0.3263
bint =
0.0073 0.7755
-0.0105 -0.0032
-0.0156 -0.0030
-0.5253 -0.1273
r =
-0.0037
1.0561
-0.2683
0.6733
0.5028
0.3179
0.7320
-0.7044
1.1361
0.2553
0.4955
-0.1593
-1.7643
1.1984
0.0662
-0.9937
1.3983
0.9988
0.9621
0.3072
0.4942
0.8161
0.3957
0.1141
1.2176
1.2225
0.8670
0.7468
0.8531
0.5777
0.8556
0.2588
0.9675
-0.6179
-0.3984
-0.5943
-0.4360
-0.7585
-0.4476
-0.5541
-0.5288
-0.3687
0.2194
0.9248
-0.3078
-0.7516
-0.4266
-0.9150
-0.0680
0.0653
-0.5082
-1.1506
-0.8882
-0.5701
-0.4191
-0.3540
-0.8289
-0.4239
-0.5720
-0.3449
-0.3153
-0.4396
-0.6967
-0.3640
-0.8616
-0.8919
rint =
-1.4320 1.4245
-0.3990 2.5113
-1.6975 1.1608
-0.7882 2.1349
-0.9222 1.9277
-1.1498 1.7856
-0.7332 2.1971
-2.0696 0.6609
-0.3070 2.5791
-1.2048 1.7154
-0.9730 1.9640
-1.5626 1.2441
-2.9063 -0.6223
-0.2499 2.6466
-1.3925 1.5249
-1.7217 -0.2657
-0.0051 2.8018
-0.4609 2.4585
-0.4909 2.4152
-1.1505 1.7649
-0.9556 1.9439
-0.6477 2.2799
-1.0648 1.8562
-1.3238 1.5521
-0.2340 2.6692
-0.2162 2.6613
-0.5911 2.3250
-0.7136 2.2073
-0.6117 2.3178
-0.8868 2.0421
-0.6044 2.3156
-1.1944 1.7120
-0.4914 2.4264
-2.0862 0.8504
-1.8729 1.0760
-2.0558 0.8671
-1.9108 1.0389
-2.2125 0.6955
-1.9186 1.0234
-2.0271 0.9190
-2.0034 0.9459
-1.8340 1.0967
-1.1951 1.6340
-0.3186 2.1681
-1.7819 1.1662
-2.2238 0.7205
-1.8981 1.0449
-2.3643 0.5342
-1.5319 1.3959
-1.3378 1.4683
-1.9834 0.9669
-2.5850 0.2839
-2.3556 0.5793
-2.0422 0.9020
-1.8929 1.0547
-1.8195 1.1116
-2.2961 0.6383
-1.8955 1.0476
-2.0355 0.8916
-1.8178 1.1280
-1.7876 1.1571
-1.9105 1.0313
-2.1620 0.7686
-1.8335 1.1055
-2.3237 0.6005
-2.3544 0.5707
stats =
0.5699 27.3841 0.0000 0.5526
即,得到:值=0.5699(说明回归方程刻画原问题不是太好),F_检验值=27.3841>0.0000(这个值比较好),与显著性概率相关的p值=0.5526>,说明变量之间存在线性相关关系。回归方程为:
以及残差图:
通过残差图看出,残差连续的出现在0的上方,或者连续地出现在0的下方,这也暗示变量之间存在线性相关。编程计算它们的相关系数:
X=[1,-62.8,-89.5,1.7;
1,3.3,-3.5,1.1;
1,-120.8,-103.2,2.5;
1,-18.1,-28.8,1.1;
1,-3.8,-50.6,0.9;
1,-61.2,-56.2,1.7;
1,-20.3,-17.4,1;
1,-194.5,-25.8,0.5;
1,20.8,-4.3,1;
1,-106.1,-22.9,1.5;
1,-39.4,-35.7,1.2;
1,-164.1,-17.7,1.3;
1,-308.9,-65.8,0.8;
1,7.2,-22.6,2.0;
1,-118.3,-34.2,1.5;
1,-185.9,-280,6.7;
1,-34.6,-19.4,3.4;
1,-27.9,6.3,1.3;
1,-48.2,6.8,1.6;
1,-49.2,-17.2,0.3;
1,-19.2,-36.7,0.8;
1,-18.1,-6.5,0.9;
1,-98,-20.8,1.7;
1,-129,-14.2,1.3;
1,-4,-15.8,2.1;
1,-8.7,-36.3,2.8;
1,-59.2,-12.8,2.1;
1,-13.1,-17.6,0.9;
1,-38,1.6,1.2;
1,-57.9,0.7,0.8;
1,-8.8,-9.1,0.9;
1,-64.7,-4,0.1;
1,-11.4,4.8,0.9;
1,43,16.4,1.3;
1,47,16,1.9;
1,-3.3,4,2.7;
1,35,20.8,1.9;
1,46.7,12.6,0.9;
1,20.8,12.5,2.4;
1,33,23.6,1.5;
1,26.1,10.4,2.1;
1,68.6,13.8,1.6;
1,37.3,33.4,3.5;
1,59,23.1,5.5;
1,49.6,23.8,1.9;
1,12.5,7,1.8;
1,37.3,34.1,1.5;
1,35.3,4.2,0.9;
1,49.5,25.1,2.6;
1,18.1,13.5,4;
1,31.4,15.7,1.9;
1,21.5,-14.4,1;
1,8.5,5.8,1.5;
1,40.6,5.8,1.8;
1,34.6,26.4,1.8;
1,19.9,26.7,2.3;
1,17.4,12.6,1.3;
1,54.7,14.6,1.7;
1,53.5,20.6,1.1;
1,35.9,26.4,2;
1,39.4,30.5,1.9;
1,53.1,7.1,1.9;
1,39.8,13.8,1.2;
1,59.5,7,2;
1,16.3,20.4,1;
1,21.7,-7.8,1.6];
X1=X(:,2);X2=X(:,3);X3=X(:,4);
corrcoef(X1,X2)
corrcoef(X1,X3)
corrcoef(X2,X3)
执行后得到结果:
ans =
1.0000 0.6409
0.6409 1.0000
ans =
1.0000 0.0467
0.0467 1.0000
ans =
1.0000 -0.3501
-0.3501 1.0000
可见corrcoef(X1,X2)=0.64,这说明,在做回归时,可以去掉列。根据经济意义,我们去掉列,再进行回归。
X=[1,-62.8,-89.5,1.7;
1,3.3,-3.5,1.1;
1,-120.8,-103.2,2.5;
1,-18.1,-28.8,1.1;
1,-3.8,-50.6,0.9;
1,-61.2,-56.2,1.7;
1,-20.3,-17.4,1;
1,-194.5,-25.8,0.5;
1,20.8,-4.3,1;
1,-106.1,-22.9,1.5;
1,-39.4,-35.7,1.2;
1,-164.1,-17.7,1.3;
1,-308.9,-65.8,0.8;
1,7.2,-22.6,2.0;
1,-118.3,-34.2,1.5;
1,-185.9,-280,6.7;
1,-34.6,-19.4,3.4;
1,-27.9,6.3,1.3;
1,-48.2,6.8,1.6;
1,-49.2,-17.2,0.3;
1,-19.2,-36.7,0.8;
1,-18.1,-6.5,0.9;
1,-98,-20.8,1.7;
1,-129,-14.2,1.3;
1,-4,-15.8,2.1;
1,-8.7,-36.3,2.8;
1,-59.2,-12.8,2.1;
1,-13.1,-17.6,0.9;
1,-38,1.6,1.2;
1,-57.9,0.7,0.8;
1,-8.8,-9.1,0.9;
1,-64.7,-4,0.1;
1,-11.4,4.8,0.9;
1,43,16.4,1.3;
1,47,16,1.9;
1,-3.3,4,2.7;
1,35,20.8,1.9;
1,46.7,12.6,0.9;
1,20.8,12.5,2.4;
1,33,23.6,1.5;
1,26.1,10.4,2.1;
1,68.6,13.8,1.6;
1,37.3,33.4,3.5;
1,59,23.1,5.5;
1,49.6,23.8,1.9;
1,12.5,7,1.8;
1,37.3,34.1,1.5;
1,35.3,4.2,0.9;
1,49.5,25.1,2.6;
1,18.1,13.5,4;
1,31.4,15.7,1.9;
1,21.5,-14.4,1;
1,8.5,5.8,1.5;
1,40.6,5.8,1.8;
1,34.6,26.4,1.8;
1,19.9,26.7,2.3;
1,17.4,12.6,1.3;
1,54.7,14.6,1.7;
1,53.5,20.6,1.1;
1,35.9,26.4,2;
1,39.4,30.5,1.9;
1,53.1,7.1,1.9;
1,39.8,13.8,1.2;
1,59.5,7,2;
1,16.3,20.4,1;
1,21.7,-7.8,1.6];
a0=0.25*ones(33,1);a1=0.75*ones(33,1);
y0=[a0;a1];
Y=log((1-y0)./y0);
X1=X(:,2);X2=X(:,3);X3=X(:,4);E=ones(66,1);
B=[E,X2,X3];
[b,bint,r,rint,stats] =regress(Y,B)
rcoplot(r,rint)
执行后得到:
b =
0.6594
-0.0177
-0.4676
bint =
0.2672 1.0516
-0.0226 -0.0127
-0.6702 -0.2649
r =
-0.3478
0.8917
-0.2159
0.4445
-0.0343
0.2408
0.5992
0.2170
0.8308
0.7358
0.3693
0.7342
-0.3497
0.9749
0.5361
-1.3769
1.6861
1.1584
1.3075
0.2755
0.1646
0.7451
0.8665
0.7961
1.1419
1.1068
1.1949
0.5489
1.0286
0.8256
0.6992
0.4153
0.9449
-0.8603
-0.5868
-0.4249
-0.5020
-1.1145
-0.4149
-0.6395
-0.5923
-0.7660
0.4688
1.2219
-0.4490
-0.7927
-0.4540
-1.2630
-0.0987
0.3509
-0.5921
-1.5450
-0.9541
-0.8139
-0.4498
-0.2107
-0.9275
-0.7051
-0.8796
-0.3563
-0.3306
-0.7441
-0.9530
-0.6992
-0.9299
-1.1478
rint =
-1.9280 1.2325
-0.7220 2.5054
-1.7877 1.3560
-1.1746 2.0636
-1.6382 1.5696
-1.3743 1.8558
-1.0189 2.2173
-1.3898 1.8237
-0.7833 2.4449
-0.8845 2.3561
-1.2496 1.9882
-0.8853 2.3537
-1.9330 1.2335
-0.6385 2.5883
-1.0852 2.1574
-2.1813 -0.5724
0.1435 3.2286
-0.4463 2.7631
-0.2909 2.9059
-1.3275 1.8785
-1.4460 1.7752
-0.8695 2.3597
-0.7514 2.4843
-0.8222 2.4144
-0.4645 2.7482
-0.4883 2.7020
-0.4091 2.7988
-1.0680 2.1659
-0.5813 2.6384
-0.7851 2.4364
-0.9163 2.3146
-1.1827 2.0132
-0.6638 2.5535
-2.4750 0.7543
-2.2082 1.0345
-2.0392 1.1894
-2.1230 1.1190
-2.7155 0.4865
-2.0332 1.2034
-2.2586 0.9795
-2.2133 1.0287
-2.3850 0.8531
-1.0894 2.0270
-0.1453 2.5892
-2.0695 1.1715
-2.4121 0.8268
-2.0716 1.1637
-2.8575 0.3315
-1.7076 1.5102
-1.1978 1.8995
-2.2135 1.0292
-3.1230 0.0331
-2.5686 0.6603
-2.4329 0.8052
-2.0699 1.1704
-1.8258 1.4044
-2.5407 0.6858
-2.3254 0.9152
-2.4908 0.7316
-1.9755 1.2629
-1.9490 1.2879
-2.3644 0.8761
-2.5643 0.6582
-2.3198 0.9215
-2.5383 0.6785
-2.7554 0.4598
stats =
0.4716 28.1175 0.0000 0.6681
以及残差图:
残差图仍然显示变量之间的相关性,这说明,最开始调查数据时,3个指标没有选好。最后得到:
将企业的具体数据代入的表达式计算,再结合
金融机构就可以知道,是否应该贷款给这家企业。
注:一个通常的Regress回归,可以用等参数评价回归结果的好坏,但对Logistic回归来说,不存在这样简单而令人满意的评价参数,所以,一般应该进行回归诊断。
Logistic回归的诊断
所谓的回归诊断,就是将的原始数据代入求得的回归方程中,计算值,看看有多少个由回归方程计算所得的值与原始的值不同,因而判断回归方程的好坏。
(1)用回归方程进行诊断。
①在Matlab软件包中,编程诊断
X=[1,-62.8,-89.5,1.7;
1,3.3,-3.5,1.1;
1,-120.8,-103.2,2.5;
1,-18.1,-28.8,1.1;
1,-3.8,-50.6,0.9;
1,-61.2,-56.2,1.7;
1,-20.3,-17.4,1;
1,-194.5,-25.8,0.5;
1,20.8,-4.3,1;
1,-106.1,-22.9,1.5;
1,-39.4,-35.7,1.2;
1,-164.1,-17.7,1.3;
1,-308.9,-65.8,0.8;
1,7.2,-22.6,2.0;
1,-118.3,-34.2,1.5;
1,-185.9,-280,6.7;
1,-34.6,-19.4,3.4;
1,-27.9,6.3,1.3;
1,-48.2,6.8,1.6;
1,-49.2,-17.2,0.3;
1,-19.2,-36.7,0.8;
1,-18.1,-6.5,0.9;
1,-98,-20.8,1.7;
1,-129,-14.2,1.3;
1,-4,-15.8,2.1;
1,-8.7,-36.3,2.8;
1,-59.2,-12.8,2.1;
1,-13.1,-17.6,0.9;
1,-38,1.6,1.2;
1,-57.9,0.7,0.8;
1,-8.8,-9.1,0.9;
1,-64.7,-4,0.1;
1,-11.4,4.8,0.9;
1,43,16.4,1.3;
1,47,16,1.9;
1,-3.3,4,2.7;
1,35,20.8,1.9;
1,46.7,12.6,0.9;
1,20.8,12.5,2.4;
1,33,23.6,1.5;
1,26.1,10.4,2.1;
1,68.6,13.8,1.6;
1,37.3,33.4,3.5;
1,59,23.1,5.5;
1,49.6,23.8,1.9;
1,12.5,7,1.8;
1,37.3,34.1,1.5;
1,35.3,4.2,0.9;
1,49.5,25.1,2.6;
1,18
展开阅读全文