资源描述
按一下以編輯母片標題樣式,按一下以編輯母片,第二層,第三層,第四層,第五層,Ming-chi Chen,社會統計,Page.,*,按一下以編輯母片標題樣式,按一下以編輯母片,第二層,第三層,第四層,第五層,社會統計,*,按一下以編輯母片標題樣式,按一下以編輯母片,第二層,第三層,第四層,第五層,Ming-chi Chen,社會統計,Page.,*,Stata教學,第四講,兩個樣本之間的比較,Ming-chi Chen,Page.,1,社會統計,打開85q1family.dta這個社會變遷基本資料調查第三期第二次家庭的Stata資料檔,因為中文相容性問題有一些亂碼,辨識不易,可以打開85q1_format.txt看變數名稱以及變數值名稱,以j2、j3為例,j2問受訪者拾.2.通常您平均每週大約花多少時間做家務工作?_ 小時,j3問受訪者拾.3.通常您的配偶平均每週大約花多少時間做家務工作?_小時,Ming-chi Chen,2,社會統計,我們的資料裡有變數標籤,但是因為相容性的關係會有亂碼,查看是否有亂碼?,Data-data editor,在j2這個變數名稱上click一下,下面一整欄的數值都反白了,滑鼠右鍵-variable-properties-label,出現的中文是通常您平均牢週大約花多少時間做家務工作,把亂碼改好,也將j3變數標籤的亂碼改好,Ming-chi Chen,3,社會統計,查看變數有無異常值,關掉Data editor視窗,用box plot來看有無極端值,Graphics-easy graphs-box plot-main-在variable的空格裡鍵入j2,Ming-chi Chen,4,社會統計,用box plot來看有無極端值,Ming-chi Chen,5,社會統計,在指令欄裡直接鍵入,Graph box j2,然後按enter,Ming-chi Chen,8,社會統計,Summarize,varname,detail,指令欄鍵入summarize j2,detail,或statistics-summaries,tables,&tests-summary statistics-summary statistics,Ming-chi Chen,9,社會統計,.,通常您平均每週大約花多少時間做家務工作?,-,Percentiles Smallest,1%0 0,5%0 0,10%0 0 Obs 1924,25%2 0 Sum of Wgt.1924,50%7 Mean 50.32692,Largest Std.Dev.191.1342,75%20 998,90%35 998 Variance 36532.28,95%70 998 Skewness 4.717707,99%996 999 Kurtosis 23.40378,太愛做家事了吧!,高得不合理,Ming-chi Chen,10,社會統計,Recode極端值,我們到85q1_format.txt去看,發現,J2 J3 996不知道 998不適用 999拒答,所以要把995以上定義為system missing,Recode j2 995/max=.,這裡的句點.就是Stata系統定義的缺失值。,Ming-chi Chen,11,社會統計,.summarize j2,detail,通常您平均每週大約花多少時間做家務工作?,-,Percentiles Smallest,1%0 0,5%0 0,10%0 0 Obs 1849,25%2 0 Sum of Wgt.1849,50%7 Mean 11.96106,Largest Std.Dev.15.30762,75%15 105,90%28 112 Variance 234.3232,95%36 168 Skewness 3.208555,99%70 168 Kurtosis 20.90302,一週只有168小時,所以應該合理換算,以一天16小時算,一週112小時,12,.inspect j2,j2:通常您平均每週大約花多少時間做家務工作 Number of Observations,-,Total Integers Nonintegers,|#Negative -,|#Zero 305 305 -,|#Positive 1544 1544 -,|#-,|#Total 1849 1849 -,|#.Missing 75,+-,0 168 1924,(47 unique values),用inspect來看大致分佈以及缺失個案數Data-describe data-inspect variables,13,Recode j2 168=112,Ming-chi Chen,14,社會統計,.inspect j2,j2:通常您平均每週大約花多少時間做家務工作 Number of Observations,-,Total Integers Nonintegers,|#Negative -,|#Zero 305 305 -,|#Positive 1544 1544 -,|#-,|#Total 1849 1849 -,|#.Missing 75,+-,0 112 1924,(46 unique values),15,.sum j2,detail,通常您平均每週大約花多少時間做家務工作?,-,Percentiles Smallest,1%0 0,5%0 0,10%0 0 Obs 1849,25%2 0 Sum of Wgt.1849,50%7 Mean 11.90049,Largest Std.Dev.14.79188,75%15 105,90%28 112 Variance 218.7996,95%36 112 Skewness 2.632377,99%70 112 Kurtosis 12.87359,16,.inspect j3,j3:通常您的配偶平均每週大約花多少時間做家 Number of Observations,-,Total Integers Nonintegers,|#Negative -,|#Zero 263 263 -,|#Positive 1661 1661 -,|#-,|#Total 1924 1924 -,|#.#Missing -,+-,0 999 1924,(54 unique values),17,.summarize j3,detail,通常您的配偶平均每週大約花多少時間做家務工作?,Percentiles Smallest,1%0 0,5%0 0,10%0 0 Obs 1924,25%4 0 Sum of Wgt.1924,50%14 Mean 278.8342,Largest Std.Dev.436.2336,75%996 998,90%998 999 Variance 190299.7,95%998 999 Skewness 1.03888,99%998 999 Kurtosis 2.085666,18,Missing value&recode,Recode j3 990/max=.,Recode j3 168=112,Ming-chi Chen,19,社會統計,.recode j3 168=112,(j3:4 changes made),.inspect j3,j3:通常您的配偶平均每週大約花多少時間做家 Number of Observations,-,Total Integers Nonintegers,|#Negative -,|#Zero 263 263 -,|#Positive 1144 1144 -,|#-,|#Total 1407 1407 -,|#.Missing 517,+-,0 150 1924,(50 unique values),20,.summarize j3,detail,通常您的配偶平均每週大約花多少時間做家務工作?,Percentiles Smallest,1%0 0,5%0 0,10%0 0 Obs 1407,25%2 0 Sum of Wgt.1407,50%7 Mean 14.49893,Largest Std.Dev.18.2296,75%21 112,90%35 112 Variance 332.3185,95%49 150 Skewness 2.569526,99%85 150 Kurtosis 12.65059,21,Recode j3 112/max=112,Tabulate j3,Ming-chi Chen,22,社會統計,70|10 0.71 98.29,80|3 0.21 98.51,84|6 0.43 98.93,85|1 0.07 99.00,90|1 0.07 99.08,98|4 0.28 99.36,100|1 0.07 99.43,105|1 0.07 99.50,112|7 0.50 100.00,-+-,Total|1,407 100.00,Ming-chi Chen,23,社會統計,來看看男女的差別,A1.這題是性別,男是1,女是2。,Data-data editor-找的A1這個變數-滑鼠右鍵,Variable-properties-label改成性別,Value label-define/modify-define-label name,輸入gender-OK-value鍵入1-text鍵入男-OK,value鍵入1-text鍵入男-OK-cancel-close-value label選擇gender-OK,關掉Data editor視窗,Ming-chi Chen,24,社會統計,男女的家務分擔是否有不同?,Statistics-Summaries,tables,&tests-tables-One/Two-way table of summary statistics,自變數,依變數,Ming-chi Chen,25,社會統計,差別很大嗎?,|Summary of,|通常您平均每週大約花多少時間做家務工作|,性別|Mean Std.Dev.Freq.,-+-,男|6.0485537 10.23684 968,女|18.330306 16.287017 881,-+-,Total|11.900487 14.791877 1849,Ming-chi Chen,26,社會統計,母體變異數未知但已知相等,Statistics-Summaries,tables,&tests-Classical tests of hypotheses-Group mean comparison tests,依變數,自變數,信賴水準,Ming-chi Chen,27,社會統計,.ttest j2,by(a1)level(99),Two-sample t test with equal variances,-,Group|Obs Mean Std.Err.Std.Dev.99%Conf.Interval,-+-,男|968 6.048554 .3290245 10.23684 5.199367 6.897741,女|881 18.33031 .5487235 16.28702 16.91382 19.7468,-+-,combined|1849 11.90049 .3439971 14.79188 11.01349 12.78748,-+-,diff|-12.28175 .6268771 -13.89815 -10.66535,-,diff=mean(男)-mean(女)t=-19.5920,Ho:diff=0 degrees of freedom=1847,Ha:diff 0,Pr(T|t|)=0.0000 Pr(T t)=1.0000,28,母體變異數未知但已知不相等,以上的方法是假設母體變異數未知但已知相等。,不管樣本大小,統計軟體一般用t檢定,那如果母體變異數未知但已知不相等,怎麼辦?,Ming-chi Chen,29,社會統計,母體變異數未知但已知不相等,Statistics-Summaries,tables,&tests-Classical tests of hypotheses-Group mean comparison tests,變異數不相等,自由度需要比較複雜,由Welch提出的運算方式,Ming-chi Chen,30,社會統計,男女性負擔家務工作時數的差異,在母體變異數未知但已知不等的情況下,.ttest j2,by(a1)unequal welch level(99),Two-sample t test with unequal variances,-,Group|Obs Mean Std.Err.Std.Dev.99%Conf.Interval,-+-,男|968 6.048554 .3290245 10.23684 5.199367 6.897741,女|881 18.33031 .5487235 16.28702 16.91382 19.7468,-+-,combined|1849 11.90049 .3439971 14.79188 11.01349 12.78748,-+-,diff|-12.28175 .6398083 -13.93195 -10.63155,-,diff=mean(男)-mean(女)t=-19.1960,Ho:diff=0 Welchs degrees of freedom=1456.62,Ha:diff 0,Pr(T|t|)=0.0000 Pr(T t)=1.0000,Ming-chi Chen,31,社會統計,變異數相等與否的Levene檢定,Statistics-Summaries,tables,&tests-Classical tests of hypotheses-Group variance comparison tests,依變數,自變數,Ming-chi Chen,32,社會統計,變異數相等與否的Levene檢定,.sdtest j2,by(a1)level(99),Variance ratio test,-,Group|Obs Mean Std.Err.Std.Dev.99%Conf.Interval,-+-,男|968 6.048554 .3290245 10.23684 5.199367 6.897741,女|881 18.33031 .5487235 16.28702 16.91382 19.7468,-+-,combined|1849 11.90049 .3439971 14.79188 11.01349 12.78748,-,ratio=sd(男)/sd(女)f=0.3950,Ho:ratio=1 degrees of freedom=967,880,Ha:ratio 1,Pr(F f)=1.0000,sd(男)/sd(女)不等於一,p值顯示可以拒斥變異數相等的虛無假設,Ming-chi Chen,33,社會統計,根據Levene檢定的結果,選擇變異數不相等的假設比較正確。,也就是男性分擔家務的時數顯著地少於女性。,Ming-chi Chen,34,社會統計,已婚未婚者的家務工作負擔的比較,A5為受訪者的婚姻狀況,1為未婚,2為已婚,3為其他,已婚者家務負擔比較大嗎?,Ming-chi Chen,35,社會統計,已婚未婚者的家務工作負擔的比較,仿照男女的比較,得到如下的錯誤回報,.ttest j2,by(a5)level(99),more than 2 groups found,only 2 allowed,r(420);,這是因為a5這個變數有三個變數值:未婚、已婚和其他,要用條件是來限制,僅比較未婚者和已婚者,Ming-chi Chen,36,社會統計,Statistics-Summaries,tables,&tests-Classical tests of hypotheses-Group mean comparison tests,Ming-chi Chen,37,社會統計,變異數相等,.ttest j2 if a5!=3,by(a5)level(99),Two-sample t test with equal variances,-,Group|Obs Mean Std.Err.Std.Dev.99%Conf.Interval,-+-,未婚|306 5.598039 .5156249 9.019752 4.261516 6.934562,已婚|1531 13.12671 .3912873 15.31029 12.11757 14.13586,-+-,combined|1837 11.87262 .3434793 14.7216 10.98695 12.75828,-+-,diff|-7.528675 .9051995 -9.862742 -5.194608,-,diff=mean(未婚)-mean(已婚)t=-8.3171,Ho:diff=0 degrees of freedom=1835,Ha:diff 0,Pr(T|t|)=0.0000 Pr(T t)=1.0000,Ming-chi Chen,38,社會統計,變異數不相等,.ttest j2 if a5!=3,by(a5)unequal welch level(99),Two-sample t test with unequal variances,-,Group|Obs Mean Std.Err.Std.Dev.99%Conf.Interval,-+-,未婚|306 5.598039 .5156249 9.019752 4.261516 6.934562,已婚|1531 13.12671 .3912873 15.31029 12.11757 14.13586,-+-,combined|1837 11.87262 .3434793 14.7216 10.98695 12.75828,-+-,diff|-7.528675 .6472826 -9.20044 -5.85691,-,diff=mean(未婚)-mean(已婚)t=-11.6312,Ho:diff=0 Welchs degrees of freedom=712.885,Ha:diff 0,Pr(T|t|)=0.0000 Pr(T t)=1.0000,Ming-chi Chen,39,社會統計,Levene檢定,.sdtest j2 if a5!=3,by(a5)level(99),Variance ratio test,-,Group|Obs Mean Std.Err.Std.Dev.99%Conf.Interval,-+-,未婚|306 5.598039 .5156249 9.019752 4.261516 6.934562,已婚|1531 13.12671 .3912873 15.31029 12.11757 14.13586,-+-,combined|1837 11.87262 .3434793 14.7216 10.98695 12.75828,-,ratio=sd(未婚)/sd(已婚)f=0.3471,Ho:ratio=1 degrees of freedom=305,1530,Ha:ratio 1,Pr(F f)=1.0000,無法拒斥變異數相等的虛無假設,Ming-chi Chen,40,社會統計,兩層群體的比較,已婚男女間,未婚男女間是否有差異?,婚姻是否不利於女性(至少就花在家務勞動上的時間而言)?,Ming-chi Chen,41,社會統計,變異數相等,Statistics-Summaries,tables,&tests-Classical tests of hypotheses-Group mean comparison tests,Ming-chi Chen,42,社會統計,.by a5,sort:ttest j2 if a5!=3,by(a1)level(99),-,-a5=未婚,Two-sample t test with equal variances,-,Group|Obs Mean Std.Err.Std.Dev.99%Conf.Interval,-+-,男|177 5.316384 .7992975 10.63396 3.234972 7.397796,女|129 5.984496 .5435252 6.173259 4.563295 7.405698,-+-,combined|306 5.598039 .5156249 9.019752 4.261516 6.934562,-+-,diff|-.6681119 1.04519 -3.377347 2.041123,-,diff=mean(男)-mean(女)t=-0.6392,Ho:diff=0 degrees of freedom=304,Ha:diff 0,Pr(T|t|)=0.5232 Pr(T t)=0.7384,多重比較變異數相等,43,多重比較變異數相等,-a5=已婚,Two-sample t test with equal variances,-,Group|Obs Mean Std.Err.Std.Dev.99%Conf.Interval,-+-,男|784 6.095663 .3493023 9.780465 5.193722 6.997605,女|747 20.50602 .6054935 16.54893 18.94238 22.06967,-+-,combined|1531 13.12671 .3912873 15.31029 12.11757 14.13586,-+-,diff|-14.41036 .6909184 -16.19227 -12.62845,-,diff=mean(男)-mean(女)t=-20.8568,Ho:diff=0 degrees of freedom=1529,Ha:diff 0,Pr(T|t|)=0.0000 Pr(T t)=1.0000,44,多重比較變異數,不,相等,.by a5,sort:ttest j2 if a5!=3,by(a1)unequal welch level(99),-,-a5=未婚,Two-sample t test with unequal variances,-,Group|Obs Mean Std.Err.Std.Dev.99%Conf.Interval,-+-,男|177 5.316384 .7992975 10.63396 3.234972 7.397796,女|129 5.984496 .5435252 6.173259 4.563295 7.405698,-+-,combined|306 5.598039 .5156249 9.019752 4.261516 6.934562,-+-,diff|-.6681119 .96659 -3.174232 1.838008,-,diff=mean(男)-mean(女)t=-0.6912,Ho:diff=0 Welchs degrees of freedom=292.466,Ha:diff 0,Pr(T|t|)=0.4900 Pr(T t)=0.7550,45,多重比較變異數,不,相等,-a5=已婚,Two-sample t test with unequal variances,-,Group|Obs Mean Std.Err.Std.Dev.99%Conf.Interval,-+-,男|784 6.095663 .3493023 9.780465 5.193722 6.997605,女|747 20.50602 .6054935 16.54893 18.94238 22.06967,-+-,combined|1531 13.12671 .3912873 15.31029 12.11757 14.13586,-+-,diff|-14.41036 .699024 -16.2138 -12.60693,-,diff=mean(男)-mean(女)t=-20.6150,Ho:diff=0 Welchs degrees of freedom=1199.87,Ha:diff 0,Pr(T|t|)=0.0000 Pr(T t)=1.0000,46,多層次比較變異數相等檢定,.by a5,sort:sdtest j2 if a5!=3,by(a1)level(99),-,-a5=未婚,Variance ratio test,-,Group|Obs Mean Std.Err.Std.Dev.99%Conf.Interval,-+-,男|177 5.316384 .7992975 10.63396 3.234972 7.397796,女|129 5.984496 .5435252 6.173259 4.563295 7.405698,-+-,combined|306 5.598039 .5156249 9.019752 4.261516 6.934562,-,ratio=sd(男)/sd(女)f=2.9673,Ho:ratio=1 degrees of freedom=176,128,Ha:ratio 1,Pr(F f)=0.0000 Pr(F f)=0.0000,47,多層次比較變異數相等檢定,-a5=已婚,Variance ratio test,-,Group|Obs Mean Std.Err.Std.Dev.99%Conf.Interval,-+-,男|784 6.095663 .3493023 9.780465 5.193722 6.997605,女|747 20.50602 .6054935 16.54893 18.94238 22.06967,-+-,combined|1531 13.12671 .3912873 15.31029 12.11757 14.13586,-,ratio=sd(男)/sd(女)f=0.3493,Ho:ratio=1 degrees of freedom=783,746,Ha:ratio 1,Pr(F f)=1.0000,48,Box Plot箱型圖的比較,Ming-chi Chen,49,社會統計,單身男性和已婚男性是否有差別?,單身女性和已婚女性是否有差別?,Ming-chi Chen,50,社會統計,配對樣本,結婚對女性不利?,前例的分析中,我們比較已婚者與未婚者從事家務時間的差異,由此來推論婚前婚後可能產生的變化。,但婚前組與婚後組是由不同受訪者所構成的獨立樣本。,如果是否結婚與某些人格特質有關,則我們不知道是因為婚姻本身造成行為上的改變,還是具有某種行為傾向的人比較容易選擇婚姻。即我們的分析可能隱藏自我選擇 self-selection的問題,Ming-chi Chen,51,社會統計,配對樣本,為了證明婚姻對從事家務時間的影響不是來自於自我選擇,更好的分析樣本為長期追蹤資料(longitudinal data),即能追蹤同一個受訪者,在婚前及婚後所產生行為上的變化。,但這種樣本資料的蒐集十分費時費力。,Ming-chi Chen,52,社會統計,配對樣本,夫妻之間從事家務的時間是否有顯著的差異?,我們可以用兩種方式來分析:,將已婚男性與已婚女性當作兩,獨立樣本,,比較所有先生的平均值與太太的平均值是否有差異?,Ming-chi Chen,53,社會統計,配對樣本,但夫妻從事家務的時間不是獨立事件,先生多分擔,太太自然可以少做。,因此應該比較同一家庭中,夫與妻從事家務的時間是否有差異,而不是比較所有的夫的平均值與所有妻的平均值。,Ming-chi Chen,54,社會統計,Statistics-Summaries,tables,&tests-Classical tests of hypotheses-Mean comparison tests,paired data,1st-2nd,Ming-chi Chen,55,社會統計,夫妻之間的家務分工,Paired t test,-,Variable|Obs Mean Std.Err.Std.Dev.99%Conf.Interval,-+-,j2|1380 12.80652 .3971524 14.75356 11.78211 13.83094,j3|1380 14.32391 .4762999 17.69376 13.09535 15.55248,-+-,diff|1380 -1.517391 .6578304 24.43732 -3.214199 .1794161,-,mean(diff)=mean(j2-j3)t=-2.3067,Ho:mean(diff)=0 degrees of freedom=1379,Ha:mean(diff)0,Pr(T|t|)=0.0212 Pr(T t)=0.9894,配偶間相減,但是是妻減夫還是夫減妻?,僅知夫妻間有差異,比配偶少,且達顯著水準,Ming-chi Chen,56,社會統計,配對樣本,如果要比較先生與太太從事家務時間的多寡,則應該如何分析?,男女分開分析,Ming-chi Chen,57,社會統計,產生新的變數並定義其計算式,Generate h_work=(j3 j2),Replace h_work=(j2 j3)if a1=2,Ming-chi Chen,58,社會統計,One sample mean comparison test,Statistics-Summaries,tables,&tests-Classical tests of hypotheses-one sample mean comparison test,Ming-chi Chen,59,社會統計,.ttest h_work=0,level(99),One-sample t test,-,Variable|Obs Mean Std.Err.Std.Dev.99%Conf.Interval,-+-,h_work|1380 15.90435 .5009808 18.61061 14.61212 17.19658,-,mean=mean(h_work)t=31.7464,Ho:mean=0 degrees of freedom=1379,Ha:mean 0,Pr(T|t|)=0.0000 Pr(T t)=0.0000,已婚女性的負擔,Ming-chi Chen,60,社會統計,質化變數(比例)的比較,K1問如果母親外出工作,對還沒上學的小孩比較不好。,1非常贊成,2贊成,3不贊成,4非常不贊成,5無意見,6不知道,7不瞭解題意,9拒答,0未答,Recode k1(1 2=1)(3 4=0)(else=.),把這個依變數變成1和0兩個數值而已。,Ming-chi Chen,61,社會統計,Statistics-Summaries,tables,&tests-Classical tests of hypotheses-Group proportion test,Ming-chi Chen,62,社會統計,.prtest k1,by(a1)level(99),Two-sample test of proportion 男:Number of obs=935,女:Number of obs=861,-,Variable|Mean Std.Err.z P|z|99%Conf.Interval,-+-,男|.7754011 .0136478 .7402468 .8105554,女|.7584204 .0145876 .7208453 .7959956,-+-,diff|.0169806 .0199765 -.0344753 .0684366,|under Ho:.0199596 0.85 0.395,-,diff=prop(男)-prop(女)z=0.8507,Ho:diff=0,Ha:diff 0,Pr(Z z)=0.1975,Ming-chi Chen,63,社會統計,
展开阅读全文