1、,单击此处编辑母版标题样式,单击此处编辑母版文本样式,第二级,第三级,第四级,第五级,20046,*,为何要做统计分析?,统计分析目标是应用,样本,资料信息,作出相关,研究总体,有效推测。,应用,概要性指标,描述样本资料来实现。,这些概要性指标保留了,足够信息,去预计研究总体特征。,6,1,第1页,关于总体临床研究问题,在发展中国家,人工喂养相比母乳喂养能否增加母亲为,HIV,阳性婴儿生存率?,怎样建立一个心脏搭桥手术后生存率模型?病人特征能否预测术后生存率?相比内科治疗,搭桥手术后1,3,5年生存率能否改进?,局部治疗小肝癌能否代替外科手术切除?,根治术后应用大剂量干扰素能否降低肝癌复发率?
2、,6,2,第2页,今天主题,总体,样本和个体,资料类型:,Continuous vs.categorical,怎样描述资料?统计量,和图,测量集中趋势和离散趋势,标准误和,95%,可信区间,依据数据选择适当统计方法,诊疗试验评价,6,3,第3页,总体,样本和个体,“,Aristotle maintained that women have fewer teeth than men;although he was twice married,it never occurred to him to verify this statement by examining his wives mouth
3、s.”-Sir Bertrand Russell,The Impact of Science on Society,1952.,“It is a capital mistake to theorize before you have data.”-Sir Arthur Conan Doyle,Scandal in Bohemia,.,6,4,第4页,总体,样本和个体,And,for another viewpoint:,“If your experiment needs statistics,you ought to have done a better experiment.”Ernest
4、Rutherford.,The bench science perspective:you can control all the variables!Clinicians,however,know better human variation is large,and often inexplicable.Statistics help us describe it and generalize at least enough to improve our ability to practice medicine.,6,5,第5页,总体,样本和个体,Aristotle,推测了一个,女性总体,
5、(,比较男性总体,).,他实际上手头就有一个包含,2个女人样本,,,他能对这个样本中2个,个体,进行数牙。,The,population,is the collection of all people about whom you would like to ask a research question.This might be a fairly clear-cut easily defined set of people:,“What proportion of people 65 or older in the US today have Alzheimers disease?”,Or
6、 it might be a more hypothetical group:,“How much of a reduction in symptomatic days could a person expect if treated with a new antiviral for flu?”,6,6,第6页,总体,样本和个体,实际上,我们不可能去研究总体中每一个对象。,所以,我们研究一个,样本,并将其推广到整个人群,。,样本量,是样本中,个体,数目,(,而不是对每个研究对象测量指标数目,!),好研究设计能帮助我们得到一个 代表性好样本。,好统计分析能帮助我们取得关于总体问题答案。,6,7,
7、第7页,例子:,HCC,裸鼠转移模型,免疫重建,对照组,CD3,31.5%,14.2%,CD4,XX,XX,CD8,XX,XX,*2个水平:裸鼠 细胞,6,8,第8页,今天主题,总体,样本和个体,资料类型:,Continuous vs.categorical,怎样描述资料?统计量,和图,测量集中趋势和离散趋势,标准误和,95%,可信区间,依据数据选择适当统计方法,诊疗试验评价,6,9,第9页,数据类型,计量资料,Quantitative:“how much?”,连续变量,:年纪,体重,身高,血压,实际数值,:家庭儿女数,住院天数,分类资料,Categorical:“what type?”,等级
8、变量,:,肿瘤分期,(I,II,III);,好,中,差,名义变量:男,/,女,;,健康,/,生病,;ABO,血型,6,10,第10页,数据类型,数据类型转换,计量数据可转换成份类数据:,normal(value)vs.abnormal;,“young,middle-aged,old”,将连续变量转换成等级变量降低了资料信息量,从而造成统计学检验敏感度或把握度下降,6,11,第11页,今天主题,总体,样本和个体,资料类型:,Continuous vs.categorical,怎样描述资料?统计量,和图,测量集中趋势和离散趋势,标准误和,95%,可信区间,依据数据选择适当统计方法,诊疗试验评价,6
9、,12,第12页,Notes:,vertical axis can be count or percent,in the above example,counts do not add to 74 individuals can have multiple risk factors,tabular presentation may be more parsimonious for such data,N=74,分类资料统计描述,计数,百分比,6,13,第13页,分类数据统计描述,组成比,率,百分比,vs,率,标化,6,14,第14页,下面是一组年纪数据(11例),21,32,34,34,42,
10、44,46,48,52,56,64,年纪是一个计量变量,所以假如用条图就不适当。我们更感兴趣是,年纪分布,一些特征:,年纪分别中心点在哪里?如平均数,年纪变异又是怎样,?,是不是有些数据跟绝大部分数据差得很多(,outliers),借助视觉工具帮助我们回答这些问题,.,定量数据统计描述,6,15,第15页,计量数据统计描述,图表,1.Stem and Leaf plot,2.Histogram,3.Boxplot,数字,1.Location-mean,median,mode.,2.Spread-range,variance,standard deviation,,percentile,3.Sh
11、ape-skewness,*,例外:生存资料描述,6,16,第16页,We could group the data and tally the frequencies:,But why“hide”the details?Instead,well use the 10s place as stems and the units as leaves:,20:,X,30:XXX,40:XXXX,50:XX,60:X,2*|1,3*|244,4*|2468,5*|26,6*|4,Stem and Leaf Diagram,stem&leaf plot,For small datasets,6,17,
12、第17页,Examples,平均数,方差,中位数,百分位数,outlier,6,18,第18页,今天主题,总体,样本和个体,资料类型:,Continuous vs.categorical,怎样描述资料?统计量,和图,测量集中趋势和离散趋势,标准误和,95%,可信区间,依据数据选择适当统计方法,诊疗试验评价,6,19,第19页,集中趋势,算术平均数:,几何平均数,中位数,6,20,第20页,平均数和中位数比较,Mean is sensitive to a few very large(or small)values-“outliers”,Median is“resistant”to outlie
13、rs,Mean is attractive mathematically,50%of sample is above the median,50%of sample is below the median.,6,21,第21页,离散趋势,Variation is important!,6,22,第22页,离散趋势,方差,标准差,百分位数:,IQR=Q,.75,-Q,.25,6,23,第23页,今天主题,总体,样本和个体,资料类型:,Continuous vs.categorical,怎样描述资料?统计量,和图,测量集中趋势和离散趋势,标准误和,95%,可信区间,依据数据选择适当统计方法,诊疗试
14、验评价,6,24,第24页,标准误和,95%,可信区间,描述样本:平均数,标准差,?总体:,为了预计总体平均数,需要计算,标准误,标准误标准差,/,样本量,总体均数95,CI:,样本平均数,1.96*,标准误,论文中惯用,6,25,第25页,标准差,vs,均数标准误,(when do you use one,but not the other?),标准差,用于描述:量化样本均数周围变异,.,当确定两个样本是否来自于同一总体时,标准差是一个主要统计量。,Central limit theorem,;“,同一总体中样本均数呈正态分布,”,样本均数,标准误,用于样本均数预计总体均数。,标准误是一个主
15、要统计量,用于计算样本均数可信度,取决于标准差和样本量。但实际上二者并不独立,当样本量增加时,标准差往往降低,。,6,26,第26页,正态分布,(,basis of statistical inference for many populations ),Mean=median=mode.all=same value in the distribution,remember,:,68,.3%of data is between -1.00 s.d.and +1.00 s.d.,95.0,%“-,1.96,s.d.and +,1.96,s.d.95.5%“-2.00 s.d.and +2.00
16、s.d.99.7%“-3.00 s.d.and +3.00 s.d.,6,27,第27页,今天主题,总体,样本和个体,资料类型:,Continuous vs.categorical,怎样描述资料?统计量,和图,测量集中趋势和离散趋势,标准误和,95%,可信区间,依据数据选择适当统计方法,诊疗试验评价,6,28,第28页,推断性统计,推广结论:,样本,总体,评价证据强度,比较,预测,6,29,第29页,计量资料统计方法,正态分布,非正态分布,配对资料(2组),配对,t,检验,符号检验,符号等级检验,成组比较 (2组),成组比较,t,检验,Wilcoxon Mann&Whitney,中位数检验,配
17、伍组比较,随机区组方差分析,非参数配伍组比较,M,检验,多组比较,完全随机设计方差分析,非参数多组比较,H,检验,6,30,第30页,列联表分析,行,名义变量,等级变量,名义变量,普通联络:,Pearsons 2,行平均得分:,(,趋势分析),等级变量,行平均得分:,2,(,趋势分析),相关分析:,cmh:,2,列,*四格表是全一致,6,31,第31页,Make predictions:,回归分析,应变量:,普通定量变量 线性分析,等级或名义变量,Logistic,回归,时间变量,Cox,回归,6,32,第32页,Descriptive epidemiology:pattern of occu
18、rrencePrevalence of HIV+and community Mosquito index,r =.83 r-squared=.92 *,p .001 p .001 *,*,*,*,*,20,15,10,5,0,0 2 4 6 8 10 12 14 16 18 20 22,Index of community mosquito infestation,HIV+,6,33,第33页,今天主题,总体,样本和个体,资料类型:,Continuous vs.categorical,怎样描述资料?统计量,和图,测量集中趋势和离散趋势,标准误和,95%,可信区间,依据数据选择适当统计方法,诊疗
19、试验评价,6,34,第34页,诊疗试验评价,试验设计,6,35,第35页,诊疗试验设计,6,36,第36页,诊疗试验评价,金标准,有病,金标准,无病,试验,a,b,试验,c,d,敏感度,a/a+c,特异度,d/b+d,阳性预测值,a/a+b,阴性预测值,d/c+d,阳性拟然比敏感度,/1,特异度,阴性拟然比1敏感度/特异度,6,37,第37页,医学论文中通常报道哪些?,大多数研究报道,平均数,(正态)或,中位数,(非正态),有些研究报道标准差和/或标准误。,Be careful!,有时会看到图中有一个,error bar,could be either.,假如资料非正态,(,偏态,多峰,尾巴很
20、长或很短等),往往报道中位数和百分位数,而不是均数和标准差,.,写文章时一定有根根本研究所要回答问题:,Do you want to ask about the average or typical person?Or do you want to figure out how unusual your patient might be?,6,38,第38页,通常,流行病学(科学)路径,1.,确定,一个,问题,:clinical suspicion;case series;review of medical literature,2.,组织,一个,假设,(asking the right qu
21、estion);good hypotheses are:Specific,Measurable,and Plausible,3.,检验,假设,(assumptions vs.type of data),4.,再,验证,always,Question,the VALIDITY of the result(s):Chance;Bias;and Causality,6,39,第39页,结论,准确性,Chance,:role of,random,error in outcome measure(s),(p-value;power of the study and the confidence interval),-largely determined by sample size,Bias,:role of,systematic,error in outcome measure(s),Selection,bias -subjects not representativ,Information,bias -error(s)in subject data/classification,Confounding,-3rd variable(causal)assoc.w/both X and Y,6,40,第40页,