
Baidu   百度     


联络  Contact




















疾病─伤残─健康 三个层面

社会─心理─生理 三个维度

设计─实施─评价 三个过程

分布─因素─干预 三个方面

观察─实验─理论 三个方法


传统评价 指标

疾病负担 指标

生活质量 指标

卫生绩效 指标

生态环境 指标












混沌流行病学 (模糊数学、灰色理论)

系统流行病学 (暴露组学、遗传组学、蛋白组学、代谢组学等)









表观遗传学:是指不涉及DNA序列改变的遗传物质修饰所致的基 因表达水平的变化,主要包括:





1992: 循证医学诞生

Evidence-Based Medicine A New Approach to Teaching the Practice of Medicine Evidence-Based Medicine Working Group JAMA 1992; 268: 2420-2425.

“一种新的医学实践模式正在兴起... ...”    二十一世纪医学界最流行的一句话:   证据在哪里?循证医学


提出问题 > 查找研究证据 > 评估证据质量 > 评估效果(大小、可信度) > 评估该结果外推性━> 依此决策



北京大学公共卫生学院 李立明老师的 《流行病学绪论》




第一章 定量资料的统计描述
第一节 基本概念
第二节 定量资料的统计描述
第三节 常用统计图表
第四节 DPS软件简介
第二章 分类资料的统计描述
第一节 分类资料的频数分布
第二节 分类资料常用相对数
第三节 卫生统计常用相对数
第四节 率的标准化
第五节 动态数列及其分析指标
第三章 概率分布
第一节 二项分布
第二节 Poisson分布
第三节 正态分布与标准正态分布
第四节 正态分布的应用
第四章 参数估计
第一节 抽样分布与抽样误差
第二节 t分布
第三节 总体均数与总体概率的估计
第五章 假设检验
第一节 假设检验原理
第二节 单样本正态资料的假设检验
第三节 两样本正态资料的假设检验
第六章 方差分析
第七章 列联表资料分析
第八章 非参数检验
第九章 相关与回归
第十章 Logistic回归
第十一章 时间序列分析
第十二章 综合评价
第十三章 生存分析
第十四章 调查设计
第十五章 实验设计
附录一 统计用表
附录二 课外练习题
附录三 模拟测验试卷







          极差;四分位数;方差variance;标准差standard deviation

          变异系数coefficient of dispersion (CV)-标准差与匀数之比(S/X-m),属于相对数,便于资料间的比较。



『计量性资料』 的统计推断

【正态分布 normal distribution

正态分布Gaussian distribution-标准正态分布standard normal distibution(移动分布曲线的高峰至坐标O处,u分布);对数正态分布log-normal distribution于(对一些偏态分布的数据进行对数处理,可获得正态分布进行处理)

【总体匀数的估计和假设检验 hypothesis testing/significance tset

统计推断statistical inference-包括二个方面:参数估计和假设检验


考虑抽样误差,以interval estimation(区间估计)按一定概率估计总体匀数。

抽样误差sampling error(样本匀数与总体匀数之间的误差);

标准误standard error(样本匀数的标准差);




Two difference as the d for calculating SS and S-error  [two marching data or sample vs. population (value of theoretical, or normal or from large of group, could assume as one as standard and two mean should be as Ho for no difference to test)]; or the two groups have the same number of samples (n1=n0 always)

1)Mean of sample with total mean of population:  

样本匀数与总体匀数比较的 t 检验: 山区健康男子的脉搏是否高于一般

t=(X1m-X0m) / (s/n^1/2)   

i.g  t=(74.2pulse1-72pulse0)/6.5/25^1/2=1.692;  v=n-1=25-1=24; t value table for t

2) Mean of two sample marched: 


t=(d1-d0) / (s/n^1/2;  d= d1-d2;  s=[(sum-(d)^2-(sum-d)^2/n]/(n-1)   

it is same as t=(X1-X0) / (s/n^1/2)  i.g.  treatment before vs. after

Others (or n1=/=n0):

3) Mean of two samples (or two groups):   


t=(X1-mean-X0-mean) / Sx1-x0;  Sx1-x0=[Sc/(1/n1+1/n0)]^1/2;  v=n1+n2-2

Sc(合并方差combined or pooled variance) = [ (sum-(x1)^2-(sum-x1)^2/n1+(sum-(x0)^2-(sum-x0)^2/n0]/(n1+n0-2) ]

i.g. health vs. sick; the mean of two representative of the each group of the population as considered.

4) Mean of two log(mean) sample:  

两样本几何匀数比较的 t 检验:抗体滴度

t=(X1-X0) / Sx1-x0; =>  t=(lgX1-lgX0) / Sx1-x0 by log calculation; data is as the relationship with times or doubles.

5) 两大样本匀数比较的 u 检验 When n is large, i.e. >=50; Using the U-test; u=(x1m-x0m)/[(s1^2/n1+s0^2/n0)]^1/2

t-test A B Diff x^2
1 3550 2450 1100 1210000
2 2000 2400 -400 160000
3 3000 1800 1200 1440000
4 3950 3200 750 562500
5 3800 3250 550 302500
6 3750 2700 1050 1102500
7 3450 2500 950 902500
8 3050 1750 1300 1690000
sum 26550 20050 6500
M=6500/8 812.5 42250000 7370000
SS [7370000-422500/8]/(8-1)= 298392.9 sqrt= 546.2535
t=(M-u)/(s/n^1/2) 4.207016 546.25/n^1/2 193.1298
3318.75 2506.25 812.5



1) when camparing two small samples: 1st F--test; F=SS-large/SS-small; if F is refused Ho checked, then t'-test,

两小样本匀数(二组病例)比较的 F 检验(判断方差是否齐);方差不齐=>再做 t' 检验

 t'= (x1-x0)/[(s1^2/n1+s0^2/n0)]^1/2;   

i.g. t'= (x1-x0)/[(s1^2/n1+s0^2/n0)]^1/2=(6.21-4.34)/[(1.79)^2/10+(0.56)^2/50]=3.272  (s1^2=0.3204; s0^2=0.006272)

ta'=[s1^2 t(a,u1)+s0^2 t(a,u2)]/(s1^2+s0^2 ); i.g. v1=10-1=9; v0=50-1=49, get t0.05=2.262 and t0.005=2.009 (two-side); t0.05=[(0.3204)(2.262)+(0.006272)(2.009)]/(0.3204+(2.009)=2.257

ta' vs. t':  t0.05=2.257, t'=3.272; p<0.05, with a=0.05, refuse Ho, accept H1


假设检验:hypothesis under test (null hypothesis) 

Ho (例如,二样本匀数相等)和 alternative hypothesis H1于( 例如,二样本匀数有相差,并非抽样误差引起); a--level to check


P<=a,结论按所取检验水准拒绝Ho,接受H1 ;表示符合(预设为样本与总体之间的Ho假设的检验是小概率事件。




二个样本匀数的比较用t检验或u检验: 适应于方差齐homoscedasticity(二总体方差相等的关系)。



第一类错误type I error(拒绝了实际成立的Ho);第二类错误type II error(不拒绝了实际不成立的 Ho)。


【方差分析 analysis of variance (ANOV)




(X)^2 (X)^2 (X)^2 (X)^2-sum
1 3.3 10.89 4.4 19.36 3.6 12.96 11.3 127.69
1 3.6 12.96 4.4 19.36 4.4 19.36 12.4 153.76
1 4.3 18.49 3.4 11.56 5.1 26.01 12.8 163.84
1 4.1 16.81 4.2 17.64 5 25 13.3 176.89
1 4.2 17.64 4.7 22.09 5.5 30.25 14.4 207.36
1 3.3 10.89 4.2 17.64 4.7 22.09 12.2 148.84
Exi 22.8 519.84 25.3 640.09 28.3 800.89 76.4 5836.96
SUMi/N ni 6 86.64 6 106.6817 6 133.4817 18 324.2756
Mi 3.8 4.216667 4.716667 4.244444
(Xa)^2 87.68 107.65 135.67 331
SUMi/N (Xexi)^2/6 86.64 106.6817 133.4817 326.8033
N=18 k=3 n=17,V=2,V=15
Ma-b-c N-k=15 k-1=2 (min 1 for free degagree
C: (X)^2-sum  /N 324.2756 SSsum 331-324.2756= 6.724444
SS-total  =331-324 6.724444 V-total=18-1=17 17 SSbwt 326.8033-324.2756= 2.527778
SS-btw  =326.8033 -C 2.527778 Vk=k-1=3-1=2 2 SSinr SSsum-SSbwt= 4.196667
SS-inner SSt - SSb 4.196667 Vi=N-Vk=18-3=15 15
MSbtw SSbtw / Vbwt= 2.527778/2 1.263889 MSbwt=SSbwt/Vb.2 1.263889
MSi Ssi/Vi 0.279778 MSinr=SSinr/Vi15 0.279778
F=MSbte/MSinr 4.517474
F=MSbtw/Msi 4.517474


1) 完全随机设计的多样本匀数的比较:常见比较几种不同疗法对某病的疗效,不同因素或物质造成的影响;

SS(--inner)=SS(-total)-SS(-between);  SS=[Sum(X)^2+(sumX)^2/n]/n-total or/ n-no of group or/ n-no of individual of each group) respectively;

SS(-total)=[sum (each total of group)^2-[(sum (each of group)]^2/n-(tatal) [C]

SS(-between)=[(group1)^2+(group2)^2+(group3)^2]/no of each group have (i.g, each group of 3 group have 6 individual, then 6, not 3)-[(sum (each of group)]^2/n-(tatal)[C]

MS-btw=SS-btw/k; Ms-inner/v; k=n-2(when total of 3 groups);  v=n-k;  (to get the Mean or single SS-err, using n-1, v, k; otherwise, n used)

F=MS-btw/Ms-inner;  (v-btw and v-inner to get F value)  (Note: this example; SS-bwt divided by 6, but MS-bwt divided by 2 since only 3 groups and minus 1) 




SS-total=SS-treat+SS-btw-maching+SS-inner; one more group than 1)'s handling; except SS-inner, all others minus [C] value to get SS-...

comapring F1, if no difference, combine it to get MS-error, and finally for F=MS-treat/MS-error;


march Contro A B C D Sum
1 1.4 1.96 4.1 16.81 1.9 3.61 1.8 3.24 2 4 11.2 125.44
2 1.5 2.25 3.6 12.96 1.9 3.61 2.3 5.29 2.3 5.29 11.6 134.56
3 1.5 2.25 4.3 18.49 2.1 4.41 2.3 5.29 2.4 5.76 12.6 158.76
4 1.8 3.24 3.3 10.89 2.4 5.76 2.5 6.25 2.6 6.76 12.6 158.76
5 1.5 2.25 4.2 17.64 1.8 3.24 1.8 3.24 2.6 6.76 11.9 141.61
6 1.5 2.25 3.3 10.89 1.7 2.89 2.4 5.76 2.1 4.41 11 121
Sum 9.2 84.64 22.8 519.84 11.8 139.24 13.1 171.61 14 196 70.9 5026.81
14.2 87.68 23.52 29.07 32.98 840.13
N,k, 6 6 6 6 6 30 167.5603 168.026 Vc=k-1=5-1=4
M 1.533333 3.8 1.966667 2.183333 2.333333 2.363333      b=6 0.465667 5 in horental griop, but data are 6 of sum in cloum u
14.2 87.68 23.52 29.07 32.98 187.45 187-167= 19.88967
14.10667 86.64 23.20667 28.60167 32.66667 185.2217 185-167= 17.66133
19.8-17.6 2.228333 1.762667
5 total treatment with 6 group F= 17.7/2.2 7.925804
SSsum: 167.5603 Vsum=N-1=29
SStreat 17.66133 V=k-1=5-1=4
SScontrol 0.465667 V=b-1=6-1=5
SSerr=Sssum-Sstreat-Sscontrl 1.762667 Verr=Vtotal-vtreat-Vcontrl=29-4-5=20
MStrat=Sstreat/Vtreat 4.415333
MScotrl=SScontrl/Vcontrl 0.093133
MSerr=SSerr/Verr 0.088133
F=MScontrl/MSerr 1.056732 V1=5,V2=20, F value, P>0.05)
it is not refuse the Ho, Therefore, combine Sscontrl
Sserr=0.466+1,762=2.228, Verr=5+20=25
V1=4,v2=25 p<0.01, a=0.05, accept H1

多个样本之间的二二比较(大鼠经五种染尘湿肺重的二二比较) q (Newman-Keuls) testing methods

q (Newman-Keuls) testing methods
q=|Xa-Xb|/Sa-Sb two mean/Sderr
#1 #2 #3 #4 #5
I.e. mean 3.8 2.333333 2.183333 1.966667 1.533333
#1 to #5 3.8-1,533= 2.266667 (A)
M.Serr= 0.088133 /n(6) 0.014689 sqrt E40 0.121198 (B)
(A)/(B)= 18.70222 (q)
t-test X1^2 (++X1)^2 X2^2 (++X2)^2
14.2 84.64 14.10667 87.68 519.84 86.64 SS 0.113333
1.533333 3.8 MS 2.266667
t= 20



3) 多个方差的齐性检验(Bartlett):方差分析的应用条件是总体的方差相等,在方差分析前,常需进行多个方差的齐性检验。通过样本(理论上均来自正态分布)信息推断各总体方差是否相等。特别是在样本方差相差悬殊时,提醒需要注意这个问题。可有F检验,t'检验,Bartlett法。

X^2=2.3026{[lg(sumSSi/sum(ni-1)]sum(ni-1)-sum(ni-1)lgsi^2}/{1+1/3(k-1)[sum(1/(ni-1)-1/sum(ni-1)]}, v=k-1;  si^2--^2 each of them; SSi--sum each of them


X^2 not same
Bartett Method A A^2 B B^2 C C^2
2.3026 value 1 3.8 14.44 5.6 31.36 1.5 2.25
2 9 81 4 16 3.8 14.44
3 2.5 6.25 3 9 5.5 30.25
4 8.2 67.24 8 64 2 4
5 7.1 50.41 3.8 14.44 3 9
6 11 121 4 16 5.1 26.01
7 11.5 132.25 6.4 40.96 3.3 10.89
8 9 81 4.2 17.64 4 16
9 11 121 4 16 2.1 4.41
10 7.9 62.41 7 49 2.7 7.29
81 6561 50 2500 33 1089
sum 656.1 737 250 274.4 108.9 124.54 sum
n=10 SS 8.988889 SS 2.711111 SS 1.737778 13.43778
lgSS 0.953706 0.433147 0.239994 1.626848

NO.         A         B          C

1          3.8       5.6        1.5 ( should be include 4 digits, but omit)

2          ....      ....       ....            (v=3-1=2)

10         ....      ....       ...              Sum:

si^2       8.98      2.71       1.73             13.4378

lgsi^2    0.95       0.43       0.24             1.6268




一是变量变换,对数变换(transformation of logarithm),平方根变换(transformation of square root)(Poisson分布,例如,放射性计数),百分数的平方根反正玄变换(transformation of inverse sine(总体百分数较小,<30%;或较大,>70%,便离正态更为明显,通过p的平方根反正玄变换,使资料接近正态分布,达到方差齐性要求)


n not same A B C
1 0.61 0.3721 0.79 0.6241 0.62 0.3844
2 0.72 0.5184 1.01 1.0201 1.01 1.0201
3 0.72 0.5184 1.24 1.5376 1.18 1.3924
4 1.02 1.0404 1.41 1.9881 1.47 2.1609
5 0.88 0.7744 1.54 2.3716 1.35 1.8225
6 1.36 1.8496 1.75 3.0625 1.52 2.3104
7 1.44 2.0736 1.94 3.7636 1.81 3.2761
8 1.9 3.61 2.54 6.4516 1.86 3.4596
9 1.82 3.3124 3.05 9.3025 2.37 5.6169
10 2.16 4.6656 2.76 7.6176
12.63 159.5169 15.27 233.1729 15.95 254.4025 sum
Sum 15.95169 18.7349 25.9081 30.1217 25.44025 29.0609 26
n=26 10 2.78321 9 4.2136 10 3.62065 10.61746
SS 0.309246 SS 0.5267 SS 0.402294 1.23824
(ni-1)x lgSS -4.58727 -2.22749 -3.5591 -10.3739
1/(ni-1) 0.111111 0.125 0.111111 0.347222
X^2=2.3026[26lg(10.6064/26)-(-10.3835)]/{1+1/3(3-1)[0.3472-1/26]}=0.567 v=3-1=2 X^2 value




【正态性检验 (test of normality)


矩法method of moment(coefficient of skewnesspian 偏度系数 coefficient of kurtosis 峰度系数); Using middle of u=0, ^3 and ^4 to get the [skewness偏态; 正态峰mesokurtosis; 尖哨峰leptokurtosis; 平阔峰platykurtosis] for understanding.

D检验(顺序统计量D作正态性检验,用D界值表); D=sum[(n+1)/2-i][Xi...-Xi]/(n^3[sumX^2-(sumX)^2/n]^1/2   ( i for x ranking from small to large)

(Note: the methods are only for testing the distrubution)

【相对数 relative number


常用:rate(率);constituent ratio(构成比); relative ratio(相对比)

dynamic series(动态数列);例如,发展速度,增长速度;定基(选一个特定时期为基数);环基(选上一个时期为基数


标准化率standarized rate/adjustment rate; 直接法(由标准人口构成比计算);间接法(由标准率计算 ,i.g. 标准年龄死亡率)




二项分布binominal distribution: 二项分布的概率函数; 分布函数; 分布图形; 匀数与标准差。


步骤:计算概率;概率当频数;求频数(SUM F)和及平方和(SUM F(X)^2

Independent and no-relationship events; i.g. rate of death: 0.8; then the rate of live would be 0.2 (1-0.8); if A, B, C, 3 of them for observing, the chance would be: [(1-p)+p]^n=[(1-0.2)+0.2]^3=(0.8+0.2)^3=0.2^3+3(0.2)^2(0.8)+3(0.8)^2(0.2)+0.8^3=0.008+0.0096+0.384+0.512=1. 

If the number is large, the rate (frequency) could be considered as U-type of distribution for the statistic analysis. 

P(x)=[(n!/(X!(n-X)!][(1-p)^n-x] [P^x], x=1,2,...,n    i.g.  n=no of sample; x=rate of frequency; 


10 samples and ask how the chance of 8 with rate of 20%:   P(8)=[10!/(8!(10-8)!][0.8)^10-8][0.2^8]

X         f=P(x)         fx          f(x^2)

0         0.008          0            0

1          0.096         0.096     0.096

2          0.384         0.768     1.536 

3          0.512         1.536     4.608 

sum        1.000         2.400      6.240

Mean=sum fx / sum f = 2.400/1=2.4 ;   Standard S={[sum f(x^2)-(sum fx)^2/sum f]/sum f} ^1/2 = [6.24-(2.4)^2/1]/1=0.69;  if rate, S,=[p(1-p)/n]^1/2


1) 总体率的区间分布;find the vale from the table;  (p-UaSp,  p+UaSp)


n=10; X=8; X>n/2; x=10-8=2,  from table, 3-56(+可信限区间); 100-3=97; 100-56=44; 95%可信限区间44-97%

2) 样本率与总体率比较; if the p is far from the 0.5 and the positive rate quite small (i.e. not common diseases), it could be directly calculate by the formula.


p=0.01; 1-p=0.99; n=400;ask not over 1%; Ho: p=0.01; H1: p<0.01

P=sum[p(X)]=(0.99)^400+[400!/1!(400-1)!](0.99)^(400-1)*(0.01)=0.0905; a=0.05, not refuse Ho

(2) 正态近似法(n>50):胃溃疡出血20%;某医院65岁以上304例,31.6%胃溃疡出血,是否高?u=(0.316-0.20)/[(0.2(1-0.2)/304]^1/2=5.06 单侧:高

3) 二样本率比较的u检验; u=(p1-p2)/[Pc(1-Pc)(1/n1+1/n2)],   Pc=(X1+X2)/(n1+n2), u-table for value

男生80,感染23,感染率28.75% 女生85,感染13,感染率15.29%;感染率有无差别?

Pc=(23+13)/(80+85)=0.2182;  u=(0.2875-0.1529)/[0.2182(1-0.2182)(1/80+1/85)]^1/2=2.0921;   0.05>p>0.02, a=0.05, refuse Ho



二项分布的特例:概率小(<0.05),每次观察单位数n很大,离散形分布。(i.g. distribution of bacteria) 分子扩散型

概率函数p(X); P(x)=e^-r[r^x/X!];   展开式(容易计算些): P(0)=e^-r(0!=1);   P(X=1)=p(x) [r/(x+1)], x=1.2....   r=mean of population.  

分布图形;r is higher, i.g. r=20, close to normal distribution. 

分布特性和应用条件。离散型分布,适用于计数资料。单位时间,单位面积或容积的观察类事件。if there are mutual effectiveness between the events, i.g. infectious diseases or  不均匀的细菌培养,NOT IN the scope of APPLICATION. Similar with  二项分布, but when P is small and n is large, they are much more close. easy to calculation.


1)总体匀数的估计;  X+-Ua(X)^1/2  X=Mean of sample


X=8(sample), from table, r 95%: 3.4--15.8;  95%可信区间 3.4--15.8个/100cmm

2)样本匀数与总体匀数比较;  By formula:  P(0)=e^-r; P(X=1)=p(x) [r/(x+1)], x=1.2....   r=mean of population.  

(1) i.g. 疫苗严重反应率1/1000,现有150人,有2人严重反应,问是否高于一般。 


Ho:r=r0(假设一样或不高于正常人群); P(0)=e^-0.15[0.15^2/0!=0.8607;  P(1)=e^-0.15[0.15^2/1!=0.1291;    P=1-[P(0)+P(1)]=1-(0.8607+0.1291)=0.0102;  p=0.0102, a=0.05, refuse Ho.

(2) 正态近似法:u=(X-r0)/(r0)^1/2 ;When X>=20, ok.  

某病年死亡率7.58/10万;作3年回顾调查,得29人死亡,该人口年龄结构与总体无区别,问死亡率有无差别?H0: u=u0;  u=(29-7.58*3)/(7.58*3)^1/2=1.3127. 0.20>P>0.10, a=0.05, accept Ho

3)二样本率比较的 u 检验。  When X>=20, ok. 

AB10样本,各样本中取1ML培养,A菌落890个,A菌落785个,菌落有无差别? u==(X1-X2)/(X1-X2)^1/2=(890-785)/(890+785)^1/2


X^2检验 chi-square test


1) 四格表;行X列表资料;配对资料;频数分布拟合优度;四格表的确切概率法。


X^2 = SUM [(A-T)^2/T]; T=Theoretical frequency; A=Actual frequency; 

Chi-square test
a check 52 57.17699 19 13.82300885 71 0.732394
b compare 39 33.82301 3 8.17699115 42 0.928571
Sum 91 22 113
Theory 0.80530973  +rate 0.19469   -rate
correction for continuity:
1: -o.5 upper calc.


Treat         (+)                           (-)                            sum

T-1           52(57.18) (a)           19(13.82) (b)            71               (T-no: 57.18=71*91/113;   13.82=71*22/113)

T-0           39(33.82) (c)            3(8.18) (d)             42               (T-no: 33.82=42*91/113;    8.18=42*22/113)

sum          91                              22                113

X^2=[(52-57.18)^2/57.18+(19-13.82)^2/13.82+(39-33.82)^2/33.82+(3-8.18)^2/8.18]=6.48;  0.025>p>0.01, a=0.05, (p<a), refuse Ho, accept H1

or  X^2=[(ad-bc)^2*n]/[(a+b)*(c+d)*(a+c)*(b+d)]

(1) 1<T<5, n>40;  (2) T<1 or n<40; need a correction for formula: X^2 = SUM [(|A-T|-0.5)^2/T];  X^2=[(|ad-bc|-n/2)^2*n]/[(a+b)*(c+d)*(a+c)*(b+d)]

Basically, it assumed there were no difference. When theoretical frequency is same as actual frequency, then Ho should be accepted. However, it is difficult to get the real data for T-0, therefore, put some kind of data as T-0 supposed for comparing with T-1, if difference, it could be tested. Simply, it is used the total number as standard to adjusted the number (rate) for comparing. To sum all of each difference to get the test result.

2) More than 2x2:两个以上样本率(或构成比)之间有无差异


mult row/clm A- B+ Sum  +Rate B+/Sum
a 6 23 29 0.793103
b 30 14 44 0.318182
c 8 3 11 0.272727
sum 44 40 84
rate 0.52380952 0.47619 v=(3-1)(2-1)=2

X^2=n*[sum (A^2)/(nR*nC)-1]


注意事项:理论频数不宜过小;多个样本率(或构成比)的检验,只能说总体之间有差别,而不能说明彼此之间都有差别;如果是效应类的(+ ++ +++等等),宜用秩和检验。


3) 非X^2检验的类似四格表法(exact probabilities in 2x2 table):


When T<1 or n<40, this method is suitable. It is a method for early period since the computer not so popular or available.




4)判断是否适合于X^2,二项式分布,POISSON分布的频数分布拟合优度的X^2检验(good-of-fit test)




X A(sum(f) X*A r/X P(x) Acum P(x) T=n*P(x) (A-T)^2/T
0 26 0 0 0.08291 0.08291 24.873 0.051065
1 51 51 2.49 0.206446 0.289356 61.9338 1.930254
2 84 168 1.245 0.257025 0.546381 77.1075 0.616108
3 70 210 0.83 0.213331 0.759712 63.9993 0.562637
4 42 168 0.6225 0.132798 0.89251 39.8394 0.117175
5 15 75 0.498 0.066134 0.958644 19.8402 1.180811
6 9 54 0.415 0.027445 0.986089 8.2335 0.071358
7 3 21 0.355714 0.013911 1 4.1733 0.329867
28 300 747 558009 1 5.515602 300 4.859275
1860.03 0.01838534 1
n=300 Mean 2.49  r


【秩和检验rank sum test

t检验和 F检验都是假定正态分布,推断二个或多个总体匀数(系正态总体参数)是否相等,称为参数统计parametric statistics

秩和检验--非参数统计nonparametric statistics,不依赖于总体分布的具体形式,有广泛适应性。


1) 配对比较的符号 (T界表)

步骤: Ho,H1,a; => Difference in Value; => Arrangement from Small to large => Absolutely value for numbering; same value marked as two number (i.g. -6, -6  for No.5 and N0.6; if +6 and -6, marked as (|-6|+6)=12, 12/2=6; "0" has no number for marking) => Obtained the total for negative value and the totoal of positive value, slect the small one as T value 


(1) => if n<=50, from the table to find the T value; find the p, a=0.05 or a=0.01 for checking

(2) if n>50, u=[(|T-n(n-1)/4|-0.5)]/[(n(n+1)(2n+1)/24)]^1/2;  The similar with  u-distribution. 

If the two negative and positive number is close or equal, making an adjustment:u=[(|T-n(n-1)/4|-0.5)]/[(n(n+1)(2n+1)/24)-sum(tj^3-tj)/48]^1/2;

2) 配伍设计的多样本比较(另一种秩和检验方法,M界表);


Date                 A                 B                  C

Feb         11.4(3)          5.8(2)           3.5(1)              (No. by order)

Apr         6.4(1)           8.6(3)           7.5(2) 

Jun         ...               ...                  ... 

Aug         ...               ...                  ...

Oct                  ...                ...                   ...

Dec                 ...                ...                    ...

Ri         14        12         10           (total of No.)

R-avrg           12                 12                 12             (14+12+10)/3   average M

(Ri-R)^2        2^2               0                   (-2)^2       =8 (M value)

(1) M value from M table

(2) X^2 method: Xr^2=12M/[bk(k+1)] (average M),  X^2 table for value, k=3-1, b=6 (Feb-Dec); if the observing number with a lot of the same value and large n, the adjustment is need. Xr^2 (divided by) Xr^2=Xr^2 / {1-sum[(ti^2-tj)]/[bk(k^2-1)]}


3) 二样本比较;



                          Sample  A                                               Sample B

                       a        a'             b        b'

                    5       (1)             17     (9)              (No. by order)

                      5        (2)             18     (10.5) 

                    ...      ...             ...     ...

                    ...      ...             ...      ...

         Sum       ni=10    Ti=59.5            nj=7    Tj=93.5

步骤: Ho,H1,a; => like table above to get the small n for T value

(1) if ni<=20 and ni-nj<=10; T table for the value

(2) if not as (1) above, u={[|T1-n1(N+1)/2]-0.5}/{[n1n2(N+1)/12]^1/2}   N=n1+n2; get the u value from the u table. {n1(N+1)/2}为平均秩和,而T为样本秩和。

if a lot of same order number, an adjustment is needed.  u={[|T1-n1(N+1)/2]-0.5}/{[n1n2/12N(N-1)][(N^3-N-sum(ti^2-tj)]}^1/2




例如,二组肝炎血清胆红素的比较: 原数据分组段(一个栏a),比较组(二个组)的实际数按原数据分组段分别(分别列成二个栏b,c),并同组段相加(为一栏d),再得出秩的相应分段分布(一个栏e),按秩的相应分段上下值得出秩的组中值(一个栏f),秩和(二组各一栏);最低部分别标出ni nj Ti Tj  Using the formula to get u value.



H=[12/N(N+1)]*[sum (Ri^2/ni)-3(N+1)],  if large n and a lot of the same order number. an adjustment: /[1-sum(tj^3-tj)/(N^3-n)];  from X^2table to get value.


H=[12/(N(N+1)] * [sum(Ri^2/ni)-3(N+1)]/[1-sum(tj^3-tj)/(N^3-N)]; 




Based on H-test above, for further t-test between the two samples.

t=(Ra-Rb)/{[N(N+1)(N-1-H)/12(n-k)]*[(1/na+1/nb)]}  Ra=Ra/na ; Rb=Rb/nb; average order number;t value from t table



『直线回归liner regression:两个变量之间的关系』

I 回归

1)To get: Y=a+bX

例如,年龄或IgG浓度(自变量,independent variable,作X轴)与血压或IgG浓度表现出的环圈大小(应变量,dependent variable,作Y轴)

区别于一般函数而称为直线回归方程: y=a+bXa截距(intercept); b回归系数(regression coefficient),即斜率(slope)。

b=sum(X-x)(Y-y)/sum(x-x)^2=Lxy/Lxx,   a=y-bx;  x, y,=mean; Lxx=X离均差平方和;  Lxy ( X and Y 离均差积和) =sum[(X-x)(Y-y)]=sumXY-(sumX)(sumY)/n

a(original) b c(original) d e f g g^2
IgG(X) X^2 沉淀环直径(Y) Y^2 X*Y Y Liner Y-Y(变量的一部分)
1 1 4 16 4 4.14 -0.14 0.0196
2 4 5.5 30.25 11 5.26 0.24 0.0576
3 9 6.2 38.44 18.6 6.38 -0.18 0.0324
4 16 7.7 59.29 30.8 7.5 0.2 0.04
5 25 8.5 72.25 42.5 8.62 -0.12 0.0144
sum 15 55 31.9 216.23 106.9 31.9 -1.8E-15 0.164
225 1017.61
n=5 45 203.522 Y=3.02+1.12X= Y-Y-liner=
mean 3 6.38
Lxx=sum(X-Xmean)^2=Sum(X^2)-(sumX)^2/n= 55-(15)^2/5= 55-45=10
Lyy=sum(Y-Ymean)^2=Sum(Y^2)-(sumY)^2/n= 216.23-203.52= 12.708
Lxy=106.9-(15)(31.9)/5=11.20 11.2
a=6.38-1.12(3)=3.02 SS-total=sum[(Y-Y(m)]^2=12.708
SS-regression=(11.2)^2/10=12.544 SS-residiual=SS-total - SS -regression=12.708-12.544=0.164


[It may estimate there are the relationship of the liner regression. To get sum(X), sum(Y), sum(X^2) and sum (Y^2), furthur to get sum(XY)]


2) 回归系数的假设检验:方差分析或t检验


两个变量之间的依存关系 ;(例如,IgG vs. 环圈大小);先求方程中的a和b,列出方程,将原始数据(X)代入方程得一系列值(方程解Y),求出原始数据的(Y)与方程解)之差,得差和及其方差和。F或t检验。

进行预测forecast (回归方程的重要方面。例如,乙脑发病率(预报量Y)与日照时间的光线,将发病率作平方根反正玄变换,在已知现有日照时间(预报因子X),预测乙脑发病率,附可信区间);

统计控制statistical control  (例如,一定关系下控制汽车流量, 一种逆运算);以Y为最大控制量,Y=Y(与X相对应的量,可用线性方程代入)+-检验水准的值*标准误(Lxy), 即一个Y的最大符合a=0.05的值逆向解出X值。


(1) Take the two number of X (smaller one and lager one, i.g. x1=1,y1=4.14; x2=5,y2=8.62) to draw the liner regression. 

To check: the liner regression must included the mean of X and Y; and the extend the line would be a.

(2) To test: 

SS-total sum of square (Lyy, Y离均差平方和,总平方和, sum[(y-y(m)]^2

=SS-regression sum of square {[Ly(y变量的一部分)-y(mean)], 总平方和中可用X解释的部分, sum [(y(变部)-y(m)}^2}

+SS-residual sum of square {Ly-y(y变量的一部分), sum[(y-y(变部)]^2 ;   

Vtotal=Vregr+Vremain;   Vt=n-1, Vreg=1, Vrem=n-2 



SS-residiual=SS-total - SS -regression=12.708-12.544=0.164

(1)F=(SSreg/Vreg)/(SSrem/Vrem); t=(b-0)/Sb=b/(Syx/Lxx^1/2);  Syx=sum(Y-y)^2/(n-2)=[SSrem/(n-2)]^1/2; v-reg=1, v-residual=3, p<0.01, a=0.05

n=5; v-total=5-1=4; v-regression=1, v-residual=3, F=[(12.544/1)/(0.164/3)]=229.32; p<0.01

(2) t=(b-0)/Sb=b/[(Sxy/Lxx^1/2)]=1.12/[0.2338/(10^1/2)]=15.149; Sxy=[0.164/(5-2)]^1/2=0.2338  v=3 the p<0.001 a=0.05

t=(F)^1/2=(229.23)^1/2=15.143,  same results even with F, p<0.01 and t, p<0.001


『直线相关liner correlation, simple correlation:两个变量之间 bivariate normal distribution 是否有直线相关的关系』

II 回归

正相关positive correlation;负相关negative correlation;相关程度degree of relationship。直线相关系数的假设检验常用t检验。

1) positive correlation; OR  negative correlation; 


2) 直线相关系数 correlation coefficient

r={[sum(X-Xm)(Y-Ym)]/[(sum(X-Xm)^2*sum(Y-Ym)^2]^1/2}=Lxy/(lxx*lyy)^1/2 ;  r>0, positive correlation; r<0, negative correlation; r=1, totally correlation;


3) r have the same analysis value of b, but easy to calculate.  SS-regression=r^2*SS-total; r^2, coefficient of determination. r^2=(0.2)^2=0.04, it's meaning SS-reg only have 4% of SS-total, no actual regression significance.

3) 等级相关 rank correlation

Spearman method:  r-s=1-6Sum(d)^2/n(n^2-1);  d为每对观察值X Y秩次之差,n对子数

rank correlation
X rank x Y rank y d=Xr-Yr d^2
1 0.7 1 21.5 3 -2 4
2 1 2 18.9 2 0 0
3 1.7 3 14.4 1 2 4
4 3.7 4 46.5 7 -3 9
5 4 5 27.3 4 1 1
6 5.1 6 64.6 9 -3 9
7 5.5 7 46.3 6 1 1
8 5.7 8 34.2 5 3 9
9 5.9 9 77.6 10 -1 1
10 10 10 55.1 8 2 4
sum= 42
r-s=1-6(42)/10(10^2-1)=0.745 n=10 and r-s Table. P=0.02, a=0.05, refuse Ho and accept H1(positive correlation)
if there are too many tie rank needed an adjustment. r'-s={[(n^3-n)/6}-(Tx+Ty)-sum(d)^2| / [(n^3-n)/6-2T)^1/2]*[(n^3-n)/6-2Ty]^1/2

4) When using, must be with actual meaning between X and Y; Before analysis, make or draw a figure; May not be the factors of course

5) 曲线直线化


Tips: variable transformation to  make rectification -> curve fitting; 

步骤:Select a type of curve, i.g. exponent curve; => Change the Y (Y=A*(B)^xto lgY (lgY=a+bX);=> 最小二乘法求出方程;=>将直线华方程转换为曲线式,作曲线图;(关键:求出X-mean, Y-mean, sum(X), sum(X)^2, sum(Y) sum(Y)^2 


i.g. rank correlation above, 

curve fitting X-distance X^2 Y-density lgY Lxy=X*lgY 0.9906^x Y-from regression
50 2500 0.687 -0.16304 -8.15216 0.9361 0.623614 0.583766
100 10000 0.398 -0.40012 -40.0117 0.9361 0.388895 0.364045
150 22500 0.2 -0.69897 -104.846 0.9361 0.242521 0.227023
200 40000 0.121 -0.91721 -183.443 0.9361 0.151239 0.141575
250 62500 0.09 -1.04576 -261.439 0.9361 0.094315 0.088288
300 90000 0.05 -1.30103 -390.309 0.9361 0.058816 0.055058
400 160000 0.02 -1.69897 -679.588 0.9361 0.022873 0.021412
500 250000 0.01 -2 -1000 0.9361 0.008895 0.008327
sum 1950 637500 1.576 -8.2251 -2667.79
n=8 243.75 3802500 (1950)^2 0.197 -1.02814
X-mean=sum(X)/n=1950/8=243.75 Lxy=sum(X-X-mean)*(sum(Y-Y-mean)=sum(XY)-sum(x)*sum(Y)/n
Y-mean=sum(Y)/n=-8.2251/8=-1.0281 -384.15
Lxx=sum(X^2)-(sumX)^2/n=637500-(1950)^2/8=162187.5 162187.5
Lxy=sum(Xy-(sum(X)*(sum(Y)/n=-2667.7887-(1950)(-8.2251)/8=-662.92 instead of Y-density of original data, using lgY
lgY=-(0.0287+0.0041X) or Y=10^-(0.0287+0.0041X)=0.9361(0.9906)^x    =POWER(0.9906,X)
Usin X and Y-from regression to draw a figure.




【多元线性回归与相关 multistage linear regression


应用:某些因素与某一现象的数量关系(气温,湿度与发病率);进行因素分析(病因中相对重要的因素);进行预测forecast和统计控制statistical control

Y=bo+b1X1+b2X2+...bnXn; b1,b2, partial regression coefficient; 例如,肺活量与身高,体重的关系; (矩阵法得方程)


(1) 确定系数,建立方程。

(2) 检验相关:多元回归方程的线性如若F检验不拒绝H0,应进一步检验Y与每个自变量是否有线性回归关系,还是仅与其中部分自变量有线性关系。如若检验与某自变量无关,则进一步求系数建立新回归方程并作新回归系数的检验。假设检验:  F=[SS-regression/m]/[SS-residual/(n-m-1)

(3) 如若F检验不拒绝H0,应进一步检验Y与每个自变量是否有线性回归关系,还是仅与其中部分自变量有线性关系。偏回归系数的假设检验: ti=(bi-Bi) / { [SS-residual/(n-m-1)}^1/2} *Cii^1/2;  Cii 矩阵C的主对角线上第i行i列的元素。

(4) 如若检验与某自变量无关,则进一步求系数建立新回归方程并作新回归系数的 t 检验。

Multiply Liner Regression:
Syx=(2.5800/27)^1/2= 0.095556 0.309122 t=b/Syx/(Lxx)^1/2=0.0597/(0.3091/(857.1179)^1/2=5.6 29.27658 0.010559 5.65296 (t value) t-table: P<0.001, a=0.05, refuse Ho, accept H1; Y has the liner relatinship with X2
The new test for the new liner regression: Ho: B=0; H1, B=/=0, a=0.05; SS-total= 5.63362069 SS-regression=Lxy/Lxx=(51.1595)^2/857.1179=3.0536 3.05359692 SS-resideual= 2.580024
X2 regression: Y= -0.00917  + 0.0596878 X2 ( if X1, regression:   Y= -2.60854  + 0.031561 X1 X1: t-value= 3.78105715  ) V-residual=n-2=29-2 27
Liner Regression: Y=-0.0096+0.0597X2    <= b=51.1595/857.1179= 0.059688 a=2.2069-(0.0597)(37.1276)= -0.00917
(if X1 tested: Y=-2.60854+0.031561X1 Lxx=L11=67706.4-(4424.7^2/29)= 1957.953103 Lxy=L1y=9826.25-(4424.7)(64/29)= 61.79483 X-m-X1m 152.5758621 b=61.79/1957 0.031561 a= -2.60854 )
Lxx=L22=40832.39-(1076.7^2/29)=857.1179 857.11793 Lxy=L2y=2427.325-(1076.7)(64)/29=51.1595 51.15948 X-m=X2-m 37.12758621 Y-m= 2.206897
Renew the formula of the liner regression be excluding the X1:   n= 29 sum(X2)= 1076.7 sum(X2)^2 40832.39 sum(X2*Y) 2427.325 Y^2= 146.875
(if excluding X2:   n= ) sum(X1)= 4424.7 sum(X1)^2 677060.4 sum(X1*Y) 9826.65 Y^2= 146.875 )
t1, P>0.50; t2, 0.005>p>0.002;  a=0.05, refuse t1, accept t2; X2(kg) has the relationship of regression, X1(cm) has no relationship of regree\ssion.  Then , excluded the X1 to find the new liner regrssion. X1: SS-regr 1.950302
t1=0.0050/0.3134*0.001137^1/2=0.47 0.474357391 0.010575 0.033716655 t2=0.0541/0.3134*0.002597^1/2=3.38 3.382248819 0.01598377 0.05096 X1: SS-resideual:
Sy.12..m=[SS-residual/(n-m-1)]^1/2= 0.1568281 SS-redsidual= 2.557887 0.098380259 0.31365628 0.136419 3.683318
X1^1/2 0.36935 44.24876
t={(bi-Bi)/[SS-residual/(n-m-1)]^1/2 * (Cii)^1/2]}=bi/Sy.12…m (Cii)^1/2;    Sy.12…m=[SS-residual/(n-m-1)]^1/2 t-testing for b1, b2;  Ho: B1=0, B2=0; H1: B1=/=0, B2=/=0; a=0.05 X1: Syx/Lx 0.008347
When H1 accept, it need to find the further relationship between Y & X1 or Y & X2 respectively. Testing the coefficient; Ho: B1=0, t-testing.
F=(3.0800/2) / (2.5536/26) = 15.63187 Testing multiply liner regression p<0.01, a=0.05;  if refuse Ho, or accept H1; indicatred the liner regression significant and it is effective. (Ho B1=0;B2=0)
(Ho: b1=0; b2=0, b3=0… the whole or all regression coefficient was assumed zero; it means no relationship of regression)
SS-total=SS-regression+ SS-residual SS-total=sum(Y-Y-m)^2=sum(Y)^2-[sum(Y)]^2/n=Lyy= 5.63362069 144.3171 141.2413793 3.07573395
SS-regression=sum(Y-change-Y-m)^2=b'*B-[sum(Y)]^2/n= (-0.565664  0.005017  0.054061)(64  9836.65  2427.325)-(64)^2/29= 3.07573395 2:X1 & X2 V-regression=2 2
SS-residual=5.63362069-3.0800= 2.557886739 V-residual=29-2-1=26 26
b0 -0.56566 -0.56566 a1  b1  c1 A1  B1  C1 a1  b1 A1*a1+B1*a2+C1*a3 A1*b1+B1*b2+C1*b3
b= b1  =A^-1*B 0.005017 0.005017 a2  b2  c2 C=A^-1=a1b2c3+a2b3c1+a3b1c2-a3b2c1-a2b1c3-a1b3c2 A2  B2  C2 a2  b2 A2*a1+B2*a2+C2*a3 A2*b1+B2*b2+C2*b3
b2 0.054061 0.054061 a3  b3  c3 0 x a3  b3
29 4424.7 1076.7 64 15.63236 -0.12611 0.098131
A= 4424.7 677060.4 165239.8 B= 9826.65 C=A^-1= -0.12611 0.001137 -0.00128
(X' * X) 1076.7 165239.8 40832.39 (X' * Y) 2427.325 (X' * X)^-1 0.098131 -0.00128 0.002597
341801577.34 2757383 2145643 ######### -2757383.373 2145643 ######### -2757383 2145643 9.91E+09 -12200594211 2310213475
2757383.373 24856.42 27879.71 -2757383.4 24856.42 -27879.7 -2757383.4 24856.42 -27879.7
2145642.681 27879.71 56780.64 2145642.7 -27879.71 56780.64 2145642.7 -27879.7 56780.64 21865007.1
sumN^2-(sumN)^2/n 5.633621 1957.953103 857.1179
Square( N^2) / n 141.2414 675102.4169 39975.27
Average(m) 2.206897 152.5758621 37.12759
sum -basic 29 64 146.875 8.88E-16 4424.7 677060.4 4424.7 1076.7 40832.39 1076.7 165239.8 9826.65 2427.325
X3(L) =Y Y^2 Y-Y-m X1(cm) X1^2 X1-X1-m X2(Kg) X2^2 X2-X2-m X1 * X2 X1*Y X2 * Y
1 1.75 3.0625 -0.4569 135.1 18252.01 135.1 32 1024 32 4323.2 236.425 56
2 2 4 -0.2069 139.9 19572.01 139.9 30.4 924.16 30.4 4252.96 279.8 60.8
3 2.75 7.5625 0.543103 163.6 26764.96 163.6 46.2 2134.44 46.2 7558.32 449.9 127.05
4 2.5 6.25 0.293103 146.5 21462.25 146.5 33.5 1122.25 33.5 4907.75 366.25 83.75
5 2.75 7.5625 0.543103 156.2 24398.44 156.2 37.1 1376.41 37.1 5795.02 429.55 102.025
6 2 4 -0.2069 156.4 24460.96 156.4 35.5 1260.25 35.5 5552.2 312.8 71
7 2.75 7.5625 0.543103 167.8 28156.84 167.8 41.5 1722.25 41.5 6963.7 461.45 114.125
8 1.5 2.25 -0.7069 149.7 22410.09 149.7 31 961 31 4640.7 224.55 46.5
9 2.5 6.25 0.293103 145 21025 145 33 1089 33 4785 362.5 82.5
10 2.25 5.0625 0.043103 148.5 22052.25 148.5 37.2 1383.84 37.2 5524.2 334.125 83.7
11 3 9 0.793103 165.5 27390.25 165.5 49.5 2450.25 49.5 8192.25 496.5 148.5
12 1.25 1.5625 -0.9569 135 18225 135 27.6 761.76 27.6 3726 168.75 34.5
13 2.75 7.5625 0.543103 153.3 23500.89 153.3 41 1681 41 6285.3 421.575 112.75
14 1.75 3.0625 -0.4569 152 23104 152 32 1024 32 4864 266 56
15 2.25 5.0625 0.043103 160.5 25760.25 160.5 47.2 2227.84 47.2 7575.6 361.125 106.2
16 1.75 3.0625 -0.4569 153 23409 153 32 1024 32 4896 267.75 56
17 2 4 -0.2069 147.6 21785.76 147.6 40.5 1640.25 40.5 5977.8 295.2 81
18 2.25 5.0625 0.043103 157.5 24806.25 157.5 43.3 1874.89 43.3 6819.75 354.375 97.425
19 2.75 7.5625 0.543103 155.1 24056.01 155.1 44.7 1998.09 44.7 6932.97 426.525 122.925
20 2 4 -0.2069 160.5 25760.25 160.5 37.5 1406.25 37.5 6018.75 321 75
21 1.75 3.0625 -0.4569 143 20449 143 31.5 992.25 31.5 4504.5 250.25 55.125
22 2.25 5.0625 0.043103 149.4 22320.36 149.4 33.9 1149.21 33.9 5064.66 336.15 76.275
23 2.75 7.5625 0.543103 160.8 25856.64 160.8 40.4 1632.16 40.4 6496.32 442.2 111.1
24 2.5 6.25 0.293103 159 25281 159 38.5 1482.25 38.5 6121.5 397.5 96.25
25 2 4 -0.2069 158.2 25027.24 158.2 37.5 1406.25 37.5 5932.5 316.4 75
26 1.75 3.0625 -0.4569 150 22500 150 36 1296 36 5400 262.5 63
27 2.25 5.0625 0.043103 144.5 20880.25 144.5 34.7 1204.09 34.7 5014.15 325.125 78.075
28 2.5 6.25 0.293103 154.6 23901.16 154.6 39.5 1560.25 39.5 6106.7 386.5 98.75
29 1.75 3.0625 -0.4569 156.5 24492.25 156.5 32 1024 32 5008 273.875 56



多元线性回归与相关 multistage linear regression

复相关系数(多元相关系数,全相关系数coefficient of multiply correlation)


R^2 measured or indicated how correlated between the X1, X2, ...and Y.

R^2=SS-regression / SS-total =1-SS-residual/SS-total

R = (SS-regression / SS-total)^1/2 =(1-SS-residual/SS-total)^1/2

R between 0--1; 1-indicated most correlation.

多元线性回归与相关 multistage linear regression

复相关系数(多元相关系数,全相关系数coefficient of multiply correlation)

R^2 measured or indicated how correlated between the X1, X2, ...and Y.

R^2=SS-regression / SS-total =1-SS-residual/SS-total

R = (SS-regression / SS-total)^1/2 =(1-SS-residual/SS-total)^1/2

R between 0--1; 1-indicated most correlation.

i.e. R=(3.08/5.63)^1/2=0.74




F=[R^2/(1-R^2)]*[(n-m-1)/m)]; n:number of sample; m: number of X; n-m-1: v of residual;


i.e. Ho: Total coefficient=0, (no regression); H1: Total coefficient=/=0, (with regression); a=0.05;

F=[(0.74)^2/(1-0.74^2)]*[(29-2-1)/2)]=15.679; F-table: P<0.01, refuse Ho, accept H1;


F Meaning: with multiply liner regression


偏相关系数(Coefficient of partial correlation)

问题:简单相关并不能一定(或往往不能)真实反映一个变量与一个因变量之间的关系,只有使其它变量固定,即扣除了其它变量的影响,计算二变量之间的关系才能反映此二变量之间的真实关系。例如,身高(X1CM) 体重(X2KG) 都与肺活量(YL)有相关,但在多回归分析中(偏相关系数检验),固定体重(X2KG)变量,则身高(X1CM) 变量与肺活量(YL)不相关,是体重(X2KG)变量起主要作用,而身高(X1CM) 变量与肺活量(YL)的相关分析中,因为身高(X1CM)与体重(X2KG)有关,体重(X2KG)因素起了相关的作用。控制一个变量的,为一级相关系数;如本例为一级相关系数。




(1)XiY的离均差平方和;LiiLyy离均差积和;Lij Liy的值;




r 12^3=[r12-r13*r23]/[(1-r13^2)(1-r23^2)]^1/2




















t=[r/(1-r^2)^1/2]*[n-m-1]^1/2, v=n-m-1

ry1.2=0.0927 vs ry2.1=0.5527; Ho: 总统相关系数p1y.2=0, p2y.1=0; H1: =/=0;





t1: not refuse Ho; when X2(kg) as set, X1(cm) has no regression with Y;

t2: refuse Ho; when X1(cm) as set, X2(kg) has regression with Y.


In sum: Liner Mutliply

X1(cm): 0.5884 0.0927

X2(kg) 0.7362 0.5527




【统计表statistical table与统计图statistical graph


统计图:条图bar graph指标数值大小;百分条图percent bar graph指标比重关系;圆图circle graph指标比重关系及全体所占比重关系;线图line graph发展变化关系;半对数线图semilogarithmic line graph发展速度(相对比);直方图histogram连续变量的频数分布;散点图scatter diagram密集程度和趋势;统计地图statistical map指标地域分布



医学研究常是抽样观察 survey experiment field survey


目的和指标 sensitivityspecificity   不贪多求全;  


调查方法:case-control cohort study,普查census/complete survey;样本 sampling survey

搜集资料:直接观察和采访(访问,调查会,信访);调查项目item要精选: 分析项目(能整理出指标的内容)和备查项目(保证分析项目的完整及便于核查);调查表questionarielist 和/或card,调查表编码codesheet便于计算机处理,男1女2;整理分析计划:设计分组classification质量或数量方式分;整理表sorting table;归组方法


单纯随机抽样simple random sampling(全体编号随机);

分层抽样stratified sampling(分层后再随机);

系统抽样systematic sampling(间隔或机械抽样);

整群抽样cluster sampling(实际常为地区抽样area sampling





【实验设计experimental design





双盲double blind method










交互设计self-control design(自身对照中实验因素ab在实验组A与B再交换成Ba与Ab的方式);

随机设计completely random design(把实验因素或实验对象完全随机处理,应用广泛);

配对设计paired design(按一定条件配对,如性别年龄,提高效率);

配伍设计randomized block design/placebo(配对设计扩大成配组);

拉丁方设计latin square design(拉丁方表格,比配伍设计更进一步);

正交设计orthogonal design(正交表,高效多因素;

盲法设计(常为双盲double blind method,病人和医生都不知道谁在那个组,或安慰剂)。














WHO Epidemiology 





传统的“黑箱子”流行病学侧重于识别单一的危险因素,难以揭示完整的病因网络,在研究复杂疾病时具有严重局限[20]。近10年来,随着高通量组学技术和医学大数据的不断发展,系统流行病学应运而生。系统流行病学是现代流行病学的新兴分支与重要补充,其在分子、细胞、组织、人群社会行为和生态环境等多水平、多组学上深入研究疾病发生风险的统计学模型,并对未来风险状况进行计算模拟和预警预测[21]。系统流行病学的发展将直接推动“精准预防”(precision prevention)理念的实现[22]系统流行病学不仅能推动医学基础研究的进展与突破,还有助于指导实际的疾病防控工作,是未来流行病学发展的必然方向。














2.健康医疗大数据的发展现状:当前全球范围内,以全基因组关联研究(genome-wide association study)、甲基化研究、代谢组研究等为代表的组学研究方兴未艾,为寻找疾病病因和可能的干预靶点提供了丰富的信息。在英国、丹麦、芬兰、瑞典等欧洲发达国家,由医院和诊所常规记录的电子病历信息(electronic health record)不仅是推动临床流行病学研究和改进患者护理的强大工具[31],也被应用于疾病风险预测模型的构建[32-33]。将电子病历信息与国家层面的其他常规数据(如出生登记、死亡登记、疾病登记、疫苗接种登记、环境噪声监测等)相链接而构建的大型动态队列也成为新的研究热点[34-37]。在我国,促进健康医疗大数据的应用与发展已成为国家发展战略(国办发[2016]47号文件)。死亡登记、出院总结、医院质量监测、居民医疗保险等国家数据已应用于疾病负担估计、疾病趋势分析和病因探索等研究领域[38-40]。我国已建立了一些基于疾病登记数据的大数据平台,如中国肾脏疾病数据网络和全国肿瘤登记中心。中国队列共享平台(China Cohort ConsortiumCCC)的建立将为未来国内科研大数据的共享提供便利。



Mehand MS, Al-Shorbaji F, Millett P, et al. The WHO R&D blueprint:2018 review of emerging infectious diseases requiring urgent research and development efforts[J]. Antiviral Res, 2018, 159: 63-67. DOI:10.1016/j.antiviral.2018.09.009
Carroll D, Daszak P, Wolfe ND, et al. The global virome project[J]. Science, 2018, 359(6378): 872-874. DOI:10.1126/science.aap7463
Wellington EM, Boxall AB, Cross P, et al. The role of the natural environment in the emergence of antibiotic resistance in gram-negative bacteria[J]. Lancet Infect Dis, 2013, 13(2): 155-165. DOI:10.1016/S1473-3099(12)70317-1
Zhao Y, Cocerva T, Cox S, et al. Evidence for co-selection of antibiotic resistance genes and mobile genetic elements in metal polluted urban soils[J]. Sci Total Environ, 2019, 656: 512-520. DOI:10.1016/j.scitotenv.2018.11.372
World Health Organization. Multimorbidity:technical series on safer primary care[M]. Geneva: World Health Organization, 2016.
Salive ME. Multimorbidity in older adults[J]. Epidemiol Rev, 2013, 35(1): 75-83. DOI:10.1093/epirev/mxs009
de Groot V, Beckerman H, Lankhorst GJ, et al. How to measure comorbidity:a critical review of available methods[J]. J Clin Epidemiol, 2003, 56(3): 221-229. DOI:10.1016/s0895-4356(02)00585-1
Garin N, Koyanagi A, Chatterji S, et al. Global multimorbidity patterns:a cross-sectional, population-based, multi-country study[J]. J Gerontol A Biol Sci Med Sci, 2016, 71(2): 205-214. DOI:10.1093/gerona/glv128
Kernick D, Chew-Graham CA, O'Flynn N. Clinical assessment and management of multimorbidity:NICE guideline[J]. Br J Gen Pract, 2017, 67(658): 235-236. DOI:10.3399/bjgp17X690857
Kwong JC, Schwartz KL, Campitelli MA, et al. Acute myocardial infarction after laboratory-confirmed influenza infection[J]. N Engl J Med, 2018, 378(4): 345-353. DOI:10.1056/NEJMoa1702090
Cowan LT, Alonso A, Pankow JS, et al. Hospitalized infection as a trigger for acute ischemic stroke:the atherosclerosis risk in communities study[J]. Stroke, 2016, 47(6): 1612-1617. DOI:10.1161/strokeaha.116.012890
Si J, Yu C, Guo Y, et al. Chronic hepatitis B virus infection and risk of chronic kidney disease:a population-based prospective cohort study of 0.5 million Chinese adults[J]. BMC Med, 2018, 16(1): 1-8. DOI:10.1186/s12916-018-1084-9
Tseng CH, Muo CH, Hsu CY, et al. Increased risk of intracerebral hemorrhage among patients with hepatitis C virus infection[J]. Medicine (Baltimore), 2015, 94(46): e2132. DOI:10.1097/md.0000000000002132
Man WH, de Steenhuijsen Piters WA, Bogaert D. The microbiota of the respiratory tract:gatekeeper to respiratory health[J]. Nat Rev Microbiol, 2017, 15(5): 259-270. DOI:10.1038/nrmicro.2017.14
Sampaio-Maia B, Caldas IM, Pereira ML, et al. The oral microbiome in health and its implication in oral and systemic diseases[J]. Adv Appl Microbiol, 2016, 97: 171-210. DOI:10.1016/bs.aambs.2016.08.002
Lynch SV, Ng SC, Shanahan F, et al. Translating the gut microbiome:ready for the clinic?[J]. Nat Rev Gastroenterol Hepatol, 2019, 16(11): 656-661. DOI:10.1038/s41575-019-0204-0
Tigchelaar EF, Zhernakova A, Dekens JA, et al. Cohort profile:Life Lines DEEP, a prospective, general population cohort study in the northern Netherlands:study design and baseline characteristics[J]. BMJ Open, 2015, 5(8): e006772. DOI:10.1136/bmjopen-2014-006772
Vogtmann E, Chen J, Amir A, et al. Comparison of collection methods for fecal samples in microbiome studies[J]. Am J Epidemiol, 2017, 185(2): 115-123. DOI:10.1093/aje/kww177
Mehta RS, Abu-Ali GS, Drew DA, et al. Stability of the human faecal microbiome in a cohort of adult men[J]. Nat Microbiol, 2018, 3(3): 347-355. DOI:10.1038/s41564-017-0096-0
Hafeman DM, Schwartz S. Opening the black box:a motivation for the assessment of mediation[J]. Int J Epidemiol, 2009, 38(3): 838-845. DOI:10.1093/ije/dyn372
黄涛, 李立明. 系统流行病学[J]. 中华流行病学杂志, 2018, 39(5): 694-699.
Huang T, Li LM. Systems epidemiology[J]. Chin J Epidemiol, 2018, 39(5): 694-699. DOI:10.3760/cma.j.issn.0254-6450.2018.05.031
Biro K, Dombradi V, Jani A, et al. Creating a common language:defining individualized, personalized and precision prevention in public health[J]. J Public Health (Oxf), 2018, 40(4): e552-559. DOI:10.1093/pubmed/fdy066
Comas I, Gagneux S. The past and future of tuberculosis research[J]. PLoS Pathog, 2009, 5(10): e1000600. DOI:10.1371/journal.ppat.1000600
Krauth SJ, Balen J, Gobert GN, et al. A call for systems epidemiology to tackle the complexity of schistosomiasis, its control, and its elimination[J]. Trop Med Infect Dis, 2019, 4(1): 21. DOI:10.3390/tropicalmed4010021
Rasmussen AL, Katze MG. Genomic Signatures of emerging viruses:a new era of systems epidemiology[J]. Cell Host Microbe, 2016, 19(5): 611-618. DOI:10.1016/j.chom.2016.04.016
Peters DH, Adam T, Alonge O, et al. Implementation research:what it is and how to do it[J]. BMJ, 2013, 347: f6753. DOI:10.1136/bmj.f6753
Theobald S, Brandes N, Gyapong M, et al. Implementation research:new imperatives and opportunities in global health[J]. Lancet, 2018, 392(10160): 2214-2228. DOI:10.1016/S0140-6736(18)32205-0
Peters DH, Tran NT, Adam T. Implementation research in health: a practical guide[M]. Alliance for Health Policy and Systems Research, World Health Organization, 2013.
Pinnock H, Barwick M, Carpenter CR, et al. Standards for reporting implementation studies (StaRI) statement[J]. BMJ, 2017, 356: i6795. DOI:10.1136/bmj.i6795
Zhang L, Wang H, Li Q, et al. Big data and medical research in China[J]. BMJ, 2018, 360: j5910. DOI:10.1136/bmj.j5910
Ehrenstein V, Nielsen H, Pedersen AB, et al. Clinical epidemiology in the era of big data:new opportunities, familiar challenges[J]. Clin Epidemiol, 2017, 9: 245-250. DOI:10.2147/CLEP.S129779
Hippisley-Cox J, Coupland C, Brindle P. Development and validation of QRISK3 risk prediction algorithms to estimate future risk of cardiovascular disease:prospective cohort study[J]. BMJ, 2017, 357: j2099. DOI:10.1136/bmj.j2099
Sultan AA, West J, Grainge MJ, et al. Development and validation of risk prediction model for venous thromboembolism in postpartum women:multinational cohort study[J]. BMJ, 2016, 355: i6253. DOI:10.1136/bmj.i6253
Skufca J, Ollgren J, Artama M, et al. The association of adverse events with bivalent human papilloma virus vaccination:A nationwide register-based cohort study in Finland[J]. Vaccine, 2018, 36(39): 5926-5933. DOI:10.1016/j.vaccine.2018.06.074
Pasternak B, Inghammar M, Svanstrom H. Fluoroquinolone use and risk of aortic aneurysm and dissection:nationwide cohort study[J]. BMJ, 2018, 360: k678. DOI:10.1136/bmj.k678
Heritier H, Vienneau D, Foraster M, et al. Transportation noise exposure and cardiovascular mortality:a nationwide cohort study from Switzerland[J]. Eur J Epidemiol, 2017, 32(4): 307-315. DOI:10.1007/s10654-017-0234-2
Adelborg K, Horvath-Puho E, Ording A, et al. Heart failure and risk of dementia:a Danish nationwide population-based cohort study[J]. Eur J Heart Fail, 2017, 19(2): 253-260. DOI:10.1002/ejhf.631
Zhou M, Wang H, Zeng X, et al. Mortality, morbidity, and risk factors in China and its provinces, 1990-2017:a systematic analysis for the Global Burden of Disease Study 2017[J]. Lancet, 2019, 394(10204): 1145-1158. DOI:10.1016/S0140-6736(19)30427-1
Zhang L, Long J, Jiang W, et al. Trends in chronic kidney disease in China[J]. N Engl J Med, 2016, 375(9): 905-906. DOI:10.1056/NEJMc1602469
李立明, 吕筠, 郭彧, 等. 中国慢性病前瞻性研究:研究方法和调查对象的基线特征[J]. 中华流行病学杂志, 2012, 33(3): 249-255.
Li LM, Lv J, Guo Y, et al. The China Kadoorie Biobank:related methodology and baseline characteristics of the participants[J]. Chin J Epidemiol, 2012, 33(3): 249-255. DOI:10.3760/cmaj.issn.0254-6450.2012.03.001
Mooney SJ, Pejaver V. Big data in public health:terminology, machine learning, and privacy[J]. Annu Rev Public Health, 2018, 39: 95-112. DOI:10.1146/annurev-publhealth-040617-014208
Miotto R, Li L, Kidd BA, et al. Deep patient:an unsupervised representation to predict the future of patients from the electronic health records[J]. Sci Rep, 2016, 6: 26094. DOI:10.1038/srep26094
Tibble H, Tsanas A, Horne E, et al. Predicting asthma attacks in primary care:protocol for developing a machine learning-based prediction model[J]. BMJ Open, 2019, 9(7): e028375. DOI:10.1136/bmjopen-2018-028375


是由外科医生Thomas Wakley创办的综合性的临床医学期刊,在目前四大医学期刊中是唯一独立的期刊。每年接收自投稿数量大约10,000篇,来自中国的投稿已超过1000篇。Lancet一般不接收基础研究类文章。



• 能否改变临床实践和卫生制度

• 是否受众层面广(太专于一个小领域的稿件容易被拒)

• 第一或者是最后(研究在本领域的影响和价值,要么是开创性的,要么是决定性的)

• 伦理道德方面是否达到标准

• 试验方法是否强

• 完整或早期的结果报告



其次,AbstractCover Letter是互补的,一定不要在cover letter中重复Abstract中的内容。Cover Letter是要告诉编辑为何要考虑你的文章。文章标题应该是描述性的,包括研究类型(横断面调查,随机对照试验等),不要强调A强于B。研究背景中介绍研究内容,为什么做这项研究(一到两句话介绍),研究目的是什么(陈述具体目的或假设)。研究方法中要讲清楚研究设计、参与者、干预、分析等。

对于结果的描述和解释,要提供每组分配和分析的参与者数量,描述结果、数据和统计检验(如果需要)。比如,对于随机对照试验,主要结果的实际数字和百分比,评估效应大小(eg, odds ratio)和精度(eg, 95% CI)。报告平均值的SD,中位数的IQR,并给出准确的P值(除非p<0.0001)。任何重要的不良事件/副作用。对于研究结果的一般解释和它们的重要性,概述其研究局限性和优势。


• Introduction:现状,和你研究的背景,不要写结论

• Methods:哪些是之前就计划好的,PICO

• Results:用绝对数字,区间,不要和图表重复

• Discussion:其他类似研究跟你比?Research in context,不要重复结果


• 避免频繁用缩写

• Simple Language

• 句子长短

• 时态应用




• Diagnostic studies-STARD

• Observational studies-STORBE

• Genetic association studies-STREGA

• Systematic reviews and meta-analyses


第一,The Lancet及其系列子刊均为临床和公共卫生研究性的期刊,所以考虑纯基础研究的可能性非常小。

第二,作为一本综合性的临床公共卫生研究性期刊,The Lancet发表的研究常常集中在常见病、多发病,尤其是那些对多学科有重大影响的研究上,而专业性过强很可能是拒稿的原因之一。

第三,The Lancet对于论著的要求强调四点:

1.新颖性 这项研究在该领域中是否是所谓的第一个吃螃蟹的人,比如我们发表过首次在人体上的研究“用自体组织工程化软骨组织做肿瘤切除后鼻重塑形”①

2.临床研究要求很高,所以The Lancet经常在征稿启事上说明首先会考虑随机对照临床试验,因为它是循证医学的主要来源,比如我们最近发表的对于绝经期后高危妇女乳腺癌防治的跨国家大规模多中心随机双盲对照试验







关于医学统计分析的写作,其实他还有一本书《How to Report Statistics in



(1)数据采集:Study data were collected on standard forms, checked for completeness, and double keyed into an …… database.

(2)统计软件:All statistical analyses were performed using SAS version 9.2 (SAS Institute Inc, Cary, North Carolina).

(3)统计描述:…… were described using mean, median, standard deviation, and 25thand 75thpercentiles for continuous variables; frequencies and proportions were used for categorical variables.

(4)单因素分析:A two sample independent t test/ one-way analysis of variance (ANOVA)/ Nonparametric tests(Kruskal-Wallis test)/ Pearson’s x2tests or Fisher exact tests was used to compare the differences between …….

(5)多因素分析:Multivariable linear regression/ Multivariable binary logistic regression/ Cox proportional hazards were used to estimate …….

(6)检验水准:A p value of less than 0.05 (2-sided significance testing) was considered statistically significant in all analyses.


Statistics book




      How and why ...





  Exactly what you need to know about 

  statistical ideas and techniques

  Fundamental formulas and calculations

  Core topics in scope of applications


                                      Charles Cheng Xia, MD. MPH















   Google       中英文全球搜索门户  
关于我们     Copyright by EDF Solutions Inc.

 e-df.com    enetfile.com    add321.com   e-huaxia.com    Copyrights ©  by EDF Solutions Inc. All rights reserved.