비모수 통계 - 비모수 다중 비교

이전 포스트: https://blog.naver.com/jjy0501/222616527304

윌콕슨 검정은 사실상 t 검정의 비모수 방법이라고 할 수 있는데, 그런 만큼 pairwise t test 같은 다중 비교 (Multiple Comparisons) 기능도 제공합니다. pairwise.wilcox.test가 R에서 기본으로 제공되는데, 물론 여러 번 검정을 하는데서 오는 1종 오류 (귀무 가설이 맞는데 기각하는 것)의 위험성이 증가합니다. 전혀 연관성이 없는데도 20번 이상 검정하면 한 번은 우연히 P<0.05가 나올 수 있는 것입니다. 따라서 사후 검정을 진행하게 되는데 이 기능 역시 pairwise.wilcox.test에서 지원합니다.

pairwise t test : https://blog.naver.com/jjy0501/221132684701

참고로 비슷한 이름들이 많이 등장하기 때문에 헷갈릴 수 있는데, 윌콕슨 부호 순위 검정 (Wilcoxon signed-rank test)는 짝을 지은 두 개의 샘플이나 샘플이 한 개의 표본을 치료 전후 등으로 두 번 검사한 자료에 주로 사용합니다. 따라서 다중 비교를 한다는 이야기는 두 개 이상의 표본이 있다는 이야기입니다.

그래서 pairwise.wilcox.test는 윌콕슨 순위-합 검정 (Wilcoxon Rank-Sum Test)의 다중 비교가 되겠습니다. 이름 때문에 윌콕슨 부호 순위 검정과 많이 헷갈릴 수 있어 윌콕슨 순위 합 검정을 Mann-Whitney U-test라고 부르는 경우도 많은데 오히려 그것 때문에 더 헷갈립니다.

아무튼 여기서는 네 개의 연못에서 pH를 측정한 결과 서로 값의 차이가 없다는 것을 귀무가설로 검증해보겠습니다. (즉 짝지은 자료가 아님) 각각의 측정은 8번 이뤄졌기 때문에 각 군의 샘플수가 10개 미만으로 비모수 통계적 방법으로 접근하는 것이 타당할 것입니다.

ponds <- data.frame(pond=as.factor(rep(1:4,each=8)),

pH=c(7.68,7.69,7.70,7.70,7.72,7.73,7.73,7.76,

7.71,7.73,7.74,7.74,7.78,7.78,7.80,7.81,

7.74,7.75,7.77,7.78,7.80,7.81,7.84,7.86,

7.71,7.71,7.74,7.79,7.81,7.85,7.87,7.91))

ponds

pairwise.wilcox.test(ponds$pH, ponds$pond, exact= FALSE)

실행하면

> ponds <- data.frame(pond=as.factor(rep(1:4,each=8)),

+ pH=c(7.68,7.69,7.70,7.70,7.72,7.73,7.73,7.76,

+ 7.71,7.73,7.74,7.74,7.78,7.78,7.80,7.81,

+ 7.74,7.75,7.77,7.78,7.80,7.81,7.84,7.86,

+ 7.71,7.71,7.74,7.79,7.81,7.85,7.87,7.91))

> ponds

pond pH

1 1 7.68

2 1 7.69

3 1 7.70

4 1 7.70

5 1 7.72

6 1 7.73

7 1 7.73

8 1 7.76

9 2 7.71

10 2 7.73

11 2 7.74

12 2 7.74

13 2 7.78

14 2 7.78

15 2 7.80

16 2 7.81

17 3 7.74

18 3 7.75

19 3 7.77

20 3 7.78

21 3 7.80

22 3 7.81

23 3 7.84

24 3 7.86

25 4 7.71

26 4 7.71

27 4 7.74

28 4 7.79

29 4 7.81

30 4 7.85

31 4 7.87

32 4 7.91

> pairwise.wilcox.test(ponds$pH, ponds$pond, exact= FALSE)

Pairwise comparisons using Wilcoxon rank sum test with continuity correction

data: ponds$pH and ponds$pond

1 2 3

2 0.066 - -

3 0.012 0.460 -

4 0.071 0.682 0.958

P value adjustment method: holm

결과를 해석하면 연못 1과 3은 P<0.05에서 유의한 pH 차이가 있었으나 나머지는 없었습니다. 그런데 흥미롭게도 기본 보정 방법이 holm 방법이라는 점을 알 수 있습니다. 본페로니가 흔히 사용되나 본페로니는 매우 엄격한 기준을 적용하기 때문에 통계적으로 유의하지 않다는 결론을 내릴 가능성이 높습니다. 특히 샘플 수가 적은 경우에는 더 그럴 것입니다.

그래서 그보다 완화된 사후 검정법인 holm을 사용하는데, R에서 적용하는 검정 방법은 다음과 같습니다.

The adjustment methods include the Bonferroni correction ("bonferroni") in which the p-values are multiplied by the number of comparisons. Less conservative corrections are also included by Holm (1979) ("holm"), Hochberg (1988) ("hochberg"), Hommel (1988) ("hommel"), Benjamini & Hochberg (1995) ("BH" or its alias "fdr"), and Benjamini & Yekutieli (2001) ("BY"), respectively. A pass-through option ("none") is also included. The set of methods are contained in the p.adjust.methods vector for the benefit of methods that need to have the method as an option and pass it on to p.adjust.

The first four methods are designed to give strong control of the family-wise error rate. There seems no reason to use the unmodified Bonferroni correction because it is dominated by Holm's method, which is also valid under arbitrary assumptions.

Hochberg's and Hommel's methods are valid when the hypothesis tests are independent or when they are non-negatively associated (Sarkar, 1998; Sarkar and Chang, 1997). Hommel's method is more powerful than Hochberg's, but the difference is usually small and the Hochberg p-values are faster to compute.

The "BH" (aka "fdr") and "BY" methods of Benjamini, Hochberg, and Yekutieli control the false discovery rate, the expected proportion of false discoveries amongst the rejected hypotheses. The false discovery rate is a less stringent condition than the family-wise error rate, so these methods are more powerful than the others.

Note that you can set n larger than length(p) which means the unobserved p-values are assumed to be greater than all the observed p for "bonferroni" and "holm" methods and equal to 1 for the other methods.

p.adjust.methods = c("holm", "hochberg", "hommel", "bonferroni", "BH", "BY", "fdr", "none")

저도 처음 알았는데, R에서는 기본으로 Holm-Šídák 교정은 제공하지 않고 있습니다. 아래 논문의 저자들 역시 R이 아니라 그래프 패드 프리즘(graphpad prism)이라는 다른 그래픽 및 통계 소프트웨어를 사용했습니다.

Bates TA, McBride SK, Winders B, et al. Antibody Response and Variant Cross-Neutralization After SARS-CoV-2 Breakthrough Infection. JAMA. Published online December 16, 2021. doi:10.1001/jama.2021.22898

https://www.graphpad.com/guides/prism/latest/statistics/stat_holms_multiple_comparison_test.htm

그래프 패드 프리즘에서는 Holm-Šídák 교정이 비교 대상이 많을수록 Holm 교정보다 더 강력하다고 소개하고 있습니다. 아무튼 R에서도 할 수 있는 방법을 찾기 위해서 많은 시간을 들였으나 결국 실패하고 말았습니다. 아마도 방법이 없진 않을 것 같지만, 제가 비모수통계를 실제로 할 일이 없어보니 잘 모르는 부분이 있는 것 같네요.

아무튼 Holm 대신 본페로니 등 다른 교정을 하려면 p.adjust.methods로 지정하면 됩니다.

pairwise.wilcox.test(ponds$pH, ponds$pond, exact= FALSE, p.adjust.method = "bonf")

> pairwise.wilcox.test(ponds$pH, ponds$pond, exact= FALSE, p.adjust.method = "bonf")

Pairwise comparisons using Wilcoxon rank sum test with continuity correction

data: ponds$pH and ponds$pond

1 2 3

2 0.079 - -

3 0.012 0.919 -

4 0.107 1.000 1.000

P value adjustment method: bonferroni

본페로니 방법으로 해도 결과는 대동소이 하지만 확실히 P 값이 더 커지면서 좀 더 엄격한 잣대를 들이댄다는 것을 알 수 있습니다.

고든의 블로그 구글 분점

이 블로그 검색

비모수 통계 - 비모수 다중 비교

태그

댓글

댓글 쓰기

이 블로그의 인기 게시물

벨 V-280 Valor 시험 비행 성공

세상에서 가장 큰 벌

몸에 철이 많으면 조기 사망 위험도가 높다?