h1. Multiple comparisons of survival curves

p(meta). Ames, Iowa, USA

When you compare three or more survival curves at once, test of equality over strata reports a single p-value testing the null hypothesis that all the samples come from populations with identical survival, and that all differences are due to chance. You may also want to drill down and compare curves two at a time. If we set α=0.05 for the first test, that means that:

0.05 = α = P[reject H0 for test one | H0 is true for test one]

If all the null hypotheses are true, you’d expect 5% of the comparisons to have uncorrected rejection (i.e p-values<0.05). In other words, you will increase the risk for type I error.

If you only make a few planned comparisons, corrections for multiple comparisons may not be needed. But if the number of comparisons is large and you don’t adjust for multiple comparisons, it is easy to fool yourself. You must be honest about the number of comparisons you are making. Say there are 7 treatment groups (including control). You then go back and compare the group with the longest survival with the group with the shortest survival. It is not fair to say that you are only making one comparison, since you couldn’t decide which comparison to make without looking at all the data. With 7 groups, there are 21 pairwise comparisons you could make. You have implicitly made all these comparisons, so you should take the number of comparisons as 21. If you were only interested in comparing each of 6 treatments to the control, and weren’t interested in comparing the treatments with each other, then you would be making three comparisons, so should set K equal to 6. Now, if 21 groups are being tested, then you should expect one of them to exceed the 95% confidence bound due to chance. since 95% means around 20 times out of 21. You can no longer conclude that the difference if due to the testing parameters. The 95% confidence is based on one test. When there are multiple comparisons, that confidence does not hold anymore.

There have been some methods to account for multiple comparisons such as Bonferroni, Tukey, Sidak and other corrections, or by the approach of calculating the False Discover Rate. The following is a samply output from SAS include an over all test and multiple comparisons for logrank test with Sidak correction.

Test of Equality over Strata Pr > Test Chi-Square DF Chi-Square Log-Rank 4.6015 2 0.1002 ---------------------------------------------------------------------------- Adjustment for Multiple Comparisons for the Logrank Test Strata Comparison p-Values V1 V1 Chi-Square Raw Sidak Depop/Repop Herd closure 0.6356 0.4253 0.8102 Depop/Repop Startup 2.1446 0.1431 0.3707 Herd closure Startup 4.3056 0.0380 0.1097

Here is the SAS code.

```
%macro create(number);
%do i=1 %to &number;
ods graphics on;
proc lifetest data=data_select plots=survival;
time time*cen(1);
strata V&i/test=logrank adjust=sidak;
run;
ods graphics off;
%end;
%mend create;
%create(96)
```

α_t = P[reject H0 for one of the k tests | H0 is true for all tests]We call this quantity the family-wise (or experiment-wise) error rate. The α for each individual test is called the comparison-wise error rate. The family (or experiment), in this case, is made up of the k individual comparisons. Using the rules of probability, and the fact that we assumed the tests were independent for this example, we can calculate what α_t would be if we used α=0.05 for the comparison-wise rate.

α_t=1−(1−α)(1−α)...(1−α)=1-(1−α)^kif we choose α=0.05 and k=20, we can calculate α_t=0.64 (i.e the chance of making at least one error is 64%) The Sidak’s formula is to use α = 1 − (1−α_t)^(1/k) given a desired α_t. In the above situation, if we wanted α_t= 0.05, this formula would show us that we need to use an α=0.00256.

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License.