Guillaume Rochefort-Maranda
The severity score is particularly high for hypotheses that are substantially different from the null-hypothesis when a significant result is obtained by using an underpowered test. This means that such hypotheses are very well supported by the evidence according to that measure. However, it is now well documented that significant tests with low power display inflated effect sizes. They systematically show departures from the null hypothesis H0 that are much greater than they really are. This is problematic in research contexts where the differences between H0 and H1 is particularly small and where the sample size is also small. In this paper I argue that the severity score is an inadequate measure of evidence and that it should be rejected. The reason is that it is sensitive to the inflated effect sizes provided by underpowered significant tests: inflated effect sizes also inflate severity scores.

