Non-parametric multiple groups One vs All

Ranking:

  • Null hypothesis (H0): The means of the results of two or more algorithms are the same.
  • Use Aligned Ranks when the number of groups is low (less than 4).
  • Use Quake to take into account the difficulty to obtain each sample (dataset).

Post-hoc with control method:

  • Null hypothesis (H0): The mean of the results of the control method and against each other groups is equal (compared in pairs).
  • Bonferroni-Dunn is the less powerful but the most interpretable followed by Li, which has more power.
  • Holm and Hochber are similar in power and the most widely used.
  • Finner has the best power but is less interpretable.
References
  • Ranking:
    • Friedman: M. Friedman, The use of ranks to avoid the assumption of normality implicit in the analysis of variance, Journal of the American Statistical Association 32 (1937) 674–701.
    • Friedman Aligned Ranks: J.L. Hodges, E.L. Lehmann, Ranks methods for combination of independent experiments in analysis of variance, Annals of Mathematical Statistics 33 (1962) 482–497.
    • Quade: D. Quade, Using weighted rankings in the analysis of complete blocks with additive block effects, Journal of the American Statistical Association 74 (1979) 680–683.
  • Post-hoc:
    • Bonferroni-Dunn: O.J. Dunn, Multiple comparisons among means, Journal of the American Statistical Association 56 (1961) 52–64.
    • Holm: O.J. S. Holm, A simple sequentially rejective multiple test procedure, Scandinavian Journal of Statistics 6 (1979) 65–70.
    • Hochberg: Y. Hochberg, A sharper Bonferroni procedure for multiple tests of significance, Biometrika 75 (1988) 800–803.
    • Finner: H. Finner, On a monotonicity problem in step-down multiple test procedures, Journal of the American Statistical Association 88 (1993) 920–923.
    • Li: J. Li, A two-step rejection procedure for testing multiple hypotheses, Journal of Statistical Planning and Inference 138 (2008) 1521–1527.