Ranking:

**Null hypothesis (H**The means of the results of two or more algorithms are the same._{0}):- Use Aligned Ranks when the number of groups is low (less than 4).
- Use Quake to take into account the difficulty to obtain each sample (dataset).

Post-hoc with control method:

**Null hypothesis (H**The mean of the results of the control method and against each other groups is equal (compared in pairs)._{0}):- Bonferroni-Dunn is the less powerful but the most interpretable followed by Li, which has more power.
- Holm and Hochber are similar in power and the most widely used.
- Finner has the best power but is less interpretable.

References

- Ranking:
**Friedman:**M. Friedman, The use of ranks to avoid the assumption of normality implicit in the analysis of variance, Journal of the American Statistical Association 32 (1937) 674–701.**Friedman Aligned Ranks:**J.L. Hodges, E.L. Lehmann, Ranks methods for combination of independent experiments in analysis of variance, Annals of Mathematical Statistics 33 (1962) 482–497.**Quade:**D. Quade, Using weighted rankings in the analysis of complete blocks with additive block effects, Journal of the American Statistical Association 74 (1979) 680–683.- Post-hoc:
**Bonferroni-Dunn:**O.J. Dunn, Multiple comparisons among means, Journal of the American Statistical Association 56 (1961) 52–64.**Holm:**O.J. S. Holm, A simple sequentially rejective multiple test procedure, Scandinavian Journal of Statistics 6 (1979) 65–70.**Hochberg:**Y. Hochberg, A sharper Bonferroni procedure for multiple tests of significance, Biometrika 75 (1988) 800–803.**Finner:**H. Finner, On a monotonicity problem in step-down multiple test procedures, Journal of the American Statistical Association 88 (1993) 920–923.**Li:**J. Li, A two-step rejection procedure for testing multiple hypotheses, Journal of Statistical Planning and Inference 138 (2008) 1521–1527.