Ranking:
- Null hypothesis (H0): The means of the results of two or more algorithms are the same.
- Use Aligned Ranks when the number of groups is low (less than 4).
- Use Quake to take into account the difficulty to obtain each sample (dataset).
Post-hoc multiple comparison:
- Null hypothesis (H0): The mean of the results of each pair of groups is equal.
- Bonferroni-Dunn is the less powerful but the most interpretable.
- Holm and Hochber are similar in power and the most widely used.
- Shaffer has the best power, followed by Finner.