This document provides all details needed to reproduce the experiments reported in the paper D. Losada, J. Parapar, A. Barreiro. “Feeling Lucky? Multi-armed Bandits for Ordering Judgements in Pooling-based Evaluation”. ACM Symposium on Applied Computing, 2016.

Any scientific publication derived from the use of this software should explicitly refer to this ACM SAC paper.

Next, we explain the data used for experimentation and provide our R code, which implement all pooling strategies.


We used four TREC collections ( TREC5, TREC6, TREC7 and TREC8.

NIST kindly provided the runs that contributed to the pools of the adhoc tasks of TREC5, TREC6, TREC7 and TREC8 (

The pooled runs are archived by NIST within a password protected area. If you want to reproduce our experiments you need to request access to the protected area (follow the instructions given at


This section provides the R code needed for experimentation.

All pooling strategies are implemented in pooling_strategies.R. Furthermore, we provide another script, process_multiple_queries.R, which implements an example on how to process multiple queries. Instructions about processing multiple queries are provided here.

Besides some auxiliary functions, pooling_strategies.R contains the following R functions:

Multiple queries

Instructions for experimenting with multiple queries (the example below is included into the file process_multiple_queries.R).


  1. Store all pooled runs into a local folder (pool_folder).
  2. Store the official qrel file into another folder (qrels_path)
  3. Call process_multiple_queries(pool_folder,qrels_path)

The function process_multiple_queries processes all queries, aggregates the statistics of avg relevant documents found, and makes a plot. The example given invokes pooling_DOCID but you can just change this line and call any other pooling strategy from pooling_strategies.R.

process_multiple_queries <- function(pool_folder, qrels_path)
  # reads the qrel file into an R dataframe with appropriate column names
  qrels_df= read.table(qrels_path,header=FALSE)
  print(paste("Qrel file...",qrels_path,"...",nrow(qrels_df)," judgments."))
  # reads "input*" files from pool_folder and stores them into a list of data frames (run_rankings)
  files <- list.files(path=pool_folder, pattern = "input")
  print(paste("Processing...",pool_folder,"...",length(files)," run files",sep=""))

  for (f in files){
    df = read.table(filepath,header=FALSE)
  } # files
  print(paste(length(run_rankings),"runs in the pool"))
  # now, we proceed query by query, and aggregate the statistics of relevant docs found at different number of judgments
  queries= unique(qrels_df$QUERY)
  # for computing averages across queries
  for (q in queries)
  # this example produces the plot of pooling by DOCID.
  # just change this line to compute any other judgment sequence 
  # (by invoking any other pooling strategy from pooling_strategies.R) 
  judgments = pooling_DOCID(q, pool_depth, run_rankings)
  # data frame with the ranking of judgments and a chunk ID for each document
  current_ranking=data.frame(DOCID=judgments, CHUNK=chunks, REL=rep(NA,length(judgments)))
  # get the relevance assessments for the current query
  current_qrels = subset(qrels_df, QUERY==q)
  # assign the relevance column for each document in the sequence 
  for (i in 1:length(judgments)) 
  print(paste("Query...",q,", pool size:", length(judgments), ". ", sum(current_ranking$REL)," docs are relevant.",sep="" ))
  rel_per_chunk = aggregate(REL~CHUNK, current_ranking, sum)
  # accumulate statistics 
  } # for q in queries
  # chunks with no queries are removed
  relcounts_perchunk=data.frame(AVG=accavg, NQ=nqueries)
  avgrel_perchunk =  relcounts_perchunk$AVG / relcounts_perchunk$NQ
  # accumulate the avg rels found. needed to build an accumulative plot  
  avgrel_perchunk_accumulated = c()
  for (l in 1:length(avgrel_perchunk)) avgrel_perchunk_accumulated[l]=sum(relcounts_perchunk$AVG[1:l])
  # plots the accumulated statistics
  xaxis = seq(1,length(avgrel_perchunk))
  plot(xaxis, avgrel_perchunk_accumulated, col="blue", type="b", ylab="avg rel found", xlab="# judgments", xaxt='n')
  xlabels = xaxis*chunksize
  axis(1,at=xaxis ,labels=xlabels,cex=.75)

© David E. Losada, 2015