This document provides all details needed to reproduce the experiments reported in the paper D. Losada, J. Parapar, A. Barreiro. “Multi-Armed Bandits for Adjudicating Documents in Pooling-Based Evaluation of Information Retrieval Systems”. (Information Processing and Management, 53(3), 1005-1025, 2017).

Any scientific publication derived from the use of this software should explicitly refer to this publication.

Next, we explain the data used for experimentation and provide our R code, which implement all pooling strategies.


We used four TREC collections ( TREC5, TREC6, TREC7 and TREC8.

NIST kindly provided the runs that contributed to the pools of the adhoc tasks of TREC5, TREC6, TREC7 and TREC8 (

The pooled runs are archived by NIST within a password protected area. If you want to reproduce our experiments you need to request access to the protected area (follow the instructions given at


This section provides the R code needed for experimentation.

All pooling strategies are implemented in pooling_strategies_ms.R. Furthermore, we provide another script, process_multiple_queries_ms.R, which implements an example on how to process multiple queries. Instructions about processing multiple queries are provided here.

Besides some auxiliary functions, pooling_strategies_ms.R contains the following R functions:

Multiple queries

Instructions for experimenting with multiple queries (the example below is included into the file process_multiple_queries_ms.R).


  1. Store all pooled runs into a local folder (pool_folder).
  2. Store the official qrel file into another folder (qrels_path)
  3. Call process_multiple_queries(pool_folder,qrels_path)

The function process_multiple_queries processes all queries, aggregates the statistics of avg relevant documents found, and makes a plot. The example given invokes pooling_DOCID but you can just change this line and call any other pooling strategy from pooling_strategies_ms.R.

process_multiple_queries <- function(pool_folder, qrels_path)
  # reads the qrel file into an R dataframe with appropriate column names
  qrels_df= read.table(qrels_path,header=FALSE)
  print(paste("Qrel file...",qrels_path,"...",nrow(qrels_df)," judgments."))
  # reads "input*" files from pool_folder and stores them into a list of data frames (run_rankings)
  files <- list.files(path=pool_folder, pattern = "input")
  print(paste("Processing...",pool_folder,"...",length(files)," run files",sep=""))

  for (f in files){
    df = read.table(filepath,header=FALSE)
  } # files
  print(paste(length(run_rankings),"runs in the pool"))
  # now, we proceed query by query, and aggregate the statistics of relevant docs found at different number of judgments
  queries= unique(qrels_df$QUERY)
  # for computing averages across queries
  for (q in queries)
  # this example produces the plot of pooling by DOCID.
  # just change this line to compute any other judgment sequence 
  # (by invoking any other pooling strategy from pooling_strategies_ms.R) 
  judgments = pooling_DOCID(q, pool_depth, run_rankings)
  # data frame with the ranking of judgments and a chunk ID for each document
  current_ranking=data.frame(DOCID=judgments, CHUNK=chunks, REL=rep(NA,length(judgments)))
  # get the relevance assessments for the current query
  current_qrels = subset(qrels_df, QUERY==q)
  # assign the relevance column for each document in the sequence 
  for (i in 1:length(judgments)) 
  print(paste("Query...",q,", pool size:", length(judgments), ". ", sum(current_ranking$REL)," docs are relevant.",sep="" ))
  rel_per_chunk = aggregate(REL~CHUNK, current_ranking, sum)
  # accumulate statistics 
  } # for q in queries
  # chunks with no queries are removed
  relcounts_perchunk=data.frame(AVG=accavg, NQ=nqueries)
  avgrel_perchunk =  relcounts_perchunk$AVG / relcounts_perchunk$NQ
  # accumulate the avg rels found. needed to build an accumulative plot  
  avgrel_perchunk_accumulated = c()
  for (l in 1:length(avgrel_perchunk)) avgrel_perchunk_accumulated[l]=sum(relcounts_perchunk$AVG[1:l])
  # plots the accumulated statistics
  xaxis = seq(1,length(avgrel_perchunk))
  plot(xaxis, avgrel_perchunk_accumulated, col="blue", type="b", ylab="avg rel found", xlab="# judgments", xaxt='n')
  xlabels = xaxis*chunksize
  axis(1,at=xaxis ,labels=xlabels,cex=.75)

© David E. Losada, 2016