This document provides all details needed to reproduce the experiments reported in the paper D. Losada, J. Parapar, A. Barreiro. “Multi-Armed Bandits for Adjudicating Documents in Pooling-Based Evaluation of Information Retrieval Systems”. (Information Processing and Management, 53(3), 1005-1025, 2017).

Any scientific publication derived from the use of this software should explicitly refer to this publication.

Next, we explain the data used for experimentation and provide our R code, which implement all pooling strategies.

Data

We used four TREC collections (http://trec.nist.gov): TREC5, TREC6, TREC7 and TREC8.

NIST kindly provided the runs that contributed to the pools of the adhoc tasks of TREC5, TREC6, TREC7 and TREC8 (http://trec.nist.gov/data/intro_eng.html).

The pooled runs are archived by NIST within a password protected area. If you want to reproduce our experiments you need to request access to the protected area (follow the instructions given at http://trec.nist.gov/results.html).

R CODE

This section provides the R code needed for experimentation.

All pooling strategies are implemented in pooling_strategies_ms.R. Furthermore, we provide another script, process_multiple_queries_ms.R, which implements an example on how to process multiple queries. Instructions about processing multiple queries are provided here.

Besides some auxiliary functions, pooling_strategies_ms.R contains the following R functions:

Multiple queries

Instructions for experimenting with multiple queries (the example below is included into the file process_multiple_queries_ms.R).

Steps:

  1. Store all pooled runs into a local folder (pool_folder).
  2. Store the official qrel file into another folder (qrels_path)
  3. Call process_multiple_queries(pool_folder,qrels_path)

The function process_multiple_queries processes all queries, aggregates the statistics of avg relevant documents found, and makes a plot. The example given invokes pooling_DOCID but you can just change this line and call any other pooling strategy from pooling_strategies_ms.R.

process_multiple_queries <- function(pool_folder, qrels_path)
{
  # reads the qrel file into an R dataframe with appropriate column names
  qrels_df= read.table(qrels_path,header=FALSE)
  names(qrels_df)=c("QUERY","DUMMY","DOC_ID","REL")
  
  print(paste("Qrel file...",qrels_path,"...",nrow(qrels_df)," judgments."))
    
  # reads "input*" files from pool_folder and stores them into a list of data frames (run_rankings)
  files <- list.files(path=pool_folder, pattern = "input")
  print(paste("Processing...",pool_folder,"...",length(files)," run files",sep=""))
  
  run_rankings=list()

  for (f in files){
    filepath=paste(pool_folder,f,sep="/")  
    df = read.table(filepath,header=FALSE)
    names(df)=c("QUERY","LABEL","DOC_ID","RANK","SCORE","RUN")
    run_rankings[[length(run_rankings)+1]]=df
  } # files
  
  print(paste(length(run_rankings),"runs in the pool"))
  
  # now, we proceed query by query, and aggregate the statistics of relevant docs found at different number of judgments
  chunksize=100
  pool_depth=100
  
  queries= unique(qrels_df$QUERY)
  
  # for computing averages across queries
  maxchunks=ceiling(pool_depth*length(run_rankings)/chunksize)
  
  accavg=rep(0,maxchunks)
  nqueries=rep(0,maxchunks)
  
  for (q in queries)
  {
  # this example produces the plot of pooling by DOCID.
  # just change this line to compute any other judgment sequence 
  # (by invoking any other pooling strategy from pooling_strategies_ms.R) 
  judgments = pooling_DOCID(q, pool_depth, run_rankings)
  
  # data frame with the ranking of judgments and a chunk ID for each document
  chunks=ceiling((1:length(judgments))/chunksize)
  current_ranking=data.frame(DOCID=judgments, CHUNK=chunks, REL=rep(NA,length(judgments)))
  
  # get the relevance assessments for the current query
  current_qrels = subset(qrels_df, QUERY==q)
  
  # assign the relevance column for each document in the sequence 
  for (i in 1:length(judgments)) 
  {
    current_ranking[i,"REL"]=is_relevant(current_qrels,current_ranking[i,"DOCID"])
  }
  
  print(paste("Query...",q,", pool size:", length(judgments), ". ", sum(current_ranking$REL)," docs are relevant.",sep="" ))
    
  rel_per_chunk = aggregate(REL~CHUNK, current_ranking, sum)
  
  # accumulate statistics 
  nqueries[min(rel_per_chunk$CHUNK):max(rel_per_chunk$CHUNK)]=nqueries[min(rel_per_chunk$CHUNK):max(rel_per_chunk$CHUNK)]+1
  accavg[min(rel_per_chunk$CHUNK):max(rel_per_chunk$CHUNK)]=accavg[min(rel_per_chunk$CHUNK):max(rel_per_chunk$CHUNK)]+rel_per_chunk$REL 
   
  } # for q in queries
  
  # chunks with no queries are removed
  accavg=accavg[nqueries!=0]
  nqueries=nqueries[nqueries!=0]
  
  relcounts_perchunk=data.frame(AVG=accavg, NQ=nqueries)
  
  avgrel_perchunk =  relcounts_perchunk$AVG / relcounts_perchunk$NQ
  
  # accumulate the avg rels found. needed to build an accumulative plot  
  avgrel_perchunk_accumulated = c()
  for (l in 1:length(avgrel_perchunk)) avgrel_perchunk_accumulated[l]=sum(relcounts_perchunk$AVG[1:l])
  
  avgrel_perchunk_accumulated=avgrel_perchunk_accumulated/length(unique(qrels_df$QUERY))
  
  # plots the accumulated statistics
  
  xaxis = seq(1,length(avgrel_perchunk))
  plot(xaxis, avgrel_perchunk_accumulated, col="blue", type="b", ylab="avg rel found", xlab="# judgments", xaxt='n')
  xlabels = xaxis*chunksize
  axis(1,at=xaxis ,labels=xlabels,cex=.75)
}

© David E. Losada, 2016