This document provides all details needed to reproduce the experiments reported in the paper D. Losada, J. Parapar, A. Barreiro. “A Rank Fusion Approach based on Score Distributions for Prioritizing Relevance Assessments in Information Retrieval Evaluation”. (Information Fusion, 39, 56-71, 2018).

Any scientific publication derived from the use of this software should explicitly refer to this publication.

Next, we explain the data used for experimentation and provide our R code, which implement all pooling strategies.

Data

We used four TREC collections (http://trec.nist.gov): TREC5, TREC6, TREC7 and TREC8.

NIST kindly provided the runs that contributed to the pools of the adhoc tasks of TREC5, TREC6, TREC7 and TREC8 (http://trec.nist.gov/data/intro_eng.html).

The pooled runs are archived by NIST within a password protected area. If you want to reproduce our experiments you need to request access to the protected area (follow the instructions given at http://trec.nist.gov/results.html).

We only used those runs that provide a retrieval score assigned to each retrieved document:

R CODE

This section provides the R code needed for experimentation.

All pooling strategies are implemented in pooling_strategies_if.R. Furthermore, we provide another script, process_multiple_queries_if.R, which implements an example on how to process multiple queries. Instructions about processing multiple queries are provided here.

Besides some auxiliary functions, pooling_strategies_if.R contains the following R functions:

Multiple queries

Instructions for experimenting with multiple queries (the example below is included into the file process_multiple_queries_if.R).

Steps:

  1. Store all pooled runs into a local folder (pool_folder).
  2. Store the official qrel file into another folder (qrels_path)
  3. Call process_multiple_queries(pool_folder,qrels_path)

The function process_multiple_queries processes all queries, aggregates the statistics of avg relevant documents found, and makes a plot. The example given invokes pooling_DOCID but you can just change this line and call any other pooling strategy from pooling_strategies_if.R.

process_multiple_queries <- function(pool_folder, qrels_path)
{
  # reads the qrel file into an R dataframe with appropriate column names
  qrels_df= read.table(qrels_path,header=FALSE)
  names(qrels_df)=c("QUERY","DUMMY","DOC_ID","REL")
  
  print(paste("Qrel file...",qrels_path,"...",nrow(qrels_df)," judgments."))
    
  # reads "input*" files from pool_folder and stores them into a list of data frames (run_rankings)
  files <- list.files(path=pool_folder, pattern = "input")
  print(paste("Processing...",pool_folder,"...",length(files)," run files",sep=""))
  
  run_rankings=list()

  for (f in files){
    filepath=paste(pool_folder,f,sep="/")  
    df = read.table(filepath,header=FALSE)
    names(df)=c("QUERY","LABEL","DOC_ID","RANK","SCORE","RUN")
    df$SCORE = transform(df$SCORE)
    run_rankings[[length(run_rankings)+1]]=df
  } # files
  
  print(paste(length(run_rankings),"runs in the pool"))
  
  # now, we proceed query by query, and aggregate the statistics of relevant docs found at different number of judgments
  chunksize=100
  pool_depth=100
  
  queries= unique(qrels_df$QUERY)
  
  # for computing averages across queries
  maxchunks=ceiling(pool_depth*length(run_rankings)/chunksize)
  
  accavg=rep(0,maxchunks)
  nqueries=rep(0,maxchunks)
  
  for (q in queries)
  {
  # this example produces the plot of pooling by DOCID.
  # just change this line to compute any other judgment sequence 
  # (by invoking any other pooling strategy from pooling_strategies_if.R) 
  judgments = pooling_DOCID(q, pool_depth, run_rankings)
  
  # data frame with the ranking of judgments and a chunk ID for each document
  chunks=ceiling((1:length(judgments))/chunksize)
  current_ranking=data.frame(DOCID=judgments, CHUNK=chunks, REL=rep(NA,length(judgments)))
  
  # get the relevance assessments for the current query
  current_qrels = subset(qrels_df, QUERY==q)
  
  # assign the relevance column for each document in the sequence 
  for (i in 1:length(judgments)) 
  {
    current_ranking[i,"REL"]=is_relevant(current_qrels,current_ranking[i,"DOCID"])
  }
  
  print(paste("Query...",q,", pool size:", length(judgments), ". ", sum(current_ranking$REL)," docs are relevant.",sep="" ))
    
  rel_per_chunk = aggregate(REL~CHUNK, current_ranking, sum)
  
  # accumulate statistics 
  nqueries[min(rel_per_chunk$CHUNK):max(rel_per_chunk$CHUNK)]=nqueries[min(rel_per_chunk$CHUNK):max(rel_per_chunk$CHUNK)]+1
  accavg[min(rel_per_chunk$CHUNK):max(rel_per_chunk$CHUNK)]=accavg[min(rel_per_chunk$CHUNK):max(rel_per_chunk$CHUNK)]+rel_per_chunk$REL 
   
  } # for q in queries
  
  # chunks with no queries are removed
  accavg=accavg[nqueries!=0]
  nqueries=nqueries[nqueries!=0]
  
  relcounts_perchunk=data.frame(AVG=accavg, NQ=nqueries)
  
  avgrel_perchunk =  relcounts_perchunk$AVG / relcounts_perchunk$NQ
  
  # accumulate the avg rels found. needed to build an accumulative plot  
  avgrel_perchunk_accumulated = c()
  for (l in 1:length(avgrel_perchunk)) avgrel_perchunk_accumulated[l]=sum(relcounts_perchunk$AVG[1:l])
  
  avgrel_perchunk_accumulated=avgrel_perchunk_accumulated/length(unique(qrels_df$QUERY))
  
  # plots the accumulated statistics
  
  xaxis = seq(1,length(avgrel_perchunk))
  plot(xaxis, avgrel_perchunk_accumulated, col="blue", type="b", ylab="avg rel found", xlab="# judgments", xaxt='n')
  xlabels = xaxis*chunksize
  axis(1,at=xaxis ,labels=xlabels,cex=.75)
}

© David E. Losada, 2017