This document provides all details needed to reproduce the experiments reported in the paper D. Losada, J. Parapar, A. Barreiro. “A Rank Fusion Approach based on Score Distributions for Prioritizing Relevance Assessments in Information Retrieval Evaluation”. (Information Fusion, 39, 56-71, 2018).
Any scientific publication derived from the use of this software should explicitly refer to this publication.
Next, we explain the data used for experimentation and provide our R code, which implement all pooling strategies.
We used four TREC collections (http://trec.nist.gov): TREC5, TREC6, TREC7 and TREC8.
NIST kindly provided the runs that contributed to the pools of the adhoc tasks of TREC5, TREC6, TREC7 and TREC8 (http://trec.nist.gov/data/intro_eng.html).
The pooled runs are archived by NIST within a password protected area. If you want to reproduce our experiments you need to request access to the protected area (follow the instructions given at http://trec.nist.gov/results.html).
We only used those runs that provide a retrieval score assigned to each retrieved document:
TREC5
input.anu5man4 input.brkly15 input.brkly16 input.city96a1 input.city96a2 input.Cor5A1se input.Cor5A2cr input.Cor5M1le input.Cor5M2rf input.DCU961 input.DCU962 input.DCU963 input.DCU964 input.ETHal1 input.ETHas1 input.ETHme1 input.genrl1 input.genrl2 input.genrl3 input.genrl4 input.gmu96au1 input.gmu96au2 input.ibmgd1 input.ibmge1 input.ibmge2 input.INQ301 input.LNaDesc1 input.LNaDesc2 input.LNmFull2 input.mds001 input.mds002 input.mds003 input.pircsAAL input.pircsAAS input.pircsAM1 input.pircsAM2 input.vtwnA1 input.vtwnB1
TREC6
input.aiatA1 input.aiatB1 input.anu6alo1 input.anu6ash1 input.anu6min1 input.att97ac input.att97ae input.att97as input.Brkly21 input.Brkly22 input.Brkly23 input.city6ad input.city6al input.city6at input.Cor6A1cls input.Cor6A2qtcs input.Cor6A3cll input.DCU97lnt input.DCU97lt input.DCU97snt input.DCU97vs input.gerua1 input.gerua3 input.gmu97au1 input.gmu97au2 input.gmu97ma1 input.gmu97ma2 input.ibmg97a input.ibmg97b input.ibms97a input.iss97man input.iss97s input.iss97vs input.LNaVryShort input.LNmShort input.mds601 input.mds602 input.mds603 input.Mercure1 input.Mercure2 input.Mercure3 input.pirc7Aa input.pirc7Ad input.pirc7At input.umcpa197 input.uwmt6a2 input.VrtyAH6a input.VrtyAH6b
TREC7
input.acsys7al input.att98atc input.att98atdc input.att98atde input.bbn1 input.Brkly24 input.Brkly25 input.Cor7A1clt input.Cor7A2rrd input.Cor7A3rrf input.FLab7ad input.FLab7at input.FLab7atE input.fub98a input.fub98b input.gersh1 input.gersh2 input.gersh3 input.ibmg98b input.ibms98a input.ibms98b input.ibms98c input.iit98au1 input.INQ501 input.INQ502 input.INQ503 input.iowacuhk1 input.iowacuhk2 input.jalbse011 input.jalbse012 input.jalbse013 input.kslsV1 input.LIArel2 input.LNaTitDesc7 input.LNmanual7 input.mds98t input.mds98t2 input.mds98td input.MerAdRbtd input.MerAdRbtnd input.MerTetAdtnd input.nectitech input.nthu1 input.nthu2 input.nttdata7Al0 input.nttdata7Al2 input.nttdata7At1 input.ok7am input.ok7as input.ok7ax input.pirc8Aa2 input.pirc8Ad input.pirc8At input.ScaiTrec7 input.tno7cbm25 input.tno7exp1 input.tno7tw4 input.umd98a1 input.umd98a2 input.unc7aal1 input.unc7aal2 input.uoftimgr
TREC8
input.8manexT3D1N0 input.acsys8alo input.acsys8alo2 input.apl8c221 input.apl8c621 input.apl8ctd input.apl8p input.att99atc input.att99atdc input.att99atde input.att99ate input.Dm8Nbn input.Dm8NbnR input.Dm8TFbn input.Dm8TFidf input.Flab8as input.Flab8at input.Flab8atd2 input.Flab8atdn input.Flab8ax input.fub99a input.fub99td input.fub99tf input.fub99tt input.GE8ATD3 input.GE8MTD2 input.ibmg99b input.ibms99b input.ibms99c input.ic99dafb input.iit99ma1 input.INQ601 input.INQ602 input.INQ603 input.INQ604 input.kdd8ps16 input.kdd8qe01 input.kdd8sh16 input.kuadhoc input.mds08a1 input.mds08a2 input.mds08a3 input.mds08a4 input.mds08a5 input.Mer8Adtd1 input.Mer8Adtd2 input.Mer8Adtd4 input.Mer8Adtnd3 input.nttd8al input.nttd8ale input.nttd8alx input.nttd8am input.nttd8ame input.ok8alx input.ok8amxc input.ok8asxc input.pir9Aa1 input.pir9Aatd input.pir9At0 input.pir9Atd0 input.pir9Attd input.plt8ah1 input.plt8ah2 input.plt8ah3 input.plt8ah4 input.plt8ah5 input.ric8dnx input.ric8dpn input.ric8dpx input.Sab8A1 input.Sab8A2 input.Sab8A3 input.Sab8A4 input.tno8d4 input.tno8t2 input.umd99a1 input.unc8al32 input.unc8al42 input.unc8al52 input.UniNET8Lg input.UniNET8St input.UT800 input.UT803b input.UT810 input.uwmt8a1 input.uwmt8a2 input.weaver1 input.weaver2
This section provides the R code needed for experimentation.
All pooling strategies are implemented in pooling_strategies_if.R. Furthermore, we provide another script, process_multiple_queries_if.R, which implements an example on how to process multiple queries. Instructions about processing multiple queries are provided here.
Besides some auxiliary functions, pooling_strategies_if.R contains the following R functions:
pooling_DOCID. Implements judgment ordering by DOCID
inputs:
query: query whose pool is gonna be ordered
pool_depth: maximum number of docs from each ranking that will be pooled
run_rankings: a list containing the rankings of all pooled runs.
It is a standard R list with as many entries as runs in the pool.
Each entry in the list contains the ranking of a run, which is stored as a
dataframe with the following column names: "QUERY","LABEL","DOC_ID","RANK","SCORE","RUN".
output:
a vector of DOCIDs, which is the sequence of documents as they must be judged
Example of usage: (given two runs, "runA" and "runB", stored in your working folder)
run_rankings=list()
df = read.table("runA",header=FALSE)
names(df)=c("QUERY","LABEL","DOC_ID","RANK","SCORE","RUN")
run_rankings[[1]]=df
df = read.table("runB",header=FALSE)
names(df)=c("QUERY","LABEL","DOC_ID","RANK","SCORE","RUN")
run_rankings[[2]]=df
listIDs = pooling_DOCID(251,100,run_rankings)
The code above orders the pool for query 251 with pool depth 100
pooling_Rank. Implements judgment ordering by Rank
inputs:
query: query whose pool is gonna be ordered
pool_depth: maximum number of docs from each ranking that will be pooled
run_rankings: a list containing the rankings of all pooled runs.
It is a standard R list with as many entries as runs in the pool.
Each entry in the list contains the ranking of a run, which is stored as a
dataframe with the following column names: "QUERY","LABEL","DOC_ID","RANK","SCORE","RUN".
output:
a vector of DOCIDs, which is the sequence of documents as they must be judged
Example of usage: (given two runs, "runA" and "runB", stored in your working folder)
run_rankings=list()
df = read.table("runA",header=FALSE)
names(df)=c("QUERY","LABEL","DOC_ID","RANK","SCORE","RUN")
run_rankings[[1]]=df
df = read.table("runB",header=FALSE)
names(df)=c("QUERY","LABEL","DOC_ID","RANK","SCORE","RUN")
run_rankings[[2]]=df
listIDs = pooling_Rank(251,100,run_rankings)
The code above orders the pool for query 251 with pool depth 100
pooling_combsum. Implements judgment ordering by CombSUM
inputs:
query: query whose pool is gonna be ordered
pool_depth: maximum number of docs from each ranking that will be pooled
run_rankings: a list containing the rankings of all pooled runs.
It is a standard R list with as many entries as runs in the pool.
Each entry in the list contains the ranking of a run, which is stored as a
dataframe with the following column names: "QUERY","LABEL","DOC_ID","RANK","SCORE","RUN".
output:
a vector of DOCIDs, which is the sequence of documents as they must be judged
Example of usage: (given two runs, "runA" and "runB", stored in your working folder)
run_rankings=list()
df = read.table("runA",header=FALSE)
names(df)=c("QUERY","LABEL","DOC_ID","RANK","SCORE","RUN")
run_rankings[[1]]=df
df = read.table("runB",header=FALSE)
names(df)=c("QUERY","LABEL","DOC_ID","RANK","SCORE","RUN")
run_rankings[[2]]=df
listIDs = pooling_combsum(251,100,run_rankings)
The code above orders the pool for query 251 with pool depth 100
pooling_combmnz. Implements judgment ordering by CombMNZ
inputs:
query: query whose pool is gonna be ordered
pool_depth: maximum number of docs from each ranking that will be pooled
run_rankings: a list containing the rankings of all pooled runs.
It is a standard R list with as many entries as runs in the pool.
Each entry in the list contains the ranking of a run, which is stored as a
dataframe with the following column names: "QUERY","LABEL","DOC_ID","RANK","SCORE","RUN".
output:
a vector of DOCIDs, which is the sequence of documents as they must be judged
Example of usage: (given two runs, "runA" and "runB", stored in your working folder)
run_rankings=list()
df = read.table("runA",header=FALSE)
names(df)=c("QUERY","LABEL","DOC_ID","RANK","SCORE","RUN")
run_rankings[[1]]=df
df = read.table("runB",header=FALSE)
names(df)=c("QUERY","LABEL","DOC_ID","RANK","SCORE","RUN")
run_rankings[[2]]=df
listIDs = pooling_combmnz(251,100,run_rankings)
The code above orders the pool for query 251 with pool depth 100
pooling_moffat_A. Implements judgment ordering by Moffat’s method A (“summing contributions”).
inputs:
query: query whose pool is gonna be ordered
pool_depth: maximum number of docs from each ranking that will be pooled
run_rankings: a list containing the rankings of all pooled runs.
It is a standard R list with as many entries as runs in the pool.
Each entry in the list contains the ranking of a run, which is stored as a
dataframe with the following column names: "QUERY","LABEL","DOC_ID","RANK","SCORE","RUN".
output:
a vector of DOCIDs, which is the sequence of documents as they must be judged
Example of usage: (given two runs, "runA" and "runB", stored in your working folder)
run_rankings=list()
df = read.table("runA",header=FALSE)
names(df)=c("QUERY","LABEL","DOC_ID","RANK","SCORE","RUN")
run_rankings[[1]]=df
df = read.table("runB",header=FALSE)
names(df)=c("QUERY","LABEL","DOC_ID","RANK","SCORE","RUN")
run_rankings[[2]]=df
listIDs = pooling_moffat_A(251,100,run_rankings)
The code above orders the pool for query 251 with pool depth 100
pooling_MTF. Implements judgment ordering by MoveToFront
inputs:
query: query whose pool is gonna be ordered
pool_depth: maximum number of docs from each ranking that will be pooled
run_rankings: a list containing the rankings of all pooled runs.
It is a standard R list with as many entries as runs in the pool.
Each entry in the list contains the ranking of a run, which is stored as a
dataframe with the following column names: "QUERY","LABEL","DOC_ID","RANK","SCORE","RUN".
qrels: a data frame containing the qrels. The data frame has the following column names: "QUERY","DUMMY","DOC_ID","REL".
output:
a vector of DOCIDs, which is the sequence of documents as they must be judged
Example of usage: (given two runs, "runA" and "runB", stored in your working folder, and a qrel file, "qrels")
run_rankings=list()
df = read.table("runA",header=FALSE)
names(df)=c("QUERY","LABEL","DOC_ID","RANK","SCORE","RUN")
run_rankings[[1]]=df
df = read.table("runB",header=FALSE)
names(df)=c("QUERY","LABEL","DOC_ID","RANK","SCORE","RUN")
run_rankings[[2]]=df
qrels_df= read.table("qrels",header=FALSE)
names(qrels_df)=c("QUERY","DUMMY","DOC_ID","REL")
listIDs = pooling_MTF(251,100,run_rankings,qrels_df)
The code above orders the pool for query 251 with pool depth 100
pooling_bb. Implements judgment ordering by Bayesian Bandits (BB)
inputs:
query: query whose pool is gonna be ordered
pool_depth: maximum number of docs from each ranking that will be pooled
run_rankings: a list containing the rankings of all pooled runs.
It is a standard R list with as many entries as runs in the pool.
Each entry in the list contains the ranking of a run, which is stored as a
dataframe with the following column names: "QUERY","LABEL","DOC_ID","RANK","SCORE","RUN".
qrels: a data frame containing the qrels. The data frame has the following column names: "QUERY","DUMMY","DOC_ID","REL".
output:
a vector of DOCIDs, which is the sequence of documents as they must be judged
Example of usage: (given two runs, "runA" and "runB", stored in your working folder, and a qrel file, "qrels")
run_rankings=list()
df = read.table("runA",header=FALSE)
names(df)=c("QUERY","LABEL","DOC_ID","RANK","SCORE","RUN")
run_rankings[[1]]=df
df = read.table("runB",header=FALSE)
names(df)=c("QUERY","LABEL","DOC_ID","RANK","SCORE","RUN")
run_rankings[[2]]=df
qrels_df= read.table("qrels",header=FALSE)
names(qrels_df)=c("QUERY","DUMMY","DOC_ID","REL")
listIDs = pooling_bb(251,100,run_rankings,qrels_df)
The code above orders the pool for query 251 with pool depth 100
pooling_borda_fuse. Implements judgment ordering by the Borda Fuse method
inputs:
query: query whose pool is gonna be ordered
pool_depth: maximum number of docs from each ranking that will be pooled
run_rankings: a list containing the rankings of all pooled runs.
It is a standard R list with as many entries as runs in the pool.
Each entry in the list contains the ranking of a run, which is stored as a
dataframe with the following column names: "QUERY","LABEL","DOC_ID","RANK","SCORE","RUN".
output:
a vector of DOCIDs, which is the sequence of documents as they must be judged
Example of usage: (given two runs, "runA" and "runB", stored in your working folder)
run_rankings=list()
df = read.table("runA",header=FALSE)
names(df)=c("QUERY","LABEL","DOC_ID","RANK","SCORE","RUN")
run_rankings[[1]]=df
df = read.table("runB",header=FALSE)
names(df)=c("QUERY","LABEL","DOC_ID","RANK","SCORE","RUN")
run_rankings[[2]]=df
listIDs = pooling_borda_fuse(251,100,run_rankings)
The code above orders the pool for query 251 with pool depth 100
pooling_hedge. Implements judgment ordering by Hedge
inputs:
query: query whose pool is gonna be ordered
pool_depth: maximum number of docs from each ranking that will be pooled
run_rankings: a list containing the rankings of all pooled runs.
It is a standard R list with as many entries as runs in the pool.
Each entry in the list contains the ranking of a run, which is stored as a
dataframe with the following column names: "QUERY","LABEL","DOC_ID","RANK","SCORE","RUN".
qrels: a data frame containing the qrels. The data frame has the following column names: "QUERY","DUMMY","DOC_ID","REL".
output:
a vector of DOCIDs, which is the sequence of documents as they must be judged
Example of usage: (given two runs, "runA" and "runB", stored in your working folder, and a qrel file, "qrels")
run_rankings=list()
df = read.table("runA",header=FALSE)
names(df)=c("QUERY","LABEL","DOC_ID","RANK","SCORE","RUN")
run_rankings[[1]]=df
df = read.table("runB",header=FALSE)
names(df)=c("QUERY","LABEL","DOC_ID","RANK","SCORE","RUN")
run_rankings[[2]]=df
qrels_df= read.table("qrels",header=FALSE)
names(qrels_df)=c("QUERY","DUMMY","DOC_ID","REL")
listIDs = pooling_hedge(251,100,run_rankings,qrels_df)
The code above orders the pool for query 251 with pool depth 100
pooling_sd_static_no_pseudoqrels. Implements judgment ordering by the Score Distribution method (static) with no pseudoqrels
inputs:
query: query whose pool is gonna be ordered
pool_depth: maximum number of docs from each ranking that will be pooled
run_rankings: a list containing the rankings of all pooled runs.
It is a standard R list with as many entries as runs in the pool.
Each entry in the list contains the ranking of a run, which is stored as a
dataframe with the following column names: "QUERY","LABEL","DOC_ID","RANK","SCORE","RUN".
output:
a vector of DOCIDs, which is the sequence of documents as they must be judged
Example of usage: (given two runs, "runA" and "runB", stored in your working folder, and a qrel file, "qrels")
run_rankings=list()
df = read.table("runA",header=FALSE)
names(df)=c("QUERY","LABEL","DOC_ID","RANK","SCORE","RUN")
run_rankings[[1]]=df
df = read.table("runB",header=FALSE)
names(df)=c("QUERY","LABEL","DOC_ID","RANK","SCORE","RUN")
run_rankings[[2]]=df
listIDs = pooling_sd_static_no_pseudoqrels(251,100,run_rankings)
The code above orders the pool for query 251 with pool depth 100
pooling_sd_static_pseudoqrels. Implements judgment ordering by the Score Distribution method (static) with pseudoqrels computed by the Soboroff’s method
inputs:
query: query whose pool is gonna be ordered
pool_depth: maximum number of docs from each ranking that will be pooled
run_rankings: a list containing the rankings of all pooled runs.
It is a standard R list with as many entries as runs in the pool.
Each entry in the list contains the ranking of a run, which is stored as a
dataframe with the following column names: "QUERY","LABEL","DOC_ID","RANK","SCORE","RUN".
output:
a vector of DOCIDs, which is the sequence of documents as they must be judged
Example of usage: (given two runs, "runA" and "runB", stored in your working folder, and a qrel file, "qrels")
run_rankings=list()
df = read.table("runA",header=FALSE)
names(df)=c("QUERY","LABEL","DOC_ID","RANK","SCORE","RUN")
run_rankings[[1]]=df
df = read.table("runB",header=FALSE)
names(df)=c("QUERY","LABEL","DOC_ID","RANK","SCORE","RUN")
run_rankings[[2]]=df
listIDs = pooling_sd_static_pseudoqrels(251,100,run_rankings)
The code above orders the pool for query 251 with pool depth 100
pooling_sd_dynamic_pseudoqrels. Implements judgment ordering by the Score Distribution method (dynamic) with pseudoqrels computed by the Soboroff’s method
inputs:
query: query whose pool is gonna be ordered
pool_depth: maximum number of docs from each ranking that will be pooled
run_rankings: a list containing the rankings of all pooled runs.
It is a standard R list with as many entries as runs in the pool.
Each entry in the list contains the ranking of a run, which is stored as a
dataframe with the following column names: "QUERY","LABEL","DOC_ID","RANK","SCORE","RUN".
qrels: a data frame containing the qrels. The data frame has the following column names: "QUERY","DUMMY","DOC_ID","REL".
output:
a vector of DOCIDs, which is the sequence of documents as they must be judged
Example of usage: (given two runs, "runA" and "runB", stored in your working folder, and a qrel file, "qrels")
run_rankings=list()
df = read.table("runA",header=FALSE)
names(df)=c("QUERY","LABEL","DOC_ID","RANK","SCORE","RUN")
run_rankings[[1]]=df
df = read.table("runB",header=FALSE)
names(df)=c("QUERY","LABEL","DOC_ID","RANK","SCORE","RUN")
run_rankings[[2]]=df
qrels_df= read.table("qrels",header=FALSE)
names(qrels_df)=c("QUERY","DUMMY","DOC_ID","REL")
listIDs = pooling_sd_dynamic_pseudoqrels(251,100,run_rankings,qrels_df)
The code above orders the pool for query 251 with pool depth 100
Instructions for experimenting with multiple queries (the example below is included into the file process_multiple_queries_if.R).
Steps:
The function process_multiple_queries processes all queries, aggregates the statistics of avg relevant documents found, and makes a plot. The example given invokes pooling_DOCID but you can just change this line and call any other pooling strategy from pooling_strategies_if.R.
process_multiple_queries <- function(pool_folder, qrels_path)
{
# reads the qrel file into an R dataframe with appropriate column names
qrels_df= read.table(qrels_path,header=FALSE)
names(qrels_df)=c("QUERY","DUMMY","DOC_ID","REL")
print(paste("Qrel file...",qrels_path,"...",nrow(qrels_df)," judgments."))
# reads "input*" files from pool_folder and stores them into a list of data frames (run_rankings)
files <- list.files(path=pool_folder, pattern = "input")
print(paste("Processing...",pool_folder,"...",length(files)," run files",sep=""))
run_rankings=list()
for (f in files){
filepath=paste(pool_folder,f,sep="/")
df = read.table(filepath,header=FALSE)
names(df)=c("QUERY","LABEL","DOC_ID","RANK","SCORE","RUN")
df$SCORE = transform(df$SCORE)
run_rankings[[length(run_rankings)+1]]=df
} # files
print(paste(length(run_rankings),"runs in the pool"))
# now, we proceed query by query, and aggregate the statistics of relevant docs found at different number of judgments
chunksize=100
pool_depth=100
queries= unique(qrels_df$QUERY)
# for computing averages across queries
maxchunks=ceiling(pool_depth*length(run_rankings)/chunksize)
accavg=rep(0,maxchunks)
nqueries=rep(0,maxchunks)
for (q in queries)
{
# this example produces the plot of pooling by DOCID.
# just change this line to compute any other judgment sequence
# (by invoking any other pooling strategy from pooling_strategies_if.R)
judgments = pooling_DOCID(q, pool_depth, run_rankings)
# data frame with the ranking of judgments and a chunk ID for each document
chunks=ceiling((1:length(judgments))/chunksize)
current_ranking=data.frame(DOCID=judgments, CHUNK=chunks, REL=rep(NA,length(judgments)))
# get the relevance assessments for the current query
current_qrels = subset(qrels_df, QUERY==q)
# assign the relevance column for each document in the sequence
for (i in 1:length(judgments))
{
current_ranking[i,"REL"]=is_relevant(current_qrels,current_ranking[i,"DOCID"])
}
print(paste("Query...",q,", pool size:", length(judgments), ". ", sum(current_ranking$REL)," docs are relevant.",sep="" ))
rel_per_chunk = aggregate(REL~CHUNK, current_ranking, sum)
# accumulate statistics
nqueries[min(rel_per_chunk$CHUNK):max(rel_per_chunk$CHUNK)]=nqueries[min(rel_per_chunk$CHUNK):max(rel_per_chunk$CHUNK)]+1
accavg[min(rel_per_chunk$CHUNK):max(rel_per_chunk$CHUNK)]=accavg[min(rel_per_chunk$CHUNK):max(rel_per_chunk$CHUNK)]+rel_per_chunk$REL
} # for q in queries
# chunks with no queries are removed
accavg=accavg[nqueries!=0]
nqueries=nqueries[nqueries!=0]
relcounts_perchunk=data.frame(AVG=accavg, NQ=nqueries)
avgrel_perchunk = relcounts_perchunk$AVG / relcounts_perchunk$NQ
# accumulate the avg rels found. needed to build an accumulative plot
avgrel_perchunk_accumulated = c()
for (l in 1:length(avgrel_perchunk)) avgrel_perchunk_accumulated[l]=sum(relcounts_perchunk$AVG[1:l])
avgrel_perchunk_accumulated=avgrel_perchunk_accumulated/length(unique(qrels_df$QUERY))
# plots the accumulated statistics
xaxis = seq(1,length(avgrel_perchunk))
plot(xaxis, avgrel_perchunk_accumulated, col="blue", type="b", ylab="avg rel found", xlab="# judgments", xaxt='n')
xlabels = xaxis*chunksize
axis(1,at=xaxis ,labels=xlabels,cex=.75)
}
© David E. Losada, 2017