This document provides all details needed to reproduce the experiments reported in the paper D. Losada, J. Parapar, A. Barreiro. “Multi-Armed Bandits for Adjudicating Documents in Pooling-Based Evaluation of Information Retrieval Systems”. (Information Processing and Management, 53(3), 1005-1025, 2017).
Any scientific publication derived from the use of this software should explicitly refer to this publication.
Next, we explain the data used for experimentation and provide our R code, which implement all pooling strategies.
We used four TREC collections (http://trec.nist.gov): TREC5, TREC6, TREC7 and TREC8.
NIST kindly provided the runs that contributed to the pools of the adhoc tasks of TREC5, TREC6, TREC7 and TREC8 (http://trec.nist.gov/data/intro_eng.html).
The pooled runs are archived by NIST within a password protected area. If you want to reproduce our experiments you need to request access to the protected area (follow the instructions given at http://trec.nist.gov/results.html).
TREC5
101 runs in the pool (77 adhoc + 24 other).
The 77 adhoc runs are: input.anu5aut1 input.anu5aut2 input.anu5man4 input.anu5man6 input.brkly15 input.brkly16 input.brkly17 input.brkly18 input.city96a1 input.city96a2 input.CLCLUS input.CLTHES input.colm1 input.colm4 input.Cor5A1se input.Cor5A2cr input.Cor5M1le input.Cor5M2rf input.Ctifr1 input.Ctifr2 input.DCU961 input.DCU962 input.DCU963 input.DCU964 input.DCU969 input.DCU96C input.DCU96D input.erliA1 input.ETHal1 input.ETHas1 input.ETHme1 input.fsclt3 input.fsclt4 input.genrl1 input.genrl2 input.genrl3 input.genrl4 input.glair4 input.gmu96au1 input.gmu96au2 input.gmu96ma1 input.gmu96ma2 input.ibmgd1 input.ibmgd2 input.ibmge1 input.ibmge2 input.ibms96a input.ibms96b input.INQ301 input.INQ302 input.KUSG2 input.KUSG3 input.LNaDesc1 input.LNaDesc2 input.LNmFull1 input.LNmFull2 input.mds001 input.mds002 input.mds003 input.Mercure-al input.Mercure-as input.MONASH input.pircsAAL input.pircsAAS input.pircsAM1 input.pircsAM2 input.sdmix1 input.sdmix2 input.umcpa1 input.uncis1 input.uncis2 input.UniNE7 input.UniNE8 input.uwgcx0 input.uwgcx1 input.vtwnA1 input.vtwnB1
The other 24 runs are: input.anu5mrg0 input.anu5mrg1 input.anu5mrg7 input.CLATMC input.CLATMN input.CLPHR0 input.CLPHR1 input.CLPHR2 input.fsclt3m input.genlp1 input.genlp2 input.genlp3 input.genlp4 input.MTRa961 input.sbase1 input.sbase2 input.UniNE0 input.UniNE9 input.xerox_nlp1 input.xerox_nlp2 input.xerox_nlp3 input.xerox_nlp4 input.xerox_nlp5 input.xerox_nlp6
50 queries: TREC topics #251-#300 (available at http://trec.nist.gov/data/topics_eng/topics.251-300.gz). But the text of the queries is not needed for the pooling experiments. You just need the relevance judgments, qrels, which are available at http://trec.nist.gov/data/qrels_eng/qrels.251-300.parts1-5.tar.gz.
TREC6
46 runs in the pool (31 adhoc + 15 other).
The 31 adhoc runs are: input.aiatB1 input.anu6min1 input.att97ac input.Brkly23 input.city6al input.CLREL input.Cor6A3cll input.csiro97a1 input.DCU97lnt input.fsclt6 input.gerua1 input.glair64 input.gmu97au1 input.harris1 input.ibmg97b input.ibms97a input.INQ401 input.ispa1 input.iss97man input.jalbse0 input.jhuapln input.LNaShort input.mds601 input.Mercure2 input.nmsu2 input.nsasg1 input.pirc7Aa input.umcpa197 input.unc6ma input.uwmt6a0 input.VrtyAH6a
The other 15 runs are: input.Cor6HP1 input.Cor6HP2 input.Cor6HP3 input.DCU97HP input.genlp1 input.Gla6DS1 input.otc1 input.otc2 input.otc3 input.pirc7Ha input.pirc7Hd input.pirc7Ht input.uwmt6h0 input.uwmt6h1 input.uwmt6h2
50 queries: TREC topics #301-#350 (available at http://trec.nist.gov/data/topics_eng/topics.301-350.gz). But the text of the queries is not needed for the pooling experiments. You just need the relevance judgments, qrels, which are available at http://trec.nist.gov/data/qrels_eng/qrels.trec6.adhoc.parts1-5.tar.gz.
TREC7
84 runs in the pool (77 adhoc + 7 other).
The 77 adhoc runs are: input.acsys7al input.acsys7mi input.AntHoc01 input.APL985LC input.APL985SC input.att98atdc input.att98atde input.bbn1 input.Brkly25 input.Brkly26 input.CLARIT98CLUS input.CLARIT98COMB input.Cor7A1clt input.Cor7A3rrf input.dsir07a01 input.dsir07a02 input.ETHAC0 input.ETHAR0 input.FLab7ad input.FLab7at input.fsclt7a input.fsclt7m input.fub98a input.fub98b input.gersh1 input.gersh2 input.harris1 input.ibmg98a input.ibmg98b input.ibms98a input.ibms98b input.ic98san3 input.ic98san4 input.iit98au1 input.iit98ma1 input.INQ501 input.INQ502 input.iowacuhk1 input.iowacuhk2 input.jalbse011 input.jalbse012 input.KD70000 input.KD71010s input.kslsV1 input.lanl981 input.LIArel2 input.LIAshort2 input.LNaTitDesc7 input.LNmanual7 input.mds98t input.mds98td input.MerAdRbtnd input.MerTetAdtnd input.nectitech input.nectitechdes input.nsasgrp3 input.nsasgrp4 input.nthu1 input.nthu2 input.nttdata7Al0 input.nttdata7Al2 input.ok7am input.ok7ax input.pirc8Aa2 input.pirc8Ad input.ScaiTrec7 input.t7miti1 input.tno7exp1 input.tno7tw4 input.umd98a1 input.umd98a2 input.unc7aal1 input.unc7aal2 input.uoftimgr input.uoftimgu input.uwmt7a1 input.uwmt7a2
The other 7 runs are: input.acsys7hp input.Cor7HP1 input.Cor7HP2 input.Cor7HP3 input.pirc8Ha input.uwmt7h1 input.uwmt7h2
50 queries: TREC topics #351-#400 (available at http://trec.nist.gov/data/topics_eng/topics.351-400.gz). But the text of the queries is not needed for the pooling experiments. You just need the relevance judgments, qrels, which are available at http://trec.nist.gov/data/qrels_eng/qrels.trec7.adhoc.parts1-5.tar.gz.
TREC8
71 runs in the pool (all adhoc).
The runs are: input.1 input.8manexT3D1N0 input.acsys8alo input.acsys8amn input.AntHoc1 input.apl8c221 input.apl8n input.att99atdc input.att99atde input.cirtrc82 input.CL99SD input.CL99XT input.disco1 input.Dm8Nbn input.Dm8TFbn input.Flab8as input.Flab8atdn input.fub99a input.fub99tf input.GE8ATDN1 input.ibmg99a input.ibmg99b input.ibms99a input.ibms99b input.ic99dafb input.iit99au1 input.iit99ma1 input.INQ603 input.INQ604 input.isa25 input.isa50 input.kdd8ps16 input.kdd8qe01 input.kuadhoc input.mds08a3 input.mds08a4 input.Mer8Adtd1 input.Mer8Adtd2 input.MITSLStd input.MITSLStdn input.nttd8ale input.nttd8alx input.ok8alx input.ok8amxc input.orcl99man input.pir9Aatd input.pir9Attd input.plt8ah1 input.plt8ah2 input.READWARE input.READWARE2 input.ric8dpx input.ric8tpx input.Sab8A1 input.Sab8A2 input.Scai8Adhoc input.surfahi1 input.surfahi2 input.tno8d3 input.tno8d4 input.umd99a1 input.unc8al32 input.unc8al42 input.UniNET8Lg input.UniNET8St input.UT810 input.UT813 input.uwmt8a1 input.uwmt8a2 input.weaver1 input.weaver2
50 queries: TREC topics #401-#450 (available at http://trec.nist.gov/data/topics_eng/topics.401-450.gz). But the text of the queries is not needed for the pooling experiments. You just need the relevance judgments, qrels, which are available at http://trec.nist.gov/data/qrels_eng/qrels.trec8.adhoc.parts1-5.tar.gz.
This section provides the R code needed for experimentation.
All pooling strategies are implemented in pooling_strategies_ms.R. Furthermore, we provide another script, process_multiple_queries_ms.R, which implements an example on how to process multiple queries. Instructions about processing multiple queries are provided here.
Besides some auxiliary functions, pooling_strategies_ms.R contains the following R functions:
pooling_random. Implements judgment ordering by random selection of runs
inputs:
query: query whose pool is gonna be ordered
pool_depth: maximum number of docs from each ranking that will be pooled
run_rankings: a list containing the rankings of all pooled runs.
It is a standard R list with as many entries as runs in the pool.
Each entry in the list contains the ranking of a run, which is stored as a
dataframe with the following column names: "QUERY","LABEL","DOC_ID","RANK","SCORE","RUN".
output:
a vector of DOCIDs, which is the sequence of documents as they must be judged
Example of usage: (given two runs, "runA" and "runB", stored in your working folder)
run_rankings=list()
df = read.table("runA",header=FALSE)
names(df)=c("QUERY","LABEL","DOC_ID","RANK","SCORE","RUN")
run_rankings[[1]]=df
df = read.table("runB",header=FALSE)
names(df)=c("QUERY","LABEL","DOC_ID","RANK","SCORE","RUN")
run_rankings[[2]]=df
listIDs = pooling_random(251,100,run_rankings)
The code above orders the pool for query 251 with pool depth 100
pooling_DOCID. Implements judgment ordering by DOCID
inputs:
query: query whose pool is gonna be ordered
pool_depth: maximum number of docs from each ranking that will be pooled
run_rankings: a list containing the rankings of all pooled runs.
It is a standard R list with as many entries as runs in the pool.
Each entry in the list contains the ranking of a run, which is stored as a
dataframe with the following column names: "QUERY","LABEL","DOC_ID","RANK","SCORE","RUN".
output:
a vector of DOCIDs, which is the sequence of documents as they must be judged
Example of usage: (given two runs, "runA" and "runB", stored in your working folder)
run_rankings=list()
df = read.table("runA",header=FALSE)
names(df)=c("QUERY","LABEL","DOC_ID","RANK","SCORE","RUN")
run_rankings[[1]]=df
df = read.table("runB",header=FALSE)
names(df)=c("QUERY","LABEL","DOC_ID","RANK","SCORE","RUN")
run_rankings[[2]]=df
listIDs = pooling_DOCID(251,100,run_rankings)
The code above orders the pool for query 251 with pool depth 100
pooling_Rank. Implements judgment ordering by Rank
inputs:
query: query whose pool is gonna be ordered
pool_depth: maximum number of docs from each ranking that will be pooled
run_rankings: a list containing the rankings of all pooled runs.
It is a standard R list with as many entries as runs in the pool.
Each entry in the list contains the ranking of a run, which is stored as a
dataframe with the following column names: "QUERY","LABEL","DOC_ID","RANK","SCORE","RUN".
output:
a vector of DOCIDs, which is the sequence of documents as they must be judged
Example of usage: (given two runs, "runA" and "runB", stored in your working folder)
run_rankings=list()
df = read.table("runA",header=FALSE)
names(df)=c("QUERY","LABEL","DOC_ID","RANK","SCORE","RUN")
run_rankings[[1]]=df
df = read.table("runB",header=FALSE)
names(df)=c("QUERY","LABEL","DOC_ID","RANK","SCORE","RUN")
run_rankings[[2]]=df
listIDs = pooling_Rank(251,100,run_rankings)
The code above orders the pool for query 251 with pool depth 100
pooling_MTF. Implements judgment ordering by MoveToFront
inputs:
query: query whose pool is gonna be ordered
pool_depth: maximum number of docs from each ranking that will be pooled
run_rankings: a list containing the rankings of all pooled runs.
It is a standard R list with as many entries as runs in the pool.
Each entry in the list contains the ranking of a run, which is stored as a
dataframe with the following column names: "QUERY","LABEL","DOC_ID","RANK","SCORE","RUN".
qrels: a data frame containing the qrels. The data frame has the following column names: "QUERY","DUMMY","DOC_ID","REL".
output:
a vector of DOCIDs, which is the sequence of documents as they must be judged
Example of usage: (given two runs, "runA" and "runB", stored in your working folder, and a qrel file, "qrels")
run_rankings=list()
df = read.table("runA",header=FALSE)
names(df)=c("QUERY","LABEL","DOC_ID","RANK","SCORE","RUN")
run_rankings[[1]]=df
df = read.table("runB",header=FALSE)
names(df)=c("QUERY","LABEL","DOC_ID","RANK","SCORE","RUN")
run_rankings[[2]]=df
qrels_df= read.table("qrels",header=FALSE)
names(qrels_df)=c("QUERY","DUMMY","DOC_ID","REL")
listIDs = pooling_MTF(251,100,run_rankings,qrels_df)
The code above orders the pool for query 251 with pool depth 100
pooling_moffat_A. Implements judgment ordering by Moffat’s method A (“summing contributions”).
inputs:
query: query whose pool is gonna be ordered
pool_depth: maximum number of docs from each ranking that will be pooled
run_rankings: a list containing the rankings of all pooled runs.
It is a standard R list with as many entries as runs in the pool.
Each entry in the list contains the ranking of a run, which is stored as a
dataframe with the following column names: "QUERY","LABEL","DOC_ID","RANK","SCORE","RUN".
output:
a vector of DOCIDs, which is the sequence of documents as they must be judged
Example of usage: (given two runs, "runA" and "runB", stored in your working folder)
run_rankings=list()
df = read.table("runA",header=FALSE)
names(df)=c("QUERY","LABEL","DOC_ID","RANK","SCORE","RUN")
run_rankings[[1]]=df
df = read.table("runB",header=FALSE)
names(df)=c("QUERY","LABEL","DOC_ID","RANK","SCORE","RUN")
run_rankings[[2]]=df
listIDs = pooling_moffat_A(251,100,run_rankings)
The code above orders the pool for query 251 with pool depth 100
pooling_moffat_B. Implements judgment ordering by Moffat’s method B (“weighting by residual”).
inputs:
query: query whose pool is gonna be ordered
pool_depth: maximum number of docs from each ranking that will be pooled
run_rankings: a list containing the rankings of all pooled runs.
It is a standard R list with as many entries as runs in the pool.
Each entry in the list contains the ranking of a run, which is stored as a
dataframe with the following column names: "QUERY","LABEL","DOC_ID","RANK","SCORE","RUN".
output:
a vector of DOCIDs, which is the sequence of documents as they must be judged
Example of usage: (given two runs, "runA" and "runB", stored in your working folder)
run_rankings=list()
df = read.table("runA",header=FALSE)
names(df)=c("QUERY","LABEL","DOC_ID","RANK","SCORE","RUN")
run_rankings[[1]]=df
df = read.table("runB",header=FALSE)
names(df)=c("QUERY","LABEL","DOC_ID","RANK","SCORE","RUN")
run_rankings[[2]]=df
listIDs = pooling_moffat_B(251,100,run_rankings)
The code above orders the pool for query 251 with pool depth 100
pooling_moffat_C. Implements judgment ordering by Moffat’s method C (“weighting by predicted score”)
inputs:
query: query whose pool is gonna be ordered
pool_depth: maximum number of docs from each ranking that will be pooled
run_rankings: a list containing the rankings of all pooled runs.
It is a standard R list with as many entries as runs in the pool.
Each entry in the list contains the ranking of a run, which is stored as a
dataframe with the following column names: "QUERY","LABEL","DOC_ID","RANK","SCORE","RUN".
qrels: a data frame containing the qrels. The data frame has the following column names: "QUERY","DUMMY","DOC_ID","REL".
output:
a vector of DOCIDs, which is the sequence of documents as they must be judged
Example of usage: (given two runs, "runA" and "runB", stored in your working folder, and a qrel file, "qrels")
run_rankings=list()
df = read.table("runA",header=FALSE)
names(df)=c("QUERY","LABEL","DOC_ID","RANK","SCORE","RUN")
run_rankings[[1]]=df
df = read.table("runB",header=FALSE)
names(df)=c("QUERY","LABEL","DOC_ID","RANK","SCORE","RUN")
run_rankings[[2]]=df
qrels_df= read.table("qrels",header=FALSE)
names(qrels_df)=c("QUERY","DUMMY","DOC_ID","REL")
listIDs = pooling_moffat_C(251,100,run_rankings,qrels_df)
The code above orders the pool for query 251 with pool depth 100
pooling_epsilon_greedy. Implements judgment ordering by epsilon-greedy
inputs:
query: query whose pool is gonna be ordered
pool_depth: maximum number of docs from each ranking that will be pooled
run_rankings: a list containing the rankings of all pooled runs.
It is a standard R list with as many entries as runs in the pool.
Each entry in the list contains the ranking of a run, which is stored as a
dataframe with the following column names: "QUERY","LABEL","DOC_ID","RANK","SCORE","RUN".
qrels: a data frame containing the qrels. The data frame has the following column names: "QUERY","DUMMY","DOC_ID","REL".
output:
a vector of DOCIDs, which is the sequence of documents as they must be judged
Example of usage: (given two runs, "runA" and "runB", stored in your working folder, and a qrel file, "qrels")
run_rankings=list()
df = read.table("runA",header=FALSE)
names(df)=c("QUERY","LABEL","DOC_ID","RANK","SCORE","RUN")
run_rankings[[1]]=df
df = read.table("runB",header=FALSE)
names(df)=c("QUERY","LABEL","DOC_ID","RANK","SCORE","RUN")
run_rankings[[2]]=df
qrels_df= read.table("qrels",header=FALSE)
names(qrels_df)=c("QUERY","DUMMY","DOC_ID","REL")
listIDs = pooling_epsilon_greedy(251,100,run_rankings,qrels_df)
The code above orders the pool for query 251 with pool depth 100
pooling_ucb. Implements judgment ordering by UCB1-Tuned
inputs:
query: query whose pool is gonna be ordered
pool_depth: maximum number of docs from each ranking that will be pooled
run_rankings: a list containing the rankings of all pooled runs.
It is a standard R list with as many entries as runs in the pool.
Each entry in the list contains the ranking of a run, which is stored as a
dataframe with the following column names: "QUERY","LABEL","DOC_ID","RANK","SCORE","RUN".
qrels: a data frame containing the qrels. The data frame has the following column names: "QUERY","DUMMY","DOC_ID","REL".
output:
a vector of DOCIDs, which is the sequence of documents as they must be judged
Example of usage: (given two runs, "runA" and "runB", stored in your working folder, and a qrel file, "qrels")
run_rankings=list()
df = read.table("runA",header=FALSE)
names(df)=c("QUERY","LABEL","DOC_ID","RANK","SCORE","RUN")
run_rankings[[1]]=df
df = read.table("runB",header=FALSE)
names(df)=c("QUERY","LABEL","DOC_ID","RANK","SCORE","RUN")
run_rankings[[2]]=df
qrels_df= read.table("qrels",header=FALSE)
names(qrels_df)=c("QUERY","DUMMY","DOC_ID","REL")
listIDs = pooling_ucb(251,100,run_rankings,qrels_df)
The code above orders the pool for query 251 with pool depth 100
pooling_bla. Implements judgment ordering by Bayesian Learning Automaton
inputs:
query: query whose pool is gonna be ordered
pool_depth: maximum number of docs from each ranking that will be pooled
run_rankings: a list containing the rankings of all pooled runs.
It is a standard R list with as many entries as runs in the pool.
Each entry in the list contains the ranking of a run, which is stored as a
dataframe with the following column names: "QUERY","LABEL","DOC_ID","RANK","SCORE","RUN".
qrels: a data frame containing the qrels. The data frame has the following column names: "QUERY","DUMMY","DOC_ID","REL".
rate: rate parameter for BLA. if set to 1, it is the standard stationary approach
output:
a vector of DOCIDs, which is the sequence of documents as they must be judged
Example of usage: (given two runs, "runA" and "runB", stored in your working folder, and a qrel file, "qrels")
run_rankings=list()
df = read.table("runA",header=FALSE)
names(df)=c("QUERY","LABEL","DOC_ID","RANK","SCORE","RUN")
run_rankings[[1]]=df
df = read.table("runB",header=FALSE)
names(df)=c("QUERY","LABEL","DOC_ID","RANK","SCORE","RUN")
run_rankings[[2]]=df
qrels_df= read.table("qrels",header=FALSE)
names(qrels_df)=c("QUERY","DUMMY","DOC_ID","REL")
listIDs = pooling_bla(251,100,run_rankings,qrels_df,1)
The code above orders the pool for query 251 with pool depth 100 for the stationary approach
pooling_mm. Implements judgment ordering by MaxMean (MM)
inputs:
query: query whose pool is gonna be ordered
pool_depth: maximum number of docs from each ranking that will be pooled
run_rankings: a list containing the rankings of all pooled runs.
It is a standard R list with as many entries as runs in the pool.
Each entry in the list contains the ranking of a run, which is stored as a
dataframe with the following column names: "QUERY","LABEL","DOC_ID","RANK","SCORE","RUN".
qrels: a data frame containing the qrels. The data frame has the following column names: "QUERY","DUMMY","DOC_ID","REL".
rate: rate parameter for MM. if set to 1, it is the standard stationary approach
output:
a vector of DOCIDs, which is the sequence of documents as they must be judged
Example of usage: (given two runs, "runA" and "runB", stored in your working folder, and a qrel file, "qrels")
run_rankings=list()
df = read.table("runA",header=FALSE)
names(df)=c("QUERY","LABEL","DOC_ID","RANK","SCORE","RUN")
run_rankings[[1]]=df
df = read.table("runB",header=FALSE)
names(df)=c("QUERY","LABEL","DOC_ID","RANK","SCORE","RUN")
run_rankings[[2]]=df
qrels_df= read.table("qrels",header=FALSE)
names(qrels_df)=c("QUERY","DUMMY","DOC_ID","REL")
listIDs = pooling_mm(251,100,run_rankings,qrels_df,1)
The code above orders the pool for query 251 with pool depth 100 for the stationary approach
pooling_borda_fuse. Implements judgment ordering by the Borda Fuse method
inputs:
query: query whose pool is gonna be ordered
pool_depth: maximum number of docs from each ranking that will be pooled
run_rankings: a list containing the rankings of all pooled runs.
It is a standard R list with as many entries as runs in the pool.
Each entry in the list contains the ranking of a run, which is stored as a
dataframe with the following column names: "QUERY","LABEL","DOC_ID","RANK","SCORE","RUN".
output:
a vector of DOCIDs, which is the sequence of documents as they must be judged
Example of usage: (given two runs, "runA" and "runB", stored in your working folder)
run_rankings=list()
df = read.table("runA",header=FALSE)
names(df)=c("QUERY","LABEL","DOC_ID","RANK","SCORE","RUN")
run_rankings[[1]]=df
df = read.table("runB",header=FALSE)
names(df)=c("QUERY","LABEL","DOC_ID","RANK","SCORE","RUN")
run_rankings[[2]]=df
listIDs = pooling_hedge(251,100,run_rankings)
The code above orders the pool for query 251 with pool depth 100
pooling_hedge. Implements judgment ordering by Hedge
inputs:
query: query whose pool is gonna be ordered
pool_depth: maximum number of docs from each ranking that will be pooled
run_rankings: a list containing the rankings of all pooled runs.
It is a standard R list with as many entries as runs in the pool.
Each entry in the list contains the ranking of a run, which is stored as a
dataframe with the following column names: "QUERY","LABEL","DOC_ID","RANK","SCORE","RUN".
qrels: a data frame containing the qrels. The data frame has the following column names: "QUERY","DUMMY","DOC_ID","REL".
output:
a vector of DOCIDs, which is the sequence of documents as they must be judged
Example of usage: (given two runs, "runA" and "runB", stored in your working folder, and a qrel file, "qrels")
run_rankings=list()
df = read.table("runA",header=FALSE)
names(df)=c("QUERY","LABEL","DOC_ID","RANK","SCORE","RUN")
run_rankings[[1]]=df
df = read.table("runB",header=FALSE)
names(df)=c("QUERY","LABEL","DOC_ID","RANK","SCORE","RUN")
run_rankings[[2]]=df
qrels_df= read.table("qrels",header=FALSE)
names(qrels_df)=c("QUERY","DUMMY","DOC_ID","REL")
listIDs = pooling_hedge(251,100,run_rankings,qrels_df)
The code above orders the pool for query 251 with pool depth 100
Instructions for experimenting with multiple queries (the example below is included into the file process_multiple_queries_ms.R).
Steps:
The function process_multiple_queries processes all queries, aggregates the statistics of avg relevant documents found, and makes a plot. The example given invokes pooling_DOCID but you can just change this line and call any other pooling strategy from pooling_strategies_ms.R.
process_multiple_queries <- function(pool_folder, qrels_path)
{
# reads the qrel file into an R dataframe with appropriate column names
qrels_df= read.table(qrels_path,header=FALSE)
names(qrels_df)=c("QUERY","DUMMY","DOC_ID","REL")
print(paste("Qrel file...",qrels_path,"...",nrow(qrels_df)," judgments."))
# reads "input*" files from pool_folder and stores them into a list of data frames (run_rankings)
files <- list.files(path=pool_folder, pattern = "input")
print(paste("Processing...",pool_folder,"...",length(files)," run files",sep=""))
run_rankings=list()
for (f in files){
filepath=paste(pool_folder,f,sep="/")
df = read.table(filepath,header=FALSE)
names(df)=c("QUERY","LABEL","DOC_ID","RANK","SCORE","RUN")
run_rankings[[length(run_rankings)+1]]=df
} # files
print(paste(length(run_rankings),"runs in the pool"))
# now, we proceed query by query, and aggregate the statistics of relevant docs found at different number of judgments
chunksize=100
pool_depth=100
queries= unique(qrels_df$QUERY)
# for computing averages across queries
maxchunks=ceiling(pool_depth*length(run_rankings)/chunksize)
accavg=rep(0,maxchunks)
nqueries=rep(0,maxchunks)
for (q in queries)
{
# this example produces the plot of pooling by DOCID.
# just change this line to compute any other judgment sequence
# (by invoking any other pooling strategy from pooling_strategies_ms.R)
judgments = pooling_DOCID(q, pool_depth, run_rankings)
# data frame with the ranking of judgments and a chunk ID for each document
chunks=ceiling((1:length(judgments))/chunksize)
current_ranking=data.frame(DOCID=judgments, CHUNK=chunks, REL=rep(NA,length(judgments)))
# get the relevance assessments for the current query
current_qrels = subset(qrels_df, QUERY==q)
# assign the relevance column for each document in the sequence
for (i in 1:length(judgments))
{
current_ranking[i,"REL"]=is_relevant(current_qrels,current_ranking[i,"DOCID"])
}
print(paste("Query...",q,", pool size:", length(judgments), ". ", sum(current_ranking$REL)," docs are relevant.",sep="" ))
rel_per_chunk = aggregate(REL~CHUNK, current_ranking, sum)
# accumulate statistics
nqueries[min(rel_per_chunk$CHUNK):max(rel_per_chunk$CHUNK)]=nqueries[min(rel_per_chunk$CHUNK):max(rel_per_chunk$CHUNK)]+1
accavg[min(rel_per_chunk$CHUNK):max(rel_per_chunk$CHUNK)]=accavg[min(rel_per_chunk$CHUNK):max(rel_per_chunk$CHUNK)]+rel_per_chunk$REL
} # for q in queries
# chunks with no queries are removed
accavg=accavg[nqueries!=0]
nqueries=nqueries[nqueries!=0]
relcounts_perchunk=data.frame(AVG=accavg, NQ=nqueries)
avgrel_perchunk = relcounts_perchunk$AVG / relcounts_perchunk$NQ
# accumulate the avg rels found. needed to build an accumulative plot
avgrel_perchunk_accumulated = c()
for (l in 1:length(avgrel_perchunk)) avgrel_perchunk_accumulated[l]=sum(relcounts_perchunk$AVG[1:l])
avgrel_perchunk_accumulated=avgrel_perchunk_accumulated/length(unique(qrels_df$QUERY))
# plots the accumulated statistics
xaxis = seq(1,length(avgrel_perchunk))
plot(xaxis, avgrel_perchunk_accumulated, col="blue", type="b", ylab="avg rel found", xlab="# judgments", xaxt='n')
xlabels = xaxis*chunksize
axis(1,at=xaxis ,labels=xlabels,cex=.75)
}
© David E. Losada, 2016