# Introduction

In this notebook you find the code developed for the experimental section of our paper [1].

The reminder of this notebook goes as follows:
1. download dataset;
2. parse dataset;
3. observed user behaviour (Table 1);
4. sDCG and sRBP discount functions;
5. fit discount functions on the observed user behaviour (Figure 3, Table 2, 3, 4 and 5);
6. correlation analysis (Figure 4 and 5, Table 6 and 7)

In [1]:
import sys, os, math, re, gzip, xml.etree.ElementTree, urllib.request, plotly
import pandas as pd
import numpy as np
import scipy.stats as stats
import plotly.graph_objs as go
from plotly.offline import download_plotlyjs, init_notebook_mode, plot, iplot
from tqdm import tqdm_notebook
from scipy import signal
from bs4 import BeautifulSoup as bs

plotly.offline.init_notebook_mode(connected=True)

# Download Dataset 

From TREC ([trec.nist.gov](http://trec.nist.gov)) we download the dataset. This dataset consists of: 
1. a qrels file; 
2. a session-topic mapping file;
2. sessions files, and; 
4. search results.

However, in order to download the search results you need to request username and password following the instructions you find on this [link](https://trec.nist.gov/results.html).

## QRels

In [2]:
if not os.path.isfile('qrels.txt'):
    url_qrels = "https://trec.nist.gov/data/session/2014/judgments.txt"
    urllib.request.urlretrieve(url_qrels, "qrels.txt")
    print(url_qrels)
else:
    print('qrels.txt file exists')

https://trec.nist.gov/data/session/2014/judgments.txt


## Session-topic Mapping

In [3]:
if not os.path.isfile('session_to_topic.txt'):
    url_session_to_topic = "https://trec.nist.gov/data/session/2014/session-topic-mapping.txt"
    !rm -f session_to_topic.txt
    urllib.request.urlretrieve(url_session_to_topic, "session_to_topic.txt")
    print(url_session_to_topic)
else:
    print('session_to_topic.txt file exists')

https://trec.nist.gov/data/session/2014/session-topic-mapping.txt


## Sessions

In [4]:
if not os.path.isfile('sessions.xml'):
    url_sessions = "https://trec.nist.gov/data/session/2014/sessiontrack2014.xml.gz"
    urllib.request.urlretrieve(url_sessions, "sessions.xml.gz")
    !gunzip sessions.xml.gz
    print(url_sessions)
else:
    print('sessions.xml file exists')

https://trec.nist.gov/data/session/2014/sessiontrack2014.xml.gz


## Search Results

To execute the following code you need to first obtain username and password using this [link](https://trec.nist.gov/results.html). Then, replace the strings "username" and "password" with the correct ones.

In [5]:
username = "username"
password = "password"

if not os.path.isdir('session') or len(os.listdir('session')) == 0:
    password_mgr = urllib.request.HTTPPasswordMgrWithDefaultRealm()
    top_level_url = "https://trec.nist.gov/results"
    password_mgr.add_password(None, top_level_url, username, password)
    handler = urllib.request.HTTPBasicAuthHandler(password_mgr)
    opener = urllib.request.build_opener(handler)
    urllib.request.install_opener(opener)

    !rm -f -r session
    !mkdir session

    runs_page = opener.open("https://trec.nist.gov/results/trec23/session.input.html")
    soup = bs(runs_page)
    for link in soup.findAll('a', attrs={'href': re.compile(".*\.gz")}):
        run_path = link.get("href")
        url_run = "https://trec.nist.gov/results/trec23/"+run_path
        print(url_run)
        urllib.request.urlretrieve(url_run, run_path)
else:
    print('a session folder exists and is not empty')

https://trec.nist.gov/results/trec23/./session/input-ECxCGxPRF.RL1.gz
https://trec.nist.gov/results/trec23/./session/input-ECxCGxPRF.RL2.gz
https://trec.nist.gov/results/trec23/./session/input-ECxCGxPRF.RL3.gz
https://trec.nist.gov/results/trec23/./session/input-ECxSRMxOS.RL1.gz
https://trec.nist.gov/results/trec23/./session/input-ECxSRMxOS.RL2.gz
https://trec.nist.gov/results/trec23/./session/input-ECxSRMxOS.RL3.gz
https://trec.nist.gov/results/trec23/./session/input-ECxSRMxPRF.RL1.gz
https://trec.nist.gov/results/trec23/./session/input-ECxSRMxPRF.RL2.gz
https://trec.nist.gov/results/trec23/./session/input-ECxSRMxPRF.RL3.gz
https://trec.nist.gov/results/trec23/./session/input-GUS14Run1.RL1.gz
https://trec.nist.gov/results/trec23/./session/input-GUS14Run1.RL2.gz
https://trec.nist.gov/results/trec23/./session/input-GUS14Run1.RL3.gz
https://trec.nist.gov/results/trec23/./session/input-GUS14Run2.RL1.gz
https://trec.nist.gov/results/trec23/./session/input-GUS14Run2.RL2.gz
https://trec.nist

# Parse Dataset

With the following code we parse the downloaded files into data-structures needed for the experiments below. 

## QRels

In [6]:
qrels = {}
with open("qrels.txt") as file:
    for line in tqdm_notebook(file.readlines()):
        elem = line.split(" ")
        topic = int(elem[0])
        document = elem[2]
        rel = int(elem[3])
        if not topic in qrels:
            qrels[topic] = {}
        if not document in qrels[topic]:
            qrels[topic][document] = {}
        qrels[topic][document][0] = rel

HBox(children=(IntProgress(value=0, max=16949), HTML(value='')))




## Session-topic Mapping

In [7]:
session_qrels_remap = {}
with open("session_to_topic.txt") as file:
    for line in tqdm_notebook(file.readlines()):
        elem = re.split("\s+", line.strip())
        if not elem[0].startswith('#') and len(elem) >= 2:
            session = int(elem[0])
            topic = int(elem[1])
            session_qrels_remap[session] = topic

HBox(children=(IntProgress(value=0, max=101), HTML(value='')))




## Sessions

Each session is represented by a list of strings having the following meaning:

* **q-{n}:** The session has started. n is the number of judged relevant documents for the topic associated to this session;

* **r:** The user has reformulated the query;

* **e-{r}:** The user has examined the snippet or clicked on the document link. r is 1 when the document is relevant to the topic and 0 otherwise;

* **n-{r}:** The user has not examined the snippet or clicked on the document link. r is 1 when the document is relevant to the topic and 0 otherwise;

* **f:** The end of the session.

In [8]:
xml_sessions = xml.etree.ElementTree.parse("sessions.xml").getroot()

sessions = {}
for xml_session in tqdm_notebook(xml_sessions.findall('session')):
    session = int(xml_session.get('num'))
    
    topic = 0
    if session in session_qrels_remap:
        topic = session_qrels_remap[session]
    
    max_reformulate = 0
    rels_found = set()
    sessions[session] = []
    for xml_interaction in xml_session.findall('interaction'):
        interaction = int(xml_interaction.get("num"))
        interaction_type = xml_interaction.get("type")
        
        if interaction == 1:
            if topic > 0:
                n_rel = 0
                for doc in qrels[topic]:
                    if qrels[topic][doc][0] > 0:
                        n_rel += 1
                sessions[session].append("q-"+str(n_rel))
            else:
                sessions[session].append("q")
            max_reformulate = 0
        elif interaction_type == "reformulate":
            sessions[session].append("r")
            max_reformulate +=1
        
        rels = {}
        if topic > 0:
            for xml_result in xml_interaction.find("results").findall("result"):
                rank = int(xml_result.get("rank"))
                document = xml_result.find("clueweb12id").text
                
                rels[rank] = 0
                if document in qrels[topic] and document not in rels_found and qrels[topic][document][0] > 0:
                    #rels_found.add(document)
                    rels[rank] = 1
        
        max_rank = -1
        for xml_result in xml_interaction.find("results").findall("result"):
            max_rank = int(xml_result.get("rank"))
            break
        
        if xml_interaction.find("clicked"):
            for xml_click in xml_interaction.find("clicked").findall("click"):
                max_rank = int(xml_click.find("rank").text)
        
        for xml_result in xml_interaction.find("results").findall("result"):
            rank = int(xml_result.get("rank"))
            if topic > 0:
                if rank <= max_rank:
                    if rels[rank] > 0:
                        sessions[session].append("e-1")
                    else:
                        sessions[session].append("e-0")
                else:
                    if rels[rank] > 0:
                        sessions[session].append("n-1")
                    else:
                        sessions[session].append("n-0")
            else:
                if rank <= max_rank:
                    sessions[session].append("e")
                else:
                    sessions[session].append("n")
        
        is_e = False
        i = len(sessions[session]) - 1
        while i >= 0:
            if sessions[session][i] == "q" or sessions[session][i] == "r":
                break
            elif sessions[session][i].startswith("e"):
                is_e = True
            elif is_e:
                sessions[session][i] = sessions[session][i].replace("n", "e")
            i-=1
        
    sessions[session].append("f")

HBox(children=(IntProgress(value=0, max=1257), HTML(value='')))




## Search Results

In [9]:
runs = {}

for path, dirs, files in os.walk('./session'):
    for run_name in tqdm_notebook(files):
        run_path = "session/"+run_name
        with gzip.open(run_path, "rb") as file:
            runs[run_name] = {}
            for line in file:
                elem = str(line).split(" ")[0:4]
                session = int(elem[0].split("'")[1])
                topic = 0
                if session in session_qrels_remap:
                    topic = session_qrels_remap[session]
                if topic > 0 and topic in qrels:
                    if session not in runs[run_name]:
                        runs[run_name][session] = []

                    document = elem[2]
                    rank = elem[3]
                    rel = 0
                    if document in qrels[topic] and qrels[topic][document][0] > 0:
                        rel = 1
                    if rank == 1:
                        runs[run_name][session].append("r")
                    runs[run_name][session].append("n-" + str(rel))

HBox(children=(IntProgress(value=0, max=73), HTML(value='')))




# Observed User Behaviours (Table 1)

With this code we compute the observed user behaviours.

## Over Sessions

This generates a dataframe representing the observed examination probability distribution over queries, reformulations and ranks.

In [10]:
max_reformulation = 0
max_rank = 0
for session in sessions:
    m = 0
    n = 0
    for event in range(len(sessions[session])):
        if sessions[session][event].startswith("q"):
            m = 0
            n = 0
        if sessions[session][event].startswith("e"):
            n += 1
        if sessions[session][event] == "r":
            m += 1
            n = 0
    if m > max_reformulation:
        max_reformulation = m
    if n > max_rank:
        max_rank = n
        
count = [[0] * max_rank for i in range(max_reformulation+1)]
for session in sessions:
    m = 0
    n = 0
    for event in range(len(sessions[session])):
        if sessions[session][event].startswith("q"):
            m = 0
            n = 0
        if sessions[session][event].startswith("e"):
            count[m][n] += 1
            n += 1
        if sessions[session][event] == "r":
            m += 1
            n = 0
        
df = pd.DataFrame(count)
df = df/df.values.sum()
gt = df.values
df.transpose()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14
0,0.159816,0.096758,0.069824,0.046209,0.028593,0.014935,0.008552,0.004468,0.002042,0.001532,0.000766,0.000638,0.000511,0.000255,0.000128
1,0.042890,0.028976,0.015445,0.009574,0.005489,0.002553,0.001149,0.000638,0.000255,0.000128,0.000000,0.000128,0.000128,0.000128,0.000128
2,0.032550,0.021828,0.012254,0.008042,0.004468,0.001915,0.000894,0.000638,0.000255,0.000128,0.000000,0.000128,0.000128,0.000128,0.000128
3,0.024636,0.017871,0.009701,0.006255,0.003829,0.001532,0.000766,0.000511,0.000255,0.000128,0.000000,0.000000,0.000128,0.000000,0.000000
4,0.019403,0.014680,0.008552,0.004978,0.003319,0.001404,0.000638,0.000383,0.000255,0.000128,0.000000,0.000000,0.000128,0.000000,0.000000
5,0.016211,0.011999,0.007021,0.004212,0.002681,0.001149,0.000638,0.000255,0.000255,0.000128,0.000000,0.000000,0.000128,0.000000,0.000000
6,0.013531,0.010212,0.006127,0.003702,0.002681,0.001149,0.000638,0.000255,0.000255,0.000000,0.000000,0.000000,0.000128,0.000000,0.000000
7,0.011105,0.008552,0.005234,0.003319,0.002681,0.001021,0.000638,0.000255,0.000255,0.000000,0.000000,0.000000,0.000128,0.000000,0.000000
8,0.008808,0.007276,0.004595,0.003319,0.002553,0.000894,0.000638,0.000128,0.000255,0.000000,0.000000,0.000000,0.000128,0.000000,0.000000
9,0.006255,0.005744,0.004212,0.003064,0.002553,0.000894,0.000511,0.000128,0.000255,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000


## Over Independent Queries

We do the same but this time assuming every query as part of its own session, i.e. queries and reformulations are treated independently.

In [11]:
count = [0] * max_rank
for session in sessions:
    m = 0
    n = 0
    for event in range(len(sessions[session])):
        if sessions[session][event].startswith("q"):
            m = 0
            n = 0
        if sessions[session][event].startswith("e"):
            count[n] += 1
            n += 1
        if sessions[session][event] == "r":
            m += 1
            n = 0
  
df = pd.DataFrame(count)
df = df/df.values.sum()
nt = df.values
df

Unnamed: 0,0
0,0.435027
1,0.107608
2,0.083482
3,0.065611
4,0.053868
5,0.044677
6,0.038678
7,0.033189
8,0.028593
9,0.023615


# sDCG and sRBP Discount Functions

Here we define the discount functions for sDCG and sRBP.

In [12]:
def d_sdcg(bq, b, m, n):
    return 1.0/((1.0 + math.log(m+1, bq))*math.log(n+2, b))

def d_srbp(b, p, m, n):
    def spow(base, exp):
        if base == 0 and exp == 0:
            return 1.0
        else:
            return base**exp
        
    if b == 1.0 and p == 1.0:
        if m == 0:
            return spow(b*p, n)
        else:
            return 0.0
    else:
        return spow((p - b*p)/(1.0 - b*p), m)*spow(b*p, n)

# Fitting on the Observed User Behaviours (Figure 3, Table 2, 3, 4, and 5)

We now fit the discount functions on the observed user behaviours using as a loss function the total squared error (TSE).

In [13]:
M = gt.shape[0]
N = gt.shape[1]

## On Sessions

### sRBP

In [14]:
min_err = sys.float_info.max
res = 100
best_p = -1
best_b = -1
grid = {}

def tse_srbp(b, p):
    norm = 0.0
    for i in range(M*N):
        norm += d_srbp(b, p, i//N, i%N)
    err = 0.0
    for i in range(M*N):
        err += (gt[i//N, i%N]-d_srbp(b, p, i//N, i%N)/norm)**2
    return err

t = tqdm_notebook(np.linspace(0, 1, res+1), desc = "TSE: {:.4f}".format(min_err))
for p_i, p in enumerate(t):

    grid[p_i] = []
    for b in np.linspace(0, 1, res+1):
        
        err = tse_srbp(b, p)
        
        grid[p_i].append(err)

        if err < min_err:
            min_err = err
            best_p = p
            best_b = b
            t.set_description("TSE: {:.4f}".format(min_err))

grid = pd.DataFrame.from_dict(grid)

srbp_best_b = best_b
srbp_best_p = best_p

print("b =", best_b, ", p =", best_p, ", TSE = {:.4f}".format(min_err))

HBox(children=(IntProgress(value=0, description='TSE: 17976931348623157081452742373170435679807056752584499659…


b = 0.64 , p = 0.86 , TSE = 0.0046


#### Sensitivity Plots (Figure 3)

In [15]:
x_min = []
y_min = []
z_min = []
g_min = sys.float_info.max

for i in range(grid.shape[0]):
    _z_min = sys.float_info.max
    for j in range(grid.shape[1]):
        if grid[i][j] < _z_min:
            _z_min = grid[i][j]
            _x_min = i
            _y_min = j
            
    count_min = 0
    for j in range(grid.shape[1]):
        if grid[i][j] == _z_min:
            count_min += 1
    
    if count_min == 1:
        x_min.append(_x_min/res)
        y_min.append(_y_min/res)
        z_min.append(_z_min)
        g_min = _z_min
        
x_min_p = x_min
y_min_p = y_min
z_min_p = z_min
g_min_p = g_min

x_min = []
y_min = []
z_min = []
g_min = sys.float_info.max

for j in range(grid.shape[0]):
    _z_min = sys.float_info.max
    for i in range(grid.shape[1]):
        if grid[i][j] < _z_min:
            _z_min = grid[i][j]
            _x_min = i
            _y_min = j
            
    count_min = 0
    for i in range(grid.shape[1]):
        if grid[i][j] == _z_min:
            count_min += 1
    
    if count_min == 1:# and _z_min < g_min:
        x_min.append(_x_min/res)
        y_min.append(_y_min/res)
        z_min.append(_z_min)
        g_min = _z_min

x_min_b = x_min
y_min_b = y_min
z_min_b = z_min
g_min_b = g_min

data = [
    go.Surface(x=np.linspace(0, 1, res+1), y=np.linspace(0, 1, res+1), z=grid.values, 
               colorscale='Viridis', reversescale=True),
    go.Scatter3d(x=signal.savgol_filter(x_min_p, 7, 1), y=signal.savgol_filter(y_min_p, 7, 1), z=z_min_p, 
                     mode='lines',
                     line=dict(color='black', width=5),
                     showlegend=False),
    go.Scatter3d(x=signal.savgol_filter(x_min_b, 7, 1), y=signal.savgol_filter(y_min_b, 7, 1), z=z_min_b, 
                     mode='lines',
                     line=dict(color='black', width=5),
                     showlegend=False)]

layout = go.Layout(
    scene=dict(
        xaxis=dict(title='p', showbackground=True, backgroundcolor='rgb(230, 230,230)', gridcolor='rgb(255, 255, 255)', zerolinecolor='rgb(255, 255, 255)'),
        yaxis=dict(title='b', showbackground=True, backgroundcolor='rgb(230, 230,230)', gridcolor='rgb(255, 255, 255)', zerolinecolor='rgb(255, 255, 255)'),
        zaxis=dict(title='TSE', showbackground=True, backgroundcolor='rgb(230, 230,230)', gridcolor='rgb(255, 255, 255)', zerolinecolor='rgb(255, 255, 255)')))

fig = go.Figure(data=data, layout=layout)
iplot(fig)

In [16]:
best_p_i = -1
best_b_i = -1
for p_i, p in enumerate(np.linspace(0, 1, res+1)):
    for b_i, b in enumerate(np.linspace(0, 1, res+1)):
        if min_err == grid.iat[p_i, b_i]:
            best_p_i = p_i
            best_b_i = b_i

In [17]:
def plot_line(name_x, x, name_y, y):
    trace = go.Scatter(
        x = x,
        y = y)

    layout = go.Layout(
        xaxis=go.layout.XAxis(title=go.layout.xaxis.Title(text=name_x)),
        yaxis=go.layout.YAxis(title=go.layout.yaxis.Title(text=name_y)))
    
    data = [trace]
    
    fig = go.Figure(data=data, layout=layout)
    iplot(fig)

In [18]:
y = grid.values[best_p_i]
x = np.linspace(0, 1, res+1)


plot_line("b", x, "TSE", y)

In [19]:
y = grid.values[:,best_b_i]
x = np.linspace(0, 1, res+1)

plot_line("p", x, "TSE", y)

#### Table 2

In [20]:
dist_srbp = [[0] * N for i in range(M)]
norm = 0.0      
for m in range(M):
    for n in range(N):
        dist_srbp[m][n] = d_srbp(srbp_best_b, srbp_best_p, m, n)
        norm += dist_srbp[m][n]
        
dist_srbp = pd.DataFrame(dist_srbp)
dist_srbp = dist_srbp/norm
dist_srbp.transpose()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14
0,1.405216e-01,9.676489e-02,6.663348e-02,4.588462e-02,3.159670e-02,2.175787e-02,1.498273e-02,1.031729e-02,7.104612e-03,4.892322e-03,3.368912e-03,2.319874e-03,1.597493e-03,1.100053e-03,7.575098e-04
1,7.734311e-02,5.325940e-02,3.667507e-02,2.525489e-02,1.739083e-02,1.197553e-02,8.246497e-03,5.678638e-03,3.910379e-03,2.692734e-03,1.854249e-03,1.276858e-03,8.792602e-04,6.054692e-04,4.169334e-04
2,4.256965e-02,2.931397e-02,2.018596e-02,1.390029e-02,9.571910e-03,6.591333e-03,4.538872e-03,3.125522e-03,2.152272e-03,1.482081e-03,1.020579e-03,7.027829e-04,4.839448e-04,3.332502e-04,2.294801e-04
3,2.343033e-02,1.613441e-02,1.111035e-02,7.650722e-03,5.268380e-03,3.627870e-03,2.498195e-03,1.720287e-03,1.184611e-03,8.157373e-04,5.617265e-04,3.868117e-04,2.663632e-04,1.834209e-04,1.263059e-04
4,1.289606e-02,8.880380e-03,6.115137e-03,4.210957e-03,2.899716e-03,1.996780e-03,1.375007e-03,9.468462e-04,6.520097e-04,4.489818e-04,3.091743e-04,2.129012e-04,1.466063e-04,1.009549e-04,6.951875e-05
5,7.097989e-03,4.887761e-03,3.365771e-03,2.317711e-03,1.596004e-03,1.099027e-03,7.568036e-04,5.211441e-04,3.588662e-04,2.471196e-04,1.701695e-04,1.171808e-04,8.069212e-05,5.556557e-05,3.826312e-05
6,3.906733e-03,2.690224e-03,1.852521e-03,1.275668e-03,8.784405e-04,6.049047e-04,4.165447e-04,2.868377e-04,1.975199e-04,1.360146e-04,9.366131e-05,6.449631e-05,4.441294e-05,3.058329e-05,2.106002e-05
7,2.150266e-03,1.480699e-03,1.019627e-03,7.021277e-04,4.834936e-04,3.329396e-04,2.292662e-04,1.578755e-04,1.087150e-04,7.486244e-05,5.155118e-05,3.549877e-05,2.444488e-05,1.683304e-05,1.159144e-05
8,1.183506e-03,8.149768e-04,5.612029e-04,3.864511e-04,2.661149e-04,1.832499e-04,1.261881e-04,8.689467e-05,5.983672e-05,4.120429e-05,2.837377e-05,1.953852e-05,1.345446e-05,9.264906e-06,6.379926e-06
9,6.514019e-04,4.485632e-04,3.088861e-04,2.127027e-04,1.464696e-04,1.008608e-04,6.945394e-05,4.782683e-05,3.293413e-05,2.267884e-05,1.561692e-05,1.075400e-05,7.405337e-06,5.099404e-06,3.511512e-06


### sDCG

In [21]:
res = 0.01
min_err = sys.float_info.max
best_bq = -1
best_b = -1
grid = {}

def tse_sdcg(bq, b):
    err = 0.0
    norm = 0.0
    for i in range(M*N):
        norm += d_sdcg(bq, b, i//N, i%N)
    for i in range(M*N):
        err += (gt[i//N, i%N]-d_sdcg(bq, b, i//N, i%N)/norm)**2
    return err

t = tqdm_notebook(np.arange(1.01, 5+res, res), desc = "TSE: {:.4f}".format(min_err))
for bq_i, bq in enumerate(t):

    grid[bq_i] = []
    for b in np.arange(1.01, 20+res, res):
        
        err = tse_sdcg(bq, b)
        
        grid[bq_i].append(err)

        if err < min_err:
            min_err = err
            best_bq = bq
            best_b = b
            t.set_description("TSE: {:.4f}".format(min_err))

grid = pd.DataFrame.from_dict(grid)

sdcg_best_bq = best_bq
sdcg_best_b = best_b

print("bq =", best_bq, ", b =", best_b, ", TSE = {:.4f}".format(min_err))

HBox(children=(IntProgress(value=0, description='TSE: 17976931348623157081452742373170435679807056752584499659…


bq = 1.07 , b = 4.540000000000003 , TSE = 0.0362


#### Table 3

In [22]:
dist_sdcg = [[0] * N for i in range(M)]
norm = 0.0      
for m in range(M):
    for n in range(N):
        dist_sdcg[m][n] = d_sdcg(sdcg_best_bq, sdcg_best_b, m, n)
        norm += dist_sdcg[m][n]
        
dist_sdcg = pd.DataFrame(dist_sdcg)
dist_sdcg = dist_sdcg/norm
dist_sdcg.transpose()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14
0,0.044418,0.003950,0.002577,0.002067,0.001792,0.001616,0.001493,0.001400,0.001327,0.001268,0.001219,0.001177,0.001142,0.001110,0.001083
1,0.028025,0.002492,0.001626,0.001304,0.001131,0.001020,0.000942,0.000883,0.000837,0.000800,0.000769,0.000743,0.000720,0.000701,0.000683
2,0.022209,0.001975,0.001288,0.001033,0.000896,0.000808,0.000746,0.000700,0.000663,0.000634,0.000609,0.000589,0.000571,0.000555,0.000541
3,0.019130,0.001701,0.001110,0.000890,0.000772,0.000696,0.000643,0.000603,0.000571,0.000546,0.000525,0.000507,0.000492,0.000478,0.000466
4,0.017183,0.001528,0.000997,0.000800,0.000693,0.000625,0.000577,0.000541,0.000513,0.000490,0.000472,0.000455,0.000442,0.000430,0.000419
5,0.015822,0.001407,0.000918,0.000736,0.000638,0.000576,0.000532,0.000499,0.000473,0.000452,0.000434,0.000419,0.000407,0.000395,0.000386
6,0.014806,0.001317,0.000859,0.000689,0.000597,0.000539,0.000498,0.000467,0.000442,0.000423,0.000406,0.000392,0.000381,0.000370,0.000361
7,0.014012,0.001246,0.000813,0.000652,0.000565,0.000510,0.000471,0.000442,0.000419,0.000400,0.000385,0.000371,0.000360,0.000350,0.000342
8,0.013371,0.001189,0.000776,0.000622,0.000539,0.000487,0.000449,0.000421,0.000399,0.000382,0.000367,0.000354,0.000344,0.000334,0.000326
9,0.012840,0.001142,0.000745,0.000597,0.000518,0.000467,0.000431,0.000405,0.000384,0.000367,0.000352,0.000340,0.000330,0.000321,0.000313


## On Independent Queries

### RBP

In [23]:
min_err = sys.float_info.max
best_p = -1
res = 0.01

def tse_rbp(p):
    norm = 0.0
    for i in range(N):
        norm += p**i
    err = 0.0
    for i in range(N):
        err += (nt[i][0]-(p**i)/norm)**2
    return err

t = tqdm_notebook(np.arange(0, 1 + res, res), desc = "TSE: {:.4f}".format(min_err))
for p_i, p in enumerate(t):

    err = tse_rbp(p)

    if err < min_err:
        min_err = err
        best_p = p
        t.set_description("TSE: {:.4f}".format(min_err))

rbp_best_p = best_p

print("p =", best_p, ", TSE = {:.4f}".format(min_err))

HBox(children=(IntProgress(value=0, description='TSE: 17976931348623157081452742373170435679807056752584499659…


p = 0.59 , TSE = 0.0252


In [24]:
dist_rbp = [0] * N
norm = 0.0      
for n in range(N):
    dist_rbp[n] = rbp_best_p**n
    norm += dist_rbp[n]
        
dist_rbp = pd.DataFrame(dist_rbp)
dist_rbp = dist_rbp/norm
dist_rbp

Unnamed: 0,0
0,4.100000e-01
1,2.419000e-01
2,1.427210e-01
3,8.420539e-02
4,4.968118e-02
5,2.931190e-02
6,1.729402e-02
7,1.020347e-02
8,6.020048e-03
9,3.551828e-03


### sRBP

In [25]:
min_err = sys.float_info.max
res = 0.01
best_p = -1
best_b = -1
grid = {}

def tse_srbp(b, p):
    norm = 0.0
    for i in range(N):
        norm += d_srbp(b, p, 0, i)
    err = 0.0
    for i in range(N):
        err += (nt[i][0]-d_srbp(b, p, 0, i)/norm)**2
    return err

t = tqdm_notebook(np.arange(0, 1+res, res), desc = "TSE: {:.4f}".format(min_err))
for p_i, p in enumerate(t):

    grid[p_i] = []
    for b in np.arange(0, 1+res, res):
        
        err = tse_srbp(b, p)
        
        grid[p_i].append(err)

        if err < min_err:
            min_err = err
            best_p = p
            best_b = b
            t.set_description("TSE: {:.4f}".format(min_err))

grid = pd.DataFrame.from_dict(grid)

srbp_best_p_2 = best_p
srbp_best_b_2 = best_b

print("b =", best_b, ", p =", best_p, ", TSE = {:.4f}".format(min_err))

HBox(children=(IntProgress(value=0, description='TSE: 17976931348623157081452742373170435679807056752584499659…


b = 0.92 , p = 0.64 , TSE = 0.0252


In [26]:
dist_srbp_2 = [0] * N
norm = 0.0      
for n in range(N):
    dist_srbp_2[n] = d_srbp(srbp_best_b_2, srbp_best_p_2, 0, n)
    norm += dist_srbp_2[n]

dist_srbp_2 = pd.DataFrame(dist_srbp_2)
dist_srbp_2 = dist_srbp_2/norm
dist_srbp_2

Unnamed: 0,0
0,4.112000e-01
1,2.421146e-01
2,1.425571e-01
3,8.393759e-02
4,4.942245e-02
5,2.909994e-02
6,1.713405e-02
7,1.008853e-02
8,5.940124e-03
9,3.497545e-03


### DCG

In [27]:
min_err = sys.float_info.max
best_b = -1

def tse_dcg(b):
    norm = 0.0
    for i in range(N):
        norm += 1.0/math.log(i+2, b)
    err = 0.0
    for i in range(N):
        err += (nt[i][0]-1.0/(math.log(i+2, b)*norm))**2
    return err

t = tqdm_notebook(np.arange(1.01, 20.01, 1.0/res), desc = "TSE: {:.4f}".format(min_err))
for b_i, b in enumerate(t):

    err = tse_dcg(b)

    if err < min_err:
        min_err = err
        best_b = b
        t.set_description("TSE: {:.4f}".format(min_err))

dcg_best_b = best_b

print("b =", best_b, ", TSE = {:.4f}".format(min_err))

HBox(children=(IntProgress(value=0, description='TSE: 17976931348623157081452742373170435679807056752584499659…


b = 1.01 , TSE = 0.1521


In [28]:
dist_dcg = [0] * N
norm = 0.0      
for n in range(N):
    dist_dcg[n] = 1.0/math.log(n+2, dcg_best_b)
    norm += dist_dcg[n]
        
dist_dcg = pd.DataFrame(dist_dcg)
dist_dcg = dist_dcg/norm
dist_dcg

Unnamed: 0,0
0,0.067638
1,0.042675
2,0.033819
3,0.029130
4,0.026166
5,0.024093
6,0.022546
7,0.021337
8,0.020361
9,0.019552


### sDCG

In [29]:
res = 100
min_err = sys.float_info.max
best_bq = -1
best_b = -1
grid = {}

def tse_sdcg(bq, b):
    norm = 0.0
    for i in range(N):
        norm += d_sdcg(bq, b, 0, i%N)
    err = 0.0
    for i in range(N):
        err += (nt[i][0]-d_sdcg(bq, b, 0, i%N)/norm)**2
    return err

t = tqdm_notebook(np.arange(1.01, 5.01, 1.0/res), desc = "TSE: {:.4f}".format(min_err)) # 5.01
for bq_i, bq in enumerate(t):

    grid[bq_i] = []
    for b in np.arange(1.01, 20.01, 1.0/res): # 20.01
        
        err = tse_sdcg(bq, b)
        
        grid[bq_i].append(err)

        if err < min_err:
            min_err = err
            best_bq = bq
            best_b = b
            t.set_description("TSE: {:.4f}".format(min_err))

grid = pd.DataFrame.from_dict(grid)

sdcg_best_bq_2 = best_bq
sdcg_best_b_2 = best_b

print("bq =", best_bq, ", b = ", best_b, ", TSE = {:.4f}".format(min_err))

HBox(children=(IntProgress(value=0, description='TSE: 17976931348623157081452742373170435679807056752584499659…


bq = 1.01 , b =  1.2600000000000002 , TSE = 0.1521


In [30]:
dist_sdcg_2 = [0] * N
norm = 0.0      
for n in range(N):
    dist_sdcg_2[n] = d_sdcg(sdcg_best_bq_2, sdcg_best_b_2, 0, n)
    norm += dist_sdcg_2[n]

dist_sdcg_2 = pd.DataFrame(dist_sdcg_2)
dist_sdcg_2 = dist_sdcg_2/norm
dist_sdcg_2

Unnamed: 0,0
0,0.067638
1,0.042675
2,0.033819
3,0.029130
4,0.026166
5,0.024093
6,0.022546
7,0.021337
8,0.020361
9,0.019552


## Compute Total Absolute Error (TAE)

In [31]:
print("Sessions")
e = 0.0
for m in range(M):
    for n in range(N):
        e += abs(gt[m, n] - dist_srbp.iat[m, n])
        
print("\tsRBP \t{:.4f}".format(e))

e = 0.0
for m in range(M):
    for n in range(N):
        e += abs(gt[m, n] - dist_sdcg.iat[m, n])
        
print("\tsDCG \t{:.4f}".format(e))

e = 0.0
for n in range(N):
    e += abs(nt[n][0] - dist_rbp.values[n][0])
        
print("Independent Queries")
print("\tRBP \t{:.4f}".format(e))

e = 0.0
for n in range(N):
    e += abs(nt[n][0] - dist_srbp_2.values[n][0])
        
print("\tsRBP \t{:.4f}".format(e))

e = 0.0
for n in range(N):
    e += abs(nt[n][0] - dist_dcg.values[n][0])
        
print("\tDCG \t{:.4f}".format(e))

e = 0.0
for n in range(N):
    e += abs(nt[n][0] - dist_sdcg_2.values[n][0])
        
print("\tsDCG \t{:.4f}".format(e))

Sessions
	sRBP 	0.4950
	sDCG 	1.3357
Independent Queries
	RBP 	0.4242
	sRBP 	0.4238
	DCG 	1.2162
	sDCG 	1.2162


## Compute Kullback–Leibler Divergence (KLD)

In [32]:
print("Sessions")
e = 0.0
for m in range(M):
    for n in range(N):
        if gt[m, n] > 0:
            e += gt[m, n]*math.log(gt[m, n]/dist_srbp.iat[m, n], 2)
        
print("\tsRBP \t{:.4f}".format(e))

e = 0.0
for m in range(M):
    for n in range(N):
        if gt[m, n] > 0:
            e += gt[m, n]*math.log(gt[m, n]/dist_sdcg.iat[m, n], 2)
        
print("\tsDCG \t{:.4f}".format(e))

e = 0.0
for n in range(N):
    if nt[n][0] > 0:
        e += nt[n][0]*math.log(nt[n][0]/dist_rbp.values[n][0], 2)

print("Independent Queries")
print("\tRBP \t{:.4f}".format(e))

e = 0.0
for n in range(N):
    if nt[n, 0] > 0:
        e += nt[n, 0]*math.log(nt[n, 0]/dist_srbp_2.values[n][0], 2)
        
print("\tsRBP \t{:.4f}".format(e))

e = 0.0
for n in range(N):
    if nt[n, 0] > 0:
        e += nt[n, 0]*math.log(nt[n, 0]/dist_dcg.values[n][0], 2)
        
print("\tDCG \t{:.4f}".format(e))

e = 0.0
for n in range(N):
    if nt[n, 0] > 0:
        e += nt[n, 0]*math.log(nt[n, 0]/dist_sdcg_2.values[n][0], 2)
        
print("\tsDCG \t{:.4f}".format(e))

Sessions
	sRBP 	0.9475
	sDCG 	2.2710
Independent Queries
	RBP 	0.6624
	sRBP 	0.6679
	DCG 	1.5035
	sDCG 	1.5035


# Correlation Analysis

We now compare how the evaluation measures behave when used as evaluation measures on actual sessions and search results. But before doing this we need to select only those sessions that have been judged (topics are contained on the qRels).

In [33]:
judged_sessions = []
for session in range(1, 101):
    judged_sessions.append(sessions[session])

To use standard evaluation measures on sessions we can either (i) evaluate only
the last reformulation, or (ii) aggregate evaluations of the query and all reformulations together. Of course, session-based evaluation measures do not have this issue.

In [34]:
def standard_i_measure(d, sessions):
    res=0.0
    for session in sessions:
        r=0.0
        n=0
        for event in session:
            if event.startswith('q') or event == 'r':
                n=0
                r=0.0
            elif event.endswith('-1'):
                r+=d(n)
                n+=1
            elif event != 'f':
                n+=1
        res+=r
            
    res/=len(sessions)
    return res

def standard_ii_measure(d, sessions):
    res=0.0
    m=-1 # reformulation
    for session in sessions:
        n=0 # rank
        for event in session:
            if event.startswith('q') or event == 'r':
                m+=1
                n=0
            elif event.endswith('-1'):
                res+=d(n)
                n+=1
            elif event != 'f':
                n+=1
            
    res/=(m+1)
    return res

def session_based_measure(d, sessions):
    res=0.0
    for session in sessions:
        n=0 # rank
        m=0 # reformulation
        for event in session:
            if event.startswith('q'):
                m=0
                n=0
            elif event == 'r':
                n=0
                m+=1
            elif event.endswith('-1'):
                res += d(m, n)
                n+=1
            elif event != 'f':
                n+=1                
                            
    res/=len(sessions)
    
    return res
    
def rbp_i(p, sessions):
    return (1-p)*standard_i_measure(lambda n : p**n, sessions)

def dcg_i(b, sessions):
    return standard_i_measure(lambda n : 1.0/math.log(n+2, b), sessions)

def rbp_ii(p, sessions):
    return (1-p)*standard_ii_measure(lambda n : p**n, sessions)

def dcg_ii(b, sessions):
    return standard_ii_measure(lambda n : 1.0/math.log(n+2, b), sessions)

def srbp(b, p, sessions):
    return (1-p)*session_based_measure(lambda m, n : d_srbp(b, p, m, n), sessions)

def sdcg(bq, b, sessions):
    return session_based_measure(lambda m, n : d_sdcg(bq, b, m, n), sessions)

## Kendall's Tau

### On Sessions (Figure 4 and Table 6)

In [35]:
srbp_vs   = []
rbp_i_vs  = []
rbp_ii_vs = []
sdcg_vs   = []
dcg_i_vs  = []
dcg_ii_vs = []

for i in range(len(judged_sessions)):
    srbp_vs.append(
        srbp(srbp_best_b, srbp_best_p, [judged_sessions[i]]))
    rbp_i_vs.append(
        rbp_i(rbp_best_p, [judged_sessions[i]])) 
    rbp_ii_vs.append(
        rbp_ii(rbp_best_p, [judged_sessions[i]]))
    sdcg_vs.append(
        sdcg(sdcg_best_bq, sdcg_best_b, [judged_sessions[i]]))
    dcg_i_vs.append(
        dcg_i(dcg_best_b, [judged_sessions[i]])) 
    dcg_ii_vs.append(
        dcg_ii(dcg_best_b, [judged_sessions[i]])) 
    
ls = [("sRBP (b {:.2f}, p {:.2f})".format(srbp_best_b, srbp_best_p), srbp_vs),
      ("RBP i (p {:.2f})".format(rbp_best_p), rbp_i_vs),
      ("RBP ii (p {:.2f})".format(rbp_best_p), rbp_ii_vs),
      ("sDCG (bq {:.2f}, b {:.2f})".format(sdcg_best_bq, sdcg_best_b), sdcg_vs),
      ("DCG i (b {:.2f})".format(dcg_best_b), dcg_i_vs),
      ("DCG ii (b {:.2f})".format(dcg_best_b), dcg_ii_vs)]

for name1, vs1 in ls:
    for name2, vs2 in ls:
        if name1 > name2:
            print("{:<25} - {:<25} = {:.3f}".format(name1, name2, stats.kendalltau(vs1, vs2).correlation))

sRBP (b 0.64, p 0.86)     - RBP i (p 0.59)            = 0.555
sRBP (b 0.64, p 0.86)     - RBP ii (p 0.59)           = 0.801
sRBP (b 0.64, p 0.86)     - sDCG (bq 1.07, b 4.54)    = 0.772
sRBP (b 0.64, p 0.86)     - DCG i (b 1.01)            = 0.532
sRBP (b 0.64, p 0.86)     - DCG ii (b 1.01)           = 0.745
RBP i (p 0.59)            - DCG i (b 1.01)            = 0.906
RBP i (p 0.59)            - DCG ii (b 1.01)           = 0.666
RBP ii (p 0.59)           - RBP i (p 0.59)            = 0.675
RBP ii (p 0.59)           - DCG i (b 1.01)            = 0.659
RBP ii (p 0.59)           - DCG ii (b 1.01)           = 0.869
sDCG (bq 1.07, b 4.54)    - RBP i (p 0.59)            = 0.532
sDCG (bq 1.07, b 4.54)    - RBP ii (p 0.59)           = 0.747
sDCG (bq 1.07, b 4.54)    - DCG i (b 1.01)            = 0.560
sDCG (bq 1.07, b 4.54)    - DCG ii (b 1.01)           = 0.779
DCG ii (b 1.01)           - DCG i (b 1.01)            = 0.702


In [36]:
def plot_scatter(name_x, x, name_y, y):
    trace = go.Scatter(
        x = x,
        y = y,
        mode = 'markers')

    layout = go.Layout(
        xaxis=go.layout.XAxis(title=go.layout.xaxis.Title(text=name_x)),
        yaxis=go.layout.YAxis(title=go.layout.yaxis.Title(text=name_y)))
    
    data = [trace]
    
    fig = go.Figure(data=data, layout=layout)
    iplot(fig)

In [37]:
for name1, vs1 in ls:
    for name2, vs2 in ls:
        if name1 > name2:
            plot_scatter(name1, vs1, name2, vs2)

### On Search Results (Figure 5 and Table 7)

The search results provide only ranked list of documents for the last reformulation in each session. To create sessions out of these runs we pre-append to each run its session.

In [38]:
def join_sessions_to_runs(sessions, runs):
    res = {}
    for run in runs:
        res[run] = {}
        for session in sessions:
            if session in runs[run]:
                session_list = sessions[session].copy()
                session_list[-1] = 'r'
                res[run][session] = session_list + runs[run][session]
    return res

complete_runs = join_sessions_to_runs(sessions, runs)

In [39]:
srbp_vs   = []
rbp_i_vs  = []
rbp_ii_vs = []
sdcg_vs   = []
dcg_i_vs  = []
dcg_ii_vs = []

for runs in complete_runs:
    runs = list(complete_runs[runs].values())
    srbp_vs.append(
        srbp(srbp_best_b, srbp_best_p, runs))
    rbp_i_vs.append(
        rbp_i(rbp_best_p, runs)) 
    rbp_ii_vs.append(
        rbp_ii(rbp_best_p, runs))
    sdcg_vs.append(
        sdcg(sdcg_best_bq, sdcg_best_b, runs))
    dcg_i_vs.append(
        dcg_i(dcg_best_b, runs)) 
    dcg_ii_vs.append(
        dcg_ii(dcg_best_b, runs)) 
    
ls = [("sRBP (b {:.2f}, p {:.2f})".format(srbp_best_b, srbp_best_p), srbp_vs),
      ("RBP i (p {:.2f})".format(rbp_best_p), rbp_i_vs),
      ("RBP ii (p {:.2f})".format(rbp_best_p), rbp_ii_vs),
      ("sDCG (bq {:.2f}, b {:.2f})".format(sdcg_best_bq, sdcg_best_b), sdcg_vs),
      ("DCG i (b {:.2f})".format(dcg_best_b), dcg_i_vs),
      ("DCG ii (b {:.2f})".format(dcg_best_b), dcg_ii_vs)]

for name1, vs1 in ls:
    for name2, vs2 in ls:
        if name1 > name2:
            print("{:<25} - {:<25} = {:.3f}".format(name1, name2, stats.kendalltau(vs1, vs2).correlation))

sRBP (b 0.64, p 0.86)     - RBP i (p 0.59)            = 0.843
sRBP (b 0.64, p 0.86)     - RBP ii (p 0.59)           = 0.843
sRBP (b 0.64, p 0.86)     - sDCG (bq 1.07, b 4.54)    = 0.290
sRBP (b 0.64, p 0.86)     - DCG i (b 1.01)            = 0.315
sRBP (b 0.64, p 0.86)     - DCG ii (b 1.01)           = 0.315
RBP i (p 0.59)            - DCG i (b 1.01)            = 0.293
RBP i (p 0.59)            - DCG ii (b 1.01)           = 0.293
RBP ii (p 0.59)           - RBP i (p 0.59)            = 1.000
RBP ii (p 0.59)           - DCG i (b 1.01)            = 0.293
RBP ii (p 0.59)           - DCG ii (b 1.01)           = 0.293
sDCG (bq 1.07, b 4.54)    - RBP i (p 0.59)            = 0.270
sDCG (bq 1.07, b 4.54)    - RBP ii (p 0.59)           = 0.270
sDCG (bq 1.07, b 4.54)    - DCG i (b 1.01)            = 0.950
sDCG (bq 1.07, b 4.54)    - DCG ii (b 1.01)           = 0.950
DCG ii (b 1.01)           - DCG i (b 1.01)            = 1.000


In [40]:
for name1, vs1 in ls:
    for name2, vs2 in ls:
        if name1 > name2:
            plot_scatter(name1, vs1, name2, vs2)

# References

[1] Aldo Lipani, Ben Carterette, Emine Yilmaz. From a User Model for Query Sessions to Session Rank Biased Precision (sRBP). In Proc. of ICTIR '19.