CIKM Cup 2016 Track 2: Personalized E-Commerce Search Challenge Forum

Go back to competition Back to thread list Post in this thread

> Users in the testset

Hi,

I just would like to confirm that the userId in the testset will be all known, as it seems to be the case based on what is said in the data description page:

"For testing, we take the last session for each user, find the first query in this session, and hide all actions after the query action (when a SERP with the results is presented). The goal is to re-rank the products on that SERP. "

Thanks.

Posted by: joaopalotti @ Aug. 18, 2016, 12:58 p.m.

Just to contextualize my question, I was expecting that the 'is.test' in the training set (train-queries.csv) would be true only for users that have a known id.
Nevertheless, many anonymous users are present in the test part of the training set. I probably misunderstood some part of the data division.
Could you please let me know what were the criteria to have an 'is.test' flag set to True? Random assignment?

Thanks in advance.
Best,
joao

Posted by: joaopalotti @ Aug. 18, 2016, 1:05 p.m.

Hi Joao,

If user is anonymous then his last session is his first session.
Your goal is to predict what will he view and purchase after his first query.
It's important part of the task: often we meet a 'cold start' problem with anonymous users.

The test criteria is certain timestamp and users whose activity started after this timestamp become test.

Best regards,
Alex.

Posted by: alex.laktionov @ Aug. 18, 2016, 8:41 p.m.

Hi Joao,

If user is anonymous then his last session is his first session.
Your goal is to predict what will he view and purchase after his first query.
It's important part of the task: often we meet a 'cold start' problem with anonymous users.

The test criteria is certain timestamp and users whose activity started after this timestamp become test.

Best regards,
Alex.

Posted by: alex.laktionov @ Aug. 18, 2016, 8:41 p.m.

Hi again,

Thanks for your answer Alex.

I have another question regarding the users. I am trying to understand what is the relationship between the sessionIds and the userIds. More specifically, I was expecting that given a session Id, we would find only one userId, but it seems that it is not always the case.
For example, for session number 1752, we have two very different userIds: 17320 and 829, as one can see running the following bash commands:

>$ cat train-item-views.csv | grep "^1752;"
1752;17320;36675;66175;2016-03-15
1752;17320;770;105084;2016-03-15
1752;17320;47683;1195490;2016-03-15
>$ cat train-purchases.csv | grep "^1752;"
1752;829;4422994;2016-03-15;5093;124769
1752;829;4422994;2016-03-15;5093;770

Was it expected?
Thanks a lot,
Joao

Posted by: joaopalotti @ Aug. 24, 2016, 1:02 p.m.

If user logs out from one account and logs into another within one session, this situation can happen.

Also, this is real data and real challenge, so some inconsistencies, quirks and outliers are inherent and inevitable ; )

Posted by: rampeer @ Aug. 24, 2016, 1:25 p.m.

Hi, I just wanted to be sure that I was not missing anything. :)
Thanks a lot for your quick answer.

Posted by: joaopalotti @ Aug. 24, 2016, 1:28 p.m.
Post in this thread