CIKM Cup 2016 Track 2: Personalized E-Commerce Search Challenge

Organized by spirinus - Current server time: Dec. 18, 2017, 3:24 a.m. UTC

Current

Phase 2: Test Leaderboard
Oct. 2, 2016, midnight UTC

End

Competition Ends
Oct. 5, 2020, midnight UTC
The Personalized E-commerce Search Challenge provides a unique opportunity for academia and industry researchers to test new ideas for personalized e-commerce search and consolidate the approaches already published and described in existing work. The successful participation in the challenge implies solid knowledge of learning to rank, log mining, and search personalization algorithms, to name just a few. 
 
For the model development, we release a new dataset provided by DIGINETICA and its partners containing anonymized search and browsing logs, product data, anonymized transactions, and a large data set of product images. The participants have to predict search relevance of products according to the personal shopping, search, and browsing preferences of the users. Both "query-less" and "query-full" sessions are possible. The evaluation is based on click and transaction data.
 
The Challenge is a part of the CIKM2016 and continues the CIKM Cups series co-arranged as part of the ACM CIKM conference. The reports of the winning teams will be publicly released online. We also invite all participants to present their approaches at the CIKM Cup Workshop on October 28th in Indianapolis, USA. 
 
The Personalized E-commerce Search Challenge also continues the series of search challenges organized by major search industry leaders like Yandex, Yahoo, and Baidu. In the past, participants worked on learning to rank documents ,predicting relevance of documents using search logs, detecting search engine switching in search sessions , personalizing search user experience for web search , and classifying queries.
 
The unique feature of this challenge is that we:
  1. Release both search and browsing logs while in the past only search logs were provided.
  2. Focus on e-commerce search and hence have transaction data and unique (exploratory) search behavior patterns.
  3. Provide product images enabling experimentation with the visual features for search ranking.
This challenge might be especially interesting for:
  • researchers from academia, who want to test new research ideas and algorithms on large-scale datasets generated by real users but don't have access to such datasets;
  • industry researchers from companies working on e-commerce search, including major e-commerce search vendors such as Endeca, BloomReach, SLI Systems and e-commerce stores and marketplaces such as Etsy, eBay, Amazon, etc.;
  • industry researchers and engineers, who have accumulated a lot of expertise relevant to this problem. We encourage the teams from top research labs such as Microsoft Research, Google Research, Yahoo Labs, Yandex, Baidu Labs and to join in;
  • early career data scientists and professors teaching information retrieval and machine learning, who could leverage the challenge to teach/learn by doing and experimenting having the unique access to the large-scale real-world dataset.
We hope that you will enjoy participating in Personalized E-commerce Search Challenge and push to the limits your creativity and data mining talent. Good luck!

Metric

The goal of this competition is to predict relevance labels and re-rank products returned by an e-commerce search engine on the search engine result page (SERP) using (1) search, browsing, and transaction histories for all users and specifically the user interacting with the search engine in the current session; (2) product meta-data; (3) product images.
 
We consider both "query-full" (SERPs returned in response to a query) and "query-less" (SERPs returned in response to the user click on some product category --- in this case, only products from that category are returned) sessions. In both cases, we will refer to the action by a user leading to the SERP as a query.  The only difference is that in the "query-less" sessions the query string is empty and only the product category is provided. 
 
Submissions will be evaluated using NDCG (Normalized Discounted Cumulative Gain) measure, which will be calculated using the ranking of products provided by participants for each query, and then averaged over all test queries. We first calculate NDCG for each test query. Then, we average the NDCG scores across all queries. The weight for the "query-less" case is 0.8 and the weight for the "query-full" is 0.2 (we use such weights because the category-based query-less search is more important according to DIGINETICA's prior data analysis). We are using this variant of DCG formula:
{\mathrm  {DCG_{{p}}}}=\sum _{{i=1}}^{{p}}{\frac  {2^{{rel_{{i}}}}-1}{\log _{{2}}(i+1)}}
 
The products are labeled using 3 grades of relevance: 0 (irrelevant), 1 (somewhat relevant), 2 (relevant). The labeling is done automatically, based on user-specific actions with the products on the SERP and detailed product pages (click sometime during the session, click on the SERP, purchase):
  • 0 (irrelevant) grade corresponds to the products with no SERP clicks.
  • 1 (somewhat relevant) grade corresponds to the products, which were shown on the SERP and clicked by the user.
  • 2 (relevant) grade corresponds to the products, which were shown on the SERP, clicked by the user, and purchased. If a product was purchased several times (e.g. three items of the same kind), we still use 2 as a relevance grade.
We don't use the dwell time as a proxy for relevance (used in the past in the Personalized Web Search Challenge and in the state-of-the-art research of web search personalization) because different from web search, where the tasks/needs are mostly navigational and informational, in e-commerce search the tasks/needs are transactional. However, we still indirectly model dwell time by incrementing relevance by one for products, which were viewed by the user.

Submission Format

Since only ranking of documents for a particular query is important for calculating NDCG, we ask to submit a list of queryIDs along with the re-ranked productIDs for each query (i.e. the leftmost productID on each line corresponds to the most relevant product for that query). Each submission should represent a text file, where a queryID and a list of productIDs are separated by " " (a space) and productIDs are separated by "," (a comma) and re-ranked by relevance from left to right as follows:

query_id1 product_id11,product_id12,product_id13,product_id14
query_id2 product_id21,product_id22,product_id23

In the above example product_id11 is supposed to be the most relevant productID for the test query query_id1. Please also check the format of the provided baseline submissions in case of difficulties. All and only queryIDs of queries from the test period must be included into this list. 

Baseline

We provide three simple baselines:

  • Random ranking --- takes all productIDs returned on the SERP and re-orders them randomly;
  • Non-personalized ranking --- takes all productsIDs returned on the SERP and returns them in the same order as presented without taking any user information into account;
  • Simple personalized re-ranking --- takes all productsIDs returned on the SERP and re-ranks them based on the user-specific popularity, i.e. if a user visited one product 100 times and another product 50 times, then the first product will be ranked higher than the second. In the case of ties, the product returned higher in the original SERP is used.

The code could be found here. We provide queryIDs for test queries and the participants have to extract the corresponding SERPs and productIDs to re-rank themselves by looking at the search log file. For test queryIDs all events are closed and only the SERP is availalbe in the log.

IMPORTANT: The name of the submission file should be submission.txt and prior to submitting it must be zip-archived. The final file format is submission.txt.zip. This file will be ~108MB when compressed and it might take time to upload. Please be patient :)

Train / Test Public / Test Private Splits

We partitioned the data set into three parts:
  • The first and the largest part is used for the model development. You can use this part to train and evaluate your model offline on your own machine.
  • The second part is used for validation (phase 1), which runs from Aug 5th to Oct 2nd. Until Oct 2nd, the participants can submit their solutions without violating the daily submissions limits (15 submissions per day to keep the load on the server manageable). The ranking will be continuously updated on the public leaderboard.
  • The third part is used for the final evaluation in the period from Oct 2nd till October 5th. The participants are allowed to submit the final prediction only 3 times. After that the system will not accept the files.
We use the three-stage process to avoid possible "leaderboard boosting", when the ranking/scores from the validation stage could be used for overfitting the model to the test set. By having the third hold-out set, we minimize this possibility and guarantee fair evaluation.

Prizes

Meeting with a Distinguished Researcher

The winners of the competition will be matched with a Distinguished Researcher having deep expertise in the topic for a one-hour meeting. During the meeting the winners will have a chance to discuss new research ideas and receive high quality feedback, get guidance for the future career development, or simply discuss a relevant research/technology project and get expert advice. The meeting will happen offline during the CIKM conference in Indianapolis (October 24-28) or via Skype if a winner cannot make it to the conference.
 
This is an experimental merit-based Prize administered as part of the CIKM Cup with the goal to facilitate more tight and fruitful interaction and collaboration among people associated with the CIKM conference and interested in database systems, information retreival, and knowledge management. We believe that such personal communication is crucial for effective research and will help increase social capital in the CIKM community.
 
Currently, we are looking for the most relevant distinguished researcher. The information will be updated as soon as we have the official confirmation.  

Collaboration Opportunity

Top-3 participants from academia based on the private final leaderboard ranking will be offered an opportunity to collaborate with the data provider (DIGINETICA) after the competition is over. If the winner is from industry, s/he will not be eligible for this prize. For example, for the leaderboard (1 industry, 2 academia, 3 academia, 4 industry, 5 academia), the participants ranked 2, 3, and 5 will be offered to continue collaboration.
 
We administer this prize understanding the challenges that academia faces without having access to real-world datasets. At the same time, the organizer cannot release the dataset forever and for everyone because of the sensitive nature of the data. We hope that with this merit-based data sharing mechanism, we could both enable high quality research with the publicly accessible results and protect the privacy of online users.

Competition Rules

One account per participant

You cannot sign up to CodaLab from multiple accounts and therefore you cannot submit from multiple accounts.

No private sharing outside teams

Privately sharing code or data outside of teams is not permitted. It's okay to share code if made available to all participants on the forums or as a public Github repo.

Team Mergers

Team mergers are allowed and can be performed by the team leader. In order to merge, the combined team must have a total submission count less than or equal to the maximum allowed as of the merge date. The maximum allowed is the number of submissions per day multiplied by the number of days the competition has been running. The organizers don't provide any assistance regarding the team mergers. 

Team Limits

The maximum size of a team is three participants.

Submission Limits

You may submit a maximum of 15 entries per day during the first stage (validation). For the second stage (test), you can only submit three times.

Terms and Conditions

  • You agree to the Challenge Rules you are reading now.
  • The organizers, employees of DIGINETICA, and all people, who had access to the ground-truth data, aren't eligible for the Prize.
  • Team mergers are allowed until the second stage of the competition starts (Oct 2nd, 2016).
  • The winners will be offered an opportunity to collaborate with the data provider after the competition is over.
  • The winners are required to share a public report/paper (4-8 pages, ACM double-column format) to be eligible for the CIKM Cup Award Certificate and the meeting with a Distinguished Researcher. All participants are highly encouraged but not required to submit papers documenting their approaches and present them during the CIKM Cup Workshop on October 28th in Indianapolis, USA. The reports/papers will be shared publicly on the official CIKM Cup 2016 website, like for WSDM Cup 2016. Different from previous CIKM conferences, the workshop proceedings this year will NOT be included in the ACM Digital Library. This would eliminate any concern of self-plagiarism if the authors resubmit their workshop papers to a formal publication venue. 
  • The participants do not have to share or open source their source code. This is a common convention in the research community allowing researchers from industrial labs to participate in the challenge.
  • You agree that all submissions that you make during this competition could be used by the Organizer to build an aggregated ensemble cross-device matching model for with the results of this experiment released in a publicly accessible research report.

The participants can post a question to the CIKM Cup 2016 CodaLab forum or email the organizers at cikmcup [ symbol ] gmail [ symbol ] com with the subject CIKM Cup 2016: Track 2 (DIGINETICA) (we will do our best job to share relevant updates with all participants but encourage people to use the forum).

Data Files

The dataset includes user sessions extracted from an e-commerce search engine logs, with anonymized user ids, hashed queries, hashed query terms, hashed product descriptions and meta-data, log-scaled prices, clicks, and purchases. There are eight different files described below. The files can be downloaded here

train-queries.csv and test-queries.csv (~869.5MB)

  • queryId (serial)
  • sessionId (serial)
  • userId (serial)
  • tmeframe (time since the first query in a session, in milliseconds)
  • duration (page dwell time, in milliseconds)
  • eventdate (calendar date)
  • searchstring.tokens (comma separated hashed query tokens; empty if it is a query-less case)
  • categoryId (product category ID; empty if it is a query-full session)
  • items (productIDs returned by the default ranking algorithm on the SERP; this IDs must be re-ranked).
  • is.test (TRUE/FALSE; TRUE if it is a test query)
  • regionId (geographical region of a query; serial).

An example Query object looks as follows:1;;16327074;311;2016-05-09;16655,244087,51531,529597,58153;0;62220,33969,30311,32902,8252,13682,9338;FALSE

products.csv (~7.3MB)

  • productID (serial)

  • priceLog2 (log-transformed product price)

  • product.name.tokens (comma separated hashed product name tokens)

  • imageName (name of the corresponding product image)

An example Product object looks as follows: 1;1;10;4875,776,56689,18212,18212,4896

product_images.zip (might be released after 50% of the competition time, currently under consideration, ~14GB)

To find an image for a product, one should use imageName attribute from the products.csv.

product-categories.csv (~2MB)

  • productCategoryID (serial)
  • productID
  • categoryID

An example ProductCategory object looks as follows: 1;139578;1096

train-purchases.csv (749KB)

  • sessionId (serial)
  • timeframe (time since the first query in a session, in milliseconds)
  • eventdate (calendar date)
  • ordernumber (serial product orderID; groups all products purchased together ~ shopping cart; if a user bought several products, there are several records sharing the same ordernumber)
  • itemId (purchased product)

An example Purcahse object looks as follows: 100030;1861906;2016-04-20;2963942;377191

train-item-views.csv (42.7MB)

  • sessionId
  • userId
  • itemId
  • timeframe (time since the first query in a session, in milliseconds)
  • eventdate (calendar date)

An example ItemView object looks as follows: 1;;81766;526309;2016-05-09

Data Pre-processing

To allay privacy concerns the user data is fully anonymized. Only meaningless numeric IDs of users, queries, query terms, sessions, URLs and their domains are released. The queries are grouped by sessions. Specifically, we applied the following pre-processing before the release of the dataset for the Personalized E-commerce Search Challenge:

  1. Take the most recent six months of logs of an e-commerce search engine.

  2. Remove queries without clicks.

  3. Detect sessions using a 1-hour of inactivity heuristic (for web search the session segmentation heuristic is ~20 min).

  4. Find the first query in each session and replace timestamps for all events in the session relative to the first query, i.e. the timestamp for the first query in the session is 0 and all other events have non-negative delta-timestamps.
  5. To hash textual data, we: (1) build the vocabulary by concatenating all available textual data such as queries, product titles, product descriptions; (2) for each unique word assign a hash-code using an MD5-based hash function; (3) replace each word with the corresponding hash-code.

  6. Using the same transformation as in step 5, we hash the names for product images.

  7. Prices are subject to log transformation and subsequent rounding to the nearest smallest integer, i.e. if a product costs 3.89, then the obfuscated price will be 1; if a product costs 4.89, then the obfuscated price will be 2.

  8. For training, we take all sessions before a certain timestamp.

  9. For testing, we take the last query for some sessions, and hide all actions after the query action (when a SERP with the results is presented). The goal is to re-rank the products on that SERP. 

Dataset Statistics

  • The number of sessions: 573,935
  • The number of products: 134,319,529
  • The number of products viewed from search (including browsing after SERP): 2,451,565
  • The average number of products viewed per search session (including browsing after SERP): 4.271 
  • The number of SERP clicks on products: 1,877,542
  • The average number of SERP click per search session: 3.271
  • The number of products purchased from search: 68,818
  • The average number of products purchased from search session: 0.119

We use 50% of test queries for the Validation Leaderboard (phase 1) and 50% for the Test Leaderboard (phase 2). We don't disclose which test queries are used for the public leaderboard and which test queries are used for the private leaderboard. Every submission must contain re-ranked productIDs for all test queries/SERPs.

Notes:

  • Single user might have multiple sessions, and one session can have multiple users (if user re-logins into another account)
  • There are test users without history. Yes, participants have to deal with user who visits site for the first time.
  • Click data contains information about recommendation/search clicks. No external clicks (on ad banners, external search engine, bookmarks) are registered.
  • View data is sampled. Thus, there are clicks without corresponding views.

Phase 1: Validation Leaderboard

Start: Aug. 5, 2016, midnight

Description: Ongoing model development and evaluation with the results on the public leaderboard.

Phase 2: Test Leaderboard

Start: Oct. 2, 2016, midnight

Description: Final submission.

Competition Ends

Oct. 5, 2020, midnight

You must be logged in to participate in competitions.

Sign In

Top Three

Rank Username Score
1 minerva 0.4262
2 tianmecai 0.2901
3 Dmitrii_Nikitko 0.4149