MIND News Recommendation Competition

Organized by v-jinyi - Current server time: Oct. 25, 2020, 8:06 p.m. UTC

Previous

Development
July 20, 2020, 11:59 p.m. UTC

Current

Official Test
Aug. 21, 2020, 11:59 p.m. UTC

End

Competition Ends
Never

Overview

Introduction

Online news services such as Microsoft News have gained huge popularity for online news reading. However, since massive news articles are published everyday, users of online news services are facing heavy information overload. Therefore, news recommendation is an important technique for personalized news services to improve the reading experience of users and alleviate information overload.

However, news recommendation is a challenging task. First, news articles on news websites emerge and update very quickly. Many new articles are posted continuously, and existing news articles will disappear after a short period of time. Thus, there is a severe cold-start problem in news recommendation. Second, news articles usually contain rich textual information such as title and body. It is very important to understand news content from their texts using NLP techniques. Third, there is no explicit rating of news articles posted by users in news platforms. Thus, in news recommendation we need to model users’ interests from their browsing and click behaviors. However, user interests are usually diverse and dynamic, which poses significant challenges to user modeling algorithms. Thus, further researches are highly needed to tackle the various challenges in news recommendation.

To promote the research and practice on news recommendation, we release the MIND dataset, which is a large-scale English dataset for news recommendation. This dataset can serve as a good testbed for researchers to develop better news recommender systems to improve the future reading experience of millions of users.

How to start?

  • Read the details on this website
  • Join the competition on Codalab
  • Train and evaluate your news recommendation models on the MIND dataset
  • Submit your predicted results on the test set to Codalab to obtain the official score

Any questions and suggestions on the submission process can be sent to mind[at]microsoft.com

Task

The task in this competition is described as follows. Given the news browsing history [n1, n2,..., nP] of a user u and a set of candidate news [c1,c2,...,cM] in an impression log, the goal is to rank these candidate news articles according to the personal interest of this user. In this process, news articles can be modeled by their content, and users' interests can be modeled by their news browsing history. Then, the model predicts the click scores of candidate news based on the relevance between candidate news and user interests. Finally, the candidate news articles in each impression are ranked by their click scores. The ranking results will be compared with the real user click labels to measure the ranking quality via several metrics including AUC, MRR and nDCG@K (see Evaluation tab).

Dataset

MIcrosoft News Dataset (MIND) is a large-scale dataset for news recommendation research. It was collected from anonymized behavior logs of Microsoft News website. You can visit the official website of MIND https://msnews.github.io to download the training, validation and test data sets of MIND. The detailed information of this dataset can also be found on this website. 

 

 

Evaluation Metrics

Systems are evaluated using several standard evaluation metrics in the recommendation field, including: area under the ROC curve (AUC), mean reciprocal rank (MRR), and normalized discounted cumulative gain for K shown recommendations (nDCG@K). The final result is the average of these metrics on all impression logs. The primary metric for submission ranking is AUC.

Scoring script

You can download the official evaluate script here: evaluation.py

 

 

Submission Guidelines

Submission Formats

Developers need to submit the ranking results of news in each impression generated by a recommender system. Prediction results submitted to CodaLab should be zip-compressed, containing a file named prediction.txt. In this file, each line contains an impression ID and a rank list of candidate news. The format of each line is:

ImpressionID [Rank-of-News1,Rank-of-News2,...,Rank-of-NewsN]

For example, given the impression as follows:

ImpressionID Candidate News
24481 N125045 N87192 N73556 N20417

The prediction results of this impression can be:

24481 [4,1,3,2]

which means that the ranking orders of the candidate news articles in this impression are N87192, N20417, N73556 and N125045. The evaluation script will evaluate your ranking results against the gold labels. The script, as well as a sample file containing 10 lines of predictions (cannot be directly submitted to the Codalab system) can be found on Github. Following are several additional points:

  • A valid zip submission should contain nothing but a json file named prediction.txt.  For Mac users, make sure that the submission contains no __macosx file.
  • Do not place the submission file within folders before it is compressed
  • The row orders of the results should be consistent with those in the original files.
  • The ranking results are integers starting from 1.

 

Submission Process

You need several steps to make a submission:

  • Navigate to 'Participate'
  • Write a brief description of your model (optional)
  • Click the button 'Submit / View Results'
  • Upload your zipped submission
  • Wait until the evaluation status turns to 'Finished' or 'Failed'

If the submission status is 'Failed'(*), you can click 'View scoring output log' and 'View scoring error log' to see the debug logs. When the evaluation is finished, you can decide whether to show your scores on the leaderboard. During the development phase, participants can upload their predictions on the validation set and tune their models according to the results. Although this submission is not obligatory, we highly encourage you to submit in case that you have troubles in obtaining the normal evaluation results, and can also be useful practice for those participants new to CodaLab.  

Important: Each user can only upload at most one submission each day in order not to overwhelm the system. 

(*) If the error log raises an exception that contains "File "/worker/worker.py", line 330, in run", this may be because the Codalab runners are busy now.

Terms and Conditions

The MIND dataset is free to download for research purposes under Microsoft Research License Terms. Please read these terms and confirm that you agree to them before you download the dataset. Feel free to contact us if you have any questions or need clarification regarding the licensing of the data.

 

Organizers

This competition is collaboratively organized by Microsoft News and Microsoft Research Asia teams:

  • Ying Qiao, Jiun-Hung Chen, Winnie Wu (Microsoft News Team)
  • Fangzhao Wu, Chuhan Wu, Tao Qi, Jingwei Yi, Ling Luo, Xing Xie (Microsoft Research Asia)

Contact: mind[at]microsoft.com

Development

Start: July 20, 2020, 11:59 p.m.

Official Test

Start: Aug. 21, 2020, 11:59 p.m.

Competition Ends

Never

You must be logged in to participate in competitions.

Sign In