FIRE 2020 - Authorship Identification of SOurce COde (AI-SOCO)

Organized by aliosm - Current server time: Sept. 29, 2020, 7:46 a.m. UTC

Previous

Evaluation
July 31, 2020, midnight UTC

Current

Post Evaluation
Sept. 8, 2020, midnight UTC

End

Competition Ends
Never

Welcome to Authorship Identification of SOurce COde (AI-SOCO) task at FIRE 2020!

General authorship identification is essential to the detection of undesirable deception of others' content misuse or exposing the owners of some anonymous hurtful content. This is done by revealing the author of that content. Authorship Identification of SOurce COde (AI-SOCO) focuses on uncovering the author who wrote some piece of code. This facilitates solving issues related to cheating in academic, work and open source environments. Also, it can be helpful in detecting the authors of malware softwares over the world.

The detection of cheating in academic communities is significant to properly address the contribution of each researcher. Also, in work environments, credit sometimes goes to people that did not deserve it. Such issues of plagiarism could arise in open source projects that are available on public platforms. Similarly, this could be used in public or private online coding contests whether done in coding interviews or in official coding training contests to detect the cheating of applicants or contestants. A system like this could also play a big role in detecting the source of anonymous malicious softwares.

The dataset is composed of source codes collected from the open submissions in the Codeforces online judge. Codeforces is an online judge for hosting competitive programming contests such that each contest consists of multiple problems to be solved by the participants. A Codeforces participant can solve a problem by writing a solution for it using any of the available programming languages on the website, and then submitting the solution through the website. The solution's result can be correct (accepted) or incorrect (wrong answer, time limit exceeded, etc.).

In our dataset, we selected 1,000 users and collected 100 source codes from each one. So, the total number of source codes is 100,000. All collected source codes are correct, bug-free, compile-ready and written using the C++ programming language using different versions. For each user, all collected source codes are from unique problems. 

Given the pre-defined set of source codes and their writers, the task is to build a system that is able to detect the writer given any new, unseen before source codes from the previously defined writers list.

This is an example for a Codeforces problem and some of its solutions:

 


justHusam

Sabo

Kainz

Evaluation Criteria

Systems will be evaluated and ranked based on Accuracy metric. An evaluation script is available on the Github repository.

Terms and Conditions

Submitted systems

  • Participants are NOT allowed to use development set or to use any external dataset, either labeled or unlabeled to train their systems.
  • Participants can use additional resources such as pre-trained language models, knowledge bases, etc.
  • In the testing phase, participants can perform up to three submissions and we will choose their best submission and rank the participants based on it.

Important Dates

  • 8th June - Open track website

  • 8th June – Training and development data release
  • 31st July – Test data release
  • 7th September – Run submission deadline
  • 15th September – Results declared
  • 5th October – Working notes papers due
  • 5th November – Final version of working notes papers due
  • 16th-20th December - FIRE 2020 (Online Event)

Competitions should comply with any general rules of FIRE.

The organizers are free to penalized or disqualify for any violation of the above rules or for misuse, unethical behaviour or other behaviours they agree are not accepted in a scientific competition in general and in the specific one at hand.

Please contact the organizers through the following channels:

Development

Start: June 8, 2020, midnight

Evaluation

Start: July 31, 2020, midnight

Post Evaluation

Start: Sept. 8, 2020, midnight

Competition Ends

Never

You must be logged in to participate in competitions.

Sign In
# Username Score
1 AlexCrosby 0.9511
2 benf 0.9440
3 bharathib 0.8573