Author profiling (AP) is the problem of learning to predict demographic information about the author of a given document. This AP task can be defined as a classification problem where the training data has been labeled with the demographic variables of interest and a machine learning algorithm can be trained to predict these variables with the help of a good feature engineering process.
This competition will test your predictions for the test data and rank them against overall accuracy. There are two data sets for testing your system: one for Blog testing, another for Twitter testing. See the Evaluation tab for more information about submissions.
You can post/answer questions in the forums tab but be mindful not to provide solutions to the shared task. You can also email the administrator or instructors for more one-on-one help.
To upload a submission you will need to create an account with Codalab. You'll then be able to register under the Participate tab after accpeting the terms and conditions.
Once you've created an account and registered, you can begin submitting your output for evaluation. You can run the evaluation script locally using the provided script in the Dropbox folder. You will need the sklearn library and minimum Python 2.7 in your environment. More details are found within the script. You can download the script from here <https://drive.google.com/open?id=0B3BpRYlCFEqmQUtxb0h2OEMtQjA>.
For successfully completing your system submission you need to submit a text file (submission.csv) with your system’s predictions and your source code in a single zip file. The submission.csv file should contain one instance prediction per line. Each line should contain the name of the file and the prediction for it in the following format:
[filename.extension],[male|female]
It’s important that the zip file only contains the two files mentioned above, not a zip containing a folder with the contents in it; this will cause the evaluation script to fail. Please name your zip files `cosc7336-assign1-student-name.zip`
NOTE: CodaLab is an open source framework for running competitions. Your system submissions will be ranked according to accuracy of the system but the ranking will be public and thus it’s super important that the username you choose for the submission is not disclosing your identity. In order to identify which student gets credit for which system submission, please note in your report, and in your source code, the name you used to identify your submission in CodaLab (user name).
To test against the testing blog data set, submit your predictions in the Blog Testing tab. For testing against the Twitter data set, submit your predictions in the Twitter Testing tab. To see your results on the leaderboard, please be sure to click on the 'Submit to Leaderboard' button after successfully uploading your submissions.
Cheating, plagiarism, and other forms of academic dishonesty will not be tolerated. This includes copying code from the internet without the written consent of the author. Visit University's Academic Honesty Policy for more information. By accepting these terms and conditions, you agree to the consequences if found guity of any transgressions. Feel free to contact us if you have any doubts.
Start: Sept. 19, 2017, midnight
Description: Upload your predictions to test against the truth file for the blogs data set.
Start: Sept. 19, 2017, midnight
Description: Upload your predictions to test against the truth file for the twitter data set.
Oct. 15, 2017, 6 a.m.
You must be logged in to participate in competitions.
Sign In# | Username | Score |
---|---|---|
1 | Thanos | 0.7747 |
2 | cryptexcode | 0.7733 |
3 | JohnYossarrian | 0.7689 |