The aim of this challenge is to approach the image - text matching problem as one of binary classification. Participants are provided with a classification data set in which the feature space of each instance encodes a pair (image-keyword), the class of the instance being +1 (when the keyword is relevant for describing the image) and 0 (when the keyword is irrelevant. Relevance of a keyword is determined with an undisclosed methodology, that may not be apparent to participants (i.e., a keyword may be relevant even if it is not an object visually observable in the image). Images are represented by CNN-based features, whereas keywords are encoded with their word2vec representation. Additionally, the raw images and words will be made publicly available, so that participants can take advantage of such information. Classification performance will be used to determine the winners of the challenge.
Overview of the approached task.
Participation of members of the Red Temática CONACyT en Inteligencia Computacional Aplicada is encouraged, although this is a challenge open to anyone (see the terms & conditions section).
Organizing team: Luis Pellegrin, Hugo Jair Escalante, Alicia Morales, Eduardo Morales, Carlos A. Reyes-García
Organizers are grateful with CodaLab (running on MS Azure) and ChaLearn.
Sponsors: Red temática en Inteligencia Computacional Aplicada (RedICA), CONACyT, INAOE
The approached problem is a binary classification task. Each sample is characterized by a vector of features encoding an image-text pair, where images are encoded by a CNN-based representation (4096 features) and keywords are encoded with their 200-dimensional word2vec representation. Participants must predict the relevance of the matchings: A matching is said to be relevant (class 1) if the keyword is relevant to the corresponding word and non-relevant (0 class) otherwise. Relevance of a keyword is determined with an undisclosed methodology, that may not be apparent to participants (i.e., a keyword may be relevant even if it is not an object visually observable in the image).
Overview of the data generation process.
Participants are given a training a data set (ricatim_train) with 20,000 x 4296-dimensional samples. Training samples are labeled (ricatim_train_labels). Participants must use the training set to build their models and send predictions for validation data during the challenge (a sample submission file is provided with the validation data set). For the final phase, labels for the validation data set will be released and participants will have to submit predictions for the test data set. Predictions should be submitted in a text file with the prediction (0 or 1) for each instance in the same order as they appear in the data matrix. For both, validation and test data sets the number of test instances is 5,000 (i.e, your prediction file should have 5000 lines).
Accuracy will be used as evaluation measure.
There are 2 phases associated to the RICAMIT challenge:
Important: Submission files should be named "answer.txt" and they should contain only a vector with predictions (one line per instance, see the sample submission file here), the file should be compressed in zip format.
This competition allows you to submit only prediction results (no code, although, please note that code verification will be performed for determining the winners, see the rules section).
The submissions will be evaluated using classic metrics as accuracy, f1, precision, recall (accuracy will be used to rank the winners).
This challenge is governed by the general ChaLearn contest rules.
With the following amendments:
Start: July 3, 2017, midnight
Description: Development phase: tune your models and submit prediction results on validation set.
Start: Aug. 14, 2017, midnight
Description: Final phase (submit prediction results on test set).
Aug. 17, 2017, 5 a.m.
You must be logged in to participate in competitions.Sign In