SemEval-2022 Task 09: R2VQ - Competence-based Multimodal Question Answering Forum

Go back to competition Back to thread list Post in this thread

> Question Categories

In the "Designing Multimodal Datasets for NLP Challenges" paper referred to in this competition there are six question categories explicitly listed ('Cardinality', 'Ellipsis', etc).

By our observation the question format seems to reveal its category; e.g. questions like
# question 0-*
start with "How many" and seem to match the 'Cardinality' type of questions, and
# question 18-*
(there are 2,793 questions of this type) are all answered "N/A"
This numerical classification is retained in the test set as well.
- Do these numbers map to the question categories? If yes, how exactly?
- Is this intended? If not, do we correctly assume a leak here?
- Can we rely on it being a part of the input data during the final evaluation?

By the way, does the second number in the question, like the value of 2 in "# question 0-2 = How many actions does it take to process the meat?", have an interpretable meaning?

By the occasion, the paper has "The annotation methodology is described in the Appendix". Do we correctly assume that this is chapters 7 - 11, after "6 Conclusion and Future Work"? Or is it separate and can be found elsewhere?

Posted by: t.dryjanski @ Dec. 16, 2021, 8:35 p.m.

Dear Orgnizers,
What is the answer for this important question by t.dryjanski?
1. Can we use # question X-Y categories? Suprisingly they are in all data - training and test sets.

2. I would ask one more question - Can we use semantic role labeling from the test set, e.g. data like below?
1 2 2 NUM _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
2 fresh fresh ADJ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
3 red red ADJ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
4 chile chile NOUN _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
5 , , PUNCT _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
6 finely finely ADV _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
7 chopped chop VERB _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

Data like above are not included in real recipes. Can these data also be used freely from the testset to prepare the solution?

Best regards!
Pawel

Posted by: PawelBujnowski @ Dec. 22, 2021, 11:54 a.m.

Hi all,
Thanks for your post.
- Participants are not allowed in any way to exploit the question ID (# question X-Y categories) information during or after training, in order to improve evaluation results. (Updated on the codalab page)
- The answer for "# question 0-2 = How many actions does it take to process the meat?" would be the number of cooking events where the ingredient "meat" is a participant.

Hi Pawel,
Regarding your second question, the data you included in the post (see below) is from the ingredient list of a recipe, and it only has lemma and POS.
In general, you are free to use SRL annotation from the recipes in the test set. Although we didn't provide SRL/CRL annotation for ingredient list, you are free to use that part of the data as well.
1 2 2 NUM _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
2 fresh fresh ADJ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
3 red red ADJ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
4 chile chile NOUN _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
5 , , PUNCT _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
6 finely finely ADV _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
7 chopped chop VERB _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

- Jingxuan

Posted by: r2vq @ Dec. 22, 2021, 3 p.m.
Post in this thread