Multimodal translation shared task

Organized by fblain - Current server time: Sept. 24, 2018, 11:52 a.m. UTC

Previous

Czech (Task1)
June 15, 2018, midnight UTC

Current

Czech (Task1b)
June 15, 2018, midnight UTC

End

Competition Ends
Never

Multimodal MT task 2018

Please check the official shared task website for more details.

This is shared task is aimed at the generation of image descriptions in a target language. The task can be addressed as a translation task, which will take a source language description and translate it into the target language, where this process can be supported by information from the image (multimodal translation), and as a multisource multimodal translation task, which takes source language descriptions in multiple languages and translates them into the target language, using the visual information as additional context.

This shared task has the following main goals:

  • To push existing work on the integration of computer vision and language processing.
  • To push existing work on multimodal language processing towards multilingual multimodal language processing.
  • To investigate the effectiveness of information from images in machine translation.
  • To investigate the effectiveness of multiple source language sentences and visual information in machine translation.

We invite participation for two tasks:

Task 1: Multimodal Machine Translation Task

This task consists in translating English sentences that describe an image into German or French or Czech, given the English sentence itself and the image that it describes (or features from this image, if participants chose to). See Specia et al. (2016) and Elliott et al. (2017) for descriptions of previous editions of this task at WMT16 and 17.

The original data for this task was created by extending the Flickr30K Entities dataset in the following way: for each image, one of the English descriptions was selected and manually translated into German, French, and Czech by human translators. For English-German, translations were produced by professional translators, who were given the source segment only (training set) or the source segment and image (validation and test sets). For English-French, translations were produced via crowd-sourcing where translators had access to source segment, the image and an automatic translation created with a standard phrase-based system (Moses baseline system built using the WMT'15 constrained translation task data) as a suggestion to make translation easier (note that this was not a post-editing task: although translators could copy and paste the suggested translation to edit, we found that they did not do so in the vast majority of cases). For English-Czech, the translations were produced by crowd-sourcing where translators had access to the source segment and the image.

Summary of the datasets:

TrainingValidationTest 2016Test 2017Ambiguous COCOTest 2018
Images Sentences Images Sentences Images Sentences Images Sentences Images Sentences Images Sentences
29,000 29,000 1,014 1,014 1,000 1,000 1,000 1,000 461 461 1,071 1,071

 

As training and development data, we provide 29,000, and 1,014 triples respectively, each containing an English source sentence, its German, French, and Czech human translations and corresponding image. We also provide the 2016 and 2017 test sets, which people can use for validation and internal evaluation. The English-German datasets are the same as those in 2016, but we note that human translations in the 2016 validation and test datasets have been post-edited (by humans) using the images to make sure the target descriptions are faithful to these images. There were cases where in the 2016 the source text was ambiguous and the image was used to solve the ambiguities. The French translations were added in 2017 and the Czech translations were added in 2018.

As test data, we provide a new test set of 1,071 tuples containing an English description and its corresponding image. Gold labels will be translations in German, Czech, or French.

 


Task 1b: Multisource Multimodal Machine Translation Task

This new task consists in translating English sentences that describe an image into Czech, given the English sentence itself, the image that it describes (or features from this image, if participants chose to), and parallel sentences in French and German. Participants are free to use any subset(s) of the additional source language data in their submissions.

Summary of the datasets:

TrainingValidationTest 2016Test 2017
Images Sentences Images Sentences Images Sentences Images Sentences
29,000 29,000 1,014 1,014 1,000 1,000 1,000 1,000

 

As training and development data, we provide 29,000, and 1,014 triples respectively, each containing an English, French, and German source sentence, and its Czech human translations and corresponding image. We also provide the 2016 validation and test set, which people can use for validation and internal evaluation. The English-German datasets are the same as those in 2016, but we note that human translations in the 2016 validation and test datasets have been post-edited (by humans) using the images to make sure the target descriptions are faithful to these images. There were cases where in the 2016 the source text was ambiguous and the image was used to solve the ambiguities. The French translations were added in 2017 and the Czech translations were added in 2018.

As test data, we provide a test set of 1,000 tuples containing English, French, and German descriptions and its corresponding image. Gold labels will be translations in Czech. This test set corresponds to the unseen portion of the Czech Test 2017 data.

 


Organisers

Lucia Specia (University of Sheffield)
Stella Frank (University of Amsterdam)
Loïc Barrault (University of Le Mans)
Fethi Bougares (University of Le Mans)
Desmond Elliott (University of Edinburgh)

Contact

For questions or comments, please use the wmt-tasks mailing list.

The submissions should be pre-processed to lowercase, normalise punctuation and tokenise the sentences. Meteor 1.5 will be used as score for automatic evaluation, with BLEU as secondary score. Each task and language will be evaluated independently, as a different "phase".

Please submit your system translations as a tex-only file, with no team or system identifier (note that this is different from the official shared task format). Team and system name (as well as system description) should be entered for each submission as metadata.

Please submit your translations in a file named as translations.txt. Then please generate a zip of this file: translations.txt.zip. No matter how many files you submit, they should all be named the same. Submissions will be distinguished by the metadata you enter.

Evaluation of participants in the official task (deadline passed) will be performed using human Direct Assesment as main metric.

The data is licensed under Creative Commons: Attribution-NonCommercial-ShareAlike 4.0 International.

Supported by the the following European Commission projects: MultiMT and M2CR.

 

German

Start: June 15, 2018, midnight

French

Start: June 15, 2018, midnight

Czech (Task1)

Start: June 15, 2018, midnight

Czech (Task1b)

Start: June 15, 2018, midnight

Competition Ends

Never

You must be logged in to participate in competitions.

Sign In
# Username Score
1 renj 27.99
2 chiraag 27.59
3 fblain 26.83