SemEval-2019 Task 12 - Toponym Resolution in Scientific Papers Forum

Go back to competition Back to thread list Post in this thread

> Disambiguation issues

There are some issues about subtask 2 - Disambiguation.

Some details

1. In file PMC2625346.ann
GeonameId of Öland is 2687202, but I can't query any record by this ID: 2687202.

2. In file PMC4479511.ann
Some GeonameId of Schledehausen is 2838819, But when I query the geoname, I get both 2838818 and 2838819 and there is no difference between, why the answer is 2838819.

Basic questions

1. Some geonameid in gold standard annotation files that can't be found in Geonames database, can you update it (at least in the final test set).

2. Many mistakes are made because of the place class (region or the specific city), but the predicted geonameid and std geonameid refer to the almost same place. So can you modify the evaluation method that considering longitude and latitude difference rather than only geonameid?

Posted by: HAONANL5 @ Jan. 9, 2019, 12:10 p.m.

Dear HAONANL5,

Please, see inline for my answers.

1. In file PMC2625346.ann
GeonameId of Öland is 2687202, but I can't query any record by this ID: 2687202.
-> This entry exists in Geonames: http://www.geonames.org/2687202/oeland.html. Where did you query it, in a local version of Geonames or using the web interface?

2. In file PMC4479511.ann
Some GeonameId of Schledehausen is 2838819, But when I query the geoname, I get both 2838818 and 2838819 and there is no difference between, why the answer is 2838819.
-> For your specific example, there is a clear answer from the article. It’s states that the collection sites were in the Osnabruck district so the location of Schledehausen (geoID: 2838819) within the boundary of Osnabruck (GeoNames Hierarchy: Germany>>Lower Saxony>>Landkreis Osnabruck>>Bissendorf>>Schledehausen) was selected over the other Schledehausen (geoID: 2838818) listing in the district of Vechta (GeoNames Hierarchy: Germany>>Lower Saxony>>Landkreis Vechta>>Bakum>>Schledehausen)
More generally, here is the test from the guidelines that instructs our annotators what level of specificity to use when selecting GeoNames IDs/coordinates for disambiguation:
If a toponym mention can refer to a capital or a semi-independent political entity, such as the case with Hong Kong, the coordinates of the less specific location should be chosen (semi-independent political entity). This rule applies to other instances in which the location mention can be referring to more than one entity, with each entity having different specificities.

Basic questions

1. Some geonameid in gold standard annotation files that can't be found in Geonames database, can you update it (at least in the final test set).
-> Two annotators annotated independently the test corpus and we ran automatic sanity check on the test corpus, this should improve its quality compare to the training data but we can't be sure that it is without errors. One the other side errors will impact all participating systems in the same way.

2. Many mistakes are made because of the place class (region or the specific city), but the predicted geonameid and std geonameid refer to the almost same place. So can you modify the evaluation method that considering longitude and latitude difference rather than only geonameid?
-> I can't add any new metrics at this stage of the competition (Codalab does not support changing them during the competition without recreating the competition). We may consider adding this metric to present the results of the competition, but this also introduced its own kind of problems (for example, not matching on coordinates avoids the problem of having a resolver and the gold standard denoting two different toponyms but referring to the same coordinates; for instance, a city and its state may have the same geo-coordinates in GeoNames but they refer to different locations and hence will have different place IDs.)

Hope this help.

Best regards,
Davy

Posted by: dweissen @ Jan. 10, 2019, 5:19 p.m.
Post in this thread