SemEval-2019 Task 12 - Toponym Resolution in Scientific Papers

Organized by dweissen - Current server time: Jan. 18, 2021, 8:39 a.m. UTC


Sub-Task 3: Post-Evaluation Resolution
Jan. 18, 2019, midnight UTC


Sub-Task 3: Post-Evaluation Resolution
Jan. 18, 2019, midnight UTC


Competition Ends

Task Definition

Toponym resolution, also known as geoparsing, geo-grounding or place name resolution, aims to assign geographic coordinates to all location names mentioned in documents. Toponym resolution is usually performed in two independent steps. First, toponym detection or geotagging, where the span of place names mentioned in a document is noted. Second, toponym disambiguation or geocoding, where each name found is mapped to latitude and longitude coordinates corresponding to the centroid of its physical location. Toponym detection has been extensively studied in named entity recognition; location names were one of the first classes of named entities to be detected in text [Piskorski & Yangarber, 2013]. Disambiguation of toponyms is a more recent task [Leidner, 2007].

toponym resolution

With the growth of written documents on internet, the public adoption of smartphones equipped with Geographic Information Systems and the collaborative development of comprehensive maps and geographical databases, toponym resolution has seen an important gain of interest in the last two decades. Not only academic but also commercial and open source toponym resolvers are now available. However, their performances varies greatly when applied on corpora of different genres and domains [Gritta et al., 2017]. Toponym disambiguation tackles ambiguities existing between different toponyms, like Manchester, NH USA vs. Manchester, UK (Geo-Geo ambiguities), and between toponyms and other entities, such as names of people or daily life objects (Geo-NonGeo ambiguities). Additional linguistic challenges during the resolution step may be metonymic usage of toponyms, '91% of the US didn't vote for either Hilary or Trump' (a country does not vote, thus the toponym refers to the people living in the country), elliptical constructions, "Lakeview and Harrison streets" (the phrase refers to two street names Lakeview street and Harrison street), or when the context simply does not provide enough evidences for the resolution.

Please join the dedicated google group for inquiries about this shared task.

To contact the task organizers directly, please send an email to

Timeline (Tentative)

 Aug 20, 2018  Trial Data Release, Practise Phase starts 
 Sep 17, 2018  Training Data Release 
 Jan 10, 2019  Test Data Release for sub-tasks 1 and 3   Evaluation Phase starts, for sub-tasks 1 and 3 
 Jan 18, 2019  Evaluation Phase ends for sub-tasks 1 and 3   Test Data Release for sub-tasks 2   Evaluation Phase starts for sub-tasks 2 
 Jan 25, 2019    End of Evaluation Phase for sub-tasks 2 


  • Graciela Gonzalez-Hernandez, Ph.D., The Perelman School of Medicine, University of Pennsylvania [web]
  • Matthew Scotch, Ph.D., Department of Biomedical Informatics, Arizona State University [web]
  • Davy Weissenbacher, Ph.D., The Perelman School of Medicine, University of Pennsylvania [web|mail:]
  • Karen O'Connor, MS, The Perelman School of Medicine, University of Pennsylvania
  • Arjun Magge, MS, Department of Biomedical Informatics, Arizona State University


[Piskorski & Yangarber, 2013] J. Piskorski and R. Yangarber. Information extraction: Past, present and future. In T. Poibeau, H. Saggion, J. Piskorski, and R. Yangarber, editors, Multi-source, Multilingual Information Extraction and Summarization, Theory and Applications of Natural Language Processing, pages 23–49. Springer Berlin Heidelberg. 2013.

[Leidner, 2007] J. L. Leidner. Toponym Resolution in Text: Annotation, Evaluation and Applications of Spatial Grounding of Place Names. PhD thesis, Institute for Communicating and Collaborative Systems School of Informatics, University of Edinburgh. 2007.

[Gritta et al., 2017] M. Gritta, M. T. Pilehvar, N. Limsopatham and N. Collier. What's missing in geographical parsing? Language Resources and Evaluation. 2017.

Task Details

The definition of toponym is still in debate among researchers. In its simpler definition, a toponym is a proper name of an existing populated place on Earth. This definition can be extended to include a place or geographical entity that is named, and can be designated by a geographical coordinate1. This encompasses cities, countries, lakes or monuments. In this challenge we consider the extended definition of toponyms and exclude all indirect mentions of places such as "30 km north from Boston", as well as metonymic usage and elliptical constructions of toponyms.

Subtask 1: Toponym detection

The first subtask consists of detecting the text boundaries of all toponym mentions in articles only. Despite major progress, this subtask is still an open problem and will be studied on its own since it determines the overall performance of the overall resolution: toponym mentions missed during the detection can't be disambiguated (False Negative, FN) and, inversely, phrases wrongly detected as toponyms will received geocoordinates during the disambiguation (False Positive, FP). Both FNs and FPs degrade the quality of the overall resolution.

Subtask 2: Toponym disambiguation

The second subtask focuses on the disambiguation of the toponyms only. In this subtask, all names of locations are known by the resolver but not their precise coordinates. The resolver has to select the GeoNames ID corresponding to the expected place among all possible candidates. GeoNames is a database of geospatial locations and freely available. For example, in the sentence "These sequences were compared with representative H9N2 viruses and some duck viruses isolated in Shantou, Guangdong Province in mainland China.", among the 75 populated places named Shantou in GeoNames, the place expected is the entry 1795932 in GeoNames since it is the less specific location (the entry 1795932 is the second-order administrative division in GeoNames and a larger area than the entry 1795940 which is the seat of the second-order administrative division).

Subtask 3: end-to-end, toponym resolution

The last subtask evaluates the toponym resolver in real conditions. Only the full PubMed articles are given to the resolver and all toponyms detected and disambiguated by the resolver are evaluated.


Evaluation Criteria


When a gold standard corpus and a toponym resolver are aligned on the same geographical database, the standard metrics of precision, recall and F-measure can be used to measure the performance of the resolver. For this challenge, we will report all results by using two common variations of these metrics: strict and overlapping measures. In the strict measure, resolver annotations are considered matching with the gold standard annotations if they hit the same spans of text; whereas in overlapping measure, both annotations are matching when they share a common span of text. The python script used for the evaluation can be downloaded from Bitbucket.

Toponym Detection.

We will compute the precision and recall for toponym detection with the standard equations: Precision = TP / (TP + FP) and Recall = TP / (TP + FN), where TP (True Positive) is the number of toponyms correctly identified by a toponym detector in the corpus, FP (False Positive) the number of phrases incorrectly identified as toponyms by the detector, and FN (False Negative) the number of toponyms not identified by the detector.

Toponym Disambiguation.

The precision and recall for toponym disambiguation will be computed by slightly modified equations. The precision Pds of the toponym disambiguation is given by the equation 1, where TCD is the number of toponyms correctly identified and disambiguated by the toponym disambiguator in the corpus and TID the number of toponyms incorrectly identified or disambiguated in the corpus.

1. Pds = TCD / TCD + TID

The recall Rds of the toponym disambiguation is computed by the equation 2, where TN is the total number of toponyms in the corpus.

2. Rds = TCD / TN

Since the resolvers competing and the gold corpus annotations will all be aligned on GeoNames, toponyms correctly identified will be known by a simple match between the place IDs retrieved by the resolvers and those annotated by the annotators. Not matching on coordinates avoids the problem of having a resolver and the gold standard denoting two different toponyms but referring to the same coordinates; for instance, a city and its state may have the same geo-coordinates in GeoNames but they refer to different locations and hence will have different place IDs.

Terms and Conditions

By submitting results to this competition, you consent to the public release of your scores at the SemEval-2019 workshop and in the associated proceedings, at the task organizers' discretion. Scores may include, but are not limited to, automatic and manual quantitative judgements, qualitative judgements, and such other metrics as the task organizers see fit. You accept that the ultimate decision of metric choice and score value is that of the task organizers. You further agree that the task organizers are under no obligation to release scores and that scores may be withheld if it is the task organizers' judgement that the submission was incomplete, erroneous, deceptive, or violated the letter or spirit of the competition's rules. Inclusion of a submission's scores is not an endorsement of a team or individual's submission, system, or science. You further agree that your system may be named according to the team name provided at the time of submission, or to a suitable shorthand as determined by the task organizers. You agree not to redistribute the training and test data except in the manner prescribed by its licence.

Data and Resources

A case study: epidemiology of viruses

The automatic resolution of the names of places mentioned in textual documents has multiple applications and, therefore, has been the focus of research for both industrial and academic organizations. For this challenge, we chose a scientific domain where the resolution of the names of places is a crucial knowledge: epidemiology.

One aim in epidemiology is to create maps of the locations of viruses and their migration paths, a tool which is used to monitor and intervene during disease epidemics. To create maps of viruses, researchers often use geospatial metadata of individual sequence records in public databases such as NIH’s GenBank [Benson et al., 2011]. The metadata represents the location of the infected host. With more than 1.9 million virus sequences (as of March 2015), GenBank provides abundant information on viruses. However, previous work has suggested that geospatial metadata, when it is not simply missing, can be too imprecise for local-scale epidemiology. In their article Scotch et al., 2011 estimate that only 20% of GenBank records of zoonotic viruses contain detailed geospatial metadata such as a county or a town name (zoonotic viruses are viruses able to infect naturally hosts of different species, like rabies). Most GenBank records provide generic information, such as China or USA, without mentioning the specific places within these countries. However, more specific information about the locations of the viruses may be present in articles which describe the research work. To create a complete map, researchers are then forced to read these articles to locate in the text these additional pieces of geospatial metadata for a set of viruses of interest. This manual process can be highly time-consuming and labor-intensive.

This challenge will be an opportunity to assess the development and evaluation of automated approaches to retrieve geospatial metadata with finer level of granularity from full-text journal articles, approaches that can be further transferred or adapted to resolve names of places in other scientific domains.


Our corpus is composed of 120 full text journal articles in open access and downloaded from PubMed Central (PMC). All articles in the set of open access PMC are covered by a Creative Commons license and free to access.

Corpus   Links
Trial Corpus  

Converted texts

Training Corpus  

To be release around Sep 17, 2018

Test Corpus  

To be release around Jan 10, 2018

To perform the annotation, we manually downloaded the PDF versions of the PMC articles and converted them to text files using the freely available tool, pdf-to-text. We manually corrected and disambiguated the toponyms using GeoNames. We annotated toponyms in the title and the body of the document using the Brat annotator 3 [Stenetorp et al., 2012]. We removed contents that would not contain phylogeography-related toponyms, such as the tables and their captions, the names of the authors, acknowledgments and references, this was done manually. In cases where a toponym couldn't be found in GeoNames, we searched Google Maps and Wikipedia. If the toponym was still not found, we set its coordinates to a special value N/A. Prior to beginning annotation, we developed a set of annotation guidelines after discussion among four annotators familiar with the biological domain. The resulting guidelines is available for download.

GeoNames is a collaborative database of geospatial locations and available free of charge under a creative Commons attribution license. GeoNames is certainly the most commonly used database for toponym resolution [Gritta et al., 2017] and was chosen for this reason. At the time of writing, May 2018, GeoNames contains more than 10 million entries with more than 5.5 million alternative names. The database can be downloaded or accessed via API at


We released a end-to-end system to be used as a strong baseline or to get started with. This system performs sequentially the detection and the disambiguation of the toponyms in raw texts. To detect the toponyms the system uses a feedforward neural network describes in [Magge et al., 2018]. The disambiguation of all toponyms detected is then performed using a common heuristic, the population heuristic. Using this heuristic, the system always disambiguates a toponym by choosing the place which has the highest population in GeoNames. This heuristic as well as other alternatives are described in [Weissenbacher et al., 2015]. The baseline system can be downloaded from Github. We also made available a Rest service to search a recent copy of GeoNames at GeoNames Rest service, the documentation and the code to deploy the service locally can be found on Github.


[Benson et al., 2011] D. A. Benson, I. Karsch-Mizrachi, D. J. Lipman, J. Ostell, and E. W. Sayers. Genbank. Nucleic Acids Res, 39(D):32–37. 2011.

[Scotch et al., 2011] M. Scotch, I. N. Sarkar, C. Mei, R. Leaman, K-H. Cheung, P. Ortiz, A. Singraur, and G. Gonzalez. Enhancing phylogeography by improving geographical information from genbank. Journal of Biomedical Informatics, 44(44-47). 2011.

[Stenetorp et al., 2012] P. Stenetorp, S. Pyysalo, G. Topić, T. Ohta, S. Ananiadou and J. Tsujii. BRAT: a Web-based Tool for NLP-Assisted Text Annotation. In Proceedings of the Demonstrations Session at EACL. 2012.

[Gritta et al., 2017] M. Gritta, M. T. Pilehvar, N. Limsopatham and N. Collier. What's missing in geographical parsing? Language Resources and Evaluation. 2017.

[Magge et al., 2018] A. Magge, D. Weissenbacher, A. Sarker, M. Scotch and G. Gonzalez. Deep neural networks and distant supervision for geographic location mention extraction. Bioinformatics. 2018. To appear.

[Weissenbacher et al., 2015] D. Weissenbacher, T. Tahsin, R. Beard, M. Figaro, R. Rivera, M. Scotch, G. Gonzalez. Knowledge-driven geospatial location resolution for phylogeographic models of virus migration. Bioinformatics 31 (12): i348-i356. 2015.

Sub-Task 1: Practice Toponym Detection

Start: Jan. 1, 2018, midnight

Sub-Task 1: Evaluation Toponym Detection

Start: Jan. 10, 2019, midnight

Sub-Task 1: Post-Evaluation Toponym Detection

Start: Jan. 18, 2019, midnight

Sub-Task 2: Practice Toponym Disambiguation

Start: Jan. 1, 2018, midnight

Sub-Task 2: Evaluation Toponym Disambiguation

Start: Jan. 18, 2019, midnight

Sub-Task 2: Post-Evaluation Toponym Disambiguation

Start: Jan. 25, 2019, midnight

Sub-Task 3: Practice Toponym Resolution

Start: Jan. 1, 2018, midnight

Sub-Task 3: Evaluation Resolution

Start: Jan. 10, 2019, midnight

Sub-Task 3: Post-Evaluation Resolution

Start: Jan. 18, 2019, midnight

Competition Ends


You must be logged in to participate in competitions.

Sign In