Hi Davy, I have a question about the abbreviation annotation. It seems that the corpus lacks consistency of abbreviation annotation. Sometimes abbreviations are annotated as LOC, sometimes not.
In file: PUB20975994.txt.
Why 'Th', 'VN', 'HK', 'In', 'Gd', 'YN', 'Sh', 'Sd' and 'ST' are not annotated? Some of these abbreviations can be retrieved by Geonames search engine(e.g. Th, VN). According to your annotation guideline(https://drive.google.com/file/d/1NCtHmesaXwaPHNHhDQWY1ZUjYk93HX4J/view) they should be annotated as LOC.
In file: PMC3773574
Context: DK, duck; GS, goose; SbD, spot-billed duck; MD, mallard duck; BbM, black-billed
magpie; CHU, chukkar; GF, guinea fowl; PG, pigeon; PH, pheasant; QA, quail; FR, ferret;
PT, partridge; SW, swine; WD, wild duck; HK, Hong Kong.
Why 'HK' here is annotated as LOC?Posted by: chengchen.xpj @ Jan. 10, 2019, 2:22 p.m.
Regarding the abbreviations in PUB20975994.txt, these would be FNs. As stated in previous answers the abbreviation annotation was inconsistent in a subset of the training corpus which was released in 2015 with the publication of a paper. In PMC3773574, this is from a description of a figure (which is not shown in the text file) so I assume that annotator thought it was a reference to a location and annotated both ‘HK’ and ‘Hong Kong’ as LOC.