Finnish NER
We have trained an NER system based on FinBERT and a new NER annotation layer of the UD_Finnish-TDT treebank. In comparisons, the NER system surpassed the state-of-the-art.
Links and resources:
Note: the latest model available for download from http://dl.turkunlp.org/turku-ner-models/combined-ext-model-130220.tar.gz uses OntoNotes NE types:
Type | Description |
---|---|
PERSON | People, including fictional |
NORP | Nationalities or religious or political groups |
FAC | Buildings, airports, highways, bridges, etc. |
ORG | Companies, agencies, institutions, etc. |
GPE | Countries, cities, states |
LOC | Non-GPE locations, mountain ranges, bodies of water |
PRODUCT | Objects, vehicles, foods, etc. (Not services.) |
EVENT | Named hurricanes, battles, wars, sports events, etc. |
WORK_OF_ART | Titles of books, songs, etc. |
LAW | Named documents made into laws |
LANGUAGE | Any named language |
DATE | Absolute or relative dates or periods |
TIME | Times smaller than a day |
PERCENT | Percentage, including “%” |
MONEY | Monetary values, including unit |
QUANTITY | Measurements, as of weight or distance |
ORDINAL | “first”, “second” |
CARDINAL | Numerals that do not fall under another type |
To run a server version of the tagger locally (in UNIX):
git clone https://github.com/spyysalo/keras-bert-ner.git
cd keras-bert-ner/
git submodule init
git submodule update
wget http://dl.turkunlp.org/turku-ner-models/combined-ext-model-130220.tar.gz
tar xvzf combined-ext-model-130220.tar.gz
python3 serve.py --ner_model_dir combined-ext-model
and then try
curl http://127.0.0.1:8080?text=Turun+yliopisto
or
curl -G --data-urlencode "text=Turun yliopisto" http://127.0.0.1:8080
If you use this system or data, please cite:
Jouni Luoma, Miika Oinonen, Maria Pyykönen, Veronika Laippala, Sampo Pyysalo. 2020. A Broad-coverage Corpus for Finnish Named Entity Recognition. In Proceedings of The 12th Language Resources and Evaluation Conference (LREC’2020). BibTeX