home edit page issue tracker

This page still pertains to UD version 1.

Tokenization

The low-level tokenization of the Belarusian UD treebank generally adopts the RNC standard.

Some special cases worth mentioning:

The Belarusian UD treebank does not contain multiword tokens.

Indefinite pronouns and adverbs

Verb forms, analytical grammatical forms, negation

Character set

-,;:!?.’’”“”()/&#%°+0123456789aAábdDeěfFghHiIjkKlLmn№oOpPrRsStTuvVwWXyаАбБвВгГдДеЕёЁжЖзЗіІйкКлЛмМнНоОпПрРсСтТуУўфФхХцЦчЧшШыьэЭюЮяЯ