NumType
: numeral type
In Slovenian UD Treebank, NumType
is a lexical feature of numerals and some adjectives that denote counting by numbers.
Card
: cardinal number
Examples
- en, dva, tri “one, two, three”
- 1, 2, 3
- I, II, III
Ord
: ordinal number
Examples
- prvi, drugi, tretji “first, second, third”
- 1., 2., 3.
- I., II., III.
Sets
: number of sets of things
Numerals used to count sets of things or nouns that are pluralia tantum.
Examples
- enoj, dvoj, troj “one-fold, two-fold, three-fold”
Gen
: generic numeral, i.e. a numeral that is neither of the above
Examples
- enojen, dvojen, trojen “single, double, triple”
Conversion from JOS
All numerals with Type=cardinal are converted to NumType=Card
and all numerals with Type=ordinal are converted to NumType=Ord
. Numerals with Type=pronominal are either converted to NumType=Card
(lemmas en and eden) or to NumType=Ord
(lemma drug). Numerals with Type=special are either converted to NumType=Sets
(lemmas not ending in -en) or to NumType=Gen
(lemmas ending in -en).
Note that other types of quantifying words have not been explicitly marked in JOS, so assigning these and other NumType
values to other words or part-of-speech categories, such as adjectives (enkraten, dvakraten, trikraten), adverbs (enkrat, dvakrat, trikrat; prvič, drugič, tretjič), determiners (veliko, malo, nekaj, koliko) and nouns (tretjina, polovica, četrtina), remains for future work.
Treebank Statistics (UD_Slovenian)
This feature is universal.
It occurs with 4 different values: Card
, Mult
, Ord
, Sets
.
2074 tokens (2%) have a non-empty value of NumType
.
606 types (2%) occur at least once with a non-empty value of NumType
.
493 lemmas (3%) occur at least once with a non-empty value of NumType
.
The feature is used with 2 part-of-speech tags: sl-pos/NUM (1779; 1% instances), sl-pos/ADJ (295; 0% instances).
NUM
1779 sl-pos/NUM tokens (100% of all NUM
tokens) have a non-empty value of NumType
.
The most frequent other feature values with which NUM
and NumType
co-occurred: Gender=EMPTY (1335; 75%), Case=EMPTY (1098; 62%), Number=EMPTY (1098; 62%), NumForm=Digit (1079; 61%).
NUM
tokens may have the following values of NumType
:
Card
(1533; 86% of non-emptyNumType
): eno, tri, dveh, dva, ena, tisoč, eden, štiri, dve, štirihOrd
(242; 14% of non-emptyNumType
): 1., 18., 20., 9., 14., 17., 19., 6., 3., 15.Sets
(4; 0% of non-emptyNumType
): dvoje, troje
NumType
seems to be lexical feature of NUM
. 100% lemmas (469) occur only with one value of NumType
.
ADJ
295 sl-pos/ADJ tokens (2% of all ADJ
tokens) have a non-empty value of NumType
.
The most frequent other feature values with which ADJ
and NumType
co-occurred: VerbForm=EMPTY (295; 100%), Degree=EMPTY (295; 100%), Definite=EMPTY (295; 100%), Number=Sing (235; 80%).
ADJ
tokens may have the following values of NumType
:
Mult
(4; 1% of non-emptyNumType
): dvojnega, dvojnim, dvojno, trojnimOrd
(291; 99% of non-emptyNumType
): prvi, prvo, prva, prve, prvem, prvih, prvega, tretji, tretje, prvimEMPTY
(13234): drugi, druge, mogoče, sam, nove, novo, drugih, različnih, slovenski, veliko
NumType
seems to be lexical feature of ADJ
. 100% lemmas (24) occur only with one value of NumType
.
Relations with Agreement in NumType
The 10 most frequent relations where parent and child node agree in NumType
:
NUM –[conj]–> NUM (88; 100%),
NUM –[flat]–> NUM (26; 90%),
NUM –[nmod]–> NUM (2; 100%).
Treebank Statistics (UD_Slovenian-SST)
This feature is universal.
It occurs with 4 different values: Card
, Mult
, Ord
, Sets
.
412 tokens (2%) have a non-empty value of NumType
.
104 types (2%) occur at least once with a non-empty value of NumType
.
68 lemmas (2%) occur at least once with a non-empty value of NumType
.
The feature is used with 2 part-of-speech tags: sl-pos/NUM (350; 2% instances), sl-pos/ADJ (62; 0% instances).
NUM
350 sl-pos/NUM tokens (100% of all NUM
tokens) have a non-empty value of NumType
.
The most frequent other feature values with which NUM
and NumType
co-occurred: NumForm=Word (350; 100%), Number=Plur (201; 57%), Case=Acc (177; 51%).
NUM
tokens may have the following values of NumType
:
Card
(349; 100% of non-emptyNumType
): eno, dva, en, ena, tri, dvajset, pet, tisoč, dve, enegaSets
(1; 0% of non-emptyNumType
): dvoje
NumType
seems to be lexical feature of NUM
. 100% lemmas (47) occur only with one value of NumType
.
ADJ
62 sl-pos/ADJ tokens (6% of all ADJ
tokens) have a non-empty value of NumType
.
The most frequent other feature values with which ADJ
and NumType
co-occurred: Degree=EMPTY (62; 100%), VerbForm=EMPTY (62; 100%), Definite=EMPTY (60; 97%), Number=Sing (59; 95%).
ADJ
tokens may have the following values of NumType
:
Mult
(2; 3% of non-emptyNumType
): dvojni, trojniOrd
(60; 97% of non-emptyNumType
): prvi, prva, devetindvajseti, peta, prvega, prvo, tretjo, trideseti, pete, prvemEMPTY
(1031): dobro, drugi, dober, drugo, drugega, glavnem, rdeča, lep, sami, stari
NumType
seems to be lexical feature of ADJ
. 100% lemmas (21) occur only with one value of NumType
.
Relations with Agreement in NumType
The 10 most frequent relations where parent and child node agree in NumType
:
NUM –[flat]–> NUM (33; 100%),
NUM –[conj]–> NUM (23; 100%),
NUM –[fixed]–> NUM (4; 100%),
ADJ –[conj]–> ADJ (4; 67%),
ADJ –[reparandum]–> ADJ (2; 100%),
NUM –[nummod]–> NUM (1; 100%),
NUM –[reparandum]–> NUM (1; 100%).
NumType in other languages: [am] [ar] [bg] [bxr] [ca] [ckb] [cop] [cs] [cu] [da] [de] [el] [en] [es] [et] [eu] [fa] [fi] [fo] [fr] [ga] [gl] [got] [grc] [he] [hi] [hr] [hu] [id] [it] [ja] [kk] [kmr] [ko] [la] [lv] [mr] [nl] [no] [pl] [pt] [ro] [ru] [sa] [sk] [sla] [sl] [so] [sr] [sv] [swl] [ta] [tr] [u] [ug] [uk] [ur] [urj] [vi] [yue] [zh]