home bg/pos edit page issue tracker

This page still pertains to UD version 1.

NUM: numeral

Definition

A numeral is a word, functioning most typically as a determiner, adjective or pronoun, that expresses a number and a relation to the number, such as quantity, sequence, frequency or fraction.

Note that cardinal numerals are covered by NUM whether they are used as determiners or not (as in Windows Seven) and whether they are expressed as words (four), digits (4) or Roman numerals (IV). Other words functioning as determiners (including quantifiers such as many and few) are tagged DET.

In Bultreebank tagset the tag which maps to NUM, is Mc#.

Examples

Note that the symbol `#’, used in the Universal POS section indicates a holder for arbitrary number of features, suppressed in the respective tag as irrelevant in the BulTreeBank tagset, when mapped to the Universal one.


Treebank Statistics (UD_Bulgarian)

There are 385 NUM lemmas (3%), 420 NUM types (2%) and 1883 NUM tokens (1%). Out of 16 observed tags, the rank of NUM is: 6 in number of lemmas, 6 in number of types and 12 in number of tokens.

The 10 most frequent NUM lemmas: два, един, три, двама, четири, 10, шест, двадесет, тридесет, 000

The 10 most frequent NUM types: две, един, 2, два, една, 1, 3, три, едно, 10

The 10 most frequent ambiguous lemmas: един (DET 220, NUM 204, PRON 6), 10 (NUM 39, ADJ 9), 15 (NUM 27, ADJ 11), 5 (NUM 23, ADJ 6), 12 (NUM 15, ADJ 4), 21 (NUM 15, ADJ 1), 11 (NUM 12, ADJ 1), 25 (NUM 12, ADJ 2), 16 (NUM 11, ADJ 3), 18 (NUM 11, ADJ 1)

The 10 most frequent ambiguous types: един (DET 77, NUM 55, PRON 1), 2 (NUM 57, ADJ 3, PROPN 1), една (DET 70, NUM 45), 1 (NUM 46, ADJ 27, PROPN 1), 3 (NUM 45, ADJ 5), едно (NUM 39, DET 38, PRON 2), 10 (NUM 39, ADJ 9), 20 (NUM 34, ADJ 4), 4 (NUM 28, ADJ 3), 15 (NUM 27, ADJ 11)

Morphology

The form / lemma ratio of NUM is 1.090909 (the average of all parts of speech is 1.709615).

The 1st highest number of forms (7) was observed with the lemma “един”: 1, Единият, един, една, едната, едно, едното.

The 2nd highest number of forms (5) was observed with the lemma “два”: 2, два, двата, две, двете.

The 3rd highest number of forms (4) was observed with the lemma “четирима”: 4-има, 4-ма, четирима, четиримата.

NUM occurs with 5 features: bg-feat/NumType (1883; 100% instances), bg-feat/Definite (1880; 100% instances), bg-feat/Number (1880; 100% instances), bg-feat/Gender (464; 25% instances), bg-feat/Animacy (8; 0% instances)

NUM occurs with 10 feature-value pairs: Animacy=Anim, Definite=Def, Definite=Ind, Gender=Fem, Gender=Masc, Gender=Neut, NumType=Card, NumType=Ord, Number=Plur, Number=Sing

NUM occurs with 17 feature combinations. The most frequent feature combination is Definite=Ind|Number=Plur|NumType=Card (1331 tokens). Examples: 3, три, 10, 20, 000, 4, 15, 30, 6, двама

Relations

NUM nodes are attached to their parents using 18 different relations: bg-dep/nummod (1559; 83% instances), bg-dep/nmod (93; 5% instances), bg-dep/nsubj (55; 3% instances), bg-dep/flat (45; 2% instances), bg-dep/conj (30; 2% instances), bg-dep/obl (23; 1% instances), bg-dep/obj (20; 1% instances), bg-dep/iobj (18; 1% instances), bg-dep/nsubj:pass (13; 1% instances), bg-dep/root (13; 1% instances), bg-dep/appos (3; 0% instances), bg-dep/ccomp (3; 0% instances), bg-dep/csubj (2; 0% instances), bg-dep/parataxis (2; 0% instances), bg-dep/acl (1; 0% instances), bg-dep/discourse (1; 0% instances), bg-dep/fixed (1; 0% instances), bg-dep/xcomp (1; 0% instances)

Parents of NUM nodes belong to 8 different parts of speech: NOUN (1595; 85% instances), VERB (128; 7% instances), NUM (99; 5% instances), PROPN (39; 2% instances), ROOT (13; 1% instances), ADJ (3; 0% instances), ADV (3; 0% instances), PRON (3; 0% instances)

1518 (81%) NUM nodes are leaves.

259 (14%) NUM nodes have one child.

62 (3%) NUM nodes have two children.

44 (2%) NUM nodes have three or more children.

The highest child degree of a NUM node is 7.

Children of NUM nodes are attached using 17 different relations: bg-dep/case (132; 24% instances), bg-dep/nmod (110; 20% instances), bg-dep/advmod (90; 16% instances), bg-dep/punct (51; 9% instances), bg-dep/flat (46; 8% instances), bg-dep/conj (33; 6% instances), bg-dep/cc (24; 4% instances), bg-dep/cop (22; 4% instances), bg-dep/nsubj (21; 4% instances), bg-dep/det (7; 1% instances), bg-dep/amod (6; 1% instances), bg-dep/acl (5; 1% instances), bg-dep/mark (5; 1% instances), bg-dep/compound (2; 0% instances), bg-dep/obl (2; 0% instances), bg-dep/advcl (1; 0% instances), bg-dep/aux (1; 0% instances)

Children of NUM nodes belong to 15 different parts of speech: ADP (167; 30% instances), NUM (99; 18% instances), NOUN (81; 15% instances), PUNCT (51; 9% instances), ADV (42; 8% instances), CCONJ (24; 4% instances), AUX (23; 4% instances), ADJ (21; 4% instances), PRON (17; 3% instances), PART (11; 2% instances), VERB (9; 2% instances), SCONJ (6; 1% instances), PROPN (4; 1% instances), INTJ (2; 0% instances), DET (1; 0% instances)


NUM in other languages: [am] [ar] [bg] [bxr] [ca] [ckb] [cop] [cs] [cu] [da] [de] [el] [en] [es] [et] [eu] [fa] [fi] [fo] [fr] [ga] [gl] [got] [grc] [he] [hi] [hr] [hu] [id] [it] [ja] [kk] [kmr] [ko] [la] [lv] [mr] [nl] [no] [pl] [pt] [ro] [ru] [sa] [sk] [sla] [sl] [so] [sr] [sv] [swl] [ta] [tr] [ug] [uk] [u] [urj] [ur] [vi] [yue] [zh]