NUM
: numeral
Definition
A numeral is a word, functioning most typically as a determiner, adjective or pronoun, that expresses a number and a relation to the number, such as quantity, sequence, frequency or fraction.
Note that cardinal numerals are covered by NUM
whether they are used
as determiners or not (as in Windows Seven) and whether they
are expressed as words (four), digits (4) or Roman numerals
(IV). Other words functioning as determiners (including quantifiers
such as many and few) are tagged DET.
In Bultreebank tagset the tag which maps to NUM
, is Mc#
.
Examples
- 0, 1, 2, 3, 4, 5, 2014, 1000000, 3.14159265359
- едно, две, три, седемдесет и седем / edno, dve, tri, sedemdeset i sedem “one, two, three, seventy-seven”
- I, II, III, IV, V, MMXIV
Note that the symbol `#’, used in the Universal POS section indicates a holder for arbitrary number of features, suppressed in the respective tag as irrelevant in the BulTreeBank tagset, when mapped to the Universal one.
Treebank Statistics (UD_Bulgarian)
There are 385 NUM
lemmas (3%), 420 NUM
types (2%) and 1883 NUM
tokens (1%).
Out of 16 observed tags, the rank of NUM
is: 6 in number of lemmas, 6 in number of types and 12 in number of tokens.
The 10 most frequent NUM
lemmas: два, един, три, двама, четири, 10, шест, двадесет, тридесет, 000
The 10 most frequent NUM
types: две, един, 2, два, една, 1, 3, три, едно, 10
The 10 most frequent ambiguous lemmas: един (DET 220, NUM 204, PRON 6), 10 (NUM 39, ADJ 9), 15 (NUM 27, ADJ 11), 5 (NUM 23, ADJ 6), 12 (NUM 15, ADJ 4), 21 (NUM 15, ADJ 1), 11 (NUM 12, ADJ 1), 25 (NUM 12, ADJ 2), 16 (NUM 11, ADJ 3), 18 (NUM 11, ADJ 1)
The 10 most frequent ambiguous types: един (DET 77, NUM 55, PRON 1), 2 (NUM 57, ADJ 3, PROPN 1), една (DET 70, NUM 45), 1 (NUM 46, ADJ 27, PROPN 1), 3 (NUM 45, ADJ 5), едно (NUM 39, DET 38, PRON 2), 10 (NUM 39, ADJ 9), 20 (NUM 34, ADJ 4), 4 (NUM 28, ADJ 3), 15 (NUM 27, ADJ 11)
- един
- 2
- една
- 1
- 3
- едно
- 10
- 20
- 4
- 15
Morphology
The form / lemma ratio of NUM
is 1.090909 (the average of all parts of speech is 1.709615).
The 1st highest number of forms (7) was observed with the lemma “един”: 1, Единият, един, една, едната, едно, едното.
The 2nd highest number of forms (5) was observed with the lemma “два”: 2, два, двата, две, двете.
The 3rd highest number of forms (4) was observed with the lemma “четирима”: 4-има, 4-ма, четирима, четиримата.
NUM
occurs with 5 features: bg-feat/NumType (1883; 100% instances), bg-feat/Definite (1880; 100% instances), bg-feat/Number (1880; 100% instances), bg-feat/Gender (464; 25% instances), bg-feat/Animacy (8; 0% instances)
NUM
occurs with 10 feature-value pairs: Animacy=Anim
, Definite=Def
, Definite=Ind
, Gender=Fem
, Gender=Masc
, Gender=Neut
, NumType=Card
, NumType=Ord
, Number=Plur
, Number=Sing
NUM
occurs with 17 feature combinations.
The most frequent feature combination is Definite=Ind|Number=Plur|NumType=Card
(1331 tokens).
Examples: 3, три, 10, 20, 000, 4, 15, 30, 6, двама
Relations
NUM
nodes are attached to their parents using 18 different relations: bg-dep/nummod (1559; 83% instances), bg-dep/nmod (93; 5% instances), bg-dep/nsubj (55; 3% instances), bg-dep/flat (45; 2% instances), bg-dep/conj (30; 2% instances), bg-dep/obl (23; 1% instances), bg-dep/obj (20; 1% instances), bg-dep/iobj (18; 1% instances), bg-dep/nsubj:pass (13; 1% instances), bg-dep/root (13; 1% instances), bg-dep/appos (3; 0% instances), bg-dep/ccomp (3; 0% instances), bg-dep/csubj (2; 0% instances), bg-dep/parataxis (2; 0% instances), bg-dep/acl (1; 0% instances), bg-dep/discourse (1; 0% instances), bg-dep/fixed (1; 0% instances), bg-dep/xcomp (1; 0% instances)
Parents of NUM
nodes belong to 8 different parts of speech: NOUN (1595; 85% instances), VERB (128; 7% instances), NUM (99; 5% instances), PROPN (39; 2% instances), ROOT (13; 1% instances), ADJ (3; 0% instances), ADV (3; 0% instances), PRON (3; 0% instances)
1518 (81%) NUM
nodes are leaves.
259 (14%) NUM
nodes have one child.
62 (3%) NUM
nodes have two children.
44 (2%) NUM
nodes have three or more children.
The highest child degree of a NUM
node is 7.
Children of NUM
nodes are attached using 17 different relations: bg-dep/case (132; 24% instances), bg-dep/nmod (110; 20% instances), bg-dep/advmod (90; 16% instances), bg-dep/punct (51; 9% instances), bg-dep/flat (46; 8% instances), bg-dep/conj (33; 6% instances), bg-dep/cc (24; 4% instances), bg-dep/cop (22; 4% instances), bg-dep/nsubj (21; 4% instances), bg-dep/det (7; 1% instances), bg-dep/amod (6; 1% instances), bg-dep/acl (5; 1% instances), bg-dep/mark (5; 1% instances), bg-dep/compound (2; 0% instances), bg-dep/obl (2; 0% instances), bg-dep/advcl (1; 0% instances), bg-dep/aux (1; 0% instances)
Children of NUM
nodes belong to 15 different parts of speech: ADP (167; 30% instances), NUM (99; 18% instances), NOUN (81; 15% instances), PUNCT (51; 9% instances), ADV (42; 8% instances), CCONJ (24; 4% instances), AUX (23; 4% instances), ADJ (21; 4% instances), PRON (17; 3% instances), PART (11; 2% instances), VERB (9; 2% instances), SCONJ (6; 1% instances), PROPN (4; 1% instances), INTJ (2; 0% instances), DET (1; 0% instances)
NUM in other languages: [am] [ar] [bg] [bxr] [ca] [ckb] [cop] [cs] [cu] [da] [de] [el] [en] [es] [et] [eu] [fa] [fi] [fo] [fr] [ga] [gl] [got] [grc] [he] [hi] [hr] [hu] [id] [it] [ja] [kk] [kmr] [ko] [la] [lv] [mr] [nl] [no] [pl] [pt] [ro] [ru] [sa] [sk] [sla] [sl] [so] [sr] [sv] [swl] [ta] [tr] [ug] [uk] [u] [urj] [ur] [vi] [yue] [zh]