DET
: determiner
Definition
Determiners are words that modify nouns or noun phrases and express the reference of the noun phrase in context. That is, a determiner may indicate whether the noun is referring to a definite or indefinite element of a class, to a closer or more distant element, to an element belonging to a specified person or thing, to a particular number or quantity, etc.
In Bulgarian the definite article is part of the word, thus it is not considered as determiner.
However, the following pronouns are mapped to detereminers:
- demonstratives: Pda#, Pde#
- relatives: Pra#, Pre#, Prp#
- collectives: Pca#, Pce#
- interrogatives: Pia#, Pie#, Piy#, Pip#
- indefinites: Pfa#, Pfe#, Pfp#
- negatives: Pna#, Pne#, Pnp#
- possessives: Ps@l
Note that the attributive usages (#a#) and possessive attributive usages (#p#) go directly into DET category, while entities (#e#) can be either determiners or pronouns. The possessive pronouns (Ps#) are mapped with only their long forms (#l#). The short forms are clitics and will be treated differently.
Examples
- possessive determiners: мой / moy “my”, твой / tvoy “your”
- demonstrative determiners: тази / _tazi__ “this” as in _Вчера видях тази кола / Vchera vidyah tazi kola “I saw this car yesterday.”
- interrogative determiners: какъв / kakav “which.MASC.SG”
- relative determiners: какъвто / kakavto “which.MASC.SG”
- indefinite determiners: някакъв / nyakakav “some.MASC.SG”
- collective determiners: всякакъв / vsyakakav “any.MASC.SG”
- negative determiners: никакъв / nikakav “no.MASC.SG”
Note that the symbol `#’, used in the Universal POS section indicates a holder for arbitrary number of features, suppressed in the respective tag as irrelevant in the BulTreeBank tagset, when mapped to the Universal one.
The symbol `@’ marks the suppresion with one feature in the tag.
Treebank Statistics (UD_Bulgarian)
There are 23 DET
lemmas (0%), 138 DET
types (1%) and 2160 DET
tokens (2%).
Out of 16 observed tags, the rank of DET
is: 11 in number of lemmas, 7 in number of types and 11 in number of tokens.
The 10 most frequent DET
lemmas: този, всеки, един, какъв, наш, мой, свой, такъв, някой, някакъв
The 10 most frequent DET
types: тази, този, тези, това, всички, един, какво, една, всеки, едно
The 10 most frequent ambiguous lemmas: този (DET 708, PRON 479), всеки (DET 250, PRON 102), един (DET 220, NUM 204, PRON 6), наш (DET 168, PRON 152), мой (PRON 334, DET 147), свой (PRON 556, DET 113), някой (PRON 87, DET 84), ваш (DET 30, PRON 29), никой (PRON 88, DET 20), какъвто (DET 18, PRON 10)
The 10 most frequent ambiguous types: това (PRON 256, DET 120), всички (DET 109, PRON 26), един (DET 77, NUM 55, PRON 1), една (DET 70, NUM 45), всеки (DET 41, PRON 6), едно (NUM 39, DET 38, PRON 2), някои (DET 35, PRON 6), някой (PRON 16, DET 10), нищо (PRON 36, DET 8), каквото (PRON 9, DET 6)
- това
- всички
- един
- една
- всеки
- едно
- някои
- някой
- нищо
- каквото
Morphology
The form / lemma ratio of DET
is 6.000000 (the average of all parts of speech is 1.709615).
The 1st highest number of forms (27) was observed with the lemma “мой”: Моят, мое, моето, мои, моите, мой, моя, моята, негов, негова, неговата, негови, неговите, неговия, неговият, негово, неговото, неин, нейна, нейната, нейни, нейните, нейния, нейният, нейното, твое, твоите.
The 2nd highest number of forms (18) was observed with the lemma “наш”: наш, наша, нашата, наше, нашето, наши, нашите, нашия, нашият, техен, техни, техните, техния, техният, тяхна, тяхната, тяхно, тяхното.
The 3rd highest number of forms (15) was observed with the lemma “този”: онази, онези, онзи, ония, онова, оня, тeзи, тази, тая, тези, тия, това, този, тоя, туй.
DET
occurs with 9 features: bg-feat/Number (2160; 100% instances), bg-feat/PronType (2160; 100% instances), bg-feat/Gender (1525; 71% instances), bg-feat/Case (797; 37% instances), bg-feat/Definite (714; 33% instances), bg-feat/Poss (465; 22% instances), bg-feat/Person (352; 16% instances), bg-feat/Reflex (113; 5% instances), bg-feat/Animacy (1; 0% instances)
DET
occurs with 22 feature-value pairs: Animacy=Anim
, Case=Acc
, Case=Nom
, Definite=Def
, Definite=Ind
, Gender=Fem
, Gender=Masc
, Gender=Neut
, Number=Plur
, Number=Sing
, Person=1
, Person=2
, Person=3
, Poss=Yes
, PronType=Dem
, PronType=Ind
, PronType=Int
, PronType=Neg
, PronType=Prs
, PronType=Rel
, PronType=Tot
, Reflex=Yes
DET
occurs with 80 feature combinations.
The most frequent feature combination is Gender=Masc|Number=Sing|PronType=Dem
(210 tokens).
Examples: този, такъв, тоя, оня, онзи
Relations
DET
nodes are attached to their parents using 16 different relations: bg-dep/det (1791; 83% instances), bg-dep/obj (100; 5% instances), bg-dep/nsubj (89; 4% instances), bg-dep/root (61; 3% instances), bg-dep/iobj (36; 2% instances), bg-dep/nmod (24; 1% instances), bg-dep/obl (18; 1% instances), bg-dep/conj (13; 1% instances), bg-dep/nsubj:pass (12; 1% instances), bg-dep/ccomp (9; 0% instances), bg-dep/advcl (2; 0% instances), bg-dep/acl (1; 0% instances), bg-dep/csubj (1; 0% instances), bg-dep/discourse (1; 0% instances), bg-dep/vocative (1; 0% instances), bg-dep/xcomp (1; 0% instances)
Parents of DET
nodes belong to 9 different parts of speech: NOUN (1815; 84% instances), VERB (253; 12% instances), ROOT (61; 3% instances), PROPN (14; 1% instances), ADJ (5; 0% instances), PRON (5; 0% instances), ADV (4; 0% instances), DET (2; 0% instances), NUM (1; 0% instances)
1839 (85%) DET
nodes are leaves.
157 (7%) DET
nodes have one child.
79 (4%) DET
nodes have two children.
85 (4%) DET
nodes have three or more children.
The highest child degree of a DET
node is 7.
Children of DET
nodes are attached using 21 different relations: bg-dep/acl (103; 16% instances), bg-dep/case (88; 14% instances), bg-dep/punct (86; 14% instances), bg-dep/nmod (78; 12% instances), bg-dep/cop (73; 12% instances), bg-dep/nsubj (65; 10% instances), bg-dep/advmod (55; 9% instances), bg-dep/fixed (17; 3% instances), bg-dep/discourse (13; 2% instances), bg-dep/obl (11; 2% instances), bg-dep/cc (10; 2% instances), bg-dep/conj (8; 1% instances), bg-dep/det (6; 1% instances), bg-dep/aux (5; 1% instances), bg-dep/appos (2; 0% instances), bg-dep/expl (2; 0% instances), bg-dep/iobj (2; 0% instances), bg-dep/mark (2; 0% instances), bg-dep/advcl (1; 0% instances), bg-dep/amod (1; 0% instances), bg-dep/csubj (1; 0% instances)
Children of DET
nodes belong to 14 different parts of speech: NOUN (131; 21% instances), VERB (100; 16% instances), ADP (89; 14% instances), PUNCT (86; 14% instances), AUX (84; 13% instances), ADV (45; 7% instances), PRON (27; 4% instances), PART (23; 4% instances), ADJ (16; 3% instances), CCONJ (16; 3% instances), PROPN (7; 1% instances), DET (2; 0% instances), SCONJ (2; 0% instances), INTJ (1; 0% instances)
DET in other languages: [am] [ar] [bg] [bxr] [ca] [ckb] [cop] [cs] [cu] [da] [de] [el] [en] [es] [et] [eu] [fa] [fi] [fo] [fr] [ga] [gl] [got] [grc] [he] [hi] [hr] [hu] [id] [it] [ja] [kk] [kmr] [ko] [la] [lv] [mr] [nl] [no] [pl] [pt] [ro] [ru] [sa] [sk] [sla] [sl] [so] [sr] [sv] [swl] [ta] [tr] [ug] [uk] [u] [urj] [ur] [vi] [yue] [zh]