home cs/pos edit page issue tracker

This page pertains to UD version 2.

DET: determiner

Definition

Determiners (or pro-adjectives) are words that modify nouns or noun phrases and express the reference of the noun phrase in context. That is, a determiner may indicate whether the noun is referring to a definite or indefinite element of a class, to a closer or more distant element, to an element belonging to a specified person or thing, to a particular number or quantity, etc.

An important point to note is that the traditional grammar of Czech does not define determiners as a separate word class. Czech does not have articles. Most determiners are traditionally called pronouns; that is, an UD-conformant annotation of Czech must distinguish between substantive pronouns (UD tag PRON) and attributive pronouns (UD tag DET).

Also note that the DET tag includes (pronominal) quantifiers (words like mnoho, málo  “many, few”), which the traditional grammar classifies as a special subclass of numerals. However, cardinal numerals in the narrow sense (jeden, pět, sto) are not tagged DET even though some authors would include them in quantifiers. Cardinal numbers have their own tag NUM.

Conversion from the Prague Dependency Treebank

Since the PDT tagset (like all other Czech tagsets) does not distinguish substantive and attributive pronouns, morphological tags alone are not enough to find the correct universal POS tag. Morphological rules could help, as the inflection patterns of some pronouns bear similarities to adjectival inflection (especially the ability to inflect for gender). Unlike in UD v1, we no longer use the dependency tree to distinguish between determiners and pronouns. Instead, we use a pre-defined list of lemmas that are DET if their PDT tag indicates pronoun. See also here for a Slavic-wide discussion of the distinction between determiners and pronouns.

Examples

References


Treebank Statistics (UD_Czech)

There are 68 DET lemmas (0%), 370 DET types (0%) and 48770 DET tokens (4%). Out of 17 observed tags, the rank of DET is: 9 in number of lemmas, 7 in number of types and 9 in number of tokens.

The 10 most frequent DET lemmas: ten, který, tento, jeho, svůj, můj, všechen, některý, takový, několik

The 10 most frequent DET types: to, které, který, jeho, která, jejich, své, tím, kteří, tom

The 10 most frequent ambiguous lemmas: svůj (DET 4312, ADJ 4), jenž (PRON 1935, DET 581), mnoho (DET 422, ADV 1), tolik (DET 104, ADV 52), málo (ADV 623, DET 99, NOUN 37), moc (NOUN 208, ADV 89, DET 31), Notre (ADJ 3, DET 1), c (NOUN 54, DET 1), ce (PROPN 5, DET 1), hodně (ADV 1654, DET 1)

The 10 most frequent ambiguous types: to (DET 5335, PART 23, ADP 3), jeho (DET 2182, PRON 16), své (DET 1235, ADJ 1), tom (DET 946, PROPN 5), všech (DET 541, PRON 1), svůj (DET 410, ADJ 1), ty (DET 234, PRON 11), toto (DET 241, NOUN 2), ta (DET 132, INTJ 2, NOUN 1), mnoho (DET 213, ADV 1)

Morphology

The form / lemma ratio of DET is 5.441176 (the average of all parts of speech is 2.162583).

The 1st highest number of forms (27) was observed with the lemma “můj”: Mí, moje, moji, mojí, mou, má, mé, mého, mém, mému, mých, mýho, mým, mými, můj, n, naše, našeho, našem, našemu, naši, našich, našim, našimi, naší, naším, náš.

The 2nd highest number of forms (19) was observed with the lemma “jakýkoliv”: jakoukoli, jakoukoliv, jakákoli, jakákoliv, jakéhokoli, jakéhokoliv, jakékoli, jakékoliv, jakémkoli, jakémkoliv, jakémukoli, jakémukoliv, jakýchkoli, jakýchkoliv, jakýkoli, jakýkoliv, jakýmikoliv, jakýmkoli, jakýmkoliv.

The 3rd highest number of forms (17) was observed with the lemma “ten”: t, ta, ten, ti, to, toho, tom, tomu, tou, tu, ty, té, tím, těch, těm, těma, těmi.

DET occurs with 16 features: cs-feat/PronType (48770; 100% instances), cs-feat/Case (43864; 90% instances), cs-feat/Number (42651; 87% instances), cs-feat/Gender (38424; 79% instances), cs-feat/Poss (12656; 26% instances), cs-feat/Person (8325; 17% instances), cs-feat/Number[psor] (8323; 17% instances), cs-feat/Animacy (5524; 11% instances), cs-feat/Reflex (4314; 9% instances), cs-feat/Gender[psor] (3852; 8% instances), cs-feat/NumType (1626; 3% instances), cs-feat/Style (23; 0% instances), cs-feat/Abbr (19; 0% instances), cs-feat/Foreign (17; 0% instances), cs-feat/NameType (4; 0% instances), cs-feat/PrepCase (1; 0% instances)

DET occurs with 43 feature-value pairs: Abbr=Yes, Animacy=Anim, Animacy=Inan, Case=Acc, Case=Dat, Case=Gen, Case=Ins, Case=Loc, Case=Nom, Case=Voc, Foreign=Yes, Gender=Fem, Gender=Fem,Neut, Gender=Masc, Gender=Masc,Neut, Gender=Neut, Gender[psor]=Fem, Gender[psor]=Masc,Neut, NameType=Oth, NameType=Pro, NumType=Card, NumType=Ord, Number=Dual, Number=Plur, Number=Sing, Number[psor]=Plur, Number[psor]=Sing, Person=1, Person=2, Person=3, Poss=Yes, PrepCase=Npr, PronType=Dem, PronType=Emp, PronType=Ind, PronType=Int,Rel, PronType=Neg, PronType=Prs, PronType=Rel, PronType=Tot, Reflex=Yes, Style=Coll, Style=Rare

DET occurs with 353 feature combinations. The most frequent feature combination is Case=Nom|Gender=Neut|Number=Sing|PronType=Dem (4033 tokens). Examples: to, toto, takové, totéž, tohle, ono, takovéto, toť

Relations

DET nodes are attached to their parents using 25 different relations: cs-dep/det (24920; 51% instances), cs-dep/nsubj (10791; 22% instances), cs-dep/obj (5280; 11% instances), cs-dep/obl (2532; 5% instances), cs-dep/det:numgov (862; 2% instances), cs-dep/nsubj:pass (754; 2% instances), cs-dep/xcomp (706; 1% instances), cs-dep/det:nummod (500; 1% instances), cs-dep/conj (397; 1% instances), cs-dep/cc (365; 1% instances), cs-dep/iobj (324; 1% instances), cs-dep/root (314; 1% instances), cs-dep/amod (307; 1% instances), cs-dep/dep (203; 0% instances), cs-dep/discourse (141; 0% instances), cs-dep/appos (99; 0% instances), cs-dep/acl (75; 0% instances), cs-dep/advcl (57; 0% instances), cs-dep/orphan (52; 0% instances), cs-dep/ccomp (47; 0% instances), cs-dep/flat:foreign (13; 0% instances), cs-dep/parataxis (11; 0% instances), cs-dep/mark (8; 0% instances), cs-dep/csubj (7; 0% instances), cs-dep/advmod (5; 0% instances)

Parents of DET nodes belong to 15 different parts of speech: NOUN (28187; 58% instances), VERB (16782; 34% instances), ADJ (2136; 4% instances), DET (361; 1% instances), ROOT (314; 1% instances), ADV (292; 1% instances), PRON (246; 1% instances), PROPN (219; 0% instances), NUM (188; 0% instances), CCONJ (20; 0% instances), PART (18; 0% instances), SYM (3; 0% instances), INTJ (2; 0% instances), ADP (1; 0% instances), SCONJ (1; 0% instances)

41127 (84%) DET nodes are leaves.

4613 (9%) DET nodes have one child.

2034 (4%) DET nodes have two children.

996 (2%) DET nodes have three or more children.

The highest child degree of a DET node is 11.

Children of DET nodes are attached using 31 different relations: cs-dep/case (3950; 32% instances), cs-dep/acl (2706; 22% instances), cs-dep/punct (1086; 9% instances), cs-dep/advmod:emph (705; 6% instances), cs-dep/cc (652; 5% instances), cs-dep/cop (466; 4% instances), cs-dep/amod (442; 4% instances), cs-dep/nsubj (400; 3% instances), cs-dep/nmod (308; 2% instances), cs-dep/conj (295; 2% instances), cs-dep/xcomp (271; 2% instances), cs-dep/advmod (221; 2% instances), cs-dep/dep (186; 1% instances), cs-dep/mark (141; 1% instances), cs-dep/orphan (133; 1% instances), cs-dep/appos (117; 1% instances), cs-dep/det (112; 1% instances), cs-dep/obl (89; 1% instances), cs-dep/advcl (74; 1% instances), cs-dep/ccomp (21; 0% instances), cs-dep/nummod (15; 0% instances), cs-dep/aux (10; 0% instances), cs-dep/csubj (8; 0% instances), cs-dep/fixed (8; 0% instances), cs-dep/parataxis (6; 0% instances), cs-dep/obj (5; 0% instances), cs-dep/discourse (4; 0% instances), cs-dep/flat:foreign (4; 0% instances), cs-dep/det:nummod (3; 0% instances), cs-dep/expl:pass (1; 0% instances), cs-dep/vocative (1; 0% instances)

Children of DET nodes belong to 16 different parts of speech: ADP (3929; 32% instances), VERB (2537; 20% instances), PUNCT (1086; 9% instances), NOUN (1042; 8% instances), ADJ (870; 7% instances), CCONJ (834; 7% instances), ADV (655; 5% instances), AUX (476; 4% instances), DET (361; 3% instances), PART (199; 2% instances), PRON (158; 1% instances), SCONJ (141; 1% instances), PROPN (94; 1% instances), NUM (56; 0% instances), INTJ (1; 0% instances), SYM (1; 0% instances)


Treebank Statistics (UD_Czech-CAC)

There are 48 DET lemmas (0%), 302 DET types (0%) and 19041 DET tokens (4%). Out of 16 observed tags, the rank of DET is: 8 in number of lemmas, 6 in number of types and 8 in number of tokens.

The 10 most frequent DET lemmas: ten, který, tento, jeho, svůj, můj, některý, takový, jenž, sám

The 10 most frequent DET types: to, které, jejich, jeho, který, která, tím, této, své, těchto

The 10 most frequent ambiguous lemmas: jenž (PRON 819, DET 342), mnoho (DET 239, ADV 6), tolik (DET 30, ADV 29), málo (ADV 272, DET 24, NOUN 1), sui (DET 3, NOUN 1, ADP 1)

The 10 most frequent ambiguous types: to (DET 1862, PART 11), jeho (DET 793, PRON 26), mnoho (DET 132, ADV 6), ty (DET 111, PRON 1), jehož (DET 120, PRON 18), všechny (PRON 241, DET 69), ti (DET 43, PRON 8), tu (ADV 180, DET 42), tolik (DET 28, ADV 25), málo (ADV 65, DET 21, NOUN 1)

Morphology

The form / lemma ratio of DET is 6.291667 (the average of all parts of speech is 2.180683).

The 1st highest number of forms (25) was observed with the lemma “můj”: moje, moji, mojí, mou, má, mé, mého, mém, mému, mých, mým, mýma, můj, naše, našeho, našem, našemu, naši, našich, našim, našima, našimi, naší, naším, náš.

The 2nd highest number of forms (16) was observed with the lemma “ten”: ta, ten, ti, to, toho, tom, tomu, tou, tu, ty, té, tím, těch, těm, těma, těmi.

The 3rd highest number of forms (15) was observed with the lemma “tento”: tato, tento, tito, tohoto, tomto, tomuto, toto, touto, tuto, tyto, této, tímto, těchto, těmito, těmto.

DET occurs with 14 features: cs-feat/PronType (19041; 100% instances), cs-feat/Case (16827; 88% instances), cs-feat/Number (16374; 86% instances), cs-feat/Gender (14680; 77% instances), cs-feat/Poss (5273; 28% instances), cs-feat/Number[psor] (3865; 20% instances), cs-feat/Person (3865; 20% instances), cs-feat/Animacy (2015; 11% instances), cs-feat/Gender[psor] (1551; 8% instances), cs-feat/Reflex (1389; 7% instances), cs-feat/NumType (642; 3% instances), cs-feat/Foreign (5; 0% instances), cs-feat/PrepCase (1; 0% instances), cs-feat/Style (1; 0% instances)

DET occurs with 38 feature-value pairs: Animacy=Anim, Animacy=Inan, Case=Acc, Case=Dat, Case=Gen, Case=Ins, Case=Loc, Case=Nom, Foreign=Yes, Gender=Fem, Gender=Fem,Neut, Gender=Masc, Gender=Masc,Neut, Gender=Neut, Gender[psor]=Fem, Gender[psor]=Masc,Neut, NumType=Card, NumType=Ord, Number=Dual, Number=Plur, Number=Sing, Number[psor]=Plur, Number[psor]=Sing, Person=1, Person=2, Person=3, Poss=Yes, PrepCase=Npr, PronType=Dem, PronType=Emp, PronType=Ind, PronType=Int,Rel, PronType=Neg, PronType=Prs, PronType=Rel, PronType=Tot, Reflex=Yes, Style=Coll

DET occurs with 272 feature combinations. The most frequent feature combination is Case=Nom|Gender=Neut|Number=Sing|PronType=Dem (1575 tokens). Examples: to, toto, takové, totéž, ono, tohle, takovéto, tohleto

Relations

DET nodes are attached to their parents using 28 different relations: cs-dep/det (10528; 55% instances), cs-dep/nsubj (3585; 19% instances), cs-dep/obj (1498; 8% instances), cs-dep/obl (1105; 6% instances), cs-dep/det:numgov (360; 2% instances), cs-dep/cc (289; 2% instances), cs-dep/xcomp (282; 1% instances), cs-dep/root (257; 1% instances), cs-dep/nsubj:pass (248; 1% instances), cs-dep/conj (195; 1% instances), cs-dep/det:nummod (188; 1% instances), cs-dep/amod (97; 1% instances), cs-dep/dep (92; 0% instances), cs-dep/iobj (92; 0% instances), cs-dep/discourse (54; 0% instances), cs-dep/appos (53; 0% instances), cs-dep/acl (24; 0% instances), cs-dep/advcl (23; 0% instances), cs-dep/orphan (19; 0% instances), cs-dep/parataxis (14; 0% instances), cs-dep/advmod (12; 0% instances), cs-dep/ccomp (11; 0% instances), cs-dep/mark (5; 0% instances), cs-dep/csubj (3; 0% instances), cs-dep/flat:foreign (3; 0% instances), cs-dep/csubj:pass (2; 0% instances), cs-dep/advmod:emph (1; 0% instances), cs-dep/aux (1; 0% instances)

Parents of DET nodes belong to 14 different parts of speech: NOUN (11890; 62% instances), VERB (5494; 29% instances), ADJ (906; 5% instances), ROOT (257; 1% instances), ADV (138; 1% instances), DET (103; 1% instances), PRON (79; 0% instances), NUM (59; 0% instances), PROPN (50; 0% instances), SYM (49; 0% instances), CCONJ (7; 0% instances), PART (5; 0% instances), SCONJ (3; 0% instances), ADP (1; 0% instances)

15927 (84%) DET nodes are leaves.

1900 (10%) DET nodes have one child.

681 (4%) DET nodes have two children.

533 (3%) DET nodes have three or more children.

The highest child degree of a DET node is 8.

Children of DET nodes are attached using 24 different relations: cs-dep/case (1428; 27% instances), cs-dep/acl (863; 16% instances), cs-dep/punct (519; 10% instances), cs-dep/cc (421; 8% instances), cs-dep/cop (354; 7% instances), cs-dep/advmod:emph (322; 6% instances), cs-dep/nsubj (315; 6% instances), cs-dep/conj (173; 3% instances), cs-dep/amod (149; 3% instances), cs-dep/nmod (145; 3% instances), cs-dep/advmod (132; 2% instances), cs-dep/xcomp (114; 2% instances), cs-dep/obl (74; 1% instances), cs-dep/orphan (63; 1% instances), cs-dep/dep (61; 1% instances), cs-dep/mark (52; 1% instances), cs-dep/advcl (50; 1% instances), cs-dep/appos (26; 0% instances), cs-dep/aux (7; 0% instances), cs-dep/parataxis (7; 0% instances), cs-dep/det (6; 0% instances), cs-dep/nummod (6; 0% instances), cs-dep/csubj (3; 0% instances), cs-dep/obj (3; 0% instances)

Children of DET nodes belong to 15 different parts of speech: ADP (1418; 27% instances), VERB (827; 16% instances), NOUN (606; 11% instances), PUNCT (519; 10% instances), CCONJ (481; 9% instances), AUX (361; 7% instances), ADV (331; 6% instances), ADJ (308; 6% instances), PRON (125; 2% instances), DET (103; 2% instances), PART (100; 2% instances), SCONJ (60; 1% instances), NUM (19; 0% instances), PROPN (19; 0% instances), SYM (16; 0% instances)


Treebank Statistics (UD_Czech-CLTT)

There are 12 DET lemmas (1%), 48 DET types (1%) and 405 DET tokens (2%). Out of 15 observed tags, the rank of DET is: 11 in number of lemmas, 8 in number of types and 11 in number of tokens.

The 10 most frequent DET lemmas: tento, jeho, svůj, jenž, takový, některý, takovýto, jakýkoliv, žádný, jaký

The 10 most frequent DET types: jejich, této, jeho, těchto, tyto, tohoto, tento, tato, její, tomto

The 10 most frequent ambiguous lemmas: tento (DET 213, PRON 2), jenž (PRON 54, DET 15), některý (DET 4, PRON 1), žádný (PRON 2, DET 2), který (PRON 321, DET 1)

The 10 most frequent ambiguous types: jejich (DET 67, X 6), této (DET 55, X 1), tuto (DET 7, ADV 4), jehož (DET 4, PRON 1), tímto (DET 3, PRON 1), kterým (PRON 18, DET 1)

Morphology

The form / lemma ratio of DET is 4.000000 (the average of all parts of speech is 1.685169).

The 1st highest number of forms (13) was observed with the lemma “tento”: tato, tento, tohoto, tomto, tomuto, toto, touto, tuto, tyto, této, tímto, těchto, těmito.

The 2nd highest number of forms (7) was observed with the lemma “jeho”: jeho, jejich, její, jejích, jejího, jejím, jejími.

The 3rd highest number of forms (7) was observed with the lemma “takový”: taková, takové, takovému, takový, takových, takovým, takovými.

DET occurs with 10 features: cs-feat/PronType (405; 100% instances), cs-feat/Number (283; 70% instances), cs-feat/Case (275; 68% instances), cs-feat/Gender (233; 58% instances), cs-feat/Poss (162; 40% instances), cs-feat/Number[psor] (144; 36% instances), cs-feat/Person (144; 36% instances), cs-feat/Gender[psor] (68; 17% instances), cs-feat/Reflex (18; 4% instances), cs-feat/Animacy (16; 4% instances)

DET occurs with 26 feature-value pairs: Animacy=Inan, Case=Acc, Case=Dat, Case=Gen, Case=Ins, Case=Loc, Case=Nom, Gender=Fem, Gender=Masc, Gender=Masc,Neut, Gender=Neut, Gender[psor]=Fem, Gender[psor]=Masc,Neut, Number=Plur, Number=Sing, Number[psor]=Plur, Number[psor]=Sing, Person=3, Poss=Yes, PronType=Dem, PronType=Ind, PronType=Int,Rel, PronType=Neg, PronType=Prs, PronType=Rel, Reflex=Yes

DET occurs with 50 feature combinations. The most frequent feature combination is Number[psor]=Plur|Person=3|Poss=Yes|PronType=Prs (67 tokens). Examples: jejich

Relations

DET nodes are attached to their parents using 2 different relations: cs-dep/det (404; 100% instances), cs-dep/acl (1; 0% instances)

Parents of DET nodes belong to 2 different parts of speech: NOUN (404; 100% instances), ADJ (1; 0% instances)

404 (100%) DET nodes are leaves.

0 (0%) DET nodes have one child.

0 (0%) DET nodes have two children.

1 (0%) DET nodes have three or more children.

The highest child degree of a DET node is 3.

Children of DET nodes are attached using 3 different relations: cs-dep/cop (1; 33% instances), cs-dep/nsubj (1; 33% instances), cs-dep/punct (1; 33% instances)

Children of DET nodes belong to 3 different parts of speech: AUX (1; 33% instances), NOUN (1; 33% instances), PUNCT (1; 33% instances)


DET in other languages: [am] [ar] [bg] [bxr] [ca] [ckb] [cop] [cs] [cu] [da] [de] [el] [en] [es] [et] [eu] [fa] [fi] [fo] [fr] [ga] [gl] [got] [grc] [he] [hi] [hr] [hu] [id] [it] [ja] [kk] [kmr] [ko] [la] [lv] [mr] [nl] [no] [pl] [pt] [ro] [ru] [sa] [sk] [sla] [sl] [so] [sr] [sv] [swl] [ta] [tr] [ug] [uk] [u] [urj] [ur] [vi] [yue] [zh]