home cs/pos edit page issue tracker

This page pertains to UD version 2.

PRON: pronoun

Definition

Pronouns are words that substitute for nouns or noun phrases, whose meaning is recoverable from the linguistic or extralinguistic context.

Pronouns under this definition function like nouns. Note that Czech grammar traditionally extends the term pronoun to words that substitute for adjectives. Such words are not tagged PRON under our universal scheme. They are tagged as determiners in order to annotate the same thing same way across languages.

For instance, tohle  “this” is traditionally called pronoun in Czech grammar, regardless of context (the notion of determiners does not exist in the traditional Czech grammar). In UD v2, tohle is tagged DET.

Unlike in UD v1, we no longer use the dependency tree to distinguish between determiners and pronouns. Instead, we use a pre-defined list of lemmas that are DET if their PDT tag indicates pronoun. See also here for a Slavic-wide discussion of the distinction between determiners and pronouns.

Examples

References


Treebank Statistics (UD_Czech)

There are 54 PRON lemmas (0%), 197 PRON types (0%) and 39664 PRON tokens (3%). Out of 17 observed tags, the rank of PRON is: 10 in number of lemmas, 8 in number of types and 10 in number of tokens.

The 10 most frequent PRON lemmas: se, on, já, jenž, co, kdo, což, nic, něco, ty

The 10 most frequent PRON types: se, si, co, nás, je, nám, nich, kdo, což, mu

The 10 most frequent ambiguous lemmas: on (PRON 6433, ADP 9, PART 1), (PRON 2933, NOUN 1), jenž (PRON 1935, DET 581), co (PRON 1652, ADV 201, SCONJ 184, PART 17), což (PRON 661, INTJ 3, PART 1), I (NUM 90, PROPN 59, ADJ 16, PRON 14), all (ADV 1, PRON 1), sa (PRON 4, PROPN 2), ja (PRON 3, PART 1), von (ADP 21, PRON 3)

The 10 most frequent ambiguous types: se (PRON 18891, ADP 1691), si (PRON 3296, AUX 1), co (PRON 1058, ADV 197, SCONJ 181, PART 7), je (AUX 8849, VERB 1773, PRON 793), nám (PRON 635, NOUN 5), což (PRON 560, INTJ 2), jež (PRON 293, PROPN 1), níž (PRON 261, ADV 2), (PRON 157, NOUN 1), (PRON 231, VERB 5)

Morphology

The form / lemma ratio of PRON is 3.648148 (the average of all parts of speech is 2.162583).

The 1st highest number of forms (28) was observed with the lemma “on”: ho, je, jeho, jej, jemu, ji, jich, jim, jimi, jí, jím, mu, ni, nich, nim, nimi, ní, ním, ně, něho, něj, něm, němu, on, ona, oni, ono, ony.

The 2nd highest number of forms (22) was observed with the lemma “jenž”: jehož, jejž, jemuž, jenž, jež, jichž, jimiž, jimž, již, jímž, jíž, nichž, nimiž, nimž, niž, nímž, níž, něhož, nějž, němuž, němž, něž.

The 3rd highest number of forms (11) was observed with the lemma “samý”: samou, samá, samé, samého, samém, samému, samí, samý, samých, samým, samými.

PRON occurs with 12 features: cs-feat/PronType (39664; 100% instances), cs-feat/Case (39609; 100% instances), cs-feat/Variant (24036; 61% instances), cs-feat/Reflex (22793; 57% instances), cs-feat/Number (11968; 30% instances), cs-feat/Person (9786; 25% instances), cs-feat/Gender (7004; 18% instances), cs-feat/PrepCase (4366; 11% instances), cs-feat/Animacy (3239; 8% instances), cs-feat/Style (272; 1% instances), cs-feat/Foreign (56; 0% instances), cs-feat/NameType (12; 0% instances)

PRON occurs with 35 feature-value pairs: Animacy=Anim, Animacy=Inan, Case=Acc, Case=Dat, Case=Gen, Case=Ins, Case=Loc, Case=Nom, Case=Voc, Foreign=Yes, Gender=Fem, Gender=Masc, Gender=Masc,Neut, Gender=Neut, NameType=Com, NameType=Oth, NameType=Pro, Number=Plur, Number=Sing, Person=1, Person=2, Person=3, PrepCase=Npr, PrepCase=Pre, PronType=Ind, PronType=Int,Rel, PronType=Neg, PronType=Prs, PronType=Rel, PronType=Tot, Reflex=Yes, Style=Arch, Style=Coll, Style=Vrnc, Variant=Short

PRON occurs with 212 feature combinations. The most frequent feature combination is Case=Acc|PronType=Prs|Reflex=Yes|Variant=Short (18931 tokens). Examples: se

Relations

PRON nodes are attached to their parents using 24 different relations: cs-dep/expl:pv (15234; 38% instances), cs-dep/obj (7377; 19% instances), cs-dep/obl (4900; 12% instances), cs-dep/expl:pass (4347; 11% instances), cs-dep/nsubj (3269; 8% instances), cs-dep/iobj (1994; 5% instances), cs-dep/nmod (1414; 4% instances), cs-dep/conj (245; 1% instances), cs-dep/root (178; 0% instances), cs-dep/nsubj:pass (167; 0% instances), cs-dep/discourse (141; 0% instances), cs-dep/dep (129; 0% instances), cs-dep/advcl (47; 0% instances), cs-dep/orphan (45; 0% instances), cs-dep/acl (39; 0% instances), cs-dep/xcomp (36; 0% instances), cs-dep/flat:foreign (34; 0% instances), cs-dep/appos (28; 0% instances), cs-dep/ccomp (28; 0% instances), cs-dep/parataxis (7; 0% instances), cs-dep/cc (2; 0% instances), cs-dep/csubj (1; 0% instances), cs-dep/csubj:pass (1; 0% instances), cs-dep/det (1; 0% instances)

Parents of PRON nodes belong to 14 different parts of speech: VERB (35203; 89% instances), ADJ (1902; 5% instances), NOUN (1594; 4% instances), ADV (265; 1% instances), ROOT (178; 0% instances), NUM (166; 0% instances), DET (158; 0% instances), PRON (131; 0% instances), PROPN (43; 0% instances), SYM (11; 0% instances), PART (7; 0% instances), CCONJ (3; 0% instances), INTJ (2; 0% instances), AUX (1; 0% instances)

32522 (82%) PRON nodes are leaves.

6185 (16%) PRON nodes have one child.

527 (1%) PRON nodes have two children.

430 (1%) PRON nodes have three or more children.

The highest child degree of a PRON node is 10.

Children of PRON nodes are attached using 29 different relations: cs-dep/case (5479; 61% instances), cs-dep/punct (535; 6% instances), cs-dep/amod (447; 5% instances), cs-dep/advmod:emph (340; 4% instances), cs-dep/conj (294; 3% instances), cs-dep/cc (238; 3% instances), cs-dep/xcomp (236; 3% instances), cs-dep/cop (183; 2% instances), cs-dep/nmod (171; 2% instances), cs-dep/nsubj (146; 2% instances), cs-dep/acl (120; 1% instances), cs-dep/orphan (112; 1% instances), cs-dep/appos (105; 1% instances), cs-dep/nummod:gov (90; 1% instances), cs-dep/mark (86; 1% instances), cs-dep/advmod (72; 1% instances), cs-dep/dep (66; 1% instances), cs-dep/det (55; 1% instances), cs-dep/flat:foreign (35; 0% instances), cs-dep/obl (32; 0% instances), cs-dep/advcl (26; 0% instances), cs-dep/det:numgov (17; 0% instances), cs-dep/discourse (14; 0% instances), cs-dep/csubj (12; 0% instances), cs-dep/nummod (11; 0% instances), cs-dep/aux (8; 0% instances), cs-dep/obj (2; 0% instances), cs-dep/parataxis (2; 0% instances), cs-dep/ccomp (1; 0% instances)

Children of PRON nodes belong to 15 different parts of speech: ADP (5466; 61% instances), NOUN (615; 7% instances), PUNCT (535; 6% instances), ADJ (500; 6% instances), CCONJ (370; 4% instances), ADV (309; 3% instances), DET (246; 3% instances), VERB (211; 2% instances), AUX (191; 2% instances), PRON (131; 1% instances), NUM (108; 1% instances), PROPN (87; 1% instances), SCONJ (85; 1% instances), PART (80; 1% instances), SYM (1; 0% instances)


Treebank Statistics (UD_Czech-CAC)

There are 35 PRON lemmas (0%), 165 PRON types (0%) and 15680 PRON tokens (3%). Out of 16 observed tags, the rank of PRON is: 11 in number of lemmas, 7 in number of types and 9 in number of tokens.

The 10 most frequent PRON lemmas: se, on, všechno, já, jenž, co, což, ty, nízko, něco

The 10 most frequent PRON types: se, si, co, všech, je, nás, všechny, nám, jež, nich

The 10 most frequent ambiguous lemmas: se (PRON 8851, ADP 1), (PRON 974, NOUN 1), jenž (PRON 819, DET 342), co (PRON 511, ADV 164, SCONJ 16, PART 2, ADJ 1), nízko (PRON 135, ADV 8), copak (PRON 7, PART 1)

The 10 most frequent ambiguous types: se (PRON 7555, ADP 589), si (PRON 975, AUX 1), co (PRON 372, ADV 158, SCONJ 15, PART 1, ADJ 1), všech (PRON 379, DET 1), je (AUX 4329, VERB 716, PRON 334), všechny (PRON 241, DET 69), mu (PRON 160, NOUN 1), níž (PRON 137, ADV 1), všechno (PRON 109, DET 5), všichni (PRON 68, DET 3)

Morphology

The form / lemma ratio of PRON is 4.714286 (the average of all parts of speech is 2.180683).

The 1st highest number of forms (28) was observed with the lemma “on”: ho, je, jeho, jej, jemu, ji, jich, jim, jimi, jí, jím, mu, ni, nich, nim, nimi, ní, ním, ně, něho, něj, něm, němu, on, ona, oni, ono, ony.

The 2nd highest number of forms (22) was observed with the lemma “jenž”: jehož, jejž, jemuž, jenž, jež, jichž, jimiž, jimž, již, jímž, jíž, nichž, nimiž, nimž, niž, nímž, níž, něhož, nějž, němuž, němž, něž.

The 3rd highest number of forms (14) was observed with the lemma “všechno”: vše, všech, všechen, všechna, všechno, všechnu, všechny, všeho, všem, všemi, všemu, všichni, vší, vším.

PRON occurs with 12 features: cs-feat/PronType (15680; 100% instances), cs-feat/Case (15557; 99% instances), cs-feat/Variant (9000; 57% instances), cs-feat/Reflex (8851; 56% instances), cs-feat/Number (5616; 36% instances), cs-feat/Person (3366; 21% instances), cs-feat/Gender (2793; 18% instances), cs-feat/PrepCase (1778; 11% instances), cs-feat/Animacy (841; 5% instances), cs-feat/Style (109; 1% instances), cs-feat/Foreign (7; 0% instances), cs-feat/NameType (1; 0% instances)

PRON occurs with 31 feature-value pairs: Animacy=Anim, Animacy=Inan, Case=Acc, Case=Dat, Case=Gen, Case=Ins, Case=Loc, Case=Nom, Case=Voc, Foreign=Yes, Gender=Fem, Gender=Masc, Gender=Masc,Neut, Gender=Neut, NameType=Com, Number=Plur, Number=Sing, Person=1, Person=2, Person=3, PrepCase=Npr, PrepCase=Pre, PronType=Ind, PronType=Int,Rel, PronType=Neg, PronType=Prs, PronType=Rel, PronType=Tot, Reflex=Yes, Style=Arch, Variant=Short

PRON occurs with 180 feature combinations. The most frequent feature combination is Case=Acc|PronType=Prs|Reflex=Yes|Variant=Short (7549 tokens). Examples: se

Relations

PRON nodes are attached to their parents using 24 different relations: cs-dep/expl:pv (5941; 38% instances), cs-dep/obj (2318; 15% instances), cs-dep/obl (2048; 13% instances), cs-dep/expl:pass (2042; 13% instances), cs-dep/nmod (1403; 9% instances), cs-dep/nsubj (1035; 7% instances), cs-dep/iobj (494; 3% instances), cs-dep/conj (85; 1% instances), cs-dep/nsubj:pass (71; 0% instances), cs-dep/xcomp (57; 0% instances), cs-dep/root (42; 0% instances), cs-dep/dep (37; 0% instances), cs-dep/discourse (28; 0% instances), cs-dep/advcl (17; 0% instances), cs-dep/acl (15; 0% instances), cs-dep/orphan (14; 0% instances), cs-dep/appos (10; 0% instances), cs-dep/advmod (7; 0% instances), cs-dep/ccomp (5; 0% instances), cs-dep/cc (4; 0% instances), cs-dep/flat:foreign (4; 0% instances), cs-dep/csubj (1; 0% instances), cs-dep/csubj:pass (1; 0% instances), cs-dep/vocative (1; 0% instances)

Parents of PRON nodes belong to 14 different parts of speech: VERB (12909; 82% instances), NOUN (1402; 9% instances), ADJ (977; 6% instances), DET (125; 1% instances), ADV (86; 1% instances), SYM (46; 0% instances), PRON (43; 0% instances), ROOT (42; 0% instances), NUM (38; 0% instances), PROPN (5; 0% instances), AUX (2; 0% instances), CCONJ (2; 0% instances), PART (2; 0% instances), ADP (1; 0% instances)

12974 (83%) PRON nodes are leaves.

2376 (15%) PRON nodes have one child.

193 (1%) PRON nodes have two children.

137 (1%) PRON nodes have three or more children.

The highest child degree of a PRON node is 7.

Children of PRON nodes are attached using 28 different relations: cs-dep/case (2176; 66% instances), cs-dep/punct (132; 4% instances), cs-dep/advmod:emph (119; 4% instances), cs-dep/amod (113; 3% instances), cs-dep/conj (103; 3% instances), cs-dep/cc (101; 3% instances), cs-dep/xcomp (91; 3% instances), cs-dep/acl (74; 2% instances), cs-dep/nmod (53; 2% instances), cs-dep/orphan (50; 2% instances), cs-dep/cop (43; 1% instances), cs-dep/nsubj (36; 1% instances), cs-dep/appos (33; 1% instances), cs-dep/mark (30; 1% instances), cs-dep/nummod:gov (30; 1% instances), cs-dep/advmod (26; 1% instances), cs-dep/det (18; 1% instances), cs-dep/obl (17; 1% instances), cs-dep/dep (12; 0% instances), cs-dep/det:numgov (9; 0% instances), cs-dep/advcl (6; 0% instances), cs-dep/discourse (5; 0% instances), cs-dep/csubj (2; 0% instances), cs-dep/flat:foreign (2; 0% instances), cs-dep/nummod (2; 0% instances), cs-dep/parataxis (2; 0% instances), cs-dep/obj (1; 0% instances), cs-dep/vocative (1; 0% instances)

Children of PRON nodes belong to 15 different parts of speech: ADP (2161; 66% instances), NOUN (226; 7% instances), ADJ (138; 4% instances), PUNCT (132; 4% instances), CCONJ (128; 4% instances), ADV (126; 4% instances), VERB (92; 3% instances), DET (79; 2% instances), AUX (43; 1% instances), PRON (43; 1% instances), NUM (33; 1% instances), PART (32; 1% instances), SCONJ (30; 1% instances), PROPN (19; 1% instances), SYM (5; 0% instances)


Treebank Statistics (UD_Czech-CLTT)

There are 12 PRON lemmas (1%), 57 PRON types (1%) and 812 PRON tokens (3%). Out of 15 observed tags, the rank of PRON is: 12 in number of lemmas, 7 in number of types and 8 in number of tokens.

The 10 most frequent PRON lemmas: který, se, jenž, ten, on, všechen, veškerý, sám, tento, žádný

The 10 most frequent PRON types: se, které, která, který, to, kterých, kterým, kterém, nichž, všech

The 10 most frequent ambiguous lemmas: který (PRON 321, DET 1), jenž (PRON 54, DET 15), tento (DET 213, PRON 2), žádný (PRON 2, DET 2), některý (DET 4, PRON 1)

The 10 most frequent ambiguous types: se (PRON 309, ADP 28), kterým (PRON 18, DET 1), je (AUX 133, PRON 8, VERB 7), jehož (DET 4, PRON 1), t (PRON 1, NOUN 1), tímto (DET 3, PRON 1)

Morphology

The form / lemma ratio of PRON is 4.750000 (the average of all parts of speech is 1.685169).

The 1st highest number of forms (13) was observed with the lemma “jenž”: jehož, jenž, jež, jimiž, jímž, nichž, nimž, niž, nímž, níž, nějž, němuž, němž.

The 2nd highest number of forms (11) was observed with the lemma “on”: je, jej, jemu, ji, jim, jimi, jí, nich, nimi, ní, ně.

The 3rd highest number of forms (10) was observed with the lemma “který”: kterou, která, které, kterého, kterém, kterému, který, kterých, kterým, kterými.

PRON occurs with 12 features: cs-feat/PronType (812; 100% instances), cs-feat/Case (811; 100% instances), cs-feat/Number (496; 61% instances), cs-feat/Gender (401; 49% instances), cs-feat/Reflex (314; 39% instances), cs-feat/Variant (309; 38% instances), cs-feat/PrepCase (67; 8% instances), cs-feat/Animacy (43; 5% instances), cs-feat/Person (39; 5% instances), cs-feat/Style (2; 0% instances), cs-feat/Abbr (1; 0% instances), cs-feat/NumType (1; 0% instances)

PRON occurs with 30 feature-value pairs: Abbr=Yes, Animacy=Anim, Animacy=Inan, Case=Acc, Case=Dat, Case=Gen, Case=Ins, Case=Loc, Case=Nom, Gender=Fem, Gender=Masc, Gender=Masc,Neut, Gender=Neut, NumType=Card, Number=Plur, Number=Sing, Person=3, PrepCase=Npr, PrepCase=Pre, PronType=Dem, PronType=Dem,Ind, PronType=Ind, PronType=Int,Rel, PronType=Neg, PronType=Prs, PronType=Rel, PronType=Tot, Reflex=Yes, Style=Arch, Variant=Short

PRON occurs with 76 feature combinations. The most frequent feature combination is Case=Acc|PronType=Prs|Reflex=Yes|Variant=Short (309 tokens). Examples: se

Relations

PRON nodes are attached to their parents using 15 different relations: cs-dep/expl:pass (241; 30% instances), cs-dep/nsubj (195; 24% instances), cs-dep/obj (80; 10% instances), cs-dep/advmod (75; 9% instances), cs-dep/expl:pv (68; 8% instances), cs-dep/obl (56; 7% instances), cs-dep/nmod (34; 4% instances), cs-dep/nsubj:pass (25; 3% instances), cs-dep/cc (23; 3% instances), cs-dep/acl (4; 0% instances), cs-dep/conj (4; 0% instances), cs-dep/iobj (3; 0% instances), cs-dep/xcomp (2; 0% instances), cs-dep/amod (1; 0% instances), cs-dep/dep (1; 0% instances)

Parents of PRON nodes belong to 7 different parts of speech: VERB (648; 80% instances), NOUN (115; 14% instances), ADJ (35; 4% instances), X (7; 1% instances), ADV (3; 0% instances), PRON (3; 0% instances), NUM (1; 0% instances)

639 (79%) PRON nodes are leaves.

156 (19%) PRON nodes have one child.

10 (1%) PRON nodes have two children.

7 (1%) PRON nodes have three or more children.

The highest child degree of a PRON node is 4.

Children of PRON nodes are attached using 11 different relations: cs-dep/case (132; 66% instances), cs-dep/cc (27; 14% instances), cs-dep/xcomp (8; 4% instances), cs-dep/acl (7; 4% instances), cs-dep/punct (6; 3% instances), cs-dep/nmod (5; 3% instances), cs-dep/cop (4; 2% instances), cs-dep/nsubj (4; 2% instances), cs-dep/conj (3; 2% instances), cs-dep/orphan (2; 1% instances), cs-dep/advmod:emph (1; 1% instances)

Children of PRON nodes belong to 9 different parts of speech: ADP (132; 66% instances), CCONJ (26; 13% instances), NOUN (19; 10% instances), VERB (7; 4% instances), PUNCT (6; 3% instances), AUX (4; 2% instances), PRON (3; 2% instances), ADJ (1; 1% instances), ADV (1; 1% instances)


PRON in other languages: [am] [ar] [bg] [bxr] [ca] [ckb] [cop] [cs] [cu] [da] [de] [el] [en] [es] [et] [eu] [fa] [fi] [fo] [fr] [ga] [gl] [got] [grc] [he] [hi] [hr] [hu] [id] [it] [ja] [kk] [kmr] [ko] [la] [lv] [mr] [nl] [no] [pl] [pt] [ro] [ru] [sa] [sk] [sla] [sl] [so] [sr] [sv] [swl] [ta] [tr] [ug] [uk] [u] [urj] [ur] [vi] [yue] [zh]