home ru/pos edit page issue tracker

This page still pertains to UD version 1.

PROPN: proper noun

Definition

A proper noun is a noun that is the name of a specific individual, place, or object. Russian proper nouns are always written starting with an uppercase letter.

Single-word named entities should be tagged PROPN though they originate from a common noun (Грязь) (village) or an adjective (Белая) (river). Even if they were originally adjectives and inflect according to adjectival paradigms, they behave syntactically as nouns. For instance, Белая  (a river in Bashkortostan) is originally feminine form of the adjective белый  “warm” but as a geographical name, it is a noun. It denotes a concrete location (rather than a property of somebody/something) and its feminine gender is fixed (while adjectives have forms in all three genders).

Note that adjectives derived from geographical names (русский, английский  “Russian, English”) are written in lowercase and are not tagged PROPN.

Personal names are typically treated as a sequence of proper nouns (one or more given names and one or more surnames). If the name contains prepositions, conjunctions or articles (foreign names), these are tagged as ADP, CCONJ and DET, respectively.

Russian (and other Slavic) multi-word named entities have internal syntactic structure, which is preserved in the annotation. The headword is always noun and there may be other nouns involved. They will be tagged either PROPN or NOUN and possible ambiguities must be resolved individually. Modifying adjectives are never tagged PROPN. Even if an adjective is the first word of a multi-word name, and thus it starts with an uppercase letter, it is still tagged ADJ. Similarly, function words in named entities retain their normal tags. These rules are less strict for foreign named entities where the original part of speech is hidden for a Russian speaker.

Examples


Treebank Statistics (UD_Russian)

There are 4402 PROPN lemmas (25%), 4881 PROPN types (17%) and 6294 PROPN tokens (7%). Out of 16 observed tags, the rank of PROPN is: 2 in number of lemmas, 3 in number of types and 6 in number of tokens.

The 10 most frequent PROPN lemmas: РОССИЯ, США, СССР, ФРАНЦИЯ, УКРАИНА, МОСКВА, ГЕРМАНИЯ, АЛЕКСАНДР, ВЛАДИМИР, ИСПАНИЯ

The 10 most frequent PROPN types: России, США, СССР, Украины, Франции, Германии, де, Европы, Испании, РФ

The 10 most frequent ambiguous lemmas: ДЕ (PROPN 19, ADV 9, PART 2), ПЕТЕРБУРГ (PROPN 11, ADV 1), ТОМ (PROPN 8, NOUN 7), ДОН (PROPN 7, NOUN 2), АВГУСТ (NOUN 37, PROPN 6), МАРИЯ (PROPN 6, ADV 1), СУ (PROPN 5, ADV 1), ЛИ (PART 6, PROPN 4), ЦК (NOUN 4, PROPN 4), XX (ADJ 8, PROPN 3)

The 10 most frequent ambiguous types: де (PROPN 17, ADV 8, PART 2), Су (PROPN 5, ADV 1), ЦК (PROPN 4, NOUN 4), АН (PROPN 3, NOUN 1), Али (PROPN 3, ADV 1), ВС (PROPN 3, NOUN 1), Жан (PROPN 3, ADV 2), Мария (PROPN 3, ADV 1), Сити (PROPN 3, ADV 3, NOUN 1), ди (PROPN 2, ADP 1)

Morphology

The form / lemma ratio of PROPN is 1.108814 (the average of all parts of speech is 1.576680).

The 1st highest number of forms (5) was observed with the lemma “ВЛАДИМИР”: Влади́мир, Владимир, Владимира, Владимиром, Владимиру.

The 2nd highest number of forms (5) was observed with the lemma “КОНСТАНТИН”: Константин, Константина, Константине, Константином, Константину.

The 3rd highest number of forms (5) was observed with the lemma “МОСКВА”: Москва, Москве, Москвой, Москву, Москвы.

PROPN occurs with 4 features: ru-feat/Animacy (6294; 100% instances), ru-feat/Case (6294; 100% instances), ru-feat/Gender (6294; 100% instances), ru-feat/Number (6294; 100% instances)

PROPN occurs with 13 feature-value pairs: Animacy=Anim, Animacy=Inan, Case=Acc, Case=Dat, Case=Gen, Case=Ins, Case=Loc, Case=Nom, Gender=Fem, Gender=Masc, Gender=Neut, Number=Plur, Number=Sing

PROPN occurs with 50 feature combinations. The most frequent feature combination is Animacy=Anim|Case=Nom|Gender=Masc|Number=Sing (1446 tokens). Examples: Владимир, де, Александр, Карл, Джон, Михаил, Иван, Сергей, Алексей, Андрей

Relations

PROPN nodes are attached to their parents using 26 different relations: ru-dep/nmod (1672; 27% instances), ru-dep/appos (1214; 19% instances), ru-dep/flat (869; 14% instances), ru-dep/nsubj (828; 13% instances), ru-dep/conj (582; 9% instances), ru-dep/obl (481; 8% instances), ru-dep/goeswith (145; 2% instances), ru-dep/list (121; 2% instances), ru-dep/obj (105; 2% instances), ru-dep/iobj (98; 2% instances), ru-dep/nsubj:pass (58; 1% instances), ru-dep/root (47; 1% instances), ru-dep/parataxis (17; 0% instances), ru-dep/advmod (14; 0% instances), ru-dep/amod (13; 0% instances), ru-dep/fixed (8; 0% instances), ru-dep/orphan (5; 0% instances), ru-dep/case (3; 0% instances), ru-dep/dep (3; 0% instances), ru-dep/acl:relcl (2; 0% instances), ru-dep/ccomp (2; 0% instances), ru-dep/obl:agent (2; 0% instances), ru-dep/vocative (2; 0% instances), ru-dep/advcl (1; 0% instances), ru-dep/nummod (1; 0% instances), ru-dep/punct (1; 0% instances)

Parents of PROPN nodes belong to 14 different parts of speech: NOUN (2850; 45% instances), PROPN (1713; 27% instances), VERB (1344; 21% instances), ADV (207; 3% instances), ADJ (56; 1% instances), ROOT (47; 1% instances), ADP (28; 0% instances), DET (27; 0% instances), NUM (12; 0% instances), PUNCT (4; 0% instances), PRON (2; 0% instances), SYM (2; 0% instances), CCONJ (1; 0% instances), PART (1; 0% instances)

2954 (47%) PROPN nodes are leaves.

1940 (31%) PROPN nodes have one child.

804 (13%) PROPN nodes have two children.

596 (9%) PROPN nodes have three or more children.

The highest child degree of a PROPN node is 19.

Children of PROPN nodes are attached using 27 different relations: ru-dep/punct (1711; 29% instances), ru-dep/case (891; 15% instances), ru-dep/flat (864; 15% instances), ru-dep/conj (626; 11% instances), ru-dep/amod (322; 5% instances), ru-dep/goeswith (318; 5% instances), ru-dep/nmod (289; 5% instances), ru-dep/cc (247; 4% instances), ru-dep/appos (236; 4% instances), ru-dep/list (162; 3% instances), ru-dep/acl:relcl (56; 1% instances), ru-dep/acl (45; 1% instances), ru-dep/nsubj (38; 1% instances), ru-dep/nummod (33; 1% instances), ru-dep/det (30; 1% instances), ru-dep/parataxis (27; 0% instances), ru-dep/advmod (17; 0% instances), ru-dep/discourse (13; 0% instances), ru-dep/orphan (8; 0% instances), ru-dep/advcl (4; 0% instances), ru-dep/cc:preconj (4; 0% instances), ru-dep/cop (4; 0% instances), ru-dep/iobj (2; 0% instances), ru-dep/nummod:gov (2; 0% instances), ru-dep/ccomp (1; 0% instances), ru-dep/fixed (1; 0% instances), ru-dep/obl (1; 0% instances)

Children of PROPN nodes belong to 15 different parts of speech: PUNCT (1860; 31% instances), PROPN (1713; 29% instances), ADP (915; 15% instances), NOUN (400; 7% instances), ADJ (377; 6% instances), CCONJ (252; 4% instances), ADV (187; 3% instances), VERB (118; 2% instances), NUM (57; 1% instances), DET (36; 1% instances), PART (13; 0% instances), SYM (13; 0% instances), PRON (6; 0% instances), AUX (4; 0% instances), X (1; 0% instances)


Treebank Statistics (UD_Russian-SynTagRus)

There are 8178 PROPN lemmas (19%), 11321 PROPN types (10%) and 36250 PROPN tokens (4%). Out of 18 observed tags, the rank of PROPN is: 3 in number of lemmas, 4 in number of types and 8 in number of tokens.

The 10 most frequent PROPN lemmas: россия, москва, сша, путин, ссср, европа, владимир, земля, сергей, медведев

The 10 most frequent PROPN types: России, США, СССР, Россия, Путин, Москве, В., Владимир, РФ, Путина

The 10 most frequent ambiguous lemmas: россия (PROPN 1630, NOUN 52), москва (PROPN 477, NOUN 15), сша (PROPN 440, NOUN 6), путин (PROPN 425, NOUN 26), ссср (PROPN 272, NOUN 3), европа (PROPN 253, NOUN 6), владимир (PROPN 245, NOUN 34), земля (NOUN 294, PROPN 219), сергей (PROPN 211, NOUN 37), медведев (PROPN 204, NOUN 8)

The 10 most frequent ambiguous types: России (PROPN 1232, NOUN 4), США (PROPN 440, NOUN 6), СССР (PROPN 272, NOUN 3), Россия (PROPN 212, NOUN 44), Путин (PROPN 197, NOUN 22), Москве (PROPN 172, NOUN 1), В. (PROPN 165, NOUN 13), Владимир (PROPN 154, NOUN 32), Путина (PROPN 153, NOUN 1), Сергей (PROPN 141, NOUN 36)

Morphology

The form / lemma ratio of PROPN is 1.384324 (the average of all parts of speech is 2.644632).

The 1st highest number of forms (10) was observed with the lemma “совет”: СОВЕТ, СОВЕТА, Совет, Совета, Советам, Совете, Советов, Советом, Совету, Советы.

The 2nd highest number of forms (9) was observed with the lemma “путин”: ПУТИН, ПУТИНА, ПУТИНЫМ, Путин, Путина, Путине, Путину, Путины, Путиным.

The 3rd highest number of forms (8) was observed with the lemma “аль-каида”: Аль-Каеда, Аль-Каеды, Аль-Каида, Аль-Каиды, Аль-Кайеда, Аль-Кайеде, Аль-Кайеду, Аль-Кайеды.

PROPN occurs with 5 features: ru-feat/Animacy (34148; 94% instances), ru-feat/Number (34144; 94% instances), ru-feat/Case (34136; 94% instances), ru-feat/Gender (33696; 93% instances), ru-feat/Foreign (2024; 6% instances)

PROPN occurs with 16 feature-value pairs: Animacy=Anim, Animacy=Inan, Case=Acc, Case=Dat, Case=Gen, Case=Ins, Case=Loc, Case=Nom, Case=Par, Case=Voc, Foreign=Yes, Gender=Fem, Gender=Masc, Gender=Neut, Number=Plur, Number=Sing

PROPN occurs with 82 feature combinations. The most frequent feature combination is Animacy=Anim|Case=Nom|Gender=Masc|Number=Sing (7811 tokens). Examples: Путин, Владимир, Сергей, Александр, Медведев, Галилей, Алексей, В., Дмитрий, Монахов

Relations

PROPN nodes are attached to their parents using 27 different relations: ru-dep/nmod (11103; 31% instances), ru-dep/appos (5117; 14% instances), ru-dep/flat:name (4750; 13% instances), ru-dep/nsubj (4404; 12% instances), ru-dep/obl (3841; 11% instances), ru-dep/conj (2481; 7% instances), ru-dep/flat:foreign (1636; 5% instances), ru-dep/obj (836; 2% instances), ru-dep/parataxis (713; 2% instances), ru-dep/root (369; 1% instances), ru-dep/obl:agent (243; 1% instances), ru-dep/orphan (163; 0% instances), ru-dep/advmod (158; 0% instances), ru-dep/nsubj:pass (93; 0% instances), ru-dep/flat (83; 0% instances), ru-dep/compound (71; 0% instances), ru-dep/iobj (68; 0% instances), ru-dep/advcl (61; 0% instances), ru-dep/dep (24; 0% instances), ru-dep/acl:relcl (10; 0% instances), ru-dep/fixed (7; 0% instances), ru-dep/acl (6; 0% instances), ru-dep/nummod:entity (4; 0% instances), ru-dep/vocative (3; 0% instances), ru-dep/amod (2; 0% instances), ru-dep/ccomp (2; 0% instances), ru-dep/xcomp (2; 0% instances)

Parents of PROPN nodes belong to 18 different parts of speech: NOUN (16140; 45% instances), VERB (9244; 26% instances), PROPN (9008; 25% instances), ADJ (627; 2% instances), ROOT (369; 1% instances), ADV (261; 1% instances), PRON (137; 0% instances), PART (93; 0% instances), X (87; 0% instances), NUM (82; 0% instances), DET (70; 0% instances), ADP (42; 0% instances), PUNCT (39; 0% instances), _ (31; 0% instances), CCONJ (12; 0% instances), SCONJ (5; 0% instances), AUX (2; 0% instances), INTJ (1; 0% instances)

8854 (24%) PROPN nodes are leaves.

16032 (44%) PROPN nodes have one child.

7477 (21%) PROPN nodes have two children.

3887 (11%) PROPN nodes have three or more children.

The highest child degree of a PROPN node is 19.

Children of PROPN nodes are attached using 34 different relations: ru-dep/punct (19115; 43% instances), ru-dep/case (6903; 15% instances), ru-dep/flat:name (4589; 10% instances), ru-dep/conj (2885; 6% instances), ru-dep/amod (2471; 6% instances), ru-dep/nmod (2447; 5% instances), ru-dep/cc (1669; 4% instances), ru-dep/appos (851; 2% instances), ru-dep/flat:foreign (822; 2% instances), ru-dep/advmod (787; 2% instances), ru-dep/parataxis (559; 1% instances), ru-dep/acl:relcl (374; 1% instances), ru-dep/nsubj (364; 1% instances), ru-dep/nummod (359; 1% instances), ru-dep/cop (194; 0% instances), ru-dep/flat (82; 0% instances), ru-dep/acl (28; 0% instances), ru-dep/advcl (28; 0% instances), ru-dep/mark (26; 0% instances), ru-dep/discourse (21; 0% instances), ru-dep/nummod:gov (21; 0% instances), ru-dep/obl (21; 0% instances), ru-dep/_ (16; 0% instances), ru-dep/fixed (14; 0% instances), ru-dep/orphan (13; 0% instances), ru-dep/nummod:entity (11; 0% instances), ru-dep/compound (10; 0% instances), ru-dep/iobj (5; 0% instances), ru-dep/obl:agent (3; 0% instances), ru-dep/aux (2; 0% instances), ru-dep/dep (1; 0% instances), ru-dep/obj (1; 0% instances), ru-dep/root (1; 0% instances), ru-dep/xcomp (1; 0% instances)

Children of PROPN nodes belong to 18 different parts of speech: PUNCT (19115; 43% instances), PROPN (9021; 20% instances), ADP (6937; 16% instances), NOUN (2935; 7% instances), ADJ (2027; 5% instances), CCONJ (1467; 3% instances), VERB (1002; 2% instances), PART (543; 1% instances), NUM (411; 1% instances), ADV (344; 1% instances), DET (261; 1% instances), SCONJ (233; 1% instances), PRON (168; 0% instances), X (133; 0% instances), AUX (73; 0% instances), _ (16; 0% instances), SYM (6; 0% instances), INTJ (2; 0% instances)


PROPN in other languages: [am] [ar] [bg] [bxr] [ca] [ckb] [cop] [cs] [cu] [da] [de] [el] [en] [es] [et] [eu] [fa] [fi] [fo] [fr] [ga] [gl] [got] [grc] [he] [hi] [hr] [hu] [id] [it] [ja] [kk] [kmr] [ko] [la] [lv] [mr] [nl] [no] [pl] [pt] [ro] [ru] [sa] [sk] [sla] [sl] [so] [sr] [sv] [swl] [ta] [tr] [ug] [uk] [u] [urj] [ur] [vi] [yue] [zh]