PUNCT
: punctuation
Definition
Punctuation marks are non-alphabetical characters and character groups used to delimit linguistic units in printed text.
Punctuation is not taken to include logograms such as $, %, and §, which are instead tagged as SYM.
Examples
- Period: .
- Comma: ,
- Parentheses: ()
Treebank Statistics (UD_Russian)
There are 16 PUNCT
lemmas (0%), 15 PUNCT
types (0%) and 16634 PUNCT
tokens (19%).
Out of 16 observed tags, the rank of PUNCT
is: 15 in number of lemmas, 15 in number of types and 2 in number of tokens.
The 10 most frequent PUNCT
lemmas: ,, ., –, ), (, ``, '', -, :, ;
The 10 most frequent PUNCT
types: ,, ., –, ), (, ``, '', -, :, ;
The 10 most frequent ambiguous lemmas: – (PUNCT 1057, AUX 1), − (PUNCT 4, SYM 2)
The 10 most frequent ambiguous types: – (PUNCT 1057, AUX 1), − (PUNCT 4, SYM 2)
- –
- −
Morphology
The form / lemma ratio of PUNCT
is 0.937500 (the average of all parts of speech is 1.576680).
The 1st highest number of forms (1) was observed with the lemma “!”: !.
The 2nd highest number of forms (1) was observed with the lemma “''”: ''.
The 3rd highest number of forms (1) was observed with the lemma “’”: APOSTROPHE.
PUNCT
does not occur with any features.
Relations
PUNCT
nodes are attached to their parents using 10 different relations: ru-dep/punct (15644; 94% instances), ru-dep/goeswith (943; 6% instances), ru-dep/obl (26; 0% instances), ru-dep/conj (7; 0% instances), ru-dep/nmod (5; 0% instances), ru-dep/parataxis (3; 0% instances), ru-dep/cc (2; 0% instances), ru-dep/root (2; 0% instances), ru-dep/advmod (1; 0% instances), ru-dep/nummod (1; 0% instances)
Parents of PUNCT
nodes belong to 17 different parts of speech: VERB (6568; 39% instances), NOUN (5308; 32% instances), PROPN (1860; 11% instances), ADV (1078; 6% instances), ADJ (1031; 6% instances), ADP (321; 2% instances), NUM (295; 2% instances), PRON (51; 0% instances), SYM (40; 0% instances), PUNCT (25; 0% instances), DET (23; 0% instances), CCONJ (19; 0% instances), PART (7; 0% instances), AUX (3; 0% instances), ROOT (2; 0% instances), X (2; 0% instances), SCONJ (1; 0% instances)
16577 (100%) PUNCT
nodes are leaves.
10 (0%) PUNCT
nodes have one child.
9 (0%) PUNCT
nodes have two children.
38 (0%) PUNCT
nodes have three or more children.
The highest child degree of a PUNCT
node is 8.
Children of PUNCT
nodes are attached using 17 different relations: ru-dep/goeswith (67; 36% instances), ru-dep/case (42; 23% instances), ru-dep/nmod (25; 13% instances), ru-dep/punct (21; 11% instances), ru-dep/amod (8; 4% instances), ru-dep/conj (5; 3% instances), ru-dep/acl (2; 1% instances), ru-dep/acl:relcl (2; 1% instances), ru-dep/advmod (2; 1% instances), ru-dep/ccomp (2; 1% instances), ru-dep/det (2; 1% instances), ru-dep/list (2; 1% instances), ru-dep/nsubj (2; 1% instances), ru-dep/appos (1; 1% instances), ru-dep/discourse (1; 1% instances), ru-dep/obj (1; 1% instances), ru-dep/obl (1; 1% instances)
Children of PUNCT
nodes belong to 11 different parts of speech: ADP (75; 40% instances), NOUN (48; 26% instances), PUNCT (25; 13% instances), ADJ (10; 5% instances), VERB (7; 4% instances), PRON (5; 3% instances), ADV (4; 2% instances), DET (4; 2% instances), PROPN (4; 2% instances), NUM (3; 2% instances), PART (1; 1% instances)
Treebank Statistics (UD_Russian-SynTagRus)
There are 19 PUNCT
lemmas (0%), 19 PUNCT
types (0%) and 180702 PUNCT
tokens (18%).
Out of 18 observed tags, the rank of PUNCT
is: 15 in number of lemmas, 16 in number of types and 2 in number of tokens.
The 10 most frequent PUNCT
lemmas: ,, ., “, -, :, ), (, ?, !, …
The 10 most frequent PUNCT
types: ,, ., “, -, :, ), (, ?, !, …
The 10 most frequent ambiguous lemmas: x (NUM 6, PUNCT 2, PROPN 1), * (PUNCT 1, X 1), + (PUNCT 1, SYM 1)
The 10 most frequent ambiguous types: ? (PUNCT 2723, NOUN 1), * (PUNCT 1, X 1), + (SYM 6, PUNCT 1)
- ?
- *
- PUNCT 1: Ее разрешение ( 2000 * 2000 точек ) полностью “ покрывает “ любые современные мониторы и проекторы .
- X 1: Эти правила можно задавать как для всех поисковых роботов скопом ( User - agent : * ) , так_и выбирая конкретных пауков ( Yandex для “ Яндекса “ , Googlebot для Google , bingbot для Bing и так далее ) .
- +
Morphology
The form / lemma ratio of PUNCT
is 1.000000 (the average of all parts of speech is 2.644632).
The 1st highest number of forms (1) was observed with the lemma “!”: !.
The 2nd highest number of forms (1) was observed with the lemma “””: ”.
The 3rd highest number of forms (1) was observed with the lemma “’”: ’.
PUNCT
does not occur with any features.
Relations
PUNCT
nodes are attached to their parents using 1 different relations: ru-dep/punct (180702; 100% instances)
Parents of PUNCT
nodes belong to 18 different parts of speech: NOUN (93733; 52% instances), VERB (23335; 13% instances), PROPN (19145; 11% instances), ADJ (15426; 9% instances), ADV (11021; 6% instances), PRON (7269; 4% instances), NUM (2565; 1% instances), PART (1948; 1% instances), ADP (1570; 1% instances), CCONJ (1390; 1% instances), DET (1216; 1% instances), SCONJ (601; 0% instances), PUNCT (407; 0% instances), X (395; 0% instances), _ (306; 0% instances), AUX (266; 0% instances), INTJ (107; 0% instances), SYM (2; 0% instances)
180169 (100%) PUNCT
nodes are leaves.
301 (0%) PUNCT
nodes have one child.
134 (0%) PUNCT
nodes have two children.
98 (0%) PUNCT
nodes have three or more children.
The highest child degree of a PUNCT
node is 13.
Children of PUNCT
nodes are attached using 30 different relations: ru-dep/punct (156; 17% instances), ru-dep/orphan (112; 12% instances), ru-dep/conj (103; 11% instances), ru-dep/_ (79; 8% instances), ru-dep/amod (77; 8% instances), ru-dep/parataxis (55; 6% instances), ru-dep/case (49; 5% instances), ru-dep/nmod (44; 5% instances), ru-dep/nsubj (42; 5% instances), ru-dep/cc (35; 4% instances), ru-dep/advmod (33; 4% instances), ru-dep/obl (19; 2% instances), ru-dep/appos (18; 2% instances), ru-dep/nummod (18; 2% instances), ru-dep/mark (17; 2% instances), ru-dep/root (15; 2% instances), ru-dep/fixed (11; 1% instances), ru-dep/obj (9; 1% instances), ru-dep/advcl (7; 1% instances), ru-dep/flat:foreign (7; 1% instances), ru-dep/nsubj:pass (6; 1% instances), ru-dep/acl:relcl (3; 0% instances), ru-dep/cop (3; 0% instances), ru-dep/flat:name (3; 0% instances), ru-dep/acl (2; 0% instances), ru-dep/aux (2; 0% instances), ru-dep/nummod:gov (2; 0% instances), ru-dep/xcomp (2; 0% instances), ru-dep/nummod:entity (1; 0% instances), ru-dep/obl:agent (1; 0% instances)
Children of PUNCT
nodes belong to 17 different parts of speech: NOUN (198; 21% instances), PUNCT (156; 17% instances), ADJ (98; 11% instances), VERB (86; 9% instances), _ (79; 8% instances), PROPN (54; 6% instances), ADV (53; 6% instances), ADP (52; 6% instances), CCONJ (37; 4% instances), PRON (33; 4% instances), NUM (25; 3% instances), PART (23; 2% instances), SCONJ (18; 2% instances), DET (12; 1% instances), X (4; 0% instances), AUX (2; 0% instances), INTJ (1; 0% instances)
PUNCT in other languages: [am] [ar] [bg] [bxr] [ca] [ckb] [cop] [cs] [cu] [da] [de] [el] [en] [es] [et] [eu] [fa] [fi] [fo] [fr] [ga] [gl] [got] [grc] [he] [hi] [hr] [hu] [id] [it] [ja] [kk] [kmr] [ko] [la] [lv] [mr] [nl] [no] [pl] [pt] [ro] [ru] [sa] [sk] [sla] [sl] [so] [sr] [sv] [swl] [ta] [tr] [ug] [uk] [u] [urj] [ur] [vi] [yue] [zh]