PART: particle
Definition
Particles are function words that must be associated with another word or phrase to impart meaning and that do not satisfy definitions of other universal parts of speech (e.g. adpositions, coordinating conjunctions, subordinating conjunctions or auxiliary verbs). Czech particles are not inflected.
Note that response words such as ano, jo “yes”, ne “no”, etc. are considered particles in the PDT tagset but they should be retagged as interjections under the UD standard. Also note that ne can be used in two ways, one would be translated as English “no” and the other as “not”. Only the former should become interjection, while the latter will stay a particle.
Examples
- Sentence modality: ať, kéž, nechť (“Let’s do it!” “If only I could do it over.” “May you have an enjoyable stay!”)
- jen “just, only”
- až “only, as late as, even, up to” Use case: až po stovky tisíc let “up to hundreds of thousands of years”
- asi “about, roughly, maybe”
Diffs
Prague Dependency Treebank
-
li “if”: This is an encliticized morpheme that functions as subordinating conjunction but it always immediately follows the predicate of the subordinate clause. For example: Nebude-li pršet, nezmoknem. lit. Will-not-if rain, we-will-not-get-wet. “We will not get wet if it does not rain.” PDT tags the li morpheme as particle and it is currently kept so in the UD conversion but it might be changed to
SCONJin the future releases. -
At present the UD-conversion of PDT keeps the PDT convention on tagging the response words (“yes, no”) as particles. Automatic conversion would not be straightforward because the negative particle ne is sometimes used as the response particle/interjection (English “no”) and sometimes as a free negative morpheme (English “not”). These two usages would have to be distinguished and only the first one converted to interjection.
References
Treebank Statistics (UD_Czech)
There are 74 PART lemmas (0%), 75 PART types (0%) and 7177 PART tokens (1%).
Out of 17 observed tags, the rank of PART is: 8 in number of lemmas, 10 in number of types and 14 in number of tokens.
The 10 most frequent PART lemmas: jen, až, asi, li, ne, nejen, prý, jenom, ano, bohužel
The 10 most frequent PART types: jen, až, asi, li, ne, nejen, prý, jenom, ano, bohužel
The 10 most frequent ambiguous lemmas: jen (PART 2125, NOUN 22), až (PART 1232, CCONJ 570, SCONJ 116), li (PART 648, PROPN 5), nejen (PART 440, ADV 1), jenom (PART 190, ADV 1), ať (SCONJ 104, PART 64), pozor (PART 43, NOUN 23), ovšem (CCONJ 543, PART 36), to (PART 25, ADP 8), co (PRON 1652, ADV 201, SCONJ 184, PART 17)
The 10 most frequent ambiguous types: jen (PART 1998, NOUN 1), až (PART 1156, CCONJ 570, SCONJ 96), li (PART 648, PROPN 5), nejen (PART 417, ADV 1), jenom (PART 178, ADV 1), ať (SCONJ 87, PART 53), pozor (NOUN 16, PART 2), ovšem (CCONJ 483, PART 36), to (DET 5335, PART 23, ADP 3), co (PRON 1058, ADV 197, SCONJ 181, PART 7)
- jen
- až
- li
- nejen
- jenom
- ať
- pozor
- ovšem
- to
- co
Morphology
The form / lemma ratio of PART is 1.013514 (the average of all parts of speech is 2.162583).
The 1st highest number of forms (2) was observed with the lemma “not”: not, t.
The 2nd highest number of forms (1) was observed with the lemma “Achtung”: Achtung.
The 3rd highest number of forms (1) was observed with the lemma “L”: L.
PART occurs with 3 features: cs-feat/Foreign (100; 1% instances), cs-feat/Style (11; 0% instances), cs-feat/NameType (7; 0% instances)
PART occurs with 5 feature-value pairs: Foreign=Yes, NameType=Com, NameType=Oth, NameType=Sur, Style=Coll
PART occurs with 7 feature combinations.
The most frequent feature combination is _ (7064 tokens).
Examples: jen, až, asi, li, ne, nejen, prý, jenom, ano, bohužel
Relations
PART nodes are attached to their parents using 22 different relations: cs-dep/advmod:emph (4753; 66% instances), cs-dep/advmod (825; 11% instances), cs-dep/mark (700; 10% instances), cs-dep/cc (276; 4% instances), cs-dep/dep (159; 2% instances), cs-dep/root (117; 2% instances), cs-dep/conj (100; 1% instances), cs-dep/nmod (83; 1% instances), cs-dep/flat:foreign (66; 1% instances), cs-dep/orphan (39; 1% instances), cs-dep/obj (23; 0% instances), cs-dep/nsubj (8; 0% instances), cs-dep/appos (7; 0% instances), cs-dep/case (5; 0% instances), cs-dep/acl (4; 0% instances), cs-dep/discourse (3; 0% instances), cs-dep/fixed (3; 0% instances), cs-dep/ccomp (2; 0% instances), cs-dep/advcl (1; 0% instances), cs-dep/flat (1; 0% instances), cs-dep/iobj (1; 0% instances), cs-dep/xcomp (1; 0% instances)
Parents of PART nodes belong to 14 different parts of speech: NOUN (2739; 38% instances), VERB (1746; 24% instances), NUM (828; 12% instances), ADV (710; 10% instances), ADJ (514; 7% instances), DET (199; 3% instances), PROPN (160; 2% instances), ROOT (117; 2% instances), PRON (80; 1% instances), PART (50; 1% instances), CCONJ (19; 0% instances), SYM (10; 0% instances), INTJ (3; 0% instances), SCONJ (2; 0% instances)
6717 (94%) PART nodes are leaves.
163 (2%) PART nodes have one child.
190 (3%) PART nodes have two children.
107 (1%) PART nodes have three or more children.
The highest child degree of a PART node is 8.
Children of PART nodes are attached using 25 different relations: cs-dep/punct (391; 42% instances), cs-dep/conj (135; 14% instances), cs-dep/cc (92; 10% instances), cs-dep/orphan (55; 6% instances), cs-dep/mark (39; 4% instances), cs-dep/dep (38; 4% instances), cs-dep/advmod:emph (30; 3% instances), cs-dep/fixed (25; 3% instances), cs-dep/xcomp (19; 2% instances), cs-dep/amod (18; 2% instances), cs-dep/flat:foreign (18; 2% instances), cs-dep/case (16; 2% instances), cs-dep/cop (10; 1% instances), cs-dep/nmod (10; 1% instances), cs-dep/advmod (9; 1% instances), cs-dep/det (9; 1% instances), cs-dep/nsubj (8; 1% instances), cs-dep/aux (6; 1% instances), cs-dep/advcl (4; 0% instances), cs-dep/obl (3; 0% instances), cs-dep/appos (2; 0% instances), cs-dep/ccomp (2; 0% instances), cs-dep/acl (1; 0% instances), cs-dep/obj (1; 0% instances), cs-dep/vocative (1; 0% instances)
Children of PART nodes belong to 14 different parts of speech: PUNCT (391; 42% instances), NOUN (114; 12% instances), CCONJ (80; 8% instances), ADV (78; 8% instances), VERB (68; 7% instances), PART (50; 5% instances), SCONJ (44; 5% instances), ADJ (39; 4% instances), ADP (19; 2% instances), DET (18; 2% instances), PROPN (17; 2% instances), AUX (16; 2% instances), PRON (7; 1% instances), NUM (1; 0% instances)
Treebank Statistics (UD_Czech-CAC)
There are 40 PART lemmas (0%), 41 PART types (0%) and 3074 PART tokens (1%).
Out of 16 observed tags, the rank of PART is: 10 in number of lemmas, 12 in number of types and 15 in number of tokens.
The 10 most frequent PART lemmas: jen, li, až, nejen, asi, ovšem, ne, jenom, ať, prý
The 10 most frequent PART types: jen, li, až, nejen, asi, ovšem, ne, jenom, ať, prý
The 10 most frequent ambiguous lemmas: jen (PART 902, NOUN 1), li (PART 554, ADJ 1), až (PART 511, SCONJ 33, CCONJ 6), ovšem (PART 210, ADV 14, CCONJ 5), ať (SCONJ 43, PART 32), s (ADP 3748, PART 13), la (PART 2, ADJ 1), co (PRON 511, ADV 164, SCONJ 16, PART 2, ADJ 1), Le (ADJ 1, PART 1), copak (PRON 7, PART 1)
The 10 most frequent ambiguous types: jen (PART 848, NOUN 1), až (PART 496, SCONJ 29, CCONJ 6), ovšem (PART 189, ADV 12, CCONJ 5), ať (SCONJ 42, PART 27), s (ADP 3046, PART 13), to (DET 1862, PART 11), La (PART 3, ADJ 1), co (PRON 372, ADV 158, SCONJ 15, PART 1, ADJ 1), Copak (PRON 5, PART 1), fakt (NOUN 18, PART 1)
- jen
- až
- ovšem
- ať
- s
- to
- La
- PART 3: Proto se čtveřice Vláďa , Jiří , Věra a Dana mohla vydat přes kanál La * .
- ADJ 1: President Československé socialistické republiky propůjčil mistru sportu , nadpraporčíku Františku Venclovskému vyznamenání Za statečnost , za prokázanou osobní odvahu a příkladnou bojovnost při plavbě kanálem La Manche .
- co
- PRON 372: A že co by na to řekly , když by šli společně darovat krev .
- ADV 158: V naší době se musíme udržet co nejdéle mladé .
- SCONJ 15: Bude tomu měsíc , co narukovali .
- PART 1: Ale co , zatím jsem fejeton ještě stačil díky psacímu stroji napsat .
- ADJ 1: Usilují přitom o vypracování jakési ontologie společenskosti , která současně z druhé strany má reflektovat a zpřítomňovat společenský charakter ontologie tkvící již v samotné podstatě člověka definovaného jakožto zoon politikon , jemuž odpovídá , že lidská existence je současně a vždy také co - existence .
- Copak
- fakt
Morphology
The form / lemma ratio of PART is 1.025000 (the average of all parts of speech is 2.180683).
The 1st highest number of forms (2) was observed with the lemma “das”: das, des.
The 2nd highest number of forms (1) was observed with the lemma “Al”: Al.
The 3rd highest number of forms (1) was observed with the lemma “La”: La.
PART occurs with 3 features: cs-feat/Foreign (13; 0% instances), cs-feat/NameType (4; 0% instances), cs-feat/Style (3; 0% instances)
PART occurs with 4 feature-value pairs: Foreign=Yes, NameType=Geo, NameType=Oth, Style=Coll
PART occurs with 5 feature combinations.
The most frequent feature combination is _ (3058 tokens).
Examples: jen, li, až, nejen, asi, ovšem, ne, jenom, ať, prý
Relations
PART nodes are attached to their parents using 16 different relations: cs-dep/advmod:emph (1516; 49% instances), cs-dep/mark (582; 19% instances), cs-dep/cc (448; 15% instances), cs-dep/advmod (293; 10% instances), cs-dep/case (129; 4% instances), cs-dep/root (24; 1% instances), cs-dep/conj (23; 1% instances), cs-dep/dep (23; 1% instances), cs-dep/flat:foreign (7; 0% instances), cs-dep/orphan (7; 0% instances), cs-dep/discourse (6; 0% instances), cs-dep/nmod (6; 0% instances), cs-dep/acl (5; 0% instances), cs-dep/fixed (3; 0% instances), cs-dep/advcl (1; 0% instances), cs-dep/obj (1; 0% instances)
Parents of PART nodes belong to 13 different parts of speech: NOUN (1086; 35% instances), VERB (820; 27% instances), NUM (346; 11% instances), ADJ (305; 10% instances), ADV (237; 8% instances), DET (100; 3% instances), PROPN (40; 1% instances), SYM (39; 1% instances), PRON (32; 1% instances), ROOT (24; 1% instances), SCONJ (21; 1% instances), PART (18; 1% instances), CCONJ (6; 0% instances)
2856 (93%) PART nodes are leaves.
167 (5%) PART nodes have one child.
26 (1%) PART nodes have two children.
25 (1%) PART nodes have three or more children.
The highest child degree of a PART node is 11.
Children of PART nodes are attached using 21 different relations: cs-dep/fixed (136; 40% instances), cs-dep/punct (62; 18% instances), cs-dep/cc (31; 9% instances), cs-dep/conj (17; 5% instances), cs-dep/advmod:emph (16; 5% instances), cs-dep/dep (12; 4% instances), cs-dep/cop (11; 3% instances), cs-dep/xcomp (11; 3% instances), cs-dep/nsubj (10; 3% instances), cs-dep/orphan (10; 3% instances), cs-dep/mark (5; 1% instances), cs-dep/advcl (2; 1% instances), cs-dep/advmod (2; 1% instances), cs-dep/aux (2; 1% instances), cs-dep/nummod (2; 1% instances), cs-dep/obj (2; 1% instances), cs-dep/obl (2; 1% instances), cs-dep/amod (1; 0% instances), cs-dep/case (1; 0% instances), cs-dep/nmod (1; 0% instances), cs-dep/parataxis (1; 0% instances)
Children of PART nodes belong to 13 different parts of speech: ADP (129; 38% instances), PUNCT (62; 18% instances), NOUN (27; 8% instances), ADV (25; 7% instances), VERB (21; 6% instances), PART (18; 5% instances), CCONJ (17; 5% instances), AUX (13; 4% instances), SCONJ (8; 2% instances), ADJ (6; 2% instances), DET (5; 1% instances), NUM (4; 1% instances), PRON (2; 1% instances)
Treebank Statistics (UD_Czech-CLTT)
There are 3 PART lemmas (0%), 3 PART types (0%) and 49 PART tokens (0%).
Out of 15 observed tags, the rank of PART is: 14 in number of lemmas, 14 in number of types and 14 in number of tokens.
The 10 most frequent PART lemmas: až, jen, nikoli
The 10 most frequent PART types: až, jen, nikoliv
The 10 most frequent ambiguous lemmas: až (PART 24, X 23, SCONJ 6, CCONJ 1)
The 10 most frequent ambiguous types: až (PART 24, X 23, SCONJ 6, CCONJ 1)
- až
- PART 24: (5) Ustanovení § 52 a 53 se použijí až v účetním období začínajícím 1 . ledna 2004 a později .
- X 23: Ustanovení písmen d) až h) se použijí i pro zahraniční fyzické osoby .
- SCONJ 6: (6) Ustanovení odstavců 1 až 5 se nepoužijí při změně právní formy a přeshraničním přemístění sídla .
- CCONJ 1: Účetní jednotka , která sestavuje výkaz zisku a ztráty v účelovém členění , není povinna dodržet členění v účtových skupinách 50 až 55 a 60 až 64 ; členění přizpůsobí výkazu s přihlédnutím k povinnosti uvedené v § 39 odst. 8 .
Morphology
The form / lemma ratio of PART is 1.000000 (the average of all parts of speech is 1.685169).
The 1st highest number of forms (1) was observed with the lemma “až”: až.
The 2nd highest number of forms (1) was observed with the lemma “jen”: jen.
The 3rd highest number of forms (1) was observed with the lemma “nikoli”: nikoliv.
PART does not occur with any features.
Relations
PART nodes are attached to their parents using 2 different relations: cs-dep/advmod:emph (38; 78% instances), cs-dep/cc (11; 22% instances)
Parents of PART nodes belong to 4 different parts of speech: X (24; 49% instances), NOUN (21; 43% instances), NUM (3; 6% instances), ADV (1; 2% instances)
49 (100%) PART nodes are leaves.
The highest child degree of a PART node is 0.
PART in other languages: [am] [ar] [bg] [bxr] [ca] [ckb] [cop] [cs] [cu] [da] [de] [el] [en] [es] [et] [eu] [fa] [fi] [fo] [fr] [ga] [gl] [got] [grc] [he] [hi] [hr] [hu] [id] [it] [ja] [kk] [kmr] [ko] [la] [lv] [mr] [nl] [no] [pl] [pt] [ro] [ru] [sa] [sk] [sla] [sl] [so] [sr] [sv] [swl] [ta] [tr] [ug] [uk] [u] [urj] [ur] [vi] [yue] [zh]