PART
: particle
In Portuguese, PART
is used to tag prefixes that form complex words, but not compounds. In ex-presidente, anti-capitalista, vice-diretor, pós-graduação
, the morphemes ex-, anti-, vice-, pós-
should be tagged as PART
. Note that when one uses one of those prefixes alone (in a sentence as Minha pós não acaba nunca. (My post-grad never ends.)
) “pós” still stands for “pós-graduação”. This is different from compound words, such as norte-americano, meio-campo, porta-voz
, in which there is no particle and one cannot use only the prefix to recall the entire sense of the compound. Weekday names, such as segunda-feira
, are analysed as compound words, even if the first part is used for the whole e.g. Essa quarta, sem falta (This Wednesday, without failing.)
. Words such as fim-de-semana, a partir de, de novo
are MWE
s and their elements should not be tagged as PART
.
This means that prefixed words should be split in the tokenization step. Note that hyphenation is still a big issue here, since many of those complex words formed by particles would not necessarily be split by a hyphen. Hyphenation is discussed in the new Regulation of Portuguese Orthography (2009) and some specific cases are explictly ruled: vice- and ex- always come with hyphen. But not all cases are specified and many dictionaries (and old corpora) carry both forms anti-capitalista
and anticapitalista
.
Part
is also used for negative particles, as não, nem
in predicative contexts. Note that negative adverbs, as nunca, jamais
are still tagged as ADV
.
Examples:
Negative particles: não, nem
Prefixes: anti-, ex-, pós-, vice-, primeiro-, pró-, infra-
Treebank Statistics (UD_Portuguese)
There are 4 PART
lemmas (0%), 4 PART
types (0%) and 5 PART
tokens (0%).
Out of 17 observed tags, the rank of PART
is: 17 in number of lemmas, 17 in number of types and 17 in number of tokens.
The 10 most frequent PART
lemmas: ex, anti, pré, pós
The 10 most frequent PART
types: ex, anti, pré-, pós
The 10 most frequent ambiguous lemmas:
The 10 most frequent ambiguous types:
Morphology
The form / lemma ratio of PART
is 1.000000 (the average of all parts of speech is 1.425915).
The 1st highest number of forms (1) was observed with the lemma “anti”: anti.
The 2nd highest number of forms (1) was observed with the lemma “ex”: ex.
The 3rd highest number of forms (1) was observed with the lemma “pré”: pré-.
PART
occurs with 2 features: pt-feat/Gender (1; 20% instances), pt-feat/Number (1; 20% instances)
PART
occurs with 2 feature-value pairs: Gender=Masc
, Number=Sing
PART
occurs with 2 feature combinations.
The most frequent feature combination is _
(4 tokens).
Examples: ex, anti, pré-
Relations
PART
nodes are attached to their parents using 1 different relations: pt-dep/dep (5; 100% instances)
Parents of PART
nodes belong to 3 different parts of speech: NOUN (3; 60% instances), NUM (1; 20% instances), PROPN (1; 20% instances)
3 (60%) PART
nodes are leaves.
0 (0%) PART
nodes have one child.
0 (0%) PART
nodes have two children.
2 (40%) PART
nodes have three or more children.
The highest child degree of a PART
node is 3.
Children of PART
nodes are attached using 4 different relations: pt-dep/punct (3; 50% instances), pt-dep/case (1; 17% instances), pt-dep/conj (1; 17% instances), pt-dep/det (1; 17% instances)
Children of PART
nodes belong to 4 different parts of speech: PUNCT (3; 50% instances), ADP (1; 17% instances), DET (1; 17% instances), NOUN (1; 17% instances)
Treebank Statistics (UD_Portuguese-BR)
There are 1 PART
lemmas (5%), 70 PART
types (0%) and 687 PART
tokens (0%).
Out of 14 observed tags, the rank of PART
is: 9 in number of lemmas, 12 in number of types and 13 in number of tokens.
The 10 most frequent PART
lemmas: _
The 10 most frequent PART
types: se, ex, vice, pré, auto, latino, claro, recém, ai, aí
The 10 most frequent ambiguous lemmas: _ (NOUN 51670, PUNCT 37916, PROPN 29660, ADP 27823, VERB 26752, DET 23518, ADJ 13618, CCONJ 9896, ADV 8825, NUM 7639, PRON 6681, AUX 4729, PART 687, X 472)
The 10 most frequent ambiguous types: se (PRON 687, PART 362, CCONJ 173, ADP 3, PROPN 1), ex (PART 134, X 1, NOUN 1), vice (PART 44, NOUN 9, ADJ 3), pré (PART 29, ADJ 1), latino (PART 7, ADJ 3), claro (ADJ 26, PART 5, NOUN 2), recém (PART 5, ADV 1), ai (PART 3, ADV 2), aí (ADV 12, PART 1), bem (ADV 123, NOUN 5, PART 2)
- se
- PRON 687: Muitos clientes se anteciparam e garantiram as reservas .
- PART 362: Especula - se sobre a possibilidade de estar extinta .
- CCONJ 173: ” Mas se precisasse , usaria sim “ , diisse .
- ADP 3: Se tiver , vamos atender , se não , vamos usar outros .
- PROPN 1: A experiência adquirida a o longo de 20 anos acaba de virar o livro “ Vá se drenar !
- ex
- PART 134: En o Twitter , a ex - BBB voltou a comentar .
- X 1: Os ministros Paulo Bernardo ( Comunicações ) e Gleisi Hoffmann ( Casa Civil ) discutirão nesta terça - feira ( 9 ) estratégia para tentar convencer o ex - presidente Lula a subir en o palanque de Gustavo Fruet ( PDT ) en o segundo turno de a disputa por a Prefeitura de Curitiba .
- NOUN 1: A entojada se aproxima de Conrado bem en a hora que ele está admirando a ex .
- vice
- PART 44: A série de participações receberá os dez candidatos a vice - prefeito .
- NOUN 9: Agra seria o vice que Cássio tanto quis e nunca teve .
- ADJ 3: Entre os nomes cotados para receber o apoio de o prefeito está o candidato a vice en a chapa de Magalhães , Orly Gomes ( DEM ) , que é o mais forte deles .
- pré
- latino
- claro
- recém
- ai
- aí
- bem
Morphology
The form / lemma ratio of PART
is 70.000000 (the average of all parts of speech is 1740.105263).
The 1st highest number of forms (70) was observed with the lemma “_”: ’s, Agora, Avante, Cara, Desculpe, Intra, Nè, Ok, Olá, Oxalá, Sucesso, afro, ai, alvi, ante, anti, ar, atenção, auto, aí, bem, claro, co, contra, cyber, eba, então, ex, extra, foi, franco, germano, grão, hein, hélio, in, infanto, infra, inter, ir, latino, lá, mamilo, micro, on, pan, para, pois, prático, pré, pró, pós, pô, público, recém, rs, s, se, su, sub, supra, tele, to, tá, ultra, utz, vice, viu, ão, é.
PART
does not occur with any features.
Relations
PART
nodes are attached to their parents using 17 different relations: pt-dep/expl:pv (367; 53% instances), pt-dep/nmod (75; 11% instances), pt-dep/nsubj (44; 6% instances), pt-dep/conj (42; 6% instances), pt-dep/dep (36; 5% instances), pt-dep/amod (33; 5% instances), pt-dep/appos (33; 5% instances), pt-dep/obj (25; 4% instances), pt-dep/root (11; 2% instances), pt-dep/advmod (7; 1% instances), pt-dep/nsubj:pass (6; 1% instances), pt-dep/mark (2; 0% instances), pt-dep/parataxis (2; 0% instances), pt-dep/acl:relcl (1; 0% instances), pt-dep/advcl (1; 0% instances), pt-dep/cop (1; 0% instances), pt-dep/flat (1; 0% instances)
Parents of PART
nodes belong to 10 different parts of speech: VERB (448; 65% instances), NOUN (128; 19% instances), PROPN (37; 5% instances), ADJ (35; 5% instances), PART (18; 3% instances), ROOT (11; 2% instances), PRON (5; 1% instances), ADV (3; 0% instances), AUX (1; 0% instances), NUM (1; 0% instances)
423 (62%) PART
nodes are leaves.
18 (3%) PART
nodes have one child.
19 (3%) PART
nodes have two children.
227 (33%) PART
nodes have three or more children.
The highest child degree of a PART
node is 9.
Children of PART
nodes are attached using 21 different relations: pt-dep/punct (341; 30% instances), pt-dep/flat (231; 20% instances), pt-dep/det (151; 13% instances), pt-dep/nmod (99; 9% instances), pt-dep/appos (90; 8% instances), pt-dep/case (81; 7% instances), pt-dep/conj (35; 3% instances), pt-dep/amod (26; 2% instances), pt-dep/cc (26; 2% instances), pt-dep/cop (11; 1% instances), pt-dep/acl:part (10; 1% instances), pt-dep/acl:relcl (9; 1% instances), pt-dep/nsubj (8; 1% instances), pt-dep/det:poss (7; 1% instances), pt-dep/advmod (5; 0% instances), pt-dep/nummod (3; 0% instances), pt-dep/advcl (1; 0% instances), pt-dep/expl:pv (1; 0% instances), pt-dep/fixed (1; 0% instances), pt-dep/mark (1; 0% instances), pt-dep/xcomp (1; 0% instances)
Children of PART
nodes belong to 12 different parts of speech: PUNCT (341; 30% instances), NOUN (269; 24% instances), PROPN (168; 15% instances), DET (158; 14% instances), ADP (81; 7% instances), VERB (31; 3% instances), ADJ (30; 3% instances), CCONJ (27; 2% instances), PART (18; 2% instances), ADV (5; 0% instances), NUM (5; 0% instances), PRON (5; 0% instances)
PART in other languages: [am] [ar] [bg] [bxr] [ca] [ckb] [cop] [cs] [cu] [da] [de] [el] [en] [es] [et] [eu] [fa] [fi] [fo] [fr] [ga] [gl] [got] [grc] [he] [hi] [hr] [hu] [id] [it] [ja] [kk] [kmr] [ko] [la] [lv] [mr] [nl] [no] [pl] [pt] [ro] [ru] [sa] [sk] [sla] [sl] [so] [sr] [sv] [swl] [ta] [tr] [ug] [uk] [u] [urj] [ur] [vi] [yue] [zh]