Gender
: gender
Gender
is usually a lexical feature of nouns and inflectional feature
of other parts of speech (pronouns,
adjectives, determiners, numerals,
verbs) that mark agreement with
nouns.
Masc
: masculine gender
Nouns denoting male persons are masculine. Other nouns may be also grammatically masculine, without any relation to sex.
Examples
- castelo “castle”
Fem
: feminine gender
Nouns denoting female persons are feminine. Other nouns may be also grammatically feminine, without any relation to sex.
Examples
- casa “house”
Unsp
: unspecified
Unsp
is used to tag words that can be masculine or feminine when the context is not enough to make clear its gender.
Examples
- você “you”
Treebank Statistics (UD_Portuguese)
This feature is universal but the values Unsp
are language-specific.
It occurs with 3 different values: Fem
, Masc
, Unsp
.
103964 tokens (48%) have a non-empty value of Gender
.
18368 types (73%) occur at least once with a non-empty value of Gender
.
14111 lemmas (80%) occur at least once with a non-empty value of Gender
.
The feature is used with 14 part-of-speech tags: pt-pos/NOUN (38465; 18% instances), pt-pos/DET (32137; 15% instances), pt-pos/PROPN (11173; 5% instances), pt-pos/ADJ (11139; 5% instances), pt-pos/PRON (6900; 3% instances), pt-pos/VERB (3435; 2% instances), pt-pos/SYM (384; 0% instances), pt-pos/NUM (137; 0% instances), pt-pos/ADP (131; 0% instances), pt-pos/ADV (25; 0% instances), pt-pos/AUX (22; 0% instances), pt-pos/X (12; 0% instances), pt-pos/INTJ (3; 0% instances), pt-pos/PART (1; 0% instances).
NOUN
38465 pt-pos/NOUN tokens (98% of all NOUN
tokens) have a non-empty value of Gender
.
The most frequent other feature values with which NOUN
and Gender
co-occurred: Number=Sing (27296; 71%).
NOUN
tokens may have the following values of Gender
:
Fem
(17376; 45% of non-emptyGender
): pessoas, parte, semana, empresa, cidade, forma, empresas, casa, vida, vezMasc
(21013; 55% of non-emptyGender
): anos, ano, presidente, milhões, dia, país, contos, tempo, grupo, diasUnsp
(76; 0% of non-emptyGender
): especialistas, representantes, jornalistas, habitantes, visitantes, Presidente, artistas, clientes, estudantes, projectistasEMPTY
(938): partir, relação, causa, Estado, longo, parte, termos, vez, favor, contrário
Paradigm presidente | Masc | Fem | Unsp |
---|---|---|---|
Number=Sing | presidente | presidente | Presidente |
Number=Plur | presidentes |
Gender
seems to be lexical feature of NOUN
. 97% lemmas (6234) occur only with one value of Gender
.
DET
32137 pt-pos/DET tokens (96% of all DET
tokens) have a non-empty value of Gender
.
The most frequent other feature values with which DET
and Gender
co-occurred: PronType=Art (28441; 88%), Number=Sing (25307; 79%), Definite=Def (25256; 79%).
DET
tokens may have the following values of Gender
:
Fem
(14092; 44% of non-emptyGender
): a, as, uma, sua, esta, suas, essa, toda, outras, algumasMasc
(18031; 56% of non-emptyGender
): o, os, um, seu, este, seus, esse, todos, outros, outroUnsp
(14; 0% of non-emptyGender
): mais, cada, qual, qualquer, Que, talEMPTY
(1237): a, as, o, estas, pouco, uma
Paradigm muito | Masc | Fem | Unsp |
---|---|---|---|
Number=Sing | muito, mais | mais, muita | |
Number=Plur | muitos, mais | muitas, mais | mais |
Number=Unsp | mais |
PROPN
11173 pt-pos/PROPN tokens (62% of all PROPN
tokens) have a non-empty value of Gender
.
The most frequent other feature values with which PROPN
and Gender
co-occurred: Number=Sing (10778; 96%).
PROPN
tokens may have the following values of Gender
:
Fem
(3533; 32% of non-emptyGender
): Lisboa, Folha, Comissão, França, Alemanha, Espanha, Europa, Câmara, Rússia, AssociaçãoMasc
(7357; 66% of non-emptyGender
): São, Portugal, Brasil, José, Governo, EUA, Rio, Estados, João, PÚBLICOUnsp
(283; 3% of non-emptyGender
): Coimbra, Alvalade, Maastricht, Barcelos, Braga, Ermesinde, Aveiro, Drosnin, Frankfurt, JacartaEMPTY
(6954): Paulo, Nacional, Unidos, Silva, e, Porto, Henrique, Costa, Lisboa, República
Paradigm São | Masc | Fem | Unsp |
---|---|---|---|
_ | SÃO | ||
Number=Sing | São, SÃO | São | São |
Gender
seems to be lexical feature of PROPN
. 94% lemmas (4288) occur only with one value of Gender
.
ADJ
11139 pt-pos/ADJ tokens (99% of all ADJ
tokens) have a non-empty value of Gender
.
The most frequent other feature values with which ADJ
and Gender
co-occurred: Number=Sing (7947; 71%).
ADJ
tokens may have the following values of Gender
:
Fem
(4887; 44% of non-emptyGender
): primeira, nova, maior, grande, última, mesma, boa, segunda, política, novasMasc
(6193; 56% of non-emptyGender
): primeiro, novo, mesmo, último, passado, segundo, últimos, bom, maior, grandeUnsp
(59; 1% of non-emptyGender
): jovens, especial, melhor, capaz, grandes, inconvenientes, mole, 2., I, IIEMPTY
(146): municipal, Estrangeiros, eleitoral, pública, Nacional, civil, fria, regional, verde, geral
Paradigm grande | Masc | Fem | Unsp |
---|---|---|---|
Number=Sing | maior, grande, máximo | maior, grande, máxima | |
Number=Plur | grandes, maiores, máximos | grandes, maiores | grandes |
PRON
6900 pt-pos/PRON tokens (100% of all PRON
tokens) have a non-empty value of Gender
.
The most frequent other feature values with which PRON
and Gender
co-occurred: Number=Sing (4848; 70%), Person=EMPTY (4610; 67%), Case=EMPTY (4483; 65%).
PRON
tokens may have the following values of Gender
:
Fem
(1774; 26% of non-emptyGender
): que, se, ela, a, as, elas, lhe, esta, la, qualMasc
(4456; 65% of non-emptyGender
): que, se, o, ele, isso, tudo, eles, lhe, os, istoUnsp
(670; 10% of non-emptyGender
): se, quem, me, nos, eu, você, nós, que, lhe, mimEMPTY
(25): si, nada, que, se
Paradigm que | Masc | Fem | Unsp |
---|---|---|---|
Definite=Def|Number=Sing|PronType=Art | que | ||
Number=Sing|PronType=Dem | que | ||
Number=Sing|PronType=Ind | que | que | |
Number=Sing|PronType=Int | que | que | que |
Number=Sing|PronType=Rel | que | que, qu | que |
Number=Plur|PronType=Ind | que | ||
Number=Plur|PronType=Int | que | que | |
Number=Plur|PronType=Rel | que | que | que |
Number=Unsp|PronType=Ind | que | ||
Number=Unsp|PronType=Rel | que | ||
PronType=Rel | que |
VERB
3435 pt-pos/VERB tokens (18% of all VERB
tokens) have a non-empty value of Gender
.
The most frequent other feature values with which VERB
and Gender
co-occurred: Tense=EMPTY (3435; 100%), Person=EMPTY (3434; 100%), Mood=EMPTY (3434; 100%), VerbForm=Part (3432; 100%), Number=Sing (2254; 66%).
VERB
tokens may have the following values of Gender
:
Fem
(1373; 40% of non-emptyGender
): feita, feitas, considerada, realizada, apresentada, criada, dada, passada, utilizada, marcadaMasc
(2062; 60% of non-emptyGender
): feito, eleito, considerado, aberto, ligados, realizado, acusado, entregue, lançado, assinadoEMPTY
(15157): há, disse, tem, fazer, diz, ter, é, ver, fez, foi
Paradigm ter | Masc | Fem |
---|---|---|
Number=Sing | tido | |
Number=Sing|Voice=Pass | tido | tida |
Number=Plur | tidas |
SYM
384 pt-pos/SYM tokens (99% of all SYM
tokens) have a non-empty value of Gender
.
The most frequent other feature values with which SYM
and Gender
co-occurred: Number=Plur (376; 98%).
SYM
tokens may have the following values of Gender
:
Masc
(384; 100% of non-emptyGender
): %, US$, R$, CR$EMPTY
(3): -, %
NUM
137 pt-pos/NUM tokens (3% of all NUM
tokens) have a non-empty value of Gender
.
The most frequent other feature values with which NUM
and Gender
co-occurred: NumType=Mult (128; 93%).
NUM
tokens may have the following values of Gender
:
Fem
(1; 1% of non-emptyGender
): meiaMasc
(136; 99% of non-emptyGender
): cento, meia, dúziaEMPTY
(3960): um, dois, três, mil, uma, duas, quatro, cinco, 15, 30
ADP
131 pt-pos/ADP tokens (0% of all ADP
tokens) have a non-empty value of Gender
.
ADP
tokens may have the following values of Gender
:
Fem
(1; 1% of non-emptyGender
): daMasc
(130; 99% of non-emptyGender
): por, comoEMPTY
(34841): de, em, a, por, para, com, como, entre, sobre, sem
ADV
25 pt-pos/ADV tokens (0% of all ADV
tokens) have a non-empty value of Gender
.
The most frequent other feature values with which ADV
and Gender
co-occurred: Polarity=Neg (22; 88%).
ADV
tokens may have the following values of Gender
:
Masc
(25; 100% of non-emptyGender
): não, bom, mais, malEMPTY
(8358): não, mais, já, ainda, também, ontem, só, como, quando, depois
AUX
22 pt-pos/AUX tokens (0% of all AUX
tokens) have a non-empty value of Gender
.
The most frequent other feature values with which AUX
and Gender
co-occurred: VerbForm=Part (22; 100%), Tense=EMPTY (22; 100%), Person=EMPTY (22; 100%), Mood=EMPTY (22; 100%), Number=Sing (16; 73%).
AUX
tokens may have the following values of Gender
:
Fem
(5; 23% of non-emptyGender
): convertidas, discutidas, feridas, rejeitada, voltaMasc
(17; 77% of non-emptyGender
): sido, Acabadinho, acabados, aceite, atualizados, deslocado, interpelado, perdoados, proibidoEMPTY
(6022): é, foi, ser, está, são, foram, vai, pode, era, ter
Gender
seems to be lexical feature of AUX
. 100% lemmas (13) occur only with one value of Gender
.
X
12 pt-pos/X tokens (9% of all X
tokens) have a non-empty value of Gender
.
The most frequent other feature values with which X
and Gender
co-occurred: Number=Sing (11; 92%).
X
tokens may have the following values of Gender
:
Fem
(3; 25% of non-emptyGender
): made, naturaMasc
(9; 75% of non-emptyGender
): Dream, dolce, godfather, killer, line, primitive, prélude, search, serialEMPTY
(119): in, pole, position, jet, shopping, body, center, centers, computing, drag
Gender
seems to be lexical feature of X
. 100% lemmas (11) occur only with one value of Gender
.
INTJ
3 pt-pos/INTJ tokens (7% of all INTJ
tokens) have a non-empty value of Gender
.
INTJ
tokens may have the following values of Gender
:
Fem
(2; 67% of non-emptyGender
): Obrigada, ruaMasc
(1; 33% of non-emptyGender
): AdeusEMPTY
(40): não, Rarará, é, Ah, Ai, Alô, BINGO, Deus, Droga, Hein
PART
1 pt-pos/PART tokens (20% of all PART
tokens) have a non-empty value of Gender
.
The most frequent other feature values with which PART
and Gender
co-occurred: Number=Sing (1; 100%).
PART
tokens may have the following values of Gender
:
Masc
(1; 100% of non-emptyGender
): pósEMPTY
(4): ex, anti, pré-
Relations with Agreement in Gender
The 10 most frequent relations where parent and child node agree in Gender
:
NOUN –[det]–> DET (25282; 95%),
NOUN –[amod]–> ADJ (7987; 99%),
PROPN –[det]–> DET (4201; 80%),
NOUN –[acl]–> VERB (1497; 66%),
NOUN –[conj]–> NOUN (1250; 61%),
NOUN –[appos]–> PROPN (1129; 88%),
PROPN –[conj]–> PROPN (753; 76%),
VERB –[nsubj:pass]–> NOUN (618; 95%),
ADJ –[det]–> DET (503; 92%),
ADJ –[nsubj]–> NOUN (440; 96%).
Treebank Statistics (UD_Portuguese-BR)
This feature is universal.
It occurs with 2 different values: Fem
, Masc
.
18999 tokens (7%) have a non-empty value of Gender
.
4 types (0%) occur at least once with a non-empty value of Gender
.
1 lemmas (20%) occur at least once with a non-empty value of Gender
.
The feature is used with 1 part-of-speech tags: pt-pos/DET (18999; 7% instances).
DET
18999 pt-pos/DET tokens (45% of all DET
tokens) have a non-empty value of Gender
.
The most frequent other feature values with which DET
and Gender
co-occurred: PronType=Art (18999; 100%), Definite=Def (18999; 100%), Number=Sing (15867; 84%).
DET
tokens may have the following values of Gender
:
Fem
(8243; 43% of non-emptyGender
): a, asMasc
(10756; 57% of non-emptyGender
): o, osEMPTY
(23518): o, a, os, um, uma, as, sua, seu, seus, cada
Paradigm o | Masc | Fem |
---|---|---|
Number=Sing | o | a |
Number=Plur | os | as |
Gender in other languages: [am] [ar] [bg] [bxr] [ca] [ckb] [cop] [cs] [cu] [da] [de] [el] [en] [es] [et] [eu] [fa] [fo] [fr] [ga] [gl] [got] [grc] [he] [hi] [hr] [hu] [id] [it] [ja] [kk] [kmr] [ko] [la] [lv] [mr] [nl] [no] [pl] [pt] [ro] [ru] [sa] [sk] [sla] [sl] [so] [sr] [sv] [swl] [ta] [tr] [u] [ug] [uk] [ur] [vi] [yue] [zh]