Gender: gender
Gender is usually a lexical feature of nouns and inflectional feature
of other parts of speech (pronouns,
adjectives, determiners, numerals,
verbs) that mark agreement with
nouns.
Masc: masculine gender
Nouns denoting male persons are masculine. Other nouns may be also grammatically masculine, without any relation to sex.
Examples
- castelo “castle”
Fem: feminine gender
Nouns denoting female persons are feminine. Other nouns may be also grammatically feminine, without any relation to sex.
Examples
- casa “house”
Unsp: unspecified
Unsp is used to tag words that can be masculine or feminine when the context is not enough to make clear its gender.
Examples
- você “you”
Treebank Statistics (UD_Portuguese)
This feature is universal but the values Unsp are language-specific.
It occurs with 3 different values: Fem, Masc, Unsp.
103964 tokens (48%) have a non-empty value of Gender.
18368 types (73%) occur at least once with a non-empty value of Gender.
14111 lemmas (80%) occur at least once with a non-empty value of Gender.
The feature is used with 14 part-of-speech tags: pt-pos/NOUN (38465; 18% instances), pt-pos/DET (32137; 15% instances), pt-pos/PROPN (11173; 5% instances), pt-pos/ADJ (11139; 5% instances), pt-pos/PRON (6900; 3% instances), pt-pos/VERB (3435; 2% instances), pt-pos/SYM (384; 0% instances), pt-pos/NUM (137; 0% instances), pt-pos/ADP (131; 0% instances), pt-pos/ADV (25; 0% instances), pt-pos/AUX (22; 0% instances), pt-pos/X (12; 0% instances), pt-pos/INTJ (3; 0% instances), pt-pos/PART (1; 0% instances).
NOUN
38465 pt-pos/NOUN tokens (98% of all NOUN tokens) have a non-empty value of Gender.
The most frequent other feature values with which NOUN and Gender co-occurred: Number=Sing (27296; 71%).
NOUN tokens may have the following values of Gender:
Fem(17376; 45% of non-emptyGender): pessoas, parte, semana, empresa, cidade, forma, empresas, casa, vida, vezMasc(21013; 55% of non-emptyGender): anos, ano, presidente, milhões, dia, país, contos, tempo, grupo, diasUnsp(76; 0% of non-emptyGender): especialistas, representantes, jornalistas, habitantes, visitantes, Presidente, artistas, clientes, estudantes, projectistasEMPTY(938): partir, relação, causa, Estado, longo, parte, termos, vez, favor, contrário
| Paradigm presidente | Masc | Fem | Unsp |
|---|---|---|---|
| Number=Sing | presidente | presidente | Presidente |
| Number=Plur | presidentes |
Gender seems to be lexical feature of NOUN. 97% lemmas (6234) occur only with one value of Gender.
DET
32137 pt-pos/DET tokens (96% of all DET tokens) have a non-empty value of Gender.
The most frequent other feature values with which DET and Gender co-occurred: PronType=Art (28441; 88%), Number=Sing (25307; 79%), Definite=Def (25256; 79%).
DET tokens may have the following values of Gender:
Fem(14092; 44% of non-emptyGender): a, as, uma, sua, esta, suas, essa, toda, outras, algumasMasc(18031; 56% of non-emptyGender): o, os, um, seu, este, seus, esse, todos, outros, outroUnsp(14; 0% of non-emptyGender): mais, cada, qual, qualquer, Que, talEMPTY(1237): a, as, o, estas, pouco, uma
| Paradigm muito | Masc | Fem | Unsp |
|---|---|---|---|
| Number=Sing | muito, mais | mais, muita | |
| Number=Plur | muitos, mais | muitas, mais | mais |
| Number=Unsp | mais |
PROPN
11173 pt-pos/PROPN tokens (62% of all PROPN tokens) have a non-empty value of Gender.
The most frequent other feature values with which PROPN and Gender co-occurred: Number=Sing (10778; 96%).
PROPN tokens may have the following values of Gender:
Fem(3533; 32% of non-emptyGender): Lisboa, Folha, Comissão, França, Alemanha, Espanha, Europa, Câmara, Rússia, AssociaçãoMasc(7357; 66% of non-emptyGender): São, Portugal, Brasil, José, Governo, EUA, Rio, Estados, João, PÚBLICOUnsp(283; 3% of non-emptyGender): Coimbra, Alvalade, Maastricht, Barcelos, Braga, Ermesinde, Aveiro, Drosnin, Frankfurt, JacartaEMPTY(6954): Paulo, Nacional, Unidos, Silva, e, Porto, Henrique, Costa, Lisboa, República
| Paradigm São | Masc | Fem | Unsp |
|---|---|---|---|
| _ | SÃO | ||
| Number=Sing | São, SÃO | São | São |
Gender seems to be lexical feature of PROPN. 94% lemmas (4288) occur only with one value of Gender.
ADJ
11139 pt-pos/ADJ tokens (99% of all ADJ tokens) have a non-empty value of Gender.
The most frequent other feature values with which ADJ and Gender co-occurred: Number=Sing (7947; 71%).
ADJ tokens may have the following values of Gender:
Fem(4887; 44% of non-emptyGender): primeira, nova, maior, grande, última, mesma, boa, segunda, política, novasMasc(6193; 56% of non-emptyGender): primeiro, novo, mesmo, último, passado, segundo, últimos, bom, maior, grandeUnsp(59; 1% of non-emptyGender): jovens, especial, melhor, capaz, grandes, inconvenientes, mole, 2., I, IIEMPTY(146): municipal, Estrangeiros, eleitoral, pública, Nacional, civil, fria, regional, verde, geral
| Paradigm grande | Masc | Fem | Unsp |
|---|---|---|---|
| Number=Sing | maior, grande, máximo | maior, grande, máxima | |
| Number=Plur | grandes, maiores, máximos | grandes, maiores | grandes |
PRON
6900 pt-pos/PRON tokens (100% of all PRON tokens) have a non-empty value of Gender.
The most frequent other feature values with which PRON and Gender co-occurred: Number=Sing (4848; 70%), Person=EMPTY (4610; 67%), Case=EMPTY (4483; 65%).
PRON tokens may have the following values of Gender:
Fem(1774; 26% of non-emptyGender): que, se, ela, a, as, elas, lhe, esta, la, qualMasc(4456; 65% of non-emptyGender): que, se, o, ele, isso, tudo, eles, lhe, os, istoUnsp(670; 10% of non-emptyGender): se, quem, me, nos, eu, você, nós, que, lhe, mimEMPTY(25): si, nada, que, se
| Paradigm que | Masc | Fem | Unsp |
|---|---|---|---|
| Definite=Def|Number=Sing|PronType=Art | que | ||
| Number=Sing|PronType=Dem | que | ||
| Number=Sing|PronType=Ind | que | que | |
| Number=Sing|PronType=Int | que | que | que |
| Number=Sing|PronType=Rel | que | que, qu | que |
| Number=Plur|PronType=Ind | que | ||
| Number=Plur|PronType=Int | que | que | |
| Number=Plur|PronType=Rel | que | que | que |
| Number=Unsp|PronType=Ind | que | ||
| Number=Unsp|PronType=Rel | que | ||
| PronType=Rel | que |
VERB
3435 pt-pos/VERB tokens (18% of all VERB tokens) have a non-empty value of Gender.
The most frequent other feature values with which VERB and Gender co-occurred: Tense=EMPTY (3435; 100%), Person=EMPTY (3434; 100%), Mood=EMPTY (3434; 100%), VerbForm=Part (3432; 100%), Number=Sing (2254; 66%).
VERB tokens may have the following values of Gender:
Fem(1373; 40% of non-emptyGender): feita, feitas, considerada, realizada, apresentada, criada, dada, passada, utilizada, marcadaMasc(2062; 60% of non-emptyGender): feito, eleito, considerado, aberto, ligados, realizado, acusado, entregue, lançado, assinadoEMPTY(15157): há, disse, tem, fazer, diz, ter, é, ver, fez, foi
| Paradigm ter | Masc | Fem |
|---|---|---|
| Number=Sing | tido | |
| Number=Sing|Voice=Pass | tido | tida |
| Number=Plur | tidas |
SYM
384 pt-pos/SYM tokens (99% of all SYM tokens) have a non-empty value of Gender.
The most frequent other feature values with which SYM and Gender co-occurred: Number=Plur (376; 98%).
SYM tokens may have the following values of Gender:
Masc(384; 100% of non-emptyGender): %, US$, R$, CR$EMPTY(3): -, %
NUM
137 pt-pos/NUM tokens (3% of all NUM tokens) have a non-empty value of Gender.
The most frequent other feature values with which NUM and Gender co-occurred: NumType=Mult (128; 93%).
NUM tokens may have the following values of Gender:
Fem(1; 1% of non-emptyGender): meiaMasc(136; 99% of non-emptyGender): cento, meia, dúziaEMPTY(3960): um, dois, três, mil, uma, duas, quatro, cinco, 15, 30
ADP
131 pt-pos/ADP tokens (0% of all ADP tokens) have a non-empty value of Gender.
ADP tokens may have the following values of Gender:
Fem(1; 1% of non-emptyGender): daMasc(130; 99% of non-emptyGender): por, comoEMPTY(34841): de, em, a, por, para, com, como, entre, sobre, sem
ADV
25 pt-pos/ADV tokens (0% of all ADV tokens) have a non-empty value of Gender.
The most frequent other feature values with which ADV and Gender co-occurred: Polarity=Neg (22; 88%).
ADV tokens may have the following values of Gender:
Masc(25; 100% of non-emptyGender): não, bom, mais, malEMPTY(8358): não, mais, já, ainda, também, ontem, só, como, quando, depois
AUX
22 pt-pos/AUX tokens (0% of all AUX tokens) have a non-empty value of Gender.
The most frequent other feature values with which AUX and Gender co-occurred: VerbForm=Part (22; 100%), Tense=EMPTY (22; 100%), Person=EMPTY (22; 100%), Mood=EMPTY (22; 100%), Number=Sing (16; 73%).
AUX tokens may have the following values of Gender:
Fem(5; 23% of non-emptyGender): convertidas, discutidas, feridas, rejeitada, voltaMasc(17; 77% of non-emptyGender): sido, Acabadinho, acabados, aceite, atualizados, deslocado, interpelado, perdoados, proibidoEMPTY(6022): é, foi, ser, está, são, foram, vai, pode, era, ter
Gender seems to be lexical feature of AUX. 100% lemmas (13) occur only with one value of Gender.
X
12 pt-pos/X tokens (9% of all X tokens) have a non-empty value of Gender.
The most frequent other feature values with which X and Gender co-occurred: Number=Sing (11; 92%).
X tokens may have the following values of Gender:
Fem(3; 25% of non-emptyGender): made, naturaMasc(9; 75% of non-emptyGender): Dream, dolce, godfather, killer, line, primitive, prélude, search, serialEMPTY(119): in, pole, position, jet, shopping, body, center, centers, computing, drag
Gender seems to be lexical feature of X. 100% lemmas (11) occur only with one value of Gender.
INTJ
3 pt-pos/INTJ tokens (7% of all INTJ tokens) have a non-empty value of Gender.
INTJ tokens may have the following values of Gender:
Fem(2; 67% of non-emptyGender): Obrigada, ruaMasc(1; 33% of non-emptyGender): AdeusEMPTY(40): não, Rarará, é, Ah, Ai, Alô, BINGO, Deus, Droga, Hein
PART
1 pt-pos/PART tokens (20% of all PART tokens) have a non-empty value of Gender.
The most frequent other feature values with which PART and Gender co-occurred: Number=Sing (1; 100%).
PART tokens may have the following values of Gender:
Masc(1; 100% of non-emptyGender): pósEMPTY(4): ex, anti, pré-
Relations with Agreement in Gender
The 10 most frequent relations where parent and child node agree in Gender:
NOUN –[det]–> DET (25282; 95%),
NOUN –[amod]–> ADJ (7987; 99%),
PROPN –[det]–> DET (4201; 80%),
NOUN –[acl]–> VERB (1497; 66%),
NOUN –[conj]–> NOUN (1250; 61%),
NOUN –[appos]–> PROPN (1129; 88%),
PROPN –[conj]–> PROPN (753; 76%),
VERB –[nsubj:pass]–> NOUN (618; 95%),
ADJ –[det]–> DET (503; 92%),
ADJ –[nsubj]–> NOUN (440; 96%).
Treebank Statistics (UD_Portuguese-BR)
This feature is universal.
It occurs with 2 different values: Fem, Masc.
18999 tokens (7%) have a non-empty value of Gender.
4 types (0%) occur at least once with a non-empty value of Gender.
1 lemmas (20%) occur at least once with a non-empty value of Gender.
The feature is used with 1 part-of-speech tags: pt-pos/DET (18999; 7% instances).
DET
18999 pt-pos/DET tokens (45% of all DET tokens) have a non-empty value of Gender.
The most frequent other feature values with which DET and Gender co-occurred: PronType=Art (18999; 100%), Definite=Def (18999; 100%), Number=Sing (15867; 84%).
DET tokens may have the following values of Gender:
Fem(8243; 43% of non-emptyGender): a, asMasc(10756; 57% of non-emptyGender): o, osEMPTY(23518): o, a, os, um, uma, as, sua, seu, seus, cada
| Paradigm o | Masc | Fem |
|---|---|---|
| Number=Sing | o | a |
| Number=Plur | os | as |
Gender in other languages: [am] [ar] [bg] [bxr] [ca] [ckb] [cop] [cs] [cu] [da] [de] [el] [en] [es] [et] [eu] [fa] [fo] [fr] [ga] [gl] [got] [grc] [he] [hi] [hr] [hu] [id] [it] [ja] [kk] [kmr] [ko] [la] [lv] [mr] [nl] [no] [pl] [pt] [ro] [ru] [sa] [sk] [sla] [sl] [so] [sr] [sv] [swl] [ta] [tr] [u] [ug] [uk] [ur] [vi] [yue] [zh]