Variant
: alternative form of word
Values: | Short |
Sometimes there are multiple word forms for the same lemma and set of features.
The Variant
feature helps distinguish alternate forms.
In Czech there are two groups of words where double forms are regular and worth capturing:
short forms of adjectives and short (clitic) forms of personal pronouns.
This feature only marks the non-standard short forms, hence there is only one value, Short
.
For the long standard forms the Variant
feature remains unspecified.
Short
: short form of adjectives
The short form is called nominal form of adjective (jmenný tvar přídavného jména), as opposed to the long form, which is pronominal because it originated as a combination of a nominal form and a personal pronoun. But this is ancient history of the language. In modern Czech, only a subset of the nominal forms survive, and using them sometimes sounds slightly archaic. They are used as nominal predicates with copula, but they do not appear as premodifiers of nouns. The pronominal forms are considered standard, except for two frequent adjectives that do not have them: třeba, rád.
Examples
- možno “possible”, schopen “able”, nutno “necessary”, znám “known”, spokojen “satisfied”, povinen “supposed to”, ochoten “willing”, jist “sure”, vědom “knowing”, přítomen “present”, roven “equal”, patrno “apparent”, hotov “finished”, spjat “connected”, vinen “guilty”
- Long equivalents: možné, schopný, nutné, známý, spokojený, povinný, ochotný, jistý, vědomý, přítomný, rovný, patrné, hotový, spjatý, vinný
Short
: short (clitic) form of personal pronouns
Some personal pronouns in dative and accusative Case have double forms. The normal (long) form is more independent in terms of positions it can take in word order. The short forms are clitics (http://cs.wikipedia.org/wiki/P%C5%99%C3%ADklonka). They are separate words (unlike in some other languages) but in the word order they usually stick to the second position.
- mi, mě, ti, tě, mu, ho, si, se
- mně, mne, tobě, tebe, jemu, jeho, sobě, sebe
- “me, me, you, you, him, him, oneself, oneself”
Treebank Statistics (UD_Czech)
This feature is language-specific.
It occurs with 1 different values: Short
.
34163 tokens (3%) have a non-empty value of Variant
.
3402 types (3%) occur at least once with a non-empty value of Variant
.
1690 lemmas (3%) occur at least once with a non-empty value of Variant
.
The feature is used with 2 part-of-speech tags: cs-pos/PRON (24036; 2% instances), cs-pos/ADJ (10127; 1% instances).
PRON
24036 cs-pos/PRON tokens (61% of all PRON
tokens) have a non-empty value of Variant
.
The most frequent other feature values with which PRON
and Variant
co-occurred: PronType=Prs (24036; 100%), PrepCase=EMPTY (24036; 100%), Gender=EMPTY (22928; 95%), Person=EMPTY (22237; 93%), Reflex=Yes (22237; 93%), Number=EMPTY (22237; 93%), Case=Acc (19681; 82%).
PRON
tokens may have the following values of Variant
:
Short
(24036; 100% of non-emptyVariant
): se, si, mu, ho, mi, mě, tě, ti, sa
ADJ
10127 cs-pos/ADJ tokens (6% of all ADJ
tokens) have a non-empty value of Variant
.
The most frequent other feature values with which ADJ
and Variant
co-occurred: Degree=EMPTY (10127; 100%), Case=EMPTY (10109; 100%), Polarity=Pos (10010; 99%), Animacy=EMPTY (7735; 76%), Number=Sing (5493; 54%).
ADJ
tokens may have the following values of Variant
:
Short
(10127; 100% of non-emptyVariant
): třeba, možno, rád, řečeno, schopen, nutno, schopni, známo, připraven, přesvědčen
Variant
seems to be lexical feature of ADJ
. 100% lemmas (1685) occur only with one value of Variant
.
Relations with Agreement in Variant
The 10 most frequent relations where parent and child node agree in Variant
:
ADJ –[conj]–> ADJ (387; 73%),
ADJ –[parataxis]–> ADJ (9; 64%),
ADJ –[appos]–> ADJ (9; 56%),
ADJ –[orphan]–> ADJ (5; 83%).
Treebank Statistics (UD_Czech-CAC)
This feature is language-specific.
It occurs with 1 different values: Short
.
13953 tokens (3%) have a non-empty value of Variant
.
1990 types (3%) occur at least once with a non-empty value of Variant
.
1084 lemmas (4%) occur at least once with a non-empty value of Variant
.
The feature is used with 2 part-of-speech tags: cs-pos/PRON (9000; 2% instances), cs-pos/ADJ (4953; 1% instances).
PRON
9000 cs-pos/PRON tokens (57% of all PRON
tokens) have a non-empty value of Variant
.
The most frequent other feature values with which PRON
and Variant
co-occurred: PrepCase=EMPTY (9000; 100%), PronType=Prs (9000; 100%), Gender=EMPTY (8699; 97%), Reflex=Yes (8525; 95%), Number=EMPTY (8524; 95%), Person=EMPTY (8524; 95%), Case=Acc (7760; 86%).
PRON
tokens may have the following values of Variant
:
Short
(9000; 100% of non-emptyVariant
): se, si, mu, ho, mi, mě, ti, tě, mně, sis
ADJ
4953 cs-pos/ADJ tokens (7% of all ADJ
tokens) have a non-empty value of Variant
.
The most frequent other feature values with which ADJ
and Variant
co-occurred: Degree=EMPTY (4953; 100%), Case=EMPTY (4945; 100%), Polarity=Pos (4923; 99%), Animacy=EMPTY (3576; 72%).
ADJ
tokens may have the following values of Variant
:
Short
(4953; 100% of non-emptyVariant
): možno, nutno, povinen, řečeno, dosaženo, známo, rád, rádi, věnována, dána
Variant
seems to be lexical feature of ADJ
. 100% lemmas (1080) occur only with one value of Variant
.
Relations with Agreement in Variant
The 10 most frequent relations where parent and child node agree in Variant
:
ADJ –[conj]–> ADJ (323; 80%),
ADJ –[orphan]–> ADJ (5; 71%),
ADJ –[appos]–> ADJ (4; 80%),
ADJ –[advmod]–> ADJ (1; 100%),
ADJ –[ccomp]–> ADJ (1; 100%).
Treebank Statistics (UD_Czech-CLTT)
This feature is language-specific.
It occurs with 1 different values: Short
.
373 tokens (1%) have a non-empty value of Variant
.
12 types (0%) occur at least once with a non-empty value of Variant
.
7 lemmas (0%) occur at least once with a non-empty value of Variant
.
The feature is used with 2 part-of-speech tags: cs-pos/PRON (309; 1% instances), cs-pos/ADJ (64; 0% instances).
PRON
309 cs-pos/PRON tokens (38% of all PRON
tokens) have a non-empty value of Variant
.
The most frequent other feature values with which PRON
and Variant
co-occurred: Reflex=Yes (309; 100%), PronType=Prs (309; 100%), Gender=EMPTY (309; 100%), Case=Acc (309; 100%), Number=EMPTY (309; 100%).
PRON
tokens may have the following values of Variant
:
Short
(309; 100% of non-emptyVariant
): se
ADJ
64 cs-pos/ADJ tokens (1% of all ADJ
tokens) have a non-empty value of Variant
.
The most frequent other feature values with which ADJ
and Variant
co-occurred: Case=EMPTY (64; 100%), Degree=EMPTY (64; 100%), Polarity=Pos (64; 100%), Gender=Fem,Masc (41; 64%), Animacy=Inan (41; 64%), Number=Plur (41; 64%).
ADJ
tokens may have the following values of Variant
:
Short
(64; 100% of non-emptyVariant
): povinny, povinna, možno, znám, známa, známy, nutno, schopna, rovny, schopny
Variant in other languages: [cs] [da] [nl] [pl] [ro] [ru] [sl]