home cs/feat edit page issue tracker

This page pertains to UD version 2.

Variant: alternative form of word

Values: Short

Sometimes there are multiple word forms for the same lemma and set of features. The Variant feature helps distinguish alternate forms.

In Czech there are two groups of words where double forms are regular and worth capturing: short forms of adjectives and short (clitic) forms of personal pronouns. This feature only marks the non-standard short forms, hence there is only one value, Short. For the long standard forms the Variant feature remains unspecified.

Short: short form of adjectives

The short form is called nominal form of adjective (jmenný tvar přídavného jména), as opposed to the long form, which is pronominal because it originated as a combination of a nominal form and a personal pronoun. But this is ancient history of the language. In modern Czech, only a subset of the nominal forms survive, and using them sometimes sounds slightly archaic. They are used as nominal predicates with copula, but they do not appear as premodifiers of nouns. The pronominal forms are considered standard, except for two frequent adjectives that do not have them: třeba, rád.

Examples

Short: short (clitic) form of personal pronouns

Some personal pronouns in dative and accusative Case have double forms. The normal (long) form is more independent in terms of positions it can take in word order. The short forms are clitics (http://cs.wikipedia.org/wiki/P%C5%99%C3%ADklonka). They are separate words (unlike in some other languages) but in the word order they usually stick to the second position.


Treebank Statistics (UD_Czech)

This feature is language-specific. It occurs with 1 different values: Short.

34163 tokens (3%) have a non-empty value of Variant. 3402 types (3%) occur at least once with a non-empty value of Variant. 1690 lemmas (3%) occur at least once with a non-empty value of Variant. The feature is used with 2 part-of-speech tags: cs-pos/PRON (24036; 2% instances), cs-pos/ADJ (10127; 1% instances).

PRON

24036 cs-pos/PRON tokens (61% of all PRON tokens) have a non-empty value of Variant.

The most frequent other feature values with which PRON and Variant co-occurred: PronType=Prs (24036; 100%), PrepCase=EMPTY (24036; 100%), Gender=EMPTY (22928; 95%), Person=EMPTY (22237; 93%), Reflex=Yes (22237; 93%), Number=EMPTY (22237; 93%), Case=Acc (19681; 82%).

PRON tokens may have the following values of Variant:

ADJ

10127 cs-pos/ADJ tokens (6% of all ADJ tokens) have a non-empty value of Variant.

The most frequent other feature values with which ADJ and Variant co-occurred: Degree=EMPTY (10127; 100%), Case=EMPTY (10109; 100%), Polarity=Pos (10010; 99%), Animacy=EMPTY (7735; 76%), Number=Sing (5493; 54%).

ADJ tokens may have the following values of Variant:

Variant seems to be lexical feature of ADJ. 100% lemmas (1685) occur only with one value of Variant.

Relations with Agreement in Variant

The 10 most frequent relations where parent and child node agree in Variant: ADJ –[conj]–> ADJ (387; 73%), ADJ –[parataxis]–> ADJ (9; 64%), ADJ –[appos]–> ADJ (9; 56%), ADJ –[orphan]–> ADJ (5; 83%).


Treebank Statistics (UD_Czech-CAC)

This feature is language-specific. It occurs with 1 different values: Short.

13953 tokens (3%) have a non-empty value of Variant. 1990 types (3%) occur at least once with a non-empty value of Variant. 1084 lemmas (4%) occur at least once with a non-empty value of Variant. The feature is used with 2 part-of-speech tags: cs-pos/PRON (9000; 2% instances), cs-pos/ADJ (4953; 1% instances).

PRON

9000 cs-pos/PRON tokens (57% of all PRON tokens) have a non-empty value of Variant.

The most frequent other feature values with which PRON and Variant co-occurred: PrepCase=EMPTY (9000; 100%), PronType=Prs (9000; 100%), Gender=EMPTY (8699; 97%), Reflex=Yes (8525; 95%), Number=EMPTY (8524; 95%), Person=EMPTY (8524; 95%), Case=Acc (7760; 86%).

PRON tokens may have the following values of Variant:

ADJ

4953 cs-pos/ADJ tokens (7% of all ADJ tokens) have a non-empty value of Variant.

The most frequent other feature values with which ADJ and Variant co-occurred: Degree=EMPTY (4953; 100%), Case=EMPTY (4945; 100%), Polarity=Pos (4923; 99%), Animacy=EMPTY (3576; 72%).

ADJ tokens may have the following values of Variant:

Variant seems to be lexical feature of ADJ. 100% lemmas (1080) occur only with one value of Variant.

Relations with Agreement in Variant

The 10 most frequent relations where parent and child node agree in Variant: ADJ –[conj]–> ADJ (323; 80%), ADJ –[orphan]–> ADJ (5; 71%), ADJ –[appos]–> ADJ (4; 80%), ADJ –[advmod]–> ADJ (1; 100%), ADJ –[ccomp]–> ADJ (1; 100%).


Treebank Statistics (UD_Czech-CLTT)

This feature is language-specific. It occurs with 1 different values: Short.

373 tokens (1%) have a non-empty value of Variant. 12 types (0%) occur at least once with a non-empty value of Variant. 7 lemmas (0%) occur at least once with a non-empty value of Variant. The feature is used with 2 part-of-speech tags: cs-pos/PRON (309; 1% instances), cs-pos/ADJ (64; 0% instances).

PRON

309 cs-pos/PRON tokens (38% of all PRON tokens) have a non-empty value of Variant.

The most frequent other feature values with which PRON and Variant co-occurred: Reflex=Yes (309; 100%), PronType=Prs (309; 100%), Gender=EMPTY (309; 100%), Case=Acc (309; 100%), Number=EMPTY (309; 100%).

PRON tokens may have the following values of Variant:

ADJ

64 cs-pos/ADJ tokens (1% of all ADJ tokens) have a non-empty value of Variant.

The most frequent other feature values with which ADJ and Variant co-occurred: Case=EMPTY (64; 100%), Degree=EMPTY (64; 100%), Polarity=Pos (64; 100%), Gender=Fem,Masc (41; 64%), Animacy=Inan (41; 64%), Number=Plur (41; 64%).

ADJ tokens may have the following values of Variant:


Variant in other languages: [cs] [da] [nl] [pl] [ro] [ru] [sl]