home cs/feat edit page issue tracker

This page pertains to UD version 2.

Foreign: is this a foreign word?

Values: Yes

Boolean feature. Is this a foreign word? Not a loan word but a genuinely foreign word appearing inside native text, e.g. inside direct speech, titles of books etc.

Note that Czech data (especially those from the PDT) often indicate the original part of speech of foreign words. Thus this feature may occur with any POS tag. If the original part of speech is not known, the feature will accompany the cs-pos/X tag.

Yes: it is foreign

Examples

Diffs

Prague Dependency Treebank

For proper nouns the borderline between foreign words and loan words is somewhat fuzzy, so e.g. the English personal name George  is marked as foreign even though it would not normally be translated (except for names of rulers and saints, which would become Jiří).

Articles in foreign names (the, die, le)  are tagged cs-pos/ADJ, not cs-pos/DET.


Treebank Statistics (UD_Czech)

This feature is language-specific. It occurs with 1 different values: Yes.

8256 tokens (1%) have a non-empty value of Foreign. 3352 types (3%) occur at least once with a non-empty value of Foreign. 3184 lemmas (6%) occur at least once with a non-empty value of Foreign. The feature is used with 13 part-of-speech tags: cs-pos/PROPN (3218; 0% instances), cs-pos/ADJ (2432; 0% instances), cs-pos/NOUN (1596; 0% instances), cs-pos/ADP (524; 0% instances), cs-pos/ADV (102; 0% instances), cs-pos/VERB (102; 0% instances), cs-pos/PART (100; 0% instances), cs-pos/CCONJ (72; 0% instances), cs-pos/PRON (56; 0% instances), cs-pos/NUM (26; 0% instances), cs-pos/DET (17; 0% instances), cs-pos/INTJ (6; 0% instances), cs-pos/SCONJ (5; 0% instances).

PROPN

3218 cs-pos/PROPN tokens (4% of all PROPN tokens) have a non-empty value of Foreign.

The most frequent other feature values with which PROPN and Foreign co-occurred: Polarity=Pos (3218; 100%), Case=EMPTY (2514; 78%), Abbr=EMPTY (2374; 74%), NameType=Com (2178; 68%), Animacy=EMPTY (1933; 60%), Number=EMPTY (1875; 58%).

PROPN tokens may have the following values of Foreign:

Foreign seems to be lexical feature of PROPN. 100% lemmas (1297) occur only with one value of Foreign.

ADJ

2432 cs-pos/ADJ tokens (1% of all ADJ tokens) have a non-empty value of Foreign.

The most frequent other feature values with which ADJ and Foreign co-occurred: Polarity=Pos (2428; 100%), Degree=Pos (2419; 99%), Animacy=EMPTY (2338; 96%), Case=EMPTY (2311; 95%), Number=EMPTY (2223; 91%), Gender=EMPTY (2215; 91%).

ADJ tokens may have the following values of Foreign:

Foreign seems to be lexical feature of ADJ. 100% lemmas (927) occur only with one value of Foreign.

NOUN

1596 cs-pos/NOUN tokens (0% of all NOUN tokens) have a non-empty value of Foreign.

The most frequent other feature values with which NOUN and Foreign co-occurred: Polarity=Pos (1595; 100%), Case=EMPTY (1071; 67%), Animacy=EMPTY (886; 56%), Number=EMPTY (835; 52%).

NOUN tokens may have the following values of Foreign:

Foreign seems to be lexical feature of NOUN. 100% lemmas (849) occur only with one value of Foreign.

ADP

524 cs-pos/ADP tokens (0% of all ADP tokens) have a non-empty value of Foreign.

The most frequent other feature values with which ADP and Foreign co-occurred: AdpType=Prep (524; 100%), Case=EMPTY (303; 58%).

ADP tokens may have the following values of Foreign:

Foreign seems to be lexical feature of ADP. 100% lemmas (53) occur only with one value of Foreign.

VERB

102 cs-pos/VERB tokens (0% of all VERB tokens) have a non-empty value of Foreign.

The most frequent other feature values with which VERB and Foreign co-occurred: Aspect=EMPTY (102; 100%), Polarity=Pos (99; 97%), Gender=EMPTY (95; 93%), Person=EMPTY (56; 55%), Mood=EMPTY (54; 53%), Tense=EMPTY (52; 51%), Voice=EMPTY (52; 51%).

VERB tokens may have the following values of Foreign:

Foreign seems to be lexical feature of VERB. 100% lemmas (76) occur only with one value of Foreign.

ADV

102 cs-pos/ADV tokens (0% of all ADV tokens) have a non-empty value of Foreign.

The most frequent other feature values with which ADV and Foreign co-occurred: PronType=EMPTY (100; 98%), Polarity=EMPTY (96; 94%), Degree=EMPTY (96; 94%).

ADV tokens may have the following values of Foreign:

Foreign seems to be lexical feature of ADV. 100% lemmas (65) occur only with one value of Foreign.

PART

100 cs-pos/PART tokens (1% of all PART tokens) have a non-empty value of Foreign.

PART tokens may have the following values of Foreign:

Foreign seems to be lexical feature of PART. 100% lemmas (23) occur only with one value of Foreign.

CCONJ

72 cs-pos/CCONJ tokens (0% of all CCONJ tokens) have a non-empty value of Foreign.

CCONJ tokens may have the following values of Foreign:

PRON

56 cs-pos/PRON tokens (0% of all PRON tokens) have a non-empty value of Foreign.

The most frequent other feature values with which PRON and Foreign co-occurred: PrepCase=EMPTY (56; 100%), Reflex=EMPTY (55; 98%), Variant=EMPTY (55; 98%), Gender=EMPTY (40; 71%), PronType=Prs (37; 66%), Number=Sing (30; 54%).

PRON tokens may have the following values of Foreign:

Foreign seems to be lexical feature of PRON. 100% lemmas (23) occur only with one value of Foreign.

NUM

26 cs-pos/NUM tokens (0% of all NUM tokens) have a non-empty value of Foreign.

The most frequent other feature values with which NUM and Foreign co-occurred: NumType=Card (26; 100%), Gender=EMPTY (26; 100%), NumForm=Word (26; 100%), Case=EMPTY (23; 88%), NumValue=1,2,3 (22; 85%), Number=Plur (20; 77%).

NUM tokens may have the following values of Foreign:

DET

17 cs-pos/DET tokens (0% of all DET tokens) have a non-empty value of Foreign.

The most frequent other feature values with which DET and Foreign co-occurred: Animacy=EMPTY (16; 94%), Number[psor]=EMPTY (13; 76%), Case=EMPTY (12; 71%), Gender=EMPTY (12; 71%), Person=EMPTY (11; 65%), Poss=EMPTY (9; 53%).

DET tokens may have the following values of Foreign:

Foreign seems to be lexical feature of DET. 100% lemmas (12) occur only with one value of Foreign.

INTJ

6 cs-pos/INTJ tokens (7% of all INTJ tokens) have a non-empty value of Foreign.

INTJ tokens may have the following values of Foreign:

SCONJ

5 cs-pos/SCONJ tokens (0% of all SCONJ tokens) have a non-empty value of Foreign.

SCONJ tokens may have the following values of Foreign:

Relations with Agreement in Foreign

The 10 most frequent relations where parent and child node agree in Foreign: PROPN –[flat:foreign]–> ADJ (846; 100%), NOUN –[flat:foreign]–> ADJ (525; 100%), PROPN –[flat:foreign]–> PROPN (240; 100%), NOUN –[flat:foreign]–> NOUN (143; 99%), ADJ –[flat:foreign]–> ADJ (126; 100%), NOUN –[flat:foreign]–> ADP (114; 100%), ADJ –[flat:foreign]–> PROPN (88; 100%), NOUN –[flat:foreign]–> PART (50; 100%), ADJ –[flat:foreign]–> NOUN (39; 100%), NOUN –[flat:foreign]–> PROPN (26; 87%).


Treebank Statistics (UD_Czech-CAC)

This feature is language-specific. It occurs with 1 different values: Yes.

519 tokens (0%) have a non-empty value of Foreign. 383 types (1%) occur at least once with a non-empty value of Foreign. 372 lemmas (1%) occur at least once with a non-empty value of Foreign. The feature is used with 10 part-of-speech tags: cs-pos/NOUN (255; 0% instances), cs-pos/ADJ (117; 0% instances), cs-pos/ADP (63; 0% instances), cs-pos/PROPN (37; 0% instances), cs-pos/PART (13; 0% instances), cs-pos/ADV (12; 0% instances), cs-pos/PRON (7; 0% instances), cs-pos/VERB (7; 0% instances), cs-pos/DET (5; 0% instances), cs-pos/CCONJ (3; 0% instances).

NOUN

255 cs-pos/NOUN tokens (0% of all NOUN tokens) have a non-empty value of Foreign.

The most frequent other feature values with which NOUN and Foreign co-occurred: Polarity=Pos (255; 100%), Animacy=EMPTY (176; 69%).

NOUN tokens may have the following values of Foreign:

Foreign seems to be lexical feature of NOUN. 100% lemmas (202) occur only with one value of Foreign.

ADJ

117 cs-pos/ADJ tokens (0% of all ADJ tokens) have a non-empty value of Foreign.

The most frequent other feature values with which ADJ and Foreign co-occurred: Polarity=Pos (117; 100%), Degree=Pos (113; 97%), Animacy=EMPTY (104; 89%), Case=EMPTY (83; 71%), Number=EMPTY (80; 68%), Gender=EMPTY (77; 66%).

ADJ tokens may have the following values of Foreign:

Foreign seems to be lexical feature of ADJ. 100% lemmas (96) occur only with one value of Foreign.

ADP

63 cs-pos/ADP tokens (0% of all ADP tokens) have a non-empty value of Foreign.

The most frequent other feature values with which ADP and Foreign co-occurred: AdpType=Prep (63; 100%).

ADP tokens may have the following values of Foreign:

Foreign seems to be lexical feature of ADP. 100% lemmas (20) occur only with one value of Foreign.

PROPN

37 cs-pos/PROPN tokens (0% of all PROPN tokens) have a non-empty value of Foreign.

The most frequent other feature values with which PROPN and Foreign co-occurred: Polarity=Pos (37; 100%), Abbr=EMPTY (36; 97%), Case=EMPTY (30; 81%), Number=EMPTY (24; 65%), Animacy=EMPTY (24; 65%).

PROPN tokens may have the following values of Foreign:

Foreign seems to be lexical feature of PROPN. 100% lemmas (32) occur only with one value of Foreign.

PART

13 cs-pos/PART tokens (0% of all PART tokens) have a non-empty value of Foreign.

PART tokens may have the following values of Foreign:

ADV

12 cs-pos/ADV tokens (0% of all ADV tokens) have a non-empty value of Foreign.

The most frequent other feature values with which ADV and Foreign co-occurred: Polarity=EMPTY (12; 100%), Degree=EMPTY (12; 100%), PronType=EMPTY (12; 100%).

ADV tokens may have the following values of Foreign:

PRON

7 cs-pos/PRON tokens (0% of all PRON tokens) have a non-empty value of Foreign.

The most frequent other feature values with which PRON and Foreign co-occurred: PrepCase=EMPTY (7; 100%), Variant=EMPTY (7; 100%), Reflex=EMPTY (7; 100%), Number=Sing (5; 71%), Person=3 (4; 57%), Case=Loc (4; 57%), PronType=Prs (4; 57%).

PRON tokens may have the following values of Foreign:

VERB

7 cs-pos/VERB tokens (0% of all VERB tokens) have a non-empty value of Foreign.

The most frequent other feature values with which VERB and Foreign co-occurred: Gender=EMPTY (7; 100%), Polarity=Pos (7; 100%), Aspect=EMPTY (7; 100%), Voice=EMPTY (4; 57%), VerbForm=Inf (4; 57%), Number=EMPTY (4; 57%), Person=EMPTY (4; 57%), Mood=EMPTY (4; 57%), Tense=EMPTY (4; 57%).

VERB tokens may have the following values of Foreign:

DET

5 cs-pos/DET tokens (0% of all DET tokens) have a non-empty value of Foreign.

The most frequent other feature values with which DET and Foreign co-occurred: Animacy=EMPTY (5; 100%), Number[psor]=EMPTY (5; 100%), Person=EMPTY (5; 100%), Number=Sing (4; 80%), Poss=Yes (3; 60%), Case=Gen (3; 60%), Gender=Masc,Neut (3; 60%), PronType=Prs (3; 60%).

DET tokens may have the following values of Foreign:

CCONJ

3 cs-pos/CCONJ tokens (0% of all CCONJ tokens) have a non-empty value of Foreign.

CCONJ tokens may have the following values of Foreign:

Relations with Agreement in Foreign

The 10 most frequent relations where parent and child node agree in Foreign: NOUN –[flat:foreign]–> ADJ (35; 100%), NOUN –[conj]–> NOUN (23; 59%), NOUN –[flat:foreign]–> ADP (17; 100%), ADJ –[flat:foreign]–> NOUN (14; 100%), NOUN –[flat:foreign]–> NOUN (14; 82%), NOUN –[case]–> ADP (11; 52%), PROPN –[flat:foreign]–> ADJ (7; 100%), ADJ –[conj]–> ADJ (7; 88%), NOUN –[flat:foreign]–> PART (5; 100%), PROPN –[flat:foreign]–> PROPN (5; 100%).


Foreign in other languages: [ar] [cs] [da] [de] [es] [et] [fi] [fo] [hi] [nl] [sl] [u]