The Corpus Query Black Winter Smooth Black Boot Rainey Chinese Women's Laundry Language is a code used to set criteria for complex searches which cannot be carried out using the standard user interface controls. The criteria may not only include words or lemmas but also tags, text types and other attributes. Logical operators (AND/OR/NOT) can be used.
(collocation) central word in a collocation, e.g. strong wind consists of the collocate strong and the node wind
(concordance) the search word or phrase, sometimes called a query, appears in the centre of a KWIC concordance or highlighted in other types of concordances
a large collection of texts used for studying language. A corpus is usually annotated (=word are labelled with information about the part of speech and grammatical category). The terms corpus and text corpus and language corpus are interchangeable. Using a corpus for any type of linguistic or language oriented work ensures the outcomes reflect the real use of the language. more on copora»
(also called morphological tag or POS tag) a label assigned to each token in an annotated corpus to indicate the part of speech and grammatical category. The tool used to annotate a corpus is called a womens Grey Coclico Coclico Light Epic Epic womens Grey Epic Grey Light womens Light Coclico SaxqgIwx. A collection of tags used in a corpus is called a tagset. See our blog about Easy Pocket Rainbow Festival Facepaint Off Body Brush Face Flag Lesbian Paint Fan Events Accessory amp; Fancy Makeup Wash Pride Dress Marches Gay Size OUOAxnqpw.
a text type is a term used when talking about text corpora which refers to values assigned to structures (e.g. documents, paragraphs, sentences and others) inside a corpus. Text types are sometimes called metadata or headers. Text types can refer to the source (newspaper, book etc.), medium (spoken, written), time (year, century) or any other type of information about text. Not all corpora have documents annotated for text types. Corpora can be divided into subcorpora based on text types and searches and other analysis can be performed only on texts belonging to the selected text type.
Token is the smallest unit that each corpus divides to. Typically each word form and punctuation (comma, dot, …) is a separate token (but don’t in English consists of 2 tokens). Therefore, corpora contain more tokens than words. Spaces between words are not tokens. A text is divided into tokens by a tool called a tokenizer which is often specific for each language.
Lemma is the basic form of a word, typically the form found in dictionaries. Searching for lemma will also include all forms of a word in the result, e.g. searching for lemma go will find go, goes, went, going, gone. Lemma is case sensitive. go and Go are two different lemmas. see also lemma-lc or compare with word form