Lemma (morphology)

In morphology and lexicography, a lemma (plural lemmas or lemmata) is the canonical form,[1] dictionary form, or citation form of a set of words (headword). In English, for example, run, runs, ran and running are forms of the same lexeme, with run as the lemma by which they are indexed. Lexeme, in this context, refers to the set of all the forms that have the same meaning, and lemma refers to the particular form that is chosen by convention to represent the lexeme. Lemmas have special significance in highly inflected languages such as Arabic, Turkish and Russian. The process of determining the lemma for a given word is called lemmatisation. The lemma can be viewed as the chief of the principal parts, although lemmatisation is at least partly arbitrary.

Morphology

The form of a word that is chosen to serve as the lemma is usually the least marked form, but there are several exceptions such as, for several languages, the use of the infinitive for verbs.

For English, the citation form of a noun is the singular: mouse rather than mice. For multiword lexemes that contain possessive adjectives or reflexive pronouns, the citation form uses a form of the indefinite pronoun one: do one's best, perjure oneself. In European languages with grammatical gender, the citation form of regular adjectives and nouns is usually the masculine singular. If the language also has cases, the citation form is often the masculine singular nominative.

For many languages, the citation form of a verb is the infinitive: French aller, German gehen, Spanish ir. For English, that usually coincides with the uninflected, least marked form of the verb (that is, "run", not "runs" or "running"), but the present tense is used for some defective verbs (shall, can, and must have only the one form). For Latin, Ancient Greek, and Modern Greek, however, the first person singular present tense is traditionally used, but some modern dictionaries use the infinitive instead. (For contracted verbs in Ancient Greek, an uncontracted first person singular present tense is used to reveal the contract vowel: φιλέω philéō for φιλῶ philō "I love" [implying affection]; ἀγαπάω agapáō for ἀγαπῶ agapō "I love" [implying regard]). Finnish dictionaries list verbs not under their root but under the first infinitive, marked with -(t)a, -(t)ä.

For Japanese, the non-past (present and future) tense is used. For Arabic, which has no infinitives, the third-person singular masculine of the past tense is the least-marked form and is used for entries in modern dictionaries. In older dictionaries, which are still commonly used, the triliteral of the word, either a verb or a noun, is used. Hebrew often uses the third-person masculine perfect, e.g., ברא bara' create, כפר kaphar deny. Georgian uses the verbal noun. For Korean, -da is attached to the stem.

In Irish, words are highly inflected by case (genitive, nominative, dative and vocative) and by their place within a sentence because of initial mutations. The noun cainteoir, the lemma for the noun meaning "speaker", has a variety of forms: chainteoir, gcainteoir, cainteora, chainteora, cainteoirí, chainteoirí and gcainteoirí.

Some phrases are cited in a sort of lemma: Carthago delenda est (literally, "Carthage must be destroyed") is a common way of citing Cato, but what he said was nearer to censeo Carthaginem esse delendam ("I hold Carthage to be in need of destruction").

Lexicography

In a dictionary, the lemma "go" represents the inflected forms "go", "goes", "going", "went", and "gone". The relationship between an inflected form and its lemma is usually denoted by an angle bracket, e.g., "went" < "go". Of course, the disadvantage of such simplifications is the inability to look up a declined or conjugated form of the word, but some dictionaries, like Webster's Dictionary, list "went". Multilingual dictionaries vary in how they deal with this issue: the Langenscheidt dictionary of German does not list ging (< gehen), but the Cassell does.

Lemmas or word stems are used often in corpus linguistics for determining word frequency. In that usage, the specific definition of "lemma" is flexible depending on the task it is being used for.

Pronunciation

A word may have different pronunciations, depending on its phonetic environment (the neighbouring sounds) or on the degree of stress in a sentence. An example of the latter is the weak and strong forms of certain English function words like some and but (pronounced /sʌm/, /bʌt/ when stressed but /s(ə)m/, /bət/ when unstressed). Dictionaries usually give the pronunciation used when the word is pronounced alone (its isolation form) and with stress, but they may also note common weak forms of pronunciation.

Difference between stem and lemma

The stem is the part of the word that never changes even when morphologically inflected; a lemma is the base form of the word. For example, from "produced", the lemma is "produce", but the stem is "produc-". This is because there are words such as production. and producing[2] In linguistic analysis, the stem is defined more generally as the analyzed base form from which all inflected forms can be formed. When phonology is taken into account, the definition of the unchangeable part of the word is not useful, as can be seen in the phonological forms of the words in the preceding example: "produced" /prəˈdjst/ vs. "production" /prəˈdʌkʃən/.

Some lexemes have several stems but one lemma. For instance the verb "to go" (the lemma) has the stems "go" and "went" due to suppletion: the past tense was co-opted from a different verb, "to wend".

See also

References

  1. Zgusta, Ladislav (2006). Dolezal, Fredric F.M. (ed.). Lexicography then and now. p. 202. ISBN 3484391294. A minor... problem can arise when the canonical form of the headword, i.e. the form in which it is to be cited, is to be chosen.
  2. "Natural Language Toolkit — NLTK 3.0 documentation". Nltk.org. 2015-09-05. Retrieved 2015-09-27.
This article is issued from Wikipedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.