If you are lost after reading all sorts of literature about synsets and lemmas you have come to the right place! Let us see this through an example.
Suppose you search for word: "trunk" on wordnet.
Noun
- S: (n) trunk, tree trunk, bole (the main stem of a tree; usually covered with bark; the bole is usually the part that is commercially useful for lumber)
- S: (n) trunk (luggage consisting of a large strong case used when traveling or for storage)
- S: (n) torso, trunk, body (the body excluding the head and neck and limbs) "they moved their arms and legs and bodies"
- S: (n) luggage compartment, automobile trunk, trunk, boot (compartment in an automobile that carries luggage or shopping or tools) "he put his golf bag in the trunk"
- S: (n) proboscis, trunk (a long flexible snout as of an elephant)
All the text that has been highlighted(gloss) represents different sense or concept of the word trunk. Senses are grouped into synonym sets, i.e., synsets. So one sense is stem of tree, whereas another is body part, etc. Total 5 senses.All the underlined text is lemma or synonyms within that particular sense. So if we see third synset where the word trunk is used in sense of body part, there are 3 lemmas - torso, trunk and body.
Lemmas in synsets are sorted by how often they appear. Meaning when trunk is used in sense of body part, torso is the most appropriate lemma.
Let's see this through a python program:
import nltk
from nltk.corpus import wordnet as wn
trunk_synsets = wn.synsets("trunk")
print(trunk_synsets)
Output:
[Synset('trunk.n.01'),
Synset('trunk.n.02'),
Synset('torso.n.01'),
Synset('luggage_compartment.n.01'),
Synset('proboscis.n.02')]
Program continued:
for sense in trunk_synsets:
lemmas = [l.name() for l in sense.lemmas()]
print("Lemma for sense : " + sense.name() + " - " + str(lemmas))
Output:
Lemmas for sense : trunk.n.01 - ['trunk', 'tree_trunk', 'bole']
Lemmas for sense : trunk.n.02 - ['trunk']
Lemmas for sense : torso.n.01 - ['torso', 'trunk', 'body']
Lemmas for sense : luggage_compartment.n.01 - ['luggage_compartment', 'automobile_trunk', 'trunk']
Lemmas for sense : proboscis.n.02 - ['proboscis', 'trunk']
The word trunk is itself a lemma with different meanings or senses that we already saw above!
These explanations were given with respect to the nltk library of python. But literally lemma means root form of a word and not the inflected form. In wordnet inflected forms of a word are not stored. So, even if we find synsets of inflected words we get synsets and lemmas in root form.
Example: "trunks" instead of "trunk"
trunk_synsets = wn.synsets("trunks")
print(trunk_synsets)
for sense in trunk_synsets:
lemmas = [l.name() for l in sense.lemmas()]
print("Lemmas for sense : " + sense.name() + (" +sense.definition() +
" - " + str(lemmas))
[Synset('short_pants.n.01'),
Synset('trunk.n.01'),
Synset('trunk.n.02'),
Synset('torso.n.01'),
Synset('luggage_compartment.n.01'),
Synset('proboscis.n.02')]
Lemmas for sense : short_pants.n.01(trousers that end at or above the knee - ['short_pants', 'shorts', 'trunks']
Lemmas for sense : trunk.n.01(the main stem of a tree; usually covered with bark; the bole is usually the part that is commercially useful for lumber - ['trunk', 'tree_trunk', 'bole']
Lemmas for sense : trunk.n.02(luggage consisting of a large strong case used when traveling or for storage - ['trunk']
Lemmas for sense : torso.n.01(the body excluding the head and neck and limbs - ['torso', 'trunk', 'body']
Lemmas for sense : luggage_compartment.n.01(compartment in an automobile that carries luggage or shopping or tools - ['luggage_compartment', 'automobile_trunk', 'trunk']
Lemmas for sense : proboscis.n.02(a long flexible snout as of an elephant - ['proboscis', 'trunk']
Hope you have a clear picture of synset and lemma in your mind now!
Finally! Thank you. Lemma vs synsets was making me go mad, but this article makes it so much more clear!
ReplyDelete