# Python WordNet using NLTK

**(C) 2017-2024 by [Damir Cavar](http://damir.cavar.me/)**

**Version:** 1.3, January 2024

**License:** [Creative Commons Attribution-ShareAlike 4.0 International License](https://creativecommons.org/licenses/by-sa/4.0/) ([CA BY-SA 4.0](https://creativecommons.org/licenses/by-sa/4.0/))

**Prerequisites:**

In [None]:
!pip install -U nltk

This is a tutorial related to the discussion of a WordSense disambiguation and various machine learning strategies discussed in the textbook [Machine Learning: The Art and Science of Algorithms that Make Sense of Data](https://www.cs.bris.ac.uk/~flach/mlbook/) by [Peter Flach](https://www.cs.bris.ac.uk/~flach/).

This tutorial was developed as part of my course material for the course Machine Learning for Computational Linguistics in the [Computational Linguistics Program](http://cl.indiana.edu/) of the [Department of Linguistics](http://www.indiana.edu/~lingdept/) at [Indiana University](https://www.indiana.edu/).

## Using WordNet

Importing *wordnet* from the NLTK module:

In [10]:
from nltk.corpus import wordnet

Asking for a synset in WordNet:

In [11]:
wordnet.synsets('cat')

[Synset('cat.n.01'),
 Synset('guy.n.01'),
 Synset('cat.n.03'),
 Synset('kat.n.01'),
 Synset('cat-o'-nine-tails.n.01'),
 Synset('caterpillar.n.02'),
 Synset('big_cat.n.01'),
 Synset('computerized_tomography.n.01'),
 Synset('cat.v.01'),
 Synset('vomit.v.01')]

A synset is identified with a 3-part name of the form: word.pos.nn. Except of the last synset, all other synsets of *dog* above are nouns with the *part-of-speech* tag *n*. We can pick a synset with a specific PoS:

In [12]:
wordnet.synsets('dog', pos=wordnet.VERB)

[Synset('chase.v.01')]

Besides VERB the other parts of speech are NOUN, ADJ and ADV.

We can select a specific synset from the list using the full 3-part name notation:

In [13]:
wordnet.synset('dog.n.01')

Synset('dog.n.01')

Fort this particular synset we can fetch the definition:

In [14]:
print(wordnet.synset('dog.n.01').definition())

a member of the genus Canis (probably descended from the common wolf) that has been domesticated by man since prehistoric times; occurs in many breeds


Synsets might also have examples. We can count the number of examples for this concrete synset this way:

In [15]:
len(wordnet.synset('dog.n.01').examples())

1

We can print out the example using:

In [16]:
print(wordnet.synset('dog.n.01').examples()[0])

the dog barked all night


We can also output the lemmata for a specific synset:

In [17]:
wordnet.synset('dog.n.01').lemmas()

[Lemma('dog.n.01.dog'),
 Lemma('dog.n.01.domestic_dog'),
 Lemma('dog.n.01.Canis_familiaris')]

Using list comprehension we can convert this list to just the lemma list:

In [18]:
[str(lemma.name()) for lemma in wordnet.synset('dog.n.01').lemmas()]

['dog', 'domestic_dog', 'Canis_familiaris']

We can also reference a concrete lemma:

In [19]:
wordnet.lemma('dog.n.01.dog')

Lemma('dog.n.01.dog')

### Multilingual Functions

The current version of WordNet in NLTK is multilingual. To see which languages are supported, use this command:

In [4]:
sorted(wordnet.langs())

['als',
 'arb',
 'bul',
 'cat',
 'cmn',
 'dan',
 'ell',
 'eng',
 'eus',
 'fas',
 'fin',
 'fra',
 'glg',
 'heb',
 'hrv',
 'ind',
 'ita',
 'jpn',
 'nld',
 'nno',
 'nob',
 'pol',
 'por',
 'qcn',
 'slv',
 'spa',
 'swe',
 'tha',
 'zsm']

We can ask for the Japanese names of synsets:

In [23]:
wordnet.synset('dog.n.01').lemma_names('hrv')



['Canis_lupus_familiaris', 'domaći_pas', 'pas']

We can fetch the English lemmata from different languages for a specific synset:

In [24]:
wordnet.lemmas('cane', lang='ita')

[Lemma('dog.n.01.cane'),
 Lemma('cramp.n.02.cane'),
 Lemma('hammer.n.01.cane'),
 Lemma('bad_person.n.01.cane'),
 Lemma('incompetent.n.01.cane')]

### Synonyms, hypernyms, holonyms

In [25]:
dog = wordnet.synset('dog.n.01')

In [26]:
dog.hypernyms()

[Synset('canine.n.02'), Synset('domestic_animal.n.01')]

In [27]:
dog.hyponyms()

[Synset('basenji.n.01'),
 Synset('corgi.n.01'),
 Synset('cur.n.01'),
 Synset('dalmatian.n.02'),
 Synset('great_pyrenees.n.01'),
 Synset('griffon.n.02'),
 Synset('hunting_dog.n.01'),
 Synset('lapdog.n.01'),
 Synset('leonberg.n.01'),
 Synset('mexican_hairless.n.01'),
 Synset('newfoundland.n.01'),
 Synset('pooch.n.01'),
 Synset('poodle.n.01'),
 Synset('pug.n.01'),
 Synset('puppy.n.01'),
 Synset('spitz.n.01'),
 Synset('toy_dog.n.01'),
 Synset('working_dog.n.01')]

In [28]:
dog.member_holonyms()

[Synset('canis.n.01'), Synset('pack.n.06')]

In [29]:
dog.root_hypernyms()

[Synset('entity.n.01')]

In [30]:
wordnet.synset('dog.n.01').lowest_common_hypernyms(wordnet.synset('cat.n.01'))

[Synset('carnivore.n.01')]

In [31]:
good = wordnet.synset('good.a.01')

In [32]:
good.lemmas()[0].antonyms()

[Lemma('bad.a.01.bad')]