{ "cells": [ { "cell_type": "code", "execution_count": 39, "id": "b28adc0f", "metadata": { "collapsed": false }, "outputs": [], "source": [ "from nltk.stem import SnowballStemmer,PorterStemmer,WordNetLemmatizer" ] }, { "cell_type": "markdown", "id": "c0db7299", "metadata": {}, "source": [ "# Grammar" ] }, { "cell_type": "markdown", "id": "f108dbfd", "metadata": {}, "source": [ "Recall:\n", "\n", "- inflection - systematic alteration of words according to grammatical rules\n", "- declension - nouns, adjectives, articles, pronouns - number, gender, case\n", "- conjugation - verbs - person, number, tense, gender, aspect, mood, voice\n", "\n", "Some of the terms:\n", "\n", "- person, number tense, gender... pretty obvious\n", "- voice - relationship between verb and its arguments (subject, object, ...)\n", "- aspect - ongoing, completed, habitual, consequential, ...\n", "- mood - actual, hypothetical, counterfactual, wished for, conditional, command, question, ..." ] }, { "cell_type": "markdown", "id": "5dcc0d8c", "metadata": {}, "source": [ "# Porter Stemmer on English" ] }, { "cell_type": "code", "execution_count": 43, "id": "b431c369", "metadata": { "collapsed": false }, "outputs": [], "source": [ "en_nouns = \\\n", "[\n", " \"house houses house's\",\n", " \"child children\",\n", "]\n", "en_verbs = \\\n", "[\n", " \"walk walked walking walks\",\n", " \"see saw sees seen seeing\",\n", "]\n", "en_cases = en_nouns + en_verbs" ] }, { "cell_type": "code", "execution_count": 44, "id": "5801e1c4", "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "hous hous house'\n", "child children\n", "walk walk walk walk\n", "see saw see seen see\n" ] } ], "source": [ "pen = PorterStemmer()\n", "for c in en_cases:\n", " for w in c.split():\n", " print pen.stem(w),\n", " print" ] }, { "cell_type": "markdown", "id": "10cfb39d", "metadata": {}, "source": [ "# Snowball Stemmer on German" ] }, { "cell_type": "code", "execution_count": 36, "id": "e4ff6b4a", "metadata": { "collapsed": false }, "outputs": [], "source": [ "de_nouns = \\\n", "[\n", " u\"Bruder Bruders Brüder Brüdern\",\n", " u\"Leuchte Leuchten\",\n", " u\"Haus Hauses Hause Häuser Häusern\",\n", "]\n", "de_verbs = \\\n", "[\n", " u\"geb geben gebe gibst gibt gebt gab gabst gaben gabt gegeben gäbe gäbst gäb gäben gäbet\",\n", " u\"fangen fang fange fängst fängt fangen fangt fing fingst fingen fingt\",\n", " u\"backen backe backst backt backte backtest backten backtet gebackt gebackte\",\n", " u\"bäckst bäckt bukest bükest\",\n", " \n", "]\n", "de_cases = de_nouns+de_verbs" ] }, { "cell_type": "code", "execution_count": 38, "id": "eff68904", "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "brud brud brud brud\n", "leucht leucht\n", "haus haus haus haus haus\n", "geb geb geb gibst gibt gebt gab gabst gab gabt gegeb gab gabst gab gab gabet\n", "fang fang fang fang fangt fang fangt fing fing fing fingt\n", "back back back backt backt backt backt backtet gebackt gebackt\n", "back backt buk buk\n" ] } ], "source": [ "des = SnowballStemmer(\"german\")\n", "for c in de_cases:\n", " for w in c.split():\n", " print des.stem(w),\n", " print" ] }, { "cell_type": "markdown", "id": "0ffdca22", "metadata": {}, "source": [ "# WordNet Lemmatization" ] }, { "cell_type": "code", "execution_count": 40, "id": "a766109f", "metadata": { "collapsed": false }, "outputs": [], "source": [ "wnl = WordNetLemmatizer()" ] }, { "cell_type": "code", "execution_count": 45, "id": "f5e43ff7", "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "house house house's\n", "child child\n" ] } ], "source": [ "for c in en_nouns:\n", " for w in c.split():\n", " print wnl.lemmatize(w,pos='n'),\n", " print" ] }, { "cell_type": "code", "execution_count": 46, "id": "08bc0780", "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "walk walk walk walk\n", "see saw see see see\n" ] } ], "source": [ "for c in en_verbs:\n", " for w in c.split():\n", " print wnl.lemmatize(w,pos='v'),\n", " print" ] }, { "cell_type": "code", "execution_count": 47, "id": "623e970f", "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "walk walked walking walk\n", "see saw see seen seeing\n", "house house house's\n", "child child\n" ] } ], "source": [ "for c in en_verbs+en_nouns:\n", " for w in c.split():\n", " print wnl.lemmatize(w),\n", " print" ] }, { "cell_type": "code", "execution_count": null, "id": "f48f8926", "metadata": { "collapsed": false }, "outputs": [], "source": [] } ], "metadata": {}, "nbformat": 4, "nbformat_minor": 5 }