{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Recipe generator\n", "\n", "In this notebook we use [TextBlob](https://textblob.readthedocs.io/en/dev/) to extract nouns, verbs, and sentences from the OCRd text of a 19th century cookery book. We try to clean things up a bit, using regular expressions to discard likely OCR errors. Then we recombine the various parts in random combinations to create delicious recipes for all occasions. Enjoy!\n", "\n", "Inspired by [*Australian Plain Cookery by a Practical Cook*](https://nla.gov.au/nla.obj-579917051), 1882." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import requests\n", "from textblob import TextBlob\n", "import re\n", "import random\n", "import pandas as pd\n", "from IPython.display import display, HTML\n", "import nltk\n", "nltk.download('stopwords')\n", "nltk.download('punkt')\n", "nltk.download('averaged_perceptron_tagger')" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "# The Cloudstor URL links to the repository of OCRd text from Trove digitised books\n", "CLOUDSTOR_URL = 'https://cloudstor.aarnet.edu.au/plus/s/ugiw3gdijSKaoTL'\n", "# File name of the cookery book\n", "text_file = 'australian-plain-cookery-by-a-practical-cook-nla.obj-579917051.txt'" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "First we procure a recipe book." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "# Download the text of the book\n", "response = requests.get(f'{CLOUDSTOR_URL}/download?files={text_file}')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Then we slice and dice the words to create a new TextBlob." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "# Create a TextBlob using the text\n", "blob = TextBlob(response.text)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Carefully we remove the nouns and the verbs, discarding any that are spoiled." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "# Get the verbs filtering out short words and those including non-alpha characters.\n", "# 'VBD' is the part of speech tag for a past tense verb\n", "verbs = [w.title() for w, t in blob.tags if t == 'VBD' and len(w) > 3 and w.isalpha()]" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "# Get the nouns filtering out short words and those including non-alpha characters.\n", "# NNP is the POS tag for proper nouns\n", "nouns = [w.title() for w, t in blob.tags if t.startswith('NNP') and len(w) > 3 and w.isalpha()]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now it is necessary to prepare the sentences. First extract them from the blob. Discard any that seem ill-formed." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "# Get the sentences from the blob\n", "# Uses a regexp to exclude those that include anything other than standard letters, numbers, and punctuation.\n", "sentences = [str(s).replace('\\n', ' ') for s in blob.sentences if re.match(r'^[a-zA-Z\\s\\-,\\.;0-9\\'&\\(\\):]*$', str(s))]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The sentences now need to be divided, to separate out the titles, which are recognised by their case." ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [], "source": [ "# Titles in this cookbook are in uppercase, so we can separate them out from the rest of the sentences.\n", "titles = [s for s in sentences if s.strip('.').isupper()]\n", "sentences = [s for s in sentences if not s.strip('.').isupper()]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we are ready to start cooking!" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [], "source": [ "def recipe_maker(num=5):\n", " html = ''\n", " # Get a random title\n", " title = random.choice(titles)\n", " html = f'
{\" \".join(random.sample(sentences, num))}
'\n", " display(HTML(html))" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/html": [ "In carving a large fish, as here engraved, cut thin slices, as from A to B, and help with it pieces of the belly, in the direction marked from C to D ; the best flavoured is the upper or thick part. Take two pounds of fat bacon, and a pound and a half of beef suet. When you put in the vegetables, cover all closely, and do not use for at least six weeks. Lay in cold water, and when it boils simmer for eight or ten minutes. Add one onion, three sage leaves, some whole pepper, and a little salt in three pints of water. Edge and cover with short crust, and ornament the edges.
" ], "text/plain": [ "\n", " | title | \n", "url | \n", "contributors | \n", "date | \n", "fulltext_url | \n", "trove_id | \n", "language | \n", "rights | \n", "pages | \n", "form | \n", "volume | \n", "parent | \n", "children | \n", "text_downloaded | \n", "text_file | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1888 | \n", "The Kingswood cookery book / by H. F. Wicken | \n", "https://trove.nla.gov.au/work/12721516 | \n", "Wicken, H | \n", "1885-1950 | \n", "https://nla.gov.au/nla.obj-43987239 | \n", "nla.obj-43987239 | \n", "English | \n", "Out of Copyright|http://rightsstatements.org/v... | \n", "278 | \n", "Book | \n", "NaN | \n", "NaN | \n", "NaN | \n", "True | \n", "the-kingswood-cookery-book-by-h-f-wicken-nla.o... | \n", "
2582 | \n", "Electric cookery book : being an indispensable... | \n", "https://trove.nla.gov.au/work/16383834 | \n", "State Electricity Commission of Victoria | \n", "1940-1949 | \n", "http://nla.gov.au/nla.obj-52836472 | \n", "nla.obj-52836472 | \n", "English | \n", "No known copyright restrictions|http://rightss... | \n", "73 | \n", "Book | \n", "NaN | \n", "NaN | \n", "NaN | \n", "True | \n", "electric-cookery-book-being-an-indispensable-h... | \n", "
2654 | \n", "The English and Australian cookery book : cook... | \n", "https://trove.nla.gov.au/work/16551115 | \n", "Abbott, Edward, 1801-1869 | \n", "1864-2014 | \n", "https://nla.gov.au/nla.obj-9562000 | \n", "nla.obj-9562000 | \n", "English | \n", "Out of Copyright|http://rightsstatements.org/v... | \n", "356 | \n", "Book | \n", "NaN | \n", "NaN | \n", "NaN | \n", "True | \n", "the-english-and-australian-cookery-book-cooker... | \n", "
4431 | \n", "Australian plain cookery / by a Practical Cook... | \n", "https://trove.nla.gov.au/work/18493439 | \n", "Old housekeeper | \n", "1882-1897 | \n", "http://nla.gov.au/nla.obj-579917051 | \n", "nla.obj-579917051 | \n", "NaN | \n", "NaN | \n", "148 | \n", "Book | \n", "NaN | \n", "NaN | \n", "NaN | \n", "True | \n", "australian-plain-cookery-by-a-practical-cook-r... | \n", "
7688 | \n", "The Armidale Red Cross cookery book of tested ... | \n", "https://trove.nla.gov.au/work/20631441 | \n", "Australian Red Cross Society. Armidale Branch | \n", "1920 | \n", "https://nla.gov.au/nla.obj-52792201 | \n", "nla.obj-52792201 | \n", "English | \n", "Out of Copyright|http://rightsstatements.org/v... | \n", "82 | \n", "Book | \n", "NaN | \n", "NaN | \n", "NaN | \n", "True | \n", "the-armidale-red-cross-cookery-book-of-tested-... | \n", "
8173 | \n", "The Kandy Koola cookery book and housewife's c... | \n", "https://trove.nla.gov.au/work/21067450 | \n", "Kandy Koola Tea | \n", "1898 | \n", "https://nla.gov.au/nla.obj-2409723409 | \n", "nla.obj-2409723409 | \n", "English | \n", "Out of Copyright|http://rightsstatements.org/v... | \n", "76 | \n", "Book | \n", "NaN | \n", "NaN | \n", "NaN | \n", "True | \n", "the-kandy-koola-cookery-book-and-housewife-s-c... | \n", "
8491 | \n", "The Hawkesbury and Shoalhaven calendar, cultur... | \n", "https://trove.nla.gov.au/work/21309432 | \n", "Woodhill & Co | \n", "1905 | \n", "http://nla.gov.au/nla.obj-28658844 | \n", "nla.obj-28658844 | \n", "English | \n", "Out of Copyright|http://rightsstatements.org/v... | \n", "200 | \n", "Book | \n", "NaN | \n", "NaN | \n", "NaN | \n", "True | \n", "the-hawkesbury-and-shoalhaven-calendar-cultura... | \n", "
9457 | \n", "Hebrew cookery / by an Australian | \n", "https://trove.nla.gov.au/work/22242397 | \n", "Australian | \n", "1867 | \n", "http://nla.gov.au/nla.obj-52864954 | \n", "nla.obj-52864954 | \n", "English | \n", "No known copyright restrictions|http://rightss... | \n", "25 | \n", "Book | \n", "NaN | \n", "NaN | \n", "NaN | \n", "True | \n", "hebrew-cookery-by-an-australian-nla.obj-528649... | \n", "
9472 | \n", "Recipes given by Mrs. Wicken at cookery class,... | \n", "https://trove.nla.gov.au/work/22249810 | \n", "Wicken, H | \n", "1888 | \n", "http://nla.gov.au/nla.obj-533356312 | \n", "nla.obj-533356312 | \n", "English | \n", "Out of Copyright|http://rightsstatements.org/v... | \n", "16 | \n", "Book | \n", "NaN | \n", "NaN | \n", "NaN | \n", "True | \n", "recipes-given-by-mrs-wicken-at-cookery-class-w... | \n", "
13145 | \n", "Southland Red Cross cookery book, 1916 | \n", "https://trove.nla.gov.au/work/237279068 | \n", "NaN | \n", "1916 | \n", "https://nla.gov.au/nla.obj-49498371 | \n", "nla.obj-49498371 | \n", "English | \n", "Out of Copyright|http://rightsstatements.org/v... | \n", "187 | \n", "Book | \n", "NaN | \n", "NaN | \n", "NaN | \n", "True | \n", "southland-red-cross-cookery-book-1916-nla.obj-... | \n", "
19740 | \n", "Barossa cookery book : 400 tried recipes | \n", "https://trove.nla.gov.au/work/237367083 | \n", "NaN | \n", "1917 | \n", "https://nla.gov.au/nla.obj-497806529 | \n", "nla.obj-497806529 | \n", "English | \n", "No known copyright restrictions|http://rightss... | \n", "60 | \n", "Book | \n", "NaN | \n", "NaN | \n", "NaN | \n", "True | \n", "barossa-cookery-book-400-tried-recipes-nla.obj... | \n", "
19823 | \n", "Australian plain cookery / by a practical cook | \n", "https://trove.nla.gov.au/work/237367586 | \n", "NaN | \n", "1882 | \n", "https://nla.gov.au/nla.obj-579917051 | \n", "nla.obj-579917051 | \n", "English | \n", "Out of Copyright|http://rightsstatements.org/v... | \n", "148 | \n", "Book | \n", "NaN | \n", "NaN | \n", "NaN | \n", "True | \n", "australian-plain-cookery-by-a-practical-cook-n... | \n", "
22262 | \n", "The Australian women's weekly cookery book : p... | \n", "https://trove.nla.gov.au/work/237539542 | \n", "NaN | \n", "1948 | \n", "https://nla.gov.au/nla.obj-2122602128 | \n", "nla.obj-2122602128 | \n", "English | \n", "No known copyright restrictions|http://rightss... | \n", "68 | \n", "Book | \n", "NaN | \n", "NaN | \n", "NaN | \n", "True | \n", "the-australian-women-s-weekly-cookery-book-pri... | \n", "
29983 | \n", "The Banner cookery book : over 300 tested recipes | \n", "https://trove.nla.gov.au/work/24494136 | \n", "Dimboola Bush Nursing Hospital | \n", "1953 | \n", "https://nla.gov.au/nla.obj-43445961 | \n", "nla.obj-43445961 | \n", "English | \n", "Out of Copyright|http://rightsstatements.org/v... | \n", "48 | \n", "Book | \n", "NaN | \n", "NaN | \n", "NaN | \n", "True | \n", "the-banner-cookery-book-over-300-tested-recipe... | \n", "
30410 | \n", "The War chest cookery book | \n", "https://trove.nla.gov.au/work/26653596 | \n", "Citizens' War Chest Fund (N.S.W.) | \n", "1917 | \n", "https://nla.gov.au/nla.obj-37545603 | \n", "nla.obj-37545603 | \n", "English | \n", "Out of Copyright|http://rightsstatements.org/v... | \n", "156 | \n", "Book | \n", "NaN | \n", "NaN | \n", "NaN | \n", "True | \n", "the-war-chest-cookery-book-nla.obj-37545603.txt | \n", "
30637 | \n", "Southland Red Cross cookery book, 1916 | \n", "https://trove.nla.gov.au/work/26863907 | \n", "NaN | \n", "1916 | \n", "http://nla.gov.au/nla.obj-49498371 | \n", "nla.obj-49498371 | \n", "NaN | \n", "NaN | \n", "187 | \n", "Book | \n", "NaN | \n", "NaN | \n", "NaN | \n", "True | \n", "southland-red-cross-cookery-book-1916-nla.obj-... | \n", "
32264 | \n", "Flinders Island : souvenir : cookery book | \n", "https://trove.nla.gov.au/work/35649557 | \n", "Country Women's Association in Tasmania. Flind... | \n", "1946 | \n", "https://nla.gov.au/nla.obj-2531663107 | \n", "nla.obj-2531663107 | \n", "English | \n", "No known copyright restrictions|http://rightss... | \n", "84 | \n", "Book | \n", "NaN | \n", "NaN | \n", "NaN | \n", "True | \n", "flinders-island-souvenir-cookery-book-nla.obj-... | \n", "
32955 | \n", "Barossa cookery book : 400 tried recipes | \n", "https://trove.nla.gov.au/work/6619781 | \n", "Tanunda Australia Day Celebrations Committee (... | \n", "1917 | \n", "http://nla.gov.au/nla.obj-497806529 | \n", "nla.obj-497806529 | \n", "NaN | \n", "NaN | \n", "60 | \n", "Book | \n", "NaN | \n", "NaN | \n", "NaN | \n", "True | \n", "barossa-cookery-book-400-tried-recipes-nla.obj... | \n", "
32963 | \n", "\"Caroona\" cookery book : over 240 favourite re... | \n", "https://trove.nla.gov.au/work/6663148 | \n", "North Coast Methodist Homes for the Aged. Lism... | \n", "1900 | \n", "http://nla.gov.au/nla.obj-52837739 | \n", "nla.obj-52837739 | \n", "English | \n", "No known copyright restrictions|http://rightss... | \n", "54 | \n", "Book | \n", "NaN | \n", "NaN | \n", "NaN | \n", "True | \n", "caroona-cookery-book-over-240-favourite-recipe... | \n", "