{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# textgenrnn 1.1 Demo\n",
    "\n",
    "by [Max Woolf](http://minimaxir.com)\n",
    "\n",
    "*Max's open-source projects are supported by his [Patreon](https://www.patreon.com/minimaxir). If you found this project helpful, any monetary contributions to the Patreon are appreciated and will be put to good creative use.*"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Intro\n",
    "\n",
    "textgenrnn is a Python module on top of Keras/TensorFlow which can easily generate text using a pretrained recurrent neural network:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "Using TensorFlow backend.\n"
     ]
    }
   ],
   "source": [
    "from textgenrnn import textgenrnn\n",
    "\n",
    "textgen = textgenrnn()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Generate Text\n",
    "\n",
    "The `generate` function generates `n` text documents:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Some seeing settings for the most of a story of the Nexus 6 comments to be a book in a trip in the Ballshore Price of the Streets of Reddit\n",
      "\n",
      "What is this crappy? I want to see this in the game. Any ideas to recover the bathroom and this is looking at the day of this game of the most story\n",
      "\n",
      "[PC] [H] 5 GTX 1080 [W] PayPal\n",
      "\n",
      "New Culture is in the team for the screenshots of the top of a holiday control and realistically a speaker to the trip in 2017 and there was a super of the world?\n",
      "\n",
      "TIFU by telling someone please add this with a card\n",
      "\n"
     ]
    }
   ],
   "source": [
    "textgen.generate(5)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In addition, you can set the `temperature` to modify the amount of creativity (default 0.5; I do not recommend setting to more than 1.0), set a `prefix` to force the document to start with certain characters and generate characters accordingly, and set a `return_as_list` flag (default False) to use the generated texts elsewhere in your application (e.g. as an API)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[\"Trump is the best thing and I don't know what to do anymore.\", 'Trump is a lot of products and some of the subreddits is the best state of the price of the same to stop the only one of the starts and started the dead of the police of the state of the same time in the comments to be a lot of the same time in the state of the game to the season to a community in', 'Trump starting a post and a big state and what is the day?', 'Trump is a bit of the same subreddit and a bad story of the states of the state of the state of the party in the same to the state of the state of the same series is a good day.', 'Trump control the state of the season to the story of the state of the state of the state of the states of the state of the new company that has a state of the story of the state of the state of the season to the state of the state of the story of the state of the state of the state of the same ti']\n"
     ]
    }
   ],
   "source": [
    "generated_texts = textgen.generate(n=5, prefix=\"Trump\", temperature=0.2, return_as_list=True)\n",
    "print(generated_texts)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Using `generate_samples()` is a great way to test the model at different temperatures."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "####################\n",
      "Temperature: 0.2\n",
      "####################\n",
      "The same car can be a good day!\n",
      "\n",
      "[Specific] Can someone please remove the state of the same community and send me the state of the most poster is a stream to the computer to the state of the parents in the problems in the same state with a stream on the same controller of the state of the same second of the background of the same\n",
      "\n",
      "The best state of the same series is a community where the control is to show off the state of the state of the state of the state of the same second things and the most poster with a girl in the results.\n",
      "\n",
      "####################\n",
      "Temperature: 0.5\n",
      "####################\n",
      "Me irl\n",
      "\n",
      "An Apparent Confirm is a card straight of the health store and the most beautiful story of the desk today. Need advice to start in the state today.\n",
      "\n",
      "Why does this sub mean the man like the content was out of the personal. (Artiscouns the box)\n",
      "\n",
      "####################\n",
      "Temperature: 1.0\n",
      "####################\n",
      "(18f) when your dead spider child doors without ribs!\n",
      "\n",
      "50 in fair to painful regions fush of X3410\n",
      "\n",
      "I was a little 1993 [Rario Rock] (1879)\n",
      "\n"
     ]
    }
   ],
   "source": [
    "textgen.generate_samples()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You may also `generate_to_file()` to make the generated texts easier to copy/paste to other sources (e.g. blog/social media):"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "textgen.generate_to_file('textgenrnn_texts.txt', n=5)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Train on New Text\n",
    "\n",
    "As shown above, the results on the pretrained model vary greatly, since it's a lot of data compressed in a small form. The `train_on_texts` function fine-tunes the model on a new dataset."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Training on 174 character sequences.\n",
      "Epoch 1/2\n",
      "1/1 [==============================] - 1s 1s/step - loss: 1.3786\n",
      "Epoch 2/2\n",
      "1/1 [==============================] - 0s 152ms/step - loss: 1.0988\n",
      "####################\n",
      "Temperature: 0.2\n",
      "####################\n",
      "Never down and gonna gonna gonna you gonnally you cry and gonnalland you gonnallen you cry you gonnallen you and gonna gonna gonna you and gonna let you down\n",
      "\n",
      "Never down and gonna gonna gonna you and desert you and gonna let yount you\n",
      "\n",
      "Never down and gonna gonna gonna you cry and gonna gonna gonna you to gonna let you up gonna you gonnally you to gonna gonna you gonnally you to gonna gonna gonna gonna you cry and dennat you and gonna let yount you\n",
      "\n",
      "####################\n",
      "Temperature: 0.5\n",
      "####################\n",
      "Never dond genny gonnally granglet and gonnall you gonnallen you and gonna gonna year you and gonna let you\n",
      "\n",
      "Never down and gonna gonna you gonnally you\n",
      "\n",
      "Never down around you up, and gonna gonna revent you and dungeon crying\n",
      "\n",
      "####################\n",
      "Temperature: 1.0\n",
      "####################\n",
      "Gay you son nugete in today, and gonna givan>e gonnallacu gonna gonna gonna love you (gonnely grown)\n",
      "\n",
      "Nevertanly gonna you by and gonna giveawake you cry thingun, you that prowater and dental you and gonna let you supemy and live you unde around you on tray letter toxianical yell\n",
      "\n",
      "Nevet gont constain gonna gonnally guy never gonna let you gonna gonna you gonna centain\n",
      "\n"
     ]
    }
   ],
   "source": [
    "texts = ['Never gonna give you up, never gonna let you down',\n",
    "            'Never gonna run around and desert you',\n",
    "            'Never gonna make you cry, never gonna say goodbye',\n",
    "            'Never gonna tell a lie and hurt you']\n",
    "\n",
    "textgen.train_on_texts(texts, num_epochs=2,  gen_epochs=2)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Although the network was only trained on 4 texts, the original network still transfers the latent knowledge of all modern grammar and can incorporate that knowledge into generated texts, which becomes evident at higher temperatures or when using a prefix containign a character not present in the original dataset."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You can reset a trained model back to the original state by calling `reset()`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [],
   "source": [
    "textgen.reset()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Included in the repository is a `hacker-news-2000.txt` file containing a list of the Top 2000 [Hacker News](https://news.ycombinator.com/news) submissions by score. Let's retrain the model using that dataset.\n",
    "\n",
    "For this example, I only will use a single epoch to demonstrate how easily the model learns with just one pass of the data: I recommend leaving the default of 50 epochs, or set it even higher for complex datasets. On my 2016 15\" MacBook Pro (quad-core Skylake CPU), the dataset trains at about 1.5 minutes per epoch."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "2,000 texts collected.\n",
      "Training on 83,491 character sequences.\n",
      "Epoch 1/1\n",
      "652/652 [==============================] - 104s 160ms/step - loss: 1.8845\n",
      "####################\n",
      "Temperature: 0.2\n",
      "####################\n",
      "How to start to start to make a defallen to the top defenderes\n",
      "\n",
      "The Firefox and Startup Continument\n",
      "\n",
      "A Free Reserver\n",
      "\n",
      "####################\n",
      "Temperature: 0.5\n",
      "####################\n",
      "Tesla is a boa to die\n",
      "\n",
      "GitHub is now opening to the protest and the internet piece you one on a show in the the sight on a crime to transformanet\n",
      "\n",
      "How to Favon Problem and Operman News\n",
      "\n",
      "####################\n",
      "Temperature: 1.0\n",
      "####################\n",
      "Open Mission\n",
      "\n",
      "\"Gomain Freath Boots\n",
      "\n",
      "How Perrating Interativation Zh.\n",
      "\n"
     ]
    }
   ],
   "source": [
    "textgen.train_from_file('../datasets/hacker_news_2000.txt', num_epochs=1)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now, we can create very distinctly-HN titles, even with the very little amount of training, thanks to the pre-trained nature of the textgenrnn:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Apple Dedicter Has to Portal Asked A George To Be Winter About An AP Ask Defenderation\n",
      "\n",
      "Apple Card Continumes Code\n",
      "\n",
      "Apple and Hell Care Contersonal Program Consistent of The Port\n",
      "\n",
      "Apple Honest used a tech probered your browser\n",
      "\n",
      "Apple says he supports do your favount\n",
      "\n"
     ]
    }
   ],
   "source": [
    "textgen.generate(5, prefix=\"Apple\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Other runtime parameters for `train_on_text` and `train_from_file` are:\n",
    "\n",
    "* `num_epochs`: Number of epochs to train for (default: 50)\n",
    "* `gen_epochs`: Number of epochs to run between generating sample outputs; good for measuring model progress (default: 1)\n",
    "* `batch_size`: Batch size for training; may want to increase if running on a GPU for faster training (default: 128)\n",
    "* `train_size`: Random proportion of sequence samples to keep: good for controlling overfitting. The rest will be used to train as the validation set. (default: 1.0/all). To disable training on the validation set (for speed), set `validation=False`.\n",
    "* `dropout`: Random number of tokens to ignore each epoch. Good for controlling overfitting/making more resilient against typos, but setting too high will cause network to converge prematurely. (default: 0.0)\n",
    "* `is_csv`: Use with `train_from_file` if the source file is a one-column CSV (e.g. an export from BigQuery or Google Sheets) for proper quote/newline escaping."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Save and Load the Model\n",
    "\n",
    "The model saves the weights automatically after each epoch, or you can call `save()` and give a HDF5 filename. Those weights can then be loaded into a new textgenrnn model by specifying a path to the weights on construction. (Or use `load()` for an existing textgenrnn object)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "####################\n",
      "Temperature: 0.2\n",
      "####################\n",
      "Google is a broke and down\n",
      "\n",
      "A Starter in Amazering in a Google Startup\n",
      "\n",
      "The Web Defines Supports For Conternation\n",
      "\n",
      "####################\n",
      "Temperature: 0.5\n",
      "####################\n",
      "Good Dropbox Down to die\n",
      "\n",
      "Playing Source Aid of Adding Program For Operate\n",
      "\n",
      "Will Claim To So Formate For Founder\n",
      "\n",
      "####################\n",
      "Temperature: 1.0\n",
      "####################\n",
      "We was direction factiongend fine and GoDake\n",
      "\n",
      "John no over relied from new Stude\n",
      "\n",
      "Leaching A “Yeuari Drawn”\n",
      "\n"
     ]
    }
   ],
   "source": [
    "textgen_2 = textgenrnn('textgenrnn_weights.hdf5')\n",
    "textgen_2.generate_samples()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {
    "scrolled": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([[ True,  True,  True, ...,  True,  True,  True],\n",
       "       [ True,  True,  True, ...,  True,  True,  True],\n",
       "       [ True,  True,  True, ...,  True,  True,  True],\n",
       "       ...,\n",
       "       [ True,  True,  True, ...,  True,  True,  True],\n",
       "       [ True,  True,  True, ...,  True,  True,  True],\n",
       "       [ True,  True,  True, ...,  True,  True,  True]])"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "textgen.model.get_layer('rnn_1').get_weights()[0] == textgen_2.model.get_layer('rnn_1').get_weights()[0]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Indeed, the weights between the original model and the new model are equivalent.\n",
    "\n",
    "You can use this functionality to load models from others which have been trained on larger datasets with many more epochs (and the model weights are small enough to fit in an email!)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "####################\n",
      "Temperature: 0.2\n",
      "####################\n",
      "A startup’s Firebase bill suddenly increased from $25 to $1750 per month\n",
      "\n",
      "A Sister’s Eulogy for Steve Jobs\n",
      "\n",
      "The Website That Got Me Expelled\n",
      "\n",
      "####################\n",
      "Temperature: 0.5\n",
      "####################\n",
      "How to Sleep\n",
      "\n",
      "Ask HN: What are some examples of successful single-person businesses?\n",
      "\n",
      "Solar Roof\n",
      "\n",
      "####################\n",
      "Temperature: 1.0\n",
      "####################\n",
      "Germanā was asked a disappeared to dismay explosions to just work\n",
      "\n",
      "“Gat Vueal Cities, Weakphad\n",
      "\n",
      "The New Internet Talk\n",
      "\n",
      "####################\n",
      "Temperature: 1.2\n",
      "####################\n",
      "In Iraq, for jhilious backg program in a comparn\n",
      "\n",
      "Solar Rook Point: The NSA-indiepatorogamerace\n",
      "\n",
      "Ask.jecket shilk\n",
      "\n",
      "####################\n",
      "Temperature: 1.5\n",
      "####################\n",
      "Nodelover Misses wringless for ununticed processing on Ach destroyer is a watched sysemer \n",
      "\n",
      "Police Leeh was ruins to work RTA-a litrame related\n",
      "\n",
      "A $10K tiny homepage in RSH galace\n",
      "\n"
     ]
    }
   ],
   "source": [
    "textgen = textgenrnn('../weights/hacker_news.hdf5')\n",
    "textgen.generate_samples(temperatures=[0.2, 0.5, 1.0, 1.2, 1.5])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Training a New Model\n",
    "\n",
    "You can train a new model using any modern RNN architecture you want by calling `train_new_model` if supplying texts, or adding a `new_model=True` parameter if training from a file. If you do, the model will save a `config` file and a `vocab` file in addition to the weights, and those must be also loaded into a `textgenrnn` instances.\n",
    "\n",
    "The config parameters available are:\n",
    "\n",
    "* `word_level`: Whether to train the model at the word level (default: False)\n",
    "* `rnn_layers`: Number of recurrent LSTM layers in the model (default: 2)\n",
    "* `rnn_size`: Number of cells in each LSTM layer (default: 128)\n",
    "* `rnn_bidirectional`: Whether to use Bidirectional LSTMs, which account for sequences both forwards and backwards. Recommended if the input text follows a specific schema. (default: False)\n",
    "* `max_length`: Maximum number of previous characters/words to use before predicting the next token. This value should be reduced for word-level models (default: 40)\n",
    "* `max_words`: Maximum number of words (by frequency) to consider for training (default: 10000)\n",
    "* `dim_embeddings`: Dimensionality of the character/word embeddings (default: 100)\n",
    "\n",
    "You can also specify a `name` when creating a textgenrnn instance which will help name the output weights/config/vocab appropriately."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [],
   "source": [
    "textgen = textgenrnn(name=\"new_model\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {
    "scrolled": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "2,000 texts collected.\n",
      "Training new model w/ 2-layer, 64-cell Bidirectional LSTMs\n",
      "Training on 83,491 character sequences.\n",
      "Epoch 1/1\n",
      "652/652 [==============================] - 167s 256ms/step - loss: 2.6925\n",
      "####################\n",
      "Temperature: 0.2\n",
      "####################\n",
      "A Stork HN: A Scroder the Introder and the for and for mage a the from dead sourcing from for croder a contion coner the from coner source\n",
      "\n",
      "Chomer the Introder a decricting the and dercroder\n",
      "\n",
      "Ander Crode Stor Start Staker Stor Storter\n",
      "\n",
      "####################\n",
      "Temperature: 0.5\n",
      "####################\n",
      "Google Deackic lame in I eried diter from from for dust for loce a coner the compler\n",
      "\n",
      "A in Ancricting from ine gepleater\n",
      "\n",
      "Engingractome to Beam Start the accent conertion from stict and conting honer and muniter sagers funce\n",
      "\n",
      "####################\n",
      "Temperature: 1.0\n",
      "####################\n",
      "I webshs FrrMack A PSRoerraubtal the Thinsed\n",
      "\n",
      "Ja ComestoulR, mucting shespuled compernuctions iplionta sitok\n",
      "\n",
      "Enot wart  Daters\n",
      "\n",
      "__________________________________________________________________________________________________\n",
      "Layer (type)                    Output Shape         Param #     Connected to                     \n",
      "==================================================================================================\n",
      "input (InputLayer)              (None, 40)           0                                            \n",
      "__________________________________________________________________________________________________\n",
      "embedding (Embedding)           (None, 40, 300)      31500       input[0][0]                      \n",
      "__________________________________________________________________________________________________\n",
      "rnn_1 (Bidirectional)           (None, 40, 128)      186880      embedding[0][0]                  \n",
      "__________________________________________________________________________________________________\n",
      "rnn_2 (Bidirectional)           (None, 40, 128)      98816       rnn_1[0][0]                      \n",
      "__________________________________________________________________________________________________\n",
      "rnn_concat (Concatenate)        (None, 40, 556)      0           embedding[0][0]                  \n",
      "                                                                 rnn_1[0][0]                      \n",
      "                                                                 rnn_2[0][0]                      \n",
      "__________________________________________________________________________________________________\n",
      "attention (AttentionWeightedAve (None, 556)          556         rnn_concat[0][0]                 \n",
      "__________________________________________________________________________________________________\n",
      "output (Dense)                  (None, 105)          58485       attention[0][0]                  \n",
      "==================================================================================================\n",
      "Total params: 376,237\n",
      "Trainable params: 376,237\n",
      "Non-trainable params: 0\n",
      "__________________________________________________________________________________________________\n",
      "None\n"
     ]
    }
   ],
   "source": [
    "textgen.reset()\n",
    "textgen.train_from_file('../datasets/hacker_news_2000.txt',\n",
    "                        new_model=True,\n",
    "                        rnn_bidirectional=True,\n",
    "                        rnn_size=64,\n",
    "                        dim_embeddings=300,\n",
    "                        num_epochs=1)\n",
    "\n",
    "print(textgen.model.summary())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "####################\n",
      "Temperature: 0.2\n",
      "####################\n",
      "An Corentict the wath and dester the and from deaction\n",
      "\n",
      "A Inderer from conter died deand and deresiction\n",
      "\n",
      "A Croder Introder a the wath a the wash and and and deacting gand a deard daser and deread and deand degring the source\n",
      "\n",
      "####################\n",
      "Temperature: 0.5\n",
      "####################\n",
      "Andorac Appler Fand Stolk Intricer Stor\n",
      "\n",
      "Thot HN: Wak Cade to Stork Intreed the and a for darnes of frond the what a counting enerningers\n",
      "\n",
      "Lict The Scrabe Stor New\n",
      "\n",
      "####################\n",
      "Temperature: 1.0\n",
      "####################\n",
      "SOvarmezSQLP Cutiders HTest Epane Bat Speare now\n",
      "\n",
      "I CChlod's &tites Show HN: S halt Gefl, CPo”\n",
      "\n",
      "V. from.0 act, a giment midiodup Snters you 19 peflity fram gocapes cnaninald for hnowser daing xpoict a frel legiens equm deer’s, ronet causter\n",
      "\n"
     ]
    }
   ],
   "source": [
    "textgen_2 = textgenrnn(weights_path='new_model_weights.hdf5',\n",
    "                       vocab_path='new_model_vocab.json',\n",
    "                       config_path='new_model_config.json')\n",
    "\n",
    "textgen_2.generate_samples()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Train on Single Large Text\n",
    "\n",
    "Although textgenrnn is intended to be trained on text documents, you can train it on a large text block using `train_from_largetext_file` (which loads the entire file and processes it as if it were a single document) and it should work fine. This is akin to more traditional char-rnn tutorials.\n",
    "\n",
    "Training a new model is recommended (and is the default). When calling `generate`, you may want to increase the value of `max_gen_length`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Training new model w/ 2-layer, 128-cell LSTMs\n",
      "Training on 600,852 character sequences.\n",
      "Epoch 1/1\n",
      "4694/4694 [==============================] - 692s 147ms/step - loss: 2.0114\n",
      "####################\n",
      "Temperature: 0.2\n",
      "####################\n",
      "self-desire of the stronger of the most of the strong successions of the such and so such a self--and which have any the strength of the strong and in the sense of the longer and which is a self-sense of the successions of the specilar of the specilar of the strong and sense of the strength, and sup\n",
      "\n",
      "f-superious nature of the stronger and self-superious of the superious and so success of the specilar of the strength and stronge of the sacribility of the sense of the strength and in the strength and in the strength, and sense of the stronger success of the spections of the species and self-desire\n",
      "\n",
      "ny success of the sacrifice of the strength and stronger of the stronger of the stronger to the sacrifice of the specilar of the sacrifice of the sense of the specilar of the specilar of the superious of the sense of the specilar of the strength and in the sense of the strength, and a self--the stro\n",
      "\n",
      "####################\n",
      "Temperature: 0.5\n",
      "####################\n",
      "dual stronge of the states of things and successionsly every longer to same one one way are which an altently only the sacrifical course when the the say and success of the such and maker who serve of success and\n",
      "a desires and the little that far superstitions, which all the sapritions of supposite,\n",
      "\n",
      " of the heeds not in the every one with the \"surches.=--In a fact as is is the soul as resality of sensions of the churpence of the sense of any the nature of dread have one any any man charaction of the nature of the inchiver of the sensions of knowled and can for a fully and of the sense. In virtu\n",
      "\n",
      "he new sees of means of the streagn for human to staice, and sense of the sense and with a nature and desire itself of the most through that the sacrible of the most constitute obvery should so the stronger is in the quality to the proparty and revelop one can one is spectific servations, and it alm\n",
      "\n",
      "####################\n",
      "Temperature: 1.0\n",
      "####################\n",
      " condugiaments, sun of a proved orgaine! Many sacrist--have, injuriblars, seacticully satist..-we our lughs sufference\n",
      "which of\n",
      "it that barth, lusally power result, its out only\n",
      "selvence ones paily, inpurisow, is a cupreons or all a her ands, overwhoile upon that maghesh, of the hacileny admants not\n",
      "\n",
      "st, day Wick sirences as about were, lought in should homal, one is langistial clinsion that now\n",
      "be\n",
      "attempts? Then acts age, who\n",
      "experient, orestor butner, they shulf. The shustaish\n",
      "illins so in sense ijout when\n",
      "when accesple\" curron: it wrough have obesily esamms of are ,ameble;--through one at als\n",
      "\n",
      "y constart of wild from for takless. The storle can chusistenionsly, would some\n",
      "satical hoase of develop pition commands. Inded in whether only\n",
      "any shbiaw\n",
      "juthful.=--Accordialar all the\n",
      "crustion (one men, she like is their spirit he, a sider. As Ire farth--this Pantaction\n",
      "phyporomity of varrible his\n",
      "\n"
     ]
    }
   ],
   "source": [
    "from keras.utils.data_utils import get_file\n",
    "\n",
    "fulltext_path = get_file('nietzsche.txt', origin='https://s3.amazonaws.com/text-datasets/nietzsche.txt')\n",
    "\n",
    "textgen.reset()\n",
    "textgen.train_from_largetext_file(fulltext_path, new_model=True, num_epochs=1)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Training on text at the word level works great, although it's strongly recommended to reduce the `max_length` and `max_gen_length` during training!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Training new model w/ 2-layer, 128-cell LSTMs\n",
      "Training on 131,763 word sequences.\n",
      "Epoch 1/1\n",
      "1029/1029 [==============================] - 109s 106ms/step - loss: 5.3123\n",
      "####################\n",
      "Temperature: 0.2\n",
      "####################\n",
      "- the the of the truth . the\n",
      "\n",
      " has not a thing of the fact that which is\n",
      "to be a thing of the \" truth \" \" man \" \" - - \" the \" \" man \" \" is a\n",
      "\" modern ideas \"\n",
      "\n",
      "- the the of the \" soul \" is a\n",
      "\n",
      " of the \" modern ideas , \" \" and \" \"\n",
      "\" modern ideas , \" \" and \" \" modern ideas , \" \" and \"\n",
      "\" modern ideas , \" \" and \" \"\n",
      "\n",
      "- the of the soul . = - - the\n",
      "most sense of the world , which is the most\n",
      "of the \" modern ideas , \" \" \" the \" \" modern ideas , \"\n",
      "the \" spirit \" \" and \" \" modern ideas ,\n",
      "\n",
      "####################\n",
      "Temperature: 0.5\n",
      "####################\n",
      "a man . = - - - - the first of the\n",
      "sense of the value of the world , which must be\n",
      "very much , the , in all , as the\n",
      "of the \" modern ideas , \" \" \" - - \" this ,\n",
      "\n",
      "- the and that the old knowledge has the\n",
      "\n",
      " of his head and his own will to be\n",
      "with the same time , as the problem is at\n",
      "the same time that the most heart are the most\n",
      "delicate and of the world , which\n",
      "\n",
      "difficult to be no remained of the\n",
      "\n",
      " \" modern ideas , \" \" i mean \" \" i mean \" \" - - - - and\n",
      "now , \" the \" \" \" modern ideas , \" \" and \" the \"\n",
      "\" man \" \"\n",
      "\n",
      "####################\n",
      "Temperature: 1.0\n",
      "####################\n",
      "profound desire for badness is a\n",
      "certain hand for the hang of cause individuals : how\n",
      "i he now , begin to appear what knowledge ! it was what\n",
      "have to our origin morality , this more house is made in logic - causing among universal \n",
      "\n",
      "\n",
      "and estimate of the sea into\n",
      "instincts would have been insolence in the rule : recollection\n",
      "and siegfried knowledge can be regarded as the cowardly who for such\n",
      "let this view of the mental place , it was not .\n",
      "\n",
      " 11 . to gratification desires\n",
      "\n",
      "and finger ! - - and not proud of artists that\n",
      "the case , i could say that he can be still any other\n",
      "man upon what it is to the synthetic ego of personal deceived , a\n",
      "bearing lack of germans , moral , and feelings\n",
      "\n"
     ]
    }
   ],
   "source": [
    "textgen.reset()\n",
    "textgen.train_from_largetext_file(fulltext_path, new_model=True, num_epochs=1,\n",
    "                                  word_level=True,\n",
    "                                  max_length=10,\n",
    "                                  max_gen_length=50,\n",
    "                                  max_words=5000)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# LICENSE\n",
    "\n",
    "MIT License\n",
    "\n",
    "Copyright (c) 2018 Max Woolf\n",
    "\n",
    "Permission is hereby granted, free of charge, to any person obtaining a copy\n",
    "of this software and associated documentation files (the \"Software\"), to deal\n",
    "in the Software without restriction, including without limitation the rights\n",
    "to use, copy, modify, merge, publish, distribute, sublicense, and/or sell\n",
    "copies of the Software, and to permit persons to whom the Software is\n",
    "furnished to do so, subject to the following conditions:\n",
    "\n",
    "The above copyright notice and this permission notice shall be included in all\n",
    "copies or substantial portions of the Software.\n",
    "\n",
    "THE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR\n",
    "IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,\n",
    "FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE\n",
    "AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER\n",
    "LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,\n",
    "OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE\n",
    "SOFTWARE."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.4"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}