{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# @ObviousOstrich Generation Experiment\n", "- @obviousostrich is a twitter account [here](!https://twitter.com/search?q=obvious%20ostrich&src=tyah)\n", "- Collected all it's tweets 17k tweets using Twitter API.\n", "- I wanted to do a small text generation experiment anyway. \n", "- **I was skeptical how a bigram-trigram model will work on a little dataset like this. It works nice on reuters dataset.**\n", "- **Surprisingly, it does okay.** The reason is the same reason I thought it would fail i.e. it has very little repeatition of words, so it has merged two or max three tweets into one, generating some funny obvious work.\n", "- The **biggest downside** of working on this small a dataset is that sometimes it just produces the exaact tweet.\n" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": true }, "outputs": [], "source": [ "import pandas as pd" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "df = pd.read_csv('procpos.csv')" ] }, { "cell_type": "code", "execution_count": 109, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " | tweets | \n", "
---|---|
0 | \n", "You were born on your birthday. | \n", "
1 | \n", "In a year from now, you'll be a year older tha... | \n", "
2 | \n", "Your chances of getting into an accident incre... | \n", "