{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Introduction\n", "For a workshop for non-technical people, I need a dataset that is out of the area of software development. But I also really like the datasets I have so far. Especially the Linux kernel Git history has some nice characteristics.\n", "\n", "So in this notebook, I take an existing dataset and transform it to a dataset non-technical people can understand. We take a look at how we can \"pseudumnize\" names and shift time-based data.\n", "\n", "So let's go!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Idea\n", "The idea is to take the simple Git log output and transform it so a fun dataset. What we have are commit timestamps and the authors who committed something into the Linux kernel Git repository:\n", "\n", "```\n", "timestamp,author\n", "2017-12-31 14:47:43,Linus Torvalds\n", "2017-12-31 13:13:56,Linus Torvalds\n", "2017-12-31 13:03:05,Linus Torvalds\n", "2017-12-31 12:30:34,Linus Torvalds\n", "2017-12-31 12:29:02,Linus Torvalds\n", "```\n", "\n", "It's a really nice dataset to get students of computer science into the first steps with Pandas and Co. . For non-technical people, we change the theme of the dataset to something more comprehensible and also more futuristic. This is the new scenario I have in my mind:\n", "\n", "> It's 2028, smart assistants are integrated into our bodies and can record certain events of the owner. In this dataset, you'll find each the curse\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Swear data" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['anus',\n", " 'arse',\n", " 'arsehole',\n", " 'ass',\n", " 'ass-hat',\n", " 'ass-jabber',\n", " 'ass-pirate',\n", " 'assbag',\n", " 'assbandit',\n", " 'assbanger',\n", " 'assbite',\n", " 'assclown',\n", " 'asscock',\n", " 'asscracker',\n", " 'asses',\n", " 'assface',\n", " 'assfuck',\n", " 'assfucker',\n", " 'assgoblin',\n", " 'asshat',\n", " 'asshead',\n", " 'asshole',\n", " 'asshopper',\n", " 'assjacker',\n", " 'asslick',\n", " 'asslicker',\n", " 'assmonkey',\n", " 'assmunch',\n", " 'assmuncher',\n", " 'assnigger',\n", " 'asspirate',\n", " 'assshit',\n", " 'assshole',\n", " 'asssucker',\n", " 'asswad',\n", " 'asswipe',\n", " 'axwound',\n", " 'bampot',\n", " 'bastard',\n", " 'beaner',\n", " 'bitch',\n", " 'bitchass',\n", " 'bitches',\n", " 'bitchtits',\n", " 'bitchy',\n", " 'blow job',\n", " 'blowjob',\n", " 'bollocks',\n", " 'bollox',\n", " 'boner',\n", " 'brotherfucker',\n", " 'bullshit',\n", " 'bumblefuck',\n", " 'butt plug',\n", " 'butt-pirate',\n", " 'buttfucka',\n", " 'buttfucker',\n", " 'camel toe',\n", " 'carpetmuncher',\n", " 'chesticle',\n", " 'chinc',\n", " 'chink',\n", " 'choad',\n", " 'chode',\n", " 'clit',\n", " 'clitface',\n", " 'clitfuck',\n", " 'clusterfuck',\n", " 'cock',\n", " 'cockass',\n", " 'cockbite',\n", " 'cockburger',\n", " 'cockface',\n", " 'cockfucker',\n", " 'cockhead',\n", " 'cockjockey',\n", " 'cockknoker',\n", " 'cockmaster',\n", " 'cockmongler',\n", " 'cockmongruel',\n", " 'cockmonkey',\n", " 'cockmuncher',\n", " 'cocknose',\n", " 'cocknugget',\n", " 'cockshit',\n", " 'cocksmith',\n", " 'cocksmoke',\n", " 'cocksmoker',\n", " 'cocksniffer',\n", " 'cocksucker',\n", " 'cockwaffle',\n", " 'coochie',\n", " 'coochy',\n", " 'coon',\n", " 'cooter',\n", " 'cracker',\n", " 'cum',\n", " 'cumbubble',\n", " 'cumdumpster',\n", " 'cumguzzler',\n", " 'cumjockey',\n", " 'cumslut',\n", " 'cumtart',\n", " 'cunnie',\n", " 'cunnilingus',\n", " 'cunt',\n", " 'cuntass',\n", " 'cuntface',\n", " 'cunthole',\n", " 'cuntlicker',\n", " 'cuntrag',\n", " 'cuntslut',\n", " 'dago',\n", " 'damn',\n", " 'deggo',\n", " 'dick',\n", " 'dick-sneeze',\n", " 'dickbag',\n", " 'dickbeaters',\n", " 'dickface',\n", " 'dickfuck',\n", " 'dickfucker',\n", " 'dickhead',\n", " 'dickhole',\n", " 'dickjuice',\n", " 'dickmilk',\n", " 'dickmonger',\n", " 'dicks',\n", " 'dickslap',\n", " 'dicksucker',\n", " 'dicksucking',\n", " 'dicktickler',\n", " 'dickwad',\n", " 'dickweasel',\n", " 'dickweed',\n", " 'dickwod',\n", " 'dike',\n", " 'dildo',\n", " 'dipshit',\n", " 'doochbag',\n", " 'dookie',\n", " 'douche',\n", " 'douche-fag',\n", " 'douchebag',\n", " 'douchewaffle',\n", " 'dumass',\n", " 'dumb ass',\n", " 'dumbass',\n", " 'dumbfuck',\n", " 'dumbshit',\n", " 'dumshit',\n", " 'dyke',\n", " 'fag',\n", " 'fagbag',\n", " 'fagfucker',\n", " 'faggit',\n", " 'faggot',\n", " 'faggotcock',\n", " 'fagtard',\n", " 'fatass',\n", " 'fellatio',\n", " 'feltch',\n", " 'flamer',\n", " 'fuck',\n", " 'fuckass',\n", " 'fuckbag',\n", " 'fuckboy',\n", " 'fuckbrain',\n", " 'fuckbutt',\n", " 'fuckbutter',\n", " 'fucked',\n", " 'fucker',\n", " 'fuckersucker',\n", " 'fuckface',\n", " 'fuckhead',\n", " 'fuckhole',\n", " 'fuckin',\n", " 'fucking',\n", " 'fucknut',\n", " 'fucknutt',\n", " 'fuckoff',\n", " 'fucks',\n", " 'fuckstick',\n", " 'fucktard',\n", " 'fucktart',\n", " 'fuckup',\n", " 'fuckwad',\n", " 'fuckwit',\n", " 'fuckwitt',\n", " 'fudgepacker',\n", " 'gay',\n", " 'gayass',\n", " 'gaybob',\n", " 'gaydo',\n", " 'gayfuck',\n", " 'gayfuckist',\n", " 'gaylord',\n", " 'gaytard',\n", " 'gaywad',\n", " 'goddamn',\n", " 'goddamnit',\n", " 'gooch',\n", " 'gook',\n", " 'gringo',\n", " 'guido',\n", " 'handjob',\n", " 'hard on',\n", " 'heeb',\n", " 'hell',\n", " 'ho',\n", " 'hoe',\n", " 'homo',\n", " 'homodumbshit',\n", " 'honkey',\n", " 'humping',\n", " 'jackass',\n", " 'jagoff',\n", " 'jap',\n", " 'jerk off',\n", " 'jerkass',\n", " 'jigaboo',\n", " 'jizz',\n", " 'jungle bunny',\n", " 'junglebunny',\n", " 'kike',\n", " 'kooch',\n", " 'kootch',\n", " 'kraut',\n", " 'kunt',\n", " 'kyke',\n", " 'lameass',\n", " 'lardass',\n", " 'lesbian',\n", " 'lesbo',\n", " 'lezzie',\n", " 'mcfagget',\n", " 'mick',\n", " 'minge',\n", " 'mothafucka',\n", " \"mothafuckin\\\\\\\\\\\\'\",\n", " 'motherfucker',\n", " 'motherfucking',\n", " 'muff',\n", " 'muffdiver',\n", " 'munging',\n", " 'negro',\n", " 'nigaboo',\n", " 'nigga',\n", " 'nigger',\n", " 'niggers',\n", " 'niglet',\n", " 'nut sack',\n", " 'nutsack',\n", " 'paki',\n", " 'panooch',\n", " 'pecker',\n", " 'peckerhead',\n", " 'penis',\n", " 'penisbanger',\n", " 'penisfucker',\n", " 'penispuffer',\n", " 'piss',\n", " 'pissed',\n", " 'pissed off',\n", " 'pissflaps',\n", " 'polesmoker',\n", " 'pollock',\n", " 'poon',\n", " 'poonani',\n", " 'poonany',\n", " 'poontang',\n", " 'porch monkey',\n", " 'porchmonkey',\n", " 'prick',\n", " 'punanny',\n", " 'punta',\n", " 'pussies',\n", " 'pussy',\n", " 'pussylicking',\n", " 'puto',\n", " 'queef',\n", " 'queer',\n", " 'queerbait',\n", " 'queerhole',\n", " 'renob',\n", " 'rimjob',\n", " 'ruski',\n", " 'sand nigger',\n", " 'sandnigger',\n", " 'schlong',\n", " 'scrote',\n", " 'shit',\n", " 'shitass',\n", " 'shitbag',\n", " 'shitbagger',\n", " 'shitbrains',\n", " 'shitbreath',\n", " 'shitcanned',\n", " 'shitcunt',\n", " 'shitdick',\n", " 'shitface',\n", " 'shitfaced',\n", " 'shithead',\n", " 'shithole',\n", " 'shithouse',\n", " 'shitspitter',\n", " 'shitstain',\n", " 'shitter',\n", " 'shittiest',\n", " 'shitting',\n", " 'shitty',\n", " 'shiz',\n", " 'shiznit',\n", " 'skank',\n", " 'skeet',\n", " 'skullfuck',\n", " 'slut',\n", " 'slutbag',\n", " 'smeg',\n", " 'snatch',\n", " 'spic',\n", " 'spick',\n", " 'splooge',\n", " 'spook',\n", " 'suckass',\n", " 'tard',\n", " 'testicle',\n", " 'thundercunt',\n", " 'tit',\n", " 'titfuck',\n", " 'tits',\n", " 'tittyfuck',\n", " 'twat',\n", " 'twatlips',\n", " 'twats',\n", " 'twatwaffle',\n", " 'unclefucker',\n", " 'va-j-j',\n", " 'vag',\n", " 'vagina',\n", " 'vajayjay',\n", " 'vjayjay',\n", " 'wank',\n", " 'wankjob',\n", " 'wetback',\n", " 'whore',\n", " 'whorebag',\n", " 'whoreface',\n", " 'wop']" ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from string import ascii_lowercase\n", "from bs4 import BeautifulSoup\n", "import requests\n", "\n", "URL = \"https://www.noswearing.com/dictionary/\"\n", "\n", "swear_words = []\n", "\n", "for letter in ascii_lowercase:\n", " html_content = str(requests.get(URL + letter).content)\n", " soup = BeautifulSoup(html_content, 'lxml')\n", " for tag in soup.find_all(\"table\")[2].find_all(\"a\"):\n", "\n", " if \"name\" in tag.attrs:\n", " swear_words.append(tag['name'])\n", " \n", "swear_words" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "ename": "NameError", "evalue": "name 'p' is not defined", "output_type": "error", "traceback": [ "\u001b[1;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[1;31mNameError\u001b[0m Traceback (most recent call last)", "\u001b[1;32m\u001b[0m in \u001b[0;36m\u001b[1;34m()\u001b[0m\n\u001b[0;32m 3\u001b[0m \u001b[0mswear_word_dfs\u001b[0m \u001b[1;33m=\u001b[0m \u001b[1;33m[\u001b[0m\u001b[1;33m]\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 4\u001b[0m \u001b[1;32mfor\u001b[0m \u001b[0mletter\u001b[0m \u001b[1;32min\u001b[0m \u001b[0mstring\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mascii_lowercase\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m----> 5\u001b[1;33m \u001b[0mp\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m", "\u001b[1;31mNameError\u001b[0m: name 'p' is not defined" ] } ], "source": [ "import string\n", "\n", "swear_word_dfs = []\n", "for letter in string.ascii_lowercase:\n", " p" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.4" } }, "nbformat": 4, "nbformat_minor": 2 }