{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Using PCA to visualize the MtG universe\n", "\n", "In this notebook, we're going to scrape Magic the Gathering's Gatherer card database and then perform principal components analysis to visualize hidden relationships between cards. Our goal will be to see how much card-to-card variation can be simplified and then plotted in two-dimensions and again what those card groupings look like.\n", "\n", "\n", "\n", "This data set is very high-dimensional -- there are over a 100 unique mechanics in the game and the game state has many different elements (hand, battlefield, mana pool, etc.). Being able to translate the 13,000 unique card texts into structured data is also a challenge NLP-related task." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##Warning: This notebook is long...so, for the impatient:\n", "\n", "Here is what we will be working towards, *a programmatic mapping of every Magic card ever made *across two psuedo-axes:\n", "\n", "\n", "\n", "We will show that while Magic cards can differ in thousands of ways, they can be roughly categorized based on two simple measures: how \"creature-y\" are they? and how much do they related to the board or non-board state?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Implementation details\n", "\n", "Pretty baller, right? We will interpret and grok this graph later, but for now, let's do this...\n", "\n", "*LEERRRROOYYY JENNNKIINNNNNSSS*." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Outline\n", "Here's a breakdown of the four steps that we'll go through to accomplish this task:\n", "1. Scrape + clean the data using `requests`, `web` from `pattern`, and `pandas`\n", "- Extract features from the data using `fuzzywuzzy` and domain knowledge\n", "- Perform and analyze PCA using `sklearn`\n", "- Visualize + interpret results using the `plotly` graphing library" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "*First some boring imports and settings (feel free to skip over)*" ] }, { "cell_type": "code", "execution_count": 172, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# boring imports\n", "\n", "%matplotlib inline\n", "import numpy as np\n", "import pandas as pd\n", "import matplotlib.pylab as plt\n", "\n", "import requests\n", "from pattern import web\n", "requests.packages.urllib3.disable_warnings()\n", "\n", "import re, string\n", "from sets import Set\n", "from collections import Counter\n", "from fuzzywuzzy import fuzz\n", "\n", "database = {}\n", "pd.set_option('display.max_rows', 10)\n", "\n", "# Silly helper functions\n", "\n", "def isInt(s):\n", " try: \n", " int(s)\n", " return True\n", " except ValueError:\n", " return False\n", "\n", "def anyIntOrColor(l):\n", " for val in l:\n", " if isInt(val) | (val in ['Black', 'Red', 'Green', 'Blue', 'White']) : return True\n", " return False" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# (1) -- Scrape baby, scrape\n", "\n", "Our first order of business is scraping the data from the Gatherer database using `requests` and `web` from `pattern`. In it's simplest form, every Magic card has a name, text, type, mana cost, and power/toughness (if it's a creature). An example is Hypnotic Specter, a powerful creature in the early days of Magic:\n", "\n", "\n", "\n", "To scrape the relevant card features, we will construct card URLs using the card's `multiverse_id` and on the page we load will look for unique HTML elements that correspond to each of the features we will to obtain." ] }, { "cell_type": "code", "execution_count": 173, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# grabCard scrapes:\n", "\n", "# name, types, text (lowered, alphanumeritized), mana cost,\n", "# cmc, power and toughness, and rarity.\n", "\n", "# and adds it to the global card database\n", "\n", "def grabCard(multiverse_id):\n", " xml = \"http://gatherer.wizards.com/Pages/Card/Details.aspx?multiverseid=\" + str(multiverse_id)\n", " dom = web.Element(requests.get(xml).text)\n", " \n", " # card name, card type\n", " cardName = dom('div.cardImage img')[0].attributes['alt'] if dom('div .cardImage img') else ''\n", " cardType = [element.strip() for element in \\\n", " dom('div#ctl00_ctl00_ctl00_MainContent_SubContent_SubContent_typeRow div.value')[0].content.split(u'\\u2014')]\n", " \n", " # extract, parse, clean text into a list\n", " cardText = []\n", " pattern = re.compile('[\\W_]+')\n", " for line in dom('div.cardtextbox'):\n", " for element in line:\n", " cardText.append(element)\n", " \n", " for i in xrange(len(cardText)):\n", " if cardText[i].type == 'element' and cardText[i].tag == 'img':\n", " cardText[i] = cardText[i].attributes['alt']\n", " else:\n", " cardText[i] = str(cardText[i]).strip().lower()\n", " pattern.sub('', cardText[i]) \n", " \n", " # mana symbols\n", " manaCost = [element.attributes['alt'] for element in dom('div#ctl00_ctl00_ctl00_MainContent_SubContent_SubContent_manaRow div.value img')]\n", " cmc = int(dom('div#ctl00_ctl00_ctl00_MainContent_SubContent_SubContent_cmcRow div.value')[0].content.strip()) \\\n", " if dom('div#ctl00_ctl00_ctl00_MainContent_SubContent_SubContent_cmcRow div.value') else np.nan\n", " \n", " # rarity\n", " rarity = dom('div #ctl00_ctl00_ctl00_MainContent_SubContent_SubContent_rarityRow div.value span')[0].content.lower()\n", " \n", " # p/t\n", " power = np.nan\n", " power = [_.strip() for _ in dom('div#ctl00_ctl00_ctl00_MainContent_SubContent_SubContent_ptRow div.value')[0].content.split(' / ')][0] \\\n", " if dom('div#ctl00_ctl00_ctl00_MainContent_SubContent_SubContent_ptRow div.value') else np.nan\n", " power = float(power) if power != '*' and power != np.nan else np.nan\n", " toughness = [_.strip() for _ in dom('div#ctl00_ctl00_ctl00_MainContent_SubContent_SubContent_ptRow div.value')[0].content.split(' / ')][1] \\\n", " if dom('div#ctl00_ctl00_ctl00_MainContent_SubContent_SubContent_ptRow div.value') else np.nan\n", " toughness = float(toughness) if (toughness != '*' and toughness != '7-*' and toughness != np.nan) else np.nan\n", " \n", " # add data\n", " database[cardName] = {\n", " 'cardType' : cardType,\n", " 'cardText' : cardText,\n", " 'manaCost' : manaCost,\n", " 'cmc' : cmc,\n", " 'rarity': rarity,\n", " 'power' : power,\n", " 'toughness' : toughness\n", " }" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Perform the scraping\n", "\n", "We'll iterate through a range of `multiverse_id`s to scrape a desired amount of cards. Note that it takes around 1 minute/500 `multiverse_id`s. Given that there are 13k+ cards (and multiple versions of each -- see below), we'll limit our scraping to ~500 cards from the very first Magic set: Alpha." ] }, { "cell_type": "code", "execution_count": 174, "metadata": { "collapsed": false, "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Grabbed 100\n", "Grabbed 200\n", "Grabbed 300\n", "Grabbed 400\n", "Grabbed 500\n", "Done!\n" ] } ], "source": [ "cardsToScrape = 600\n", "\n", "for i in xrange(1, cardsToScrape):\n", " if (i % 100 == 0): print \"Grabbed \" + str(i)\n", " grabCard(i)\n", "\n", "print \"Done!\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "At this point, we now have roughly `cardsToScrape` cards and associated values in a local `dict` using the `cardName` as the key. (Note that we have less than `cardsToScrape` as we're iterating over `multiverse_id`s and some ids don't actually match to a card page.)\n", "\n", "### Note for potential future work\n", "\n", "*There are other aspects represented on the Gatherer database such as set and community ratings but we leave this to future work. Annoyingly, for cards in multiple sets, the card will have a different page (and subsequently different set of ratings) for each set; though this would require more work, it'd be super interesting if you could predict a card's community interest (# ratings) and favorability (average rating).*" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Making the data usable\n", "\n", "We'll now put this into a `pandas` dataframe for cleaning, variable creation and initial analysis/spot checking/understanding." ] }, { "cell_type": "code", "execution_count": 175, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
toughnesspowercmcraritycardTypecardTextmanaCostcardName
Air Elemental445uncommon[Creature, Elemental][flying][3, Blue, Blue]Air Elemental
Ancestral RecallNaNNaN1rare[Instant][target player draws three cards.][Blue]Ancestral Recall
Animate ArtifactNaNNaN4uncommon[Enchantment, Aura][enchant artifact, as long as enchanted artifa...[3, Blue]Animate Artifact
Animate DeadNaNNaN2uncommon[Enchantment, Aura][enchant creature card in a graveyard, when an...[1, Black]Animate Dead
Animate WallNaNNaN1rare[Enchantment, Aura][enchant wall, enchanted wall can attack as th...[White]Animate Wall
...........................
Winter OrbNaNNaN2rare[Artifact][players can't untap more than one land during...[2]Winter Orb
Wooden SphereNaNNaN1uncommon[Artifact][whenever a player casts a green spell, you ma...[1]Wooden Sphere
Word of CommandNaNNaN2rare[Instant][look at target opponent's hand and choose a c...[Black, Black]Word of Command
Wrath of GodNaNNaN4rare[Sorcery][destroy all creatures. they can't be regenera...[2, White, White]Wrath of God
Zombie Master323rare[Creature, Zombie][other zombie creatures have swampwalk., other...[1, Black, Black]Zombie Master
\n", "

296 rows × 8 columns

\n", "
" ], "text/plain": [ " toughness power cmc rarity cardType \\\n", "Air Elemental 4 4 5 uncommon [Creature, Elemental] \n", "Ancestral Recall NaN NaN 1 rare [Instant] \n", "Animate Artifact NaN NaN 4 uncommon [Enchantment, Aura] \n", "Animate Dead NaN NaN 2 uncommon [Enchantment, Aura] \n", "Animate Wall NaN NaN 1 rare [Enchantment, Aura] \n", "... ... ... ... ... ... \n", "Winter Orb NaN NaN 2 rare [Artifact] \n", "Wooden Sphere NaN NaN 1 uncommon [Artifact] \n", "Word of Command NaN NaN 2 rare [Instant] \n", "Wrath of God NaN NaN 4 rare [Sorcery] \n", "Zombie Master 3 2 3 rare [Creature, Zombie] \n", "\n", " cardText \\\n", "Air Elemental [flying] \n", "Ancestral Recall [target player draws three cards.] \n", "Animate Artifact [enchant artifact, as long as enchanted artifa... \n", "Animate Dead [enchant creature card in a graveyard, when an... \n", "Animate Wall [enchant wall, enchanted wall can attack as th... \n", "... ... \n", "Winter Orb [players can't untap more than one land during... \n", "Wooden Sphere [whenever a player casts a green spell, you ma... \n", "Word of Command [look at target opponent's hand and choose a c... \n", "Wrath of God [destroy all creatures. they can't be regenera... \n", "Zombie Master [other zombie creatures have swampwalk., other... \n", "\n", " manaCost cardName \n", "Air Elemental [3, Blue, Blue] Air Elemental \n", "Ancestral Recall [Blue] Ancestral Recall \n", "Animate Artifact [3, Blue] Animate Artifact \n", "Animate Dead [1, Black] Animate Dead \n", "Animate Wall [White] Animate Wall \n", "... ... ... \n", "Winter Orb [2] Winter Orb \n", "Wooden Sphere [1] Wooden Sphere \n", "Word of Command [Black, Black] Word of Command \n", "Wrath of God [2, White, White] Wrath of God \n", "Zombie Master [1, Black, Black] Zombie Master \n", "\n", "[296 rows x 8 columns]" ] }, "execution_count": 175, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data = pd.DataFrame.from_dict(database, orient='index')\n", "data['cardName'] = data.index\n", "data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# (2) -- Feature extraction" ] }, { "cell_type": "markdown", "metadata": { "collapsed": false, "scrolled": false }, "source": [ "Based on our domain knowledge, we're going to extract four main types of features for each card:\n", "1. **Mana cost** and **mana amounts** of a card\n", "2. **Categorical features** -- type (i.e. Artifact, Creature, etc.) and rarity (i.e. Common, Uncommon, etc.)\n", "3. **Text features** based on the card's text (i.e. \"When this creature enters the battlefield...\")\n", "4. **Functional features** -- having a Tap ability, being a mana generator, etc." ] }, { "cell_type": "code", "execution_count": 176, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# Which features do we want to use?\n", "# All enabled by default, mana and categorical features required\n", "\n", "textFeatures = True\n", "functionalFeatures = True" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## (2.1) -- Mana features" ] }, { "cell_type": "code", "execution_count": 177, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# Create mana features\n", "\n", "colorlessMana = []\n", "colorless = []\n", "\n", "for row in data['manaCost']:\n", " found = 0\n", " for val in row:\n", " if isInt(val):\n", " colorlessMana.append(float(val))\n", " found = 1\n", " if found == 0:\n", " colorlessMana.append(0)\n", "\n", "data['colorlessMana'] = colorlessMana \n", "data['Variable Colorless'] = [1 if 'Variable Colorless' in text else 0 for text in data['manaCost']]" ] }, { "cell_type": "code", "execution_count": 178, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# Count mana symbols\n", "\n", "manaSymbols = []\n", "\n", "manaSymbols = ['Blue', 'Black', 'Red', 'Green', 'White']\n", "manaVars = ['mana_' + _ for _ in manaSymbols]\n", "\n", "for i in xrange(len(manaSymbols)):\n", " data[manaVars[i]] = [text.count(manaSymbols[i]) for text in data['manaCost']]\n", " data[manaSymbols[i]] = [1 if text.count(manaSymbols[i]) > 0 else 0 for text in data['manaCost']]" ] }, { "cell_type": "code", "execution_count": 179, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ArtifactBlackBlueGreenRedVariable ColorlessWhitecmccolorlessManamana_Blackmana_Bluemana_Greenmana_Redmana_Whitepowertoughness
color
Artifactcount6262626262626247.00000062.0000006262626262.005.0000005.000000
mean10000002.3617021.79032300000.002.4000005.000000
std00000001.6736521.77539700000.002.3021731.414214
min10000000.0000000.00000000000.000.0000003.000000
25%10000001.0000000.00000000000.000.0000004.000000
......................................................
Whitemin00000011.0000000.00000000001.001.0000001.000000
25%00000011.0000000.00000000001.001.5000001.000000
50%00000012.0000001.00000000001.002.0000002.000000
75%00000013.0000001.00000000001.753.0000004.500000
max00000116.0000003.00000000003.006.0000006.000000
\n", "

48 rows × 16 columns

\n", "
" ], "text/plain": [ " Artifact Black Blue Green Red Variable Colorless White \\\n", "color \n", "Artifact count 62 62 62 62 62 62 62 \n", " mean 1 0 0 0 0 0 0 \n", " std 0 0 0 0 0 0 0 \n", " min 1 0 0 0 0 0 0 \n", " 25% 1 0 0 0 0 0 0 \n", "... ... ... ... ... ... ... ... \n", "White min 0 0 0 0 0 0 1 \n", " 25% 0 0 0 0 0 0 1 \n", " 50% 0 0 0 0 0 0 1 \n", " 75% 0 0 0 0 0 0 1 \n", " max 0 0 0 0 0 1 1 \n", "\n", " cmc colorlessMana mana_Black mana_Blue mana_Green \\\n", "color \n", "Artifact count 47.000000 62.000000 62 62 62 \n", " mean 2.361702 1.790323 0 0 0 \n", " std 1.673652 1.775397 0 0 0 \n", " min 0.000000 0.000000 0 0 0 \n", " 25% 1.000000 0.000000 0 0 0 \n", "... ... ... ... ... ... \n", "White min 1.000000 0.000000 0 0 0 \n", " 25% 1.000000 0.000000 0 0 0 \n", " 50% 2.000000 1.000000 0 0 0 \n", " 75% 3.000000 1.000000 0 0 0 \n", " max 6.000000 3.000000 0 0 0 \n", "\n", " mana_Red mana_White power toughness \n", "color \n", "Artifact count 62 62.00 5.000000 5.000000 \n", " mean 0 0.00 2.400000 5.000000 \n", " std 0 0.00 2.302173 1.414214 \n", " min 0 0.00 0.000000 3.000000 \n", " 25% 0 0.00 0.000000 4.000000 \n", "... ... ... ... ... \n", "White min 0 1.00 1.000000 1.000000 \n", " 25% 0 1.00 1.500000 1.000000 \n", " 50% 0 1.00 2.000000 2.000000 \n", " 75% 0 1.75 3.000000 4.500000 \n", " max 0 3.00 6.000000 6.000000 \n", "\n", "[48 rows x 16 columns]" ] }, "execution_count": 179, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Find color (ignores multicolor)\n", "\n", "def isColorless(l):\n", " for val in l:\n", " if val in manaSymbols: return False\n", " return True\n", "\n", "data['Artifact'] = [1 if isColorless(x) else 0 for x in data['manaCost']]\n", "\n", "def findColor(l):\n", " for val in l:\n", " if not isInt(val) and val != 'Variable Colorless': return val\n", " return 'Artifact'\n", "\n", "data['color'] = [findColor(l) for l in data['manaCost']]\n", "\n", "data.groupby(data['color']).describe().to_csv('colorSummary.csv')\n", "data.groupby(data['color']).describe()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## (2b) -- Categorical features" ] }, { "cell_type": "code", "execution_count": 180, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
toughnesspowercmccolorlessManaVariable Colorlessmana_BlueBluemana_BlackBlackmana_Red...ArtifactCreatureEnchantmentInstantLandSorcerybasic landcommonrareuncommon
Primary Type
Artifactcount1143.00000043.00000043434343.0043.0043...4343434343434343.0043.00000043.000000
mean632.1162792.1162790000.000.000...10000000.000.6046510.395349
stdNaNNaN1.4993541.4993540000.000.000...00000000.000.4947120.494712
min630.0000000.0000000000.000.000...10000000.000.0000000.000000
25%631.0000001.0000000000.000.000...10000000.000.0000000.000000
.....................................................................
SorceryminNaNNaN1.0000000.0000000000.000.000...00000100.000.0000000.000000
25%NaNNaN1.2500000.0000000000.000.000...00000100.000.0000000.000000
50%NaNNaN2.0000001.0000000000.000.000...00000100.000.0000000.000000
75%NaNNaN3.0000002.0000001000.750.750...00000100.751.0000000.750000
maxNaNNaN4.0000003.0000001313.001.001...00000101.001.0000001.000000
\n", "

48 rows × 26 columns

\n", "
" ], "text/plain": [ " toughness power cmc colorlessMana \\\n", "Primary Type \n", "Artifact count 1 1 43.000000 43.000000 \n", " mean 6 3 2.116279 2.116279 \n", " std NaN NaN 1.499354 1.499354 \n", " min 6 3 0.000000 0.000000 \n", " 25% 6 3 1.000000 1.000000 \n", "... ... ... ... ... \n", "Sorcery min NaN NaN 1.000000 0.000000 \n", " 25% NaN NaN 1.250000 0.000000 \n", " 50% NaN NaN 2.000000 1.000000 \n", " 75% NaN NaN 3.000000 2.000000 \n", " max NaN NaN 4.000000 3.000000 \n", "\n", " Variable Colorless mana_Blue Blue mana_Black Black \\\n", "Primary Type \n", "Artifact count 43 43 43 43.00 43.00 \n", " mean 0 0 0 0.00 0.00 \n", " std 0 0 0 0.00 0.00 \n", " min 0 0 0 0.00 0.00 \n", " 25% 0 0 0 0.00 0.00 \n", "... ... ... ... ... ... \n", "Sorcery min 0 0 0 0.00 0.00 \n", " 25% 0 0 0 0.00 0.00 \n", " 50% 0 0 0 0.00 0.00 \n", " 75% 1 0 0 0.75 0.75 \n", " max 1 3 1 3.00 1.00 \n", "\n", " mana_Red ... Artifact Creature Enchantment \\\n", "Primary Type ... \n", "Artifact count 43 ... 43 43 43 \n", " mean 0 ... 1 0 0 \n", " std 0 ... 0 0 0 \n", " min 0 ... 1 0 0 \n", " 25% 0 ... 1 0 0 \n", "... ... ... ... ... ... \n", "Sorcery min 0 ... 0 0 0 \n", " 25% 0 ... 0 0 0 \n", " 50% 0 ... 0 0 0 \n", " 75% 0 ... 0 0 0 \n", " max 1 ... 0 0 0 \n", "\n", " Instant Land Sorcery basic land common rare \\\n", "Primary Type \n", "Artifact count 43 43 43 43 43.00 43.000000 \n", " mean 0 0 0 0 0.00 0.604651 \n", " std 0 0 0 0 0.00 0.494712 \n", " min 0 0 0 0 0.00 0.000000 \n", " 25% 0 0 0 0 0.00 0.000000 \n", "... ... ... ... ... ... ... \n", "Sorcery min 0 0 1 0 0.00 0.000000 \n", " 25% 0 0 1 0 0.00 0.000000 \n", " 50% 0 0 1 0 0.00 0.000000 \n", " 75% 0 0 1 0 0.75 1.000000 \n", " max 0 0 1 0 1.00 1.000000 \n", "\n", " uncommon \n", "Primary Type \n", "Artifact count 43.000000 \n", " mean 0.395349 \n", " std 0.494712 \n", " min 0.000000 \n", " 25% 0.000000 \n", "... ... \n", "Sorcery min 0.000000 \n", " 25% 0.000000 \n", " 50% 0.000000 \n", " 75% 0.750000 \n", " max 1.000000 \n", "\n", "[48 rows x 26 columns]" ] }, "execution_count": 180, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Create categorical features\n", " \n", "primaryTypes = [cardType[0] for cardType in data['cardType']]\n", "\n", "for i in xrange(len(primaryTypes)):\n", " if primaryTypes[i] == u'Basic Land':\n", " primaryTypes[i] = u'Land'\n", " if primaryTypes[i] == u'Artifact Creature':\n", " primaryTypes[i] = u'Creature'\n", "\n", "data['Primary Type'] = primaryTypes\n", " \n", "data = pd.concat([data, pd.get_dummies(data['Primary Type'])], axis=1)\n", "data = pd.concat([data, pd.get_dummies(data['rarity'])], axis=1)\n", "\n", "data.groupby(data['rarity']).describe().to_csv('byRarity.csv')\n", "data.groupby(data['rarity']).describe()\n", " \n", "data.groupby(data['Primary Type']).describe().to_csv('byType.csv')\n", "data.groupby(data['Primary Type']).describe()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## (2c) -- Text features" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A helper function from `fuzzywuzzy` to find partial word matches in card text boxes:" ] }, { "cell_type": "code", "execution_count": 181, "metadata": { "collapsed": false }, "outputs": [], "source": [ "def partialMatch(s, l, threshold=95):\n", " fuzzVals = [fuzz.partial_ratio(s, x) for x in l]\n", " if not fuzzVals: fuzzVals = [0]\n", " return max(fuzzVals) >= threshold" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Based on domain knowledge, we'll fuzzy match if certain important words are in a card's text box that will give us a hint of what the card does." ] }, { "cell_type": "code", "execution_count": 182, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# Create text-based features\n", "\n", "if textFeatures:\n", "\n", " data['Damage'] = [1 if partialMatch('damage', l) else 0 for l in data['cardText']]\n", " data['Hand'] = [1 if partialMatch('hand', l) else 0 for l in data['cardText']]\n", " data['Draw'] = [1 if partialMatch('draw', l, 80) else 0 for l in data['cardText']]\n", " data['Upkeep'] = [1 if partialMatch('draw', l, 80) else 0 for l in data['cardText']]\n", " data['Library'] = [1 if partialMatch('library', l) else 0 for l in data['cardText']]\n", " data['Sacrifice'] = [1 if partialMatch('sacrifice', l) else 0 for l in data['cardText']]\n", " data['Destroy'] = [1 if partialMatch('destroy', l) else 0 for l in data['cardText']]\n", " data['Discard'] = [1 if partialMatch('discard', l) else 0 for l in data['cardText']]\n", " data['Prevent'] = [1 if partialMatch('prevent', l) else 0 for l in data['cardText']]\n", " data['Life'] = [1 if partialMatch('life', l) else 0 for l in data['cardText']]\n", " data['Attack'] = [1 if partialMatch('attack', l) else 0 for l in data['cardText']]\n", " data['Block'] = [1 if partialMatch('block', l) else 0 for l in data['cardText']]\n", " data['Search'] = [1 if partialMatch('search', l) else 0 for l in data['cardText']]\n", " data['Choose'] = [1 if partialMatch('choose', l) else 0 for l in data['cardText']]\n", " data['Copy'] = [1 if partialMatch('copy', l) else 0 for l in data['cardText']]\n", " data['Change'] = [1 if partialMatch('change', l) else 0 for l in data['cardText']]\n", " data['Turn'] = [1 if partialMatch('turn', l) else 0 for l in data['cardText']]\n", " data['End of turn'] = [1 if partialMatch('end of turn', l, 80) else 0 for l in data['cardText']]\n", " data['Beginning of turn'] = [1 if partialMatch('beginning of turn', l, 80) else 0 for l in data['cardText']]\n", " data['Spell ref'] = [1 if partialMatch('spell', l) else 0 for l in data['cardText']]\n", " data['Creature ref'] = [1 if partialMatch('creature', l) else 0 for l in data['cardText']]\n", " data['Land'] = [1 if partialMatch('land', l) else 0 for l in data['cardText']]\n", " data['Mana'] = [1 if partialMatch('mana', l) else 0 for l in data['cardText']]\n", " data['Battlefield'] = [1 if partialMatch('battlefield', l) else 0 for l in data['cardText']]\n", " data['Blue ref'] = [1 if partialMatch('blue', l) else 0 for l in data['cardText']]\n", " data['Black ref'] = [1 if partialMatch('black', l) else 0 for l in data['cardText']]\n", " data['Green ref'] = [1 if partialMatch('green', l) else 0 for l in data['cardText']]\n", " data['Red ref'] = [1 if partialMatch('red', l) else 0 for l in data['cardText']]\n", " data['White ref'] = [1 if partialMatch('white', l) else 0 for l in data['cardText']]\n", " data['Colorless ref'] = [1 if partialMatch('colorless', l) else 0 for l in data['cardText']]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## (2d) -- Functional features" ] }, { "cell_type": "code", "execution_count": 183, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# 4. Special functional features\n", "\n", "def isBuff(str, l):\n", " found = 0\n", " for val in l:\n", " if str in val:\n", " found += 1\n", " if found > 0: return True\n", " else: return False\n", "\n", "if functionalFeatures:\n", "\n", " data['Untap'] = [1 if partialMatch('untap', l) else 0 for l in data['cardText']]\n", " data['All'] = [1 if partialMatch('all', l) | partialMatch('any', l) else 0 for l in data['cardText']]\n", "\n", " data['Tap ability'] = [1 if 'Tap' in x else 0 for x in data['cardText']]\n", " data['Mana symbol'] = [1 if anyIntOrColor(x) else 0 for x in data['cardText']]\n", " data['Mana related'] = [1 if partialMatch('add mana', l) | partialMatch('your mana pool', l) \\\n", " else 0 for l in data['cardText']]\n", "\n", " data['Buff'] = [1 if isBuff('+', l) else 0 for l in data['cardText']]\n", " data['Debuff'] = [1 if isBuff('-', l) else 0 for l in data['cardText']]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Note\n", "\n", "Some of this might have been able to be done automatically, especially the **text features**, which could have been done by finding the most common words referred to in text boxes. Again, I leave this to future work and am really curious about what the literature on automatic feature creation says about this." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# (3) -- Perform PCA\n", "\n", "Surprisingly, the PCA itself is the easiest part of this entire thing. We'll use `sklearn` to perform a 10-component PCA to see how much of the entire data's dimensional variation can be reduced to 10 dimensions." ] }, { "cell_type": "code", "execution_count": 184, "metadata": { "collapsed": false, "scrolled": true }, "outputs": [], "source": [ "from sklearn.cluster import KMeans\n", "from sklearn.decomposition import PCA\n", "from sklearn.preprocessing import scale\n", "from sklearn.preprocessing import StandardScaler\n", "\n", "numericData = data.copy()\n", "# scale to mean 0, variance 1\n", "numericData_std = scale(numericData.fillna(0).select_dtypes(include=['float64', 'int64']))\n", "\n", "pca = PCA(n_components=10)\n", "Y_pca = pca.fit_transform(numericData_std)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "###So, how well did we do?\n", "\n", "Well, based on the explained variance vector below it doesn't look like we did very well. The first two principal components only combined for **14% of the total variance** in the data; though, of note, is that the first 10 factors do account for **46% of the total variance**. Considering we're working with 62 features though, this is pretty decent." ] }, { "cell_type": "code", "execution_count": 185, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "Variance explained by each factor:\n", "[0.08, 0.062, 0.053, 0.052, 0.042, 0.039, 0.038, 0.036, 0.033, 0.03]\n", "\n", "Variance explained by all 10 factors:\n", "0.464\n", "\n", "Num features:\n", "62\n" ] } ], "source": [ "# Analysis of PCA effectiveness\n", "\n", "print\n", "print \"Variance explained by each factor:\"\n", "print [round(x, 3) for x in pca.explained_variance_ratio_]\n", "print\n", "print \"Variance explained by all 10 factors:\"\n", "print round(sum(pca.explained_variance_ratio_), 3)\n", "print\n", "print \"Num features:\"\n", "print len(numericData_std[0])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# (4) -- Results\n", "\n", "Now time to see if it was all worth it -- and apply the PCA projection onto our data set. We want to be able to make a pretty scatterplot grouping the data by different types (color, card type, rarity, etc.) so we will make a helper graphing function using the `plotly` library." ] }, { "cell_type": "code", "execution_count": 186, "metadata": { "collapsed": false, "scrolled": false }, "outputs": [], "source": [ "import plotly.plotly as py\n", "py.sign_in('nhuber', 'bmopo8hk40')\n", "from plotly.graph_objs import *\n", "import plotly.tools as tls\n", "\n", "def chooseColor(group):\n", " \n", " if group == u'White': return '#B2B2B2'\n", " if group == u'Artifact': return '#996633'\n", " if group == u'Red' : return '#E50000'\n", " if group == u'Blue': return '#0000FF'\n", " if group == u'Green' : return '#006400'\n", " if group == u'Black' : return '#000000'\n", "\n", " if group == 'Instant': return '#E81A8C'\n", " if group == 'Sorcery': return '#F2AB11'\n", " if group == 'Creature' : return '#102DE8'\n", " if group == 'Enchantment': return '#1BBF28'\n", " if group == 'Land' : return '#000000'\n", " if group == 'Artifact' : return '#82580E'\n", " \n", " if group == 'common' : return '#000000'\n", " if group == 'uncommon': return '#9a9999'\n", " if group == 'rare': return '#eae002'\n", " if group == 'basic land': return '#ba7127'" ] }, { "cell_type": "code", "execution_count": 187, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# graphs data on pca axes grouped by type thetype\n", " \n", "def graphByType(thetype, thetitle, centers=False, fix=-1, typefilter='',\n", " height=625, width=725, markerfontsize=9, titlefontsize=26):\n", "\n", " # fix reflects data through y = 0 to be backwards\n", " # compatible with previous annotated visualizations\n", "\n", " # create graph data from pca results\n", " \n", " traces = []\n", "\n", " if not typefilter:\n", " typefilter = set(data[thetype])\n", " \n", " for group in typefilter:\n", " \n", " matches = []\n", " for i in xrange(len(data[thetype])):\n", " if data[thetype].irow(i) == group:\n", " matches.append(i)\n", "\n", " graphColor = chooseColor(group)\n", "\n", " trace = Scatter(\n", " x=Y_pca[matches,0],\n", " y=fix * Y_pca[matches,1],\n", " mode='text',\n", " name=group,\n", " marker=Marker(\n", " size=8,\n", " color=graphColor,\n", " opacity=0.5),\n", " text = data['cardName'].irow(matches),\n", " textfont = Font(\n", " family='Georgia',\n", " size=markerfontsize,\n", " color=graphColor\n", " )\n", " )\n", " \n", " traces.append(trace)\n", "\n", " if centers:\n", "\n", " traceCentroid = Scatter(\n", " x = np.mean(Y_pca[matches,0]),\n", " y = np.mean(fix * Y_pca[matches,1]),\n", " mode = 'marker',\n", " name = str(thetype) + \" center\",\n", " marker = Marker(\n", " size = 26,\n", " color=graphColor),\n", " opacity = 0.75\n", " )\n", "\n", " traces.append(traceCentroid)\n", " \n", " # Set up the scatter plot layout\n", "\n", " dataToGraph = Data(traces)\n", "\n", " # auto-focus on where most of the data is clustered\n", " xRange = max(abs(np.percentile(np.array([x[0] for x in Y_pca]), 2.5)),\n", " abs(np.percentile(np.array([x[0] for x in Y_pca]), 97.5)))\n", " yRange = max(abs(np.percentile(np.array([x[1] for x in Y_pca]), 2.5)),\n", " abs(np.percentile(np.array([x[1] for x in Y_pca]), 97.5)))\n", "\n", " layout = Layout(title=thetitle,\n", " titlefont=Font(family='Georgia', size=titlefontsize),\n", " showlegend = True,\n", " autosize = False,\n", " height = height,\n", " width = width,\n", " xaxis=XAxis(\n", " range=[-xRange, +xRange],\n", " title='PC1', showline=False),\n", " yaxis=YAxis(\n", " range=[-yRange, +yRange],\n", " title='PC2', showline=False))\n", " \n", " fig = Figure(data=dataToGraph, layout=layout)\n", " return fig" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## (4a) -- Grouping by color" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We first visualize all of our cards on the two PCA axes, grouped by color." ] }, { "cell_type": "code", "execution_count": 188, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "" ], "text/plain": [ "" ] }, "execution_count": 188, "metadata": {}, "output_type": "execute_result" } ], "source": [ "fig = graphByType('color', \"PCA on MtG by card color\")\n", "py.iplot(fig)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A few notes on this graph:\n", "- It's interactive: you can zoom into an area on the graph by dragging to create a rectangle\n", "- Also note that you can click the labels on the top right to turn on and off showing cards of different colors\n", "- It will probably have more meaning if you know about Magic and what each of these cards do; so, I'll offer my analysis below but if you do play and have alternate interpretations about how these cards are grouped, please do lmk" ] }, { "cell_type": "markdown", "metadata": { "collapsed": false }, "source": [ "## Result 1: A tale of two psuedo-axes\n", "\n", "The primary result of this analysis is that a magic card can be mainly broken down into two components: How much does it behave like a spell vs. a creature? and How much does it affect the board or non-board resources? Visually, we are left with two \"psuedo-axes\":\n", "- On the left downward diagonal -- a creature axis which represents how \"creature-y\" a card is: big creatures are very creature-y, mid-sized utility creatures are somewhat creature-y, and enchantments/spells are not creature-y at all. \n", "- On the right upward diagonal, we have the mana/hand axis -- which is a spectrum on how much a card relates primarily to the **board** (i.e. permanents in play) or whether it affects **non-board resources** such as the player's cards and mana pools." ] }, { "cell_type": "markdown", "metadata": { "collapsed": false }, "source": [ "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Going through salient examples\n", "\n", "We'll now go through the highlighted examples in the above graph, from left to right, to understand and evaluate how the model performs; my evaluation out of 5 for each card plotting is in parenthesis in the title:\n", "\n", "### Juzam Djinn and Juggernaut (5/5)\n", " \n", "- Classic examples of \"fatties,\" large creatures that dominate the board through sheer size\n", "- *Very creature-y* and *very board-related*\n", "\n", "### Goblin King, Old Man of the Sea and White Knight (5/5)\n", " \n", "- Classic examples of \"utility creatures\": medium size (2/2, 2/3, and 2/2 respectively) but have impact on the board through their abilities\n", "- *Mostly creature-y* and *very board-related*\n", "\n", "### Berserk, Raise dead (5/5)\n", " \n", "- Combat trick that offers a one-time pump for a creature in combat and enchantment that revives a creature from the graveyard\n", "- Appropriately *somewhat creature-y* and *very board-related*\n", "\n", "### Birds of Paradise (5/5), Manabarbs (4/5)\n", " \n", "- Birds of paradise -- a tiny, one-drop creature that provides mana ramping ability: the model correctly realizes it's a utility creature (i.e. medium creature-y) and is heavily related to a non-board resource, namely: mana.\n", "- Manabarbs is similarly a permanent, mana-based effect but it also impacts player's life totals so is appropriately in the middle of this axis\n", "- *Medium creature-y*, *mostly mana/hand*\n", "\n", "### Balance (4/5)\n", "\n", "- A high-impact spell that equalizes both players creature counts, cards in hand, and lands in play.\n", "- *Very spell-y* and related to *mostly non-board* resources (though it does equalize creatures as well)\n", "\n", "### Red elemental blast (4/5)\n", "\n", "- This is a tough one for the model: this card has two modes: destroy a blue permanent or counter a blue spell. Clearly, these are very different cards. But the model correctly predicts that it's very *spell-y* and mostly *board-related* \n", "\n", "### Wheel of Fortune, Ancestral Recall (5/5)\n", " \n", "- Perfect categorization for Wheel: this is a completely unique spell in the game where each player discards their hand and draws 7 new cards. Not a creature at all, not related to the board at all; therefore, is correctly labelled *spell-y* and *mana/hand-y*\n", "- Great categorization again for Ancestral: an instant-speed draw spell (note that it's right next to Braingeyser as well): very *spell-y* very *mana/hand-y*\n", "\n", "### Howling Mine (3/5)\n", "\n", "- Primarily card-related (each player draws 2 instead of 1) and a permanent effect so somewhat *creature-y*\n", "- Strange card because it has a permanent effect on card resources which is rare (i.e. vs. draw spells, discard spells)\n", "\n", "### Demonic Tutor, Black Lotus (5/5)\n", " \n", "- DT: The canonical tutor effect; very *spell-y* and *hand-y*\n", "- Black Lotus: the canonical magic card; pure power in a one-time rush of mana (very *spell-y* and very *mana-y*)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Summary\n", "\n", "The model does a very good job categorizing cards across these two psuedo-axes. Unsurprisingly, the hardest cards to categorize are those that cross many axes -- Balance in it's all-encompassing scope or Red Elemental Blast in its multiple modes -- or have non-traditional effects like Howling Mine being a card-drawing artifact." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Result 2: Exploring the color identities\n", "\n", "The PCA also tells a cool story of how different the colors are defined. Here each card is a small point, with the large point representing the \"average\" of all of the color's cards:\n", "\n", "\n", "\n", "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- **Artifacts** are somewhat creature-y, somewhat spell-y (as they can have different effects depending on the card) but generally are more related to mana/hand resources than board effects (except for the Artifact creatures like Juggernaut that are correctly by other fatties like Juzam)\n", "- **Blue** and **White** seem to be, like all of the non-artifact colors, a mix of creatures and spells, but they skew towards the spell side; unlike **Green**, **Red**, and **Black** (the first two are very creature-y, the last a mix)\n", "- This is consistent with the intuitive color identity you have when you play the game: Blue and White are tricky, controlling decks; Red and Green are fatty/monster decks and Black is somewhat in between, arguably the most flexible color in the game in terms of large creatures, mana effects and also removal" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Result 3: Exploring the type identities" ] }, { "cell_type": "code", "execution_count": 189, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "" ], "text/plain": [ "" ] }, "execution_count": 189, "metadata": {}, "output_type": "execute_result" } ], "source": [ "fig = graphByType('Primary Type', \"PCA on MtG by card type\")\n", "py.iplot(fig)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The model correctly associates creatures on the creature axis, instants and sorceries on the spell axis (with ones related to mana like Dark Ritual or Contract from Below higher on the mana/hand axis), enchantments in the \"spell-like\" but permanent effects region and lands in the mana-related zone. Artifacts because of their unique nature are incredibly hard to categorize and honestly the model probably doesn't do a great job for most of them, and sort of arbitrarily groups them in their own group as a whole, saying they're also mostly mana/hand-y, but there's likely a lot more that could be teased out here to get the artifacts to be distributed more uniformly/appropriately across these axes." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Result 4: Exploring rarity" ] }, { "cell_type": "code", "execution_count": 190, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "" ], "text/plain": [ "" ] }, "execution_count": 190, "metadata": {}, "output_type": "execute_result" } ], "source": [ "fig = graphByType('rarity', \"PCA on MtG by card rarity\")\n", "py.iplot(fig)" ] }, { "cell_type": "markdown", "metadata": { "collapsed": false }, "source": [ "Rares are allowed to have all kinds of effects (i.e. spread across both axes), uncommons as well but less so (i.e. less smattered), commons limited to medium-sized creatures and non-mana/hand-related spells. This is consistent with the game designer's views on what \"feels like a common\" or what power levels different types of cards are able to have." ] }, { "cell_type": "markdown", "metadata": { "collapsed": false }, "source": [ "# Conclusions and future work\n", "\n", "There's tons more work to do here that I'd love to if I have time:\n", "- Play with what features to use (which are the most impactful -- almost certainly type and color, but what else? could we automatically create them from card text boxes without domain knowledge? of those which are the most meaningful?)\n", "- What separates good cards from bad cards? how could we automatically detect/predict card quality?\n", "- What determines popular/interesting cards? (note that Gatherer also has community ratings for every card ever made\n", "- Does this analysis hold for the entire corpus of magic cards? how about for different sets? does it get better or worse over time?\n", "\n", "Until then,
\n", "@nhuber | nicholas.e.huber@gmail.com" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "*custom css*\n", "" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 2", "language": "python", "name": "python2" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 2 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython2", "version": "2.7.6" } }, "nbformat": 4, "nbformat_minor": 0 }