{ "metadata": { "name": "", "signature": "sha256:97f4c336a9201382aaed9d3e8e089873dddb5a566b498a503aced99bf837946a" }, "nbformat": 3, "nbformat_minor": 0, "worksheets": [ { "cells": [ { "cell_type": "heading", "level": 1, "metadata": {}, "source": [ "Plotly and Socrata" ] }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "Awesome datasets and graphs coming together" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Taken from both companies' Wikipedia pages:\n", "\n", "> Plotly is an online analytics and data visualization tool. Plotly provides online graphing, analytics, a Python command line, and stats tools for individuals and collaboration, as well as scientific graphing libraries for Python, R, MATLAB, Perl, Julia, Arduino, and REST.\n", "\n", "> Socrata is a company that provides social data discovery services for opening government data. Socrata targets non-technical Internet users who want to view and share government, healthcare, energy, education, or environment data. Its products are issued under a proprietary, closed, exclusive license.\n", "\n", "Simply put, the two are meant to work together and this IPython notebook will you how you can turn a dataset like this one and into a plot like that one." ] }, { "cell_type": "heading", "level": 3, "metadata": {}, "source": [ "1. Get a Socrata application token" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You need an application token to communicate with Socrata from a Socrata Open Data API (soda for short).\n", "\n", "Register to Socrata and get your application token here." ] }, { "cell_type": "heading", "level": 3, "metadata": {}, "source": [ "2. Install the Soda Ruby wrapper" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Unfortunately, there are no Soda Python wrapper available at this moment in time. But, fortunately, IPython allows us to use mutliple programming language inside the same environment (called an IPython notebook). So, here we will use Ruby and the `soda-ruby` gem to comminicate with Socrata.\n", "\n", "With Ruby and gem installed on your machine, run in a terminal/command prompt:\n", "\n", "* `$ gem install soda-ruby`\n", "\n", "Add `sudo` in front of the above for a system-wide install on Unix-like machines. Information about local gem install can be found here.\n", "\n", "Then, add the line:\n", "\n", " gem 'soda-ruby', :require => 'soda'\n", "\n", "to a file named `Gemfile` placed either in the current directory or in folder part of the gems path found of your machine (more here)." ] }, { "cell_type": "heading", "level": 3, "metadata": {}, "source": [ "3. Get dataset from Socrata with Ruby and transfer it to IPython" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Head to opendata.socrata.com, browse or search for a dataset that you like and click on its link. I chose a list of the Guardian's \"Top 1,000 Songs to Hear Before You Die\" which can be viewed here. Here is a screenshot of the web page in question:\n", "\n", "\n", "\n", "Then, \n", "\n", "1. Click on `Export`, a blue button on the upper right side of the page.\n", "\n", "2. Click on `Soda API`, the upper-most tab under `Export`.\n", "\n", "3. Copy the `API Access Endpoint`, under the `Soda API` tab.\n", "\n", "In our case the API Access Endpoint is:\n", "\n", " http://opendata.socrata.com/resource/ed74-c6ni.json\n", " \n", "The API Access Endpoint represent the link between the dataset hosted on Socrata and the API, in our case soda-ruby. It contains two pieces of important information: the domain name and the dataset identifier. From the Socrata offical docs, take note that the API Access Endpoint corresponds to:\n", "\n", " http://$domain/resource/$dataset_identifier\n", "\n", "So, in our case the domain name is `opendata.socrata.com` and the dataset identifier is `ed74-c6ni`. Note that `.json` is just the file extension, not needed to access the dataset).\n", "\n", "Now, call the `%%ruby` IPython inline magic to turn on Ruby inside the cell below:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "%%ruby --out socrata_data\n", "\n", "# with --out, data written to the stdout in this ruby cell \n", "# will be mapped to a Python variable (socrata_data) after execution.\n", "\n", "require 'soda/client'\n", "require 'json' \n", "\n", "# Set up client object with domain and application token\n", "client = SODA::Client.new({:domain => \"opendata.socrata.com\", \n", " :app_token => \"eqZC5q2iEmFXdIu2qEbtZkWgP\"})\n", "\n", "# Get data with dataset identifier\n", "response = client.get(\"ed74-c6ni\")\n", "\n", "# Print dataset to stdout as a JSON \n", "puts response.to_json" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 1 }, { "cell_type": "markdown", "metadata": {}, "source": [ "And there you go, the Socrata dataset in now inside our IPython namespace!\n", "\n", "Next, we will handle the dataset inside IPython using the popular `pandas` module, so" ] }, { "cell_type": "code", "collapsed": false, "input": [ "import pandas as pd\n", "\n", "# Read the retrieved JSON dataset (df stands for dataframe)\n", "df = pd.read_json(socrata_data)" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 2 }, { "cell_type": "code", "collapsed": false, "input": [ "df.head() # print the first 5 lines of the dataframe" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
artistspotify_urlthemetitleyear
0 ABC {u'url': u'http://open.spotify.com/track/78j3q... Love The Look of Love 1982
1 Badly Drawn Boy {u'url': u'http://open.spotify.com/track/2PojS... Love The Shining 2000
2 The Beach Boys {u'url': u'http://open.spotify.com/track/0ObrX... Love God Only Knows 1966
3 The Beach Boys {u'url': u'http://open.spotify.com/track/2oF7F... Love Good Vibrations 1966
4 The Beach Boys {u'url': u'http://open.spotify.com/track/0cx32... Love Wouldn\u2019t It Be Nice 1966
\n", "

5 rows \u00d7 5 columns

\n", "
" ], "metadata": {}, "output_type": "pyout", "prompt_number": 3, "text": [ " artist spotify_url theme \\\n", "0 ABC {u'url': u'http://open.spotify.com/track/78j3q... Love \n", "1 Badly Drawn Boy {u'url': u'http://open.spotify.com/track/2PojS... Love \n", "2 The Beach Boys {u'url': u'http://open.spotify.com/track/0ObrX... Love \n", "3 The Beach Boys {u'url': u'http://open.spotify.com/track/2oF7F... Love \n", "4 The Beach Boys {u'url': u'http://open.spotify.com/track/0cx32... Love \n", "\n", " title year \n", "0 The Look of Love 1982 \n", "1 The Shining 2000 \n", "2 God Only Knows 1966 \n", "3 Good Vibrations 1966 \n", "4 Wouldn\u2019t It Be Nice 1966 \n", "\n", "[5 rows x 5 columns]" ] } ], "prompt_number": 3 }, { "cell_type": "code", "collapsed": false, "input": [ "df.shape # print the dataframe's size" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 4, "text": [ "(994, 5)" ] } ], "prompt_number": 4 }, { "cell_type": "heading", "level": 3, "metadata": {}, "source": [ "4. Get relevent data and plot it using Plotly!" ] }, { "cell_type": "heading", "level": 5, "metadata": {}, "source": [ "Get relevent data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's make a Plotly bar chart with the following features:\n", "\n", "* Artists (on x-axis) vs number of songs in the *1,000 Songs* list (on y-axis),\n", "* Plot only artists with 4+ or more songs in the *1,000 Songs* list,\n", "* Plot artists in descending order starting from the artist with the most songs in *1,000 Songs* list." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "So, let's first make a dictionary pairing the artist's name to the their number of tracks in the *1,000 songs* list:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "song_by_artist = df.groupby('artist').size().to_dict()\n", "\n", "song_by_artist" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 5, "text": [ "{u'!!!': 1,\n", " u'(Don\\u2019t Fear) The Reaper': 1,\n", " u'808 State': 1,\n", " u'A R Rahman': 1,\n", " u'ABC': 2,\n", " u'AC/DC': 2,\n", " u'ATB': 1,\n", " u'Aaliyah': 1,\n", " u'Abba': 3,\n", " u'Abyssinians': 1,\n", " u'Aerosmith': 2,\n", " u'Afroman': 1,\n", " u'Al Green': 5,\n", " u'Alice Cooper': 1,\n", " u'Alicia Keys': 2,\n", " u'Aliotta Haynes Jeremiah': 1,\n", " u'All Saints': 1,\n", " u'Althea and Donna': 1,\n", " u'Amy Winehouse': 2,\n", " u'Andy Williams': 1,\n", " u'Ann Peebles': 1,\n", " u'Anne Briggs': 1,\n", " u'Anthony Johnson': 1,\n", " u'Antony and the Johnsons': 1,\n", " u'Aphex Twin': 1,\n", " u'Arcade Fire': 1,\n", " u'Archie Bell and the Drells': 1,\n", " u'Archie Bleyer': 1,\n", " u'Arctic Monkeys': 4,\n", " u'Aretha Franklin': 3,\n", " u'Arthur Conley': 1,\n", " u'Arthur Russell': 1,\n", " u'Artie Shaw': 1,\n", " u'Ashford and Simpson': 1,\n", " u'Astrud Gilberto': 1,\n", " u'Au Pairs': 1,\n", " u'BB King': 1,\n", " u'Baccara': 1,\n", " u'Badfinger': 1,\n", " u'Badly Drawn Boy': 1,\n", " u'Baggy Trousers': 1,\n", " u'Bappi Lahiri/Parvati Khan': 1,\n", " u'Barbra Streisand and Barry\\xa0Gibb': 1,\n", " u'Barrington Levy': 1,\n", " u'Barry McGuire': 1,\n", " u'Beastie Boys': 1,\n", " u'Bee Gees': 1,\n", " u'Belle and Sebastian': 1,\n", " u'Ben E King': 2,\n", " u'Benga and Coki': 1,\n", " u'Bessie Smith': 1,\n", " u'Bettye LaVette': 1,\n", " u'Bettye Swann': 1,\n", " u'Beyonc\\xe9': 1,\n", " u'Big Joe Turner': 1,\n", " u'Big Star': 2,\n", " u'Bill Allen and the Backbeats': 1,\n", " u'Bill Withers': 4,\n", " u'Billie Holiday': 1,\n", " u'Billie Holliday': 1,\n", " u'Billy Bragg': 2,\n", " u'Billy Paul': 1,\n", " u'Bim Sherman': 1,\n", " u'Bing Crosby': 2,\n", " u'Bj\\xf6rk': 1,\n", " u'Black Sabbath': 2,\n", " u'Blind Alfred Reed': 1,\n", " u'Blondie': 2,\n", " u'Blue Mink': 1,\n", " u'Blur': 3,\n", " u'Bo Diddley': 1,\n", " u'Bob Andy': 1,\n", " u'Bob Dylan': 24,\n", " u'Bob Lind': 1,\n", " u'Bob Marley': 2,\n", " u'Bob Marley and the Wailers': 2,\n", " u'Bobbie Gentry': 2,\n", " u'Bobby Bland': 1,\n", " u'Bobby Darin': 2,\n", " u'Bobby Fuller Four': 1,\n", " u'Bobby \\u201cBlue\\u201d Bland': 1,\n", " u'Body Count': 1,\n", " u'Bon Iver': 1,\n", " u'Bonnie \\u2018Prince\\u2019 Billy and Matt Sweeney': 1,\n", " u'Bonzo Dog Doo-Dah Band': 1,\n", " u'Boogie Down Productions': 1,\n", " u'Boys Town Gang': 1,\n", " u'Bright Eyes': 2,\n", " u'Britney Spears': 1,\n", " u'Bronski Beat': 2,\n", " u'Bruce Springsteen': 4,\n", " u'Bruce Springsteen and the E Street Band': 1,\n", " u'Bryan Ferry': 1,\n", " u'Buffalo Springfield': 1,\n", " u'Buffy Sainte-Marie': 1,\n", " u'Burning Spear': 1,\n", " u'Buzzcocks': 2,\n", " u'CSS': 1,\n", " u'Cab Calloway': 1,\n", " u'Candi Staton': 1,\n", " u'Cannibal and the Headhunters': 1,\n", " u'Captain and Tennille ': 1,\n", " u'Carl Bean': 1,\n", " u'Carlton and the Shoes': 1,\n", " u'Carly Simon': 2,\n", " u'Carole King': 2,\n", " u'Cast of Grange Hill': 1,\n", " u'Cat Stevens': 1,\n", " u'Ce Ce Rogers': 1,\n", " u'Chairmen of the Board': 1,\n", " u'Chic': 1,\n", " u'Chris Difford': 1,\n", " u'Chris Wood': 1,\n", " u'Chuck Berry': 5,\n", " u'Class Action': 1,\n", " u'Coldplay': 2,\n", " u'Cole Porter': 1,\n", " u'Commodores': 1,\n", " u'Cornershop': 1,\n", " u'Country Joe and the Fish': 1,\n", " u'Crosby, Stills, Nash and Young': 1,\n", " u'Crystal Mansion': 1,\n", " u'Curtis Mayfield': 1,\n", " u'Cyndi Lauper': 1,\n", " u'Daft Punk': 1,\n", " u'Dan Le Sac vs Scroobius Pip': 1,\n", " u'Daniel Johnston': 1,\n", " u'David Bowie': 9,\n", " u'Dead Kennedys': 1,\n", " u'Deee-Lite': 1,\n", " u'Def Leppard': 1,\n", " u'Depeche Mode': 2,\n", " u'Derek and the Dominos': 1,\n", " u'Desmond Dekker': 1,\n", " u'Destiny\\u2019s Child': 1,\n", " u'Devo': 1,\n", " u'Dexys Midnight Runners': 2,\n", " u'Diana Ross': 1,\n", " u'Diana Ross and the Supremes': 1,\n", " u'Dick Gaughan': 1,\n", " u'Dillinger': 1,\n", " u'Dinah Washington': 1,\n", " u'Dion': 2,\n", " u'Dion & the Belmonts': 1,\n", " u'Dionne Warwick': 2,\n", " u'Divinyls': 1,\n", " u'Dixie Chicks': 1,\n", " u'Dolly Parton': 3,\n", " u'Don McLean': 1,\n", " u'Donna Summer': 2,\n", " u'Donny Osmond': 1,\n", " u'Donovan': 2,\n", " u'Doobie Brothers': 1,\n", " u'Dory Previn': 1,\n", " u'Doves': 1,\n", " u'Dudley Moore and Peter Cook': 1,\n", " u'Duffy ': 1,\n", " u'Dusty Springfield': 2,\n", " u'D\\u2019Angelo': 1,\n", " u'Earth, Wind and Fire': 1,\n", " u'Echo and the Bunnymen': 1,\n", " u'Eddie Cochran': 1,\n", " u'Eddie Jefferson': 1,\n", " u'Edwin Starr': 1,\n", " u'Elastica': 2,\n", " u'Elbow': 3,\n", " u'Electribe 101': 1,\n", " u'Electric Light Orchestra': 1,\n", " u'Ella Fitzgerald': 2,\n", " u'Elliott Smith': 1,\n", " u'Elton John': 2,\n", " u'Elvis Costello': 2,\n", " u'Elvis Costello ': 1,\n", " u'Elvis Costello and the Attractions': 5,\n", " u'Elvis Costello and the Imposters': 1,\n", " u'Elvis Presley': 6,\n", " u'Eminem': 1,\n", " u'Emmylou Harris ': 1,\n", " u'Eric Bogle': 1,\n", " u'Esther Williams': 1,\n", " u'Etta James': 2,\n", " u'Eurythmics': 1,\n", " u'Everything But the Girl': 1,\n", " u'Ewan MacColl': 1,\n", " u'Faith No More': 1,\n", " u'Fatman Scoop featuring the Crooklyn Clan': 1,\n", " u'Fats Domino': 1,\n", " u'Fela Kuti': 2,\n", " u'First Choice': 1,\n", " u'Flanders and Swann': 1,\n", " u'Fleetwood Mac': 1,\n", " u'Flight of the Conchords': 1,\n", " u'Frank Sinatra': 6,\n", " u'Frank Wilson': 1,\n", " u'Frankie Goes To Hollywood': 1,\n", " u'Frankie Goes to Hollywood': 2,\n", " u'Frankie Valli and the Four Seasons': 1,\n", " u'Funkadelic': 1,\n", " u'Gary Numan': 1,\n", " u'George Harrison': 1,\n", " u'George Kranz': 1,\n", " u'George McCrae': 1,\n", " u'George Michael': 2,\n", " u'Geto Boys': 1,\n", " u'Gil Scott-Heron': 2,\n", " u'Gilbert O\\u2019Sullivan': 1,\n", " u'Girls Aloud': 1,\n", " u'Glasvegas': 1,\n", " u'Glen Campbell': 1,\n", " u'Gloria Gaynor': 1,\n", " u'Gloria Jones': 1,\n", " u'Gorillaz': 1,\n", " u'Grace Jones': 4,\n", " u'Gram Parsons': 2,\n", " u'Gram Parsons with Emmylou Harris': 1,\n", " u'Grandmaster Flash and the Furious Five': 1,\n", " u'Grandmaster Melle Mel': 1,\n", " u'Green Day': 1,\n", " u'Gregory Isaacs': 1,\n", " u'Grinderman': 1,\n", " u'Guns N\\u2019 Roses': 1,\n", " u'Guy Clark': 1,\n", " u'Gwen McRae': 1,\n", " u'Half Man Half Biscuit': 1,\n", " u'Halls and Oates': 1,\n", " u'Hamilton Bohannon': 1,\n", " u'Hank Williams': 3,\n", " u'Happy Mondays': 2,\n", " u'Heaven 17': 1,\n", " u'Heinz': 1,\n", " u'Herbie Hancock': 1,\n", " u'Herman D\\xfcne': 1,\n", " u'Herman\\u2019s Hermits': 1,\n", " u'Hot Chip': 1,\n", " u'Hot Chocolate': 1,\n", " u'House of Pain': 1,\n", " u'H\\xfcsker D\\xfc': 1,\n", " u'Ian Campbell Folk Group': 1,\n", " u'Ian Dury': 1,\n", " u'Ian Dury and the Blockheads': 2,\n", " u'Ice Cube': 1,\n", " u'If I Could Turn Back Time': 1,\n", " u'Ike and Tina Turner': 1,\n", " u'Indeep': 1,\n", " u'Inner City': 1,\n", " u'Irene Cara': 1,\n", " u'Irma Thomas': 1,\n", " u'JC Lodge': 1,\n", " u'Jackie Brenston and His Delta Cats': 1,\n", " u'Jackie Wilson': 1,\n", " u'James Brown': 2,\n", " u'James Carr': 1,\n", " u'Jamie Principle': 1,\n", " u'Jan Bradley': 1,\n", " u'Jane Birkin and Serge Gainsbourg': 1,\n", " u'Janet Kay': 1,\n", " u'Janis Joplin': 1,\n", " u'Janis Joplin and Big Brother and the Holding Company': 1,\n", " u'Jarvis Cocker': 1,\n", " u'Jeannie C Riley ': 1,\n", " u'Jerry Lee Lewis': 1,\n", " u'Jimi Hendrix': 1,\n", " u'Jimmie Rodgers': 1,\n", " u'Jimmy Cliff': 1,\n", " u'Jimmy Reed': 1,\n", " u'Jimmy Ruffin': 1,\n", " u'Jimmy Webb': 1,\n", " u'Joan Baez': 1,\n", " u'Joan Jett and the Blackhearts': 1,\n", " u'Joe Jackson': 1,\n", " u'John Cale and Lou Reed': 1,\n", " u'John Coltrane': 1,\n", " u'John Coltrane Quartet': 1,\n", " u'John Lennon': 3,\n", " u'John Martyn': 1,\n", " u'John Prine': 1,\n", " u'Johnny Bristol': 1,\n", " u'Johnny Cash': 5,\n", " u'Johnny Mandel': 1,\n", " u'Jonathan Richman and the Modern Lovers': 1,\n", " u'Joni Mitchell': 3,\n", " u'Joy Division': 2,\n", " u'Judy Clay and William Bell': 1,\n", " u'Judy Garland': 2,\n", " u'Julian Cope': 1,\n", " u'Junior Murvin': 1,\n", " u'Justice v Simian': 1,\n", " u'KD Lang': 1,\n", " u'Kanye West': 2,\n", " u'Karen Dalton': 1,\n", " u'Kate Bush': 5,\n", " u'Katy Perry': 1,\n", " u'Keep Me in Your Heart': 1,\n", " u'Kelis': 2,\n", " u'Kellee Patterson': 1,\n", " u'Kelly Clarkson': 1,\n", " u'Kid Creole and the Coconuts': 1,\n", " u'Kings of Leon': 1,\n", " u'Kirsty MacColl': 1,\n", " u'Kiss': 1,\n", " u'Klaxons': 1,\n", " u'Kraftwerk': 2,\n", " u'Kris Kristofferson': 1,\n", " u'Kylie Minogue': 1,\n", " u'LCD Soundsystem': 2,\n", " u'LFO': 1,\n", " u'Labelle': 1,\n", " u'Labi Siffre': 1,\n", " u'Larry Young': 1,\n", " u'Laura Branigan': 1,\n", " u'Leadbelly': 1,\n", " u'Led Zeppelin': 2,\n", " u'Lee Dorsey': 1,\n", " u'Lemon Jelly': 1,\n", " u'Leona Lewis': 1,\n", " u'Leonard Cohen': 3,\n", " u'Leroy Hutson': 1,\n", " u'Lesley Gore': 1,\n", " u'Lethal Bizzle': 1,\n", " u'Lily Allen': 1,\n", " u'Lil\\u2019 Louis': 1,\n", " u'Lionel Richie': 1,\n", " u'Little Richard': 1,\n", " u'Live Forever': 1,\n", " u'Liz Phair': 1,\n", " u'Lloyd Price': 1,\n", " u'Look Up': 1,\n", " u'Loose Joints': 1,\n", " u'Lord Kitchener': 1,\n", " u'Loretta Lynn': 1,\n", " u'Lou Reed': 2,\n", " u'Loudon Wainwright III': 1,\n", " u'Louis Armstrong': 3,\n", " u'Louis Jordan and His Tympany Five': 1,\n", " u'Louis Prima and Keely Smith ': 1,\n", " u'Love': 1,\n", " u'Lulu': 1,\n", " u'Luther Vandross': 1,\n", " u'Lynyrd Skynyrd': 1,\n", " u'M': 1,\n", " u'MC5': 1,\n", " u'MGMT': 2,\n", " u'MIA': 1,\n", " u'Maceo and the Macks': 1,\n", " u'Machine': 1,\n", " u'Madonna': 6,\n", " u'Mae West': 1,\n", " u'Malcolm McLaren': 1,\n", " u'Manic Street Preachers': 1,\n", " u'Manu Chao': 2,\n", " u'Manu Dibango': 1,\n", " u'Marianne Faithfull': 1,\n", " u'Marilyn Monroe': 1,\n", " u'Mark Dinning': 1,\n", " u'Mark Ronson featuring Amy Winehouse': 1,\n", " u'Martha and the Vandellas': 1,\n", " u'Martin Carthy': 2,\n", " u'Marvin Gaye': 6,\n", " u'Marvin Gaye And Tammi Terrell': 2,\n", " u'Mary Margaret O\\u2019Hara': 1,\n", " u'Massive Attack': 1,\n", " u'Max Romeo': 1,\n", " u'Max Sedgley': 1,\n", " u'McAlmont and Butler': 1,\n", " u'Memphis Minnie': 1,\n", " u'Merle Haggard': 3,\n", " u'Merrilee and the Turnabouts': 1,\n", " u'Metallica': 1,\n", " u'Me\\u2019Shell Ndegeocello': 1,\n", " u'Michael Jackson': 2,\n", " u'Mick Hanly with Christy Moore': 1,\n", " u'Mike Berry and the Outlaws': 1,\n", " u'Millie Jackson': 1,\n", " u'Minnie Riperton': 1,\n", " u'Mississippi John Hurt': 1,\n", " u'Missy Elliott': 3,\n", " u'Mitch Ryder and the Detroit Wheels': 1,\n", " u'Mohammed Rafi': 1,\n", " u'Morrissey': 1,\n", " u'Mr Vegas': 1,\n", " u'My Favourite Girl': 1,\n", " u'My Neck, My Back (Lick It)': 1,\n", " u'NWA': 1,\n", " u'Nancy Sinatra and Lee Hazlewood': 2,\n", " u'Nas': 1,\n", " u'Nat King Cole': 1,\n", " u'Naturally 7': 1,\n", " u'Naughty by Nature': 1,\n", " u'Needle of Death': 1,\n", " u'Neil Young': 5,\n", " u'Nelly': 1,\n", " u'New Order': 3,\n", " u'Nick Cave': 1,\n", " u'Nick Cave and the Bad Seeds': 1,\n", " u'Nick Cave and the Bad\\xa0Seeds': 1,\n", " u'Nick Drake': 1,\n", " u'Nilsson': 1,\n", " u'Nina Simone': 3,\n", " u'Nine Inch Nails': 1,\n", " u'Nirvana': 1,\n", " u'Nitin Sawhney': 1,\n", " u'Norman Greenbaum': 1,\n", " u'Oasis': 1,\n", " u'Old Man': 1,\n", " u'Ol\\u2019 Dirty Bastard': 1,\n", " u'Orbital': 1,\n", " u'Otis Redding': 1,\n", " u'OutKast': 2,\n", " u'Owen and Leon': 1,\n", " u'PJ Proby': 1,\n", " u'Patsy Cline': 1,\n", " u'Patsy Gallant': 1,\n", " u'Patti Smith': 2,\n", " u'Paul Hardcastle': 1,\n", " u'Paul McCartney and Wings': 1,\n", " u'Paul Simon': 4,\n", " u'Paul Weller': 3,\n", " u'Paul Westerberg': 1,\n", " u'Peaches': 1,\n", " u'Peggy Seeger': 1,\n", " u'Pentangle': 1,\n", " u'Percy Sledge': 1,\n", " u'Pet Shop Boys': 2,\n", " u'Pete Seeger': 1,\n", " u'Peter Gabriel': 2,\n", " u'Peter Tosh': 1,\n", " u'Phil Collins': 1,\n", " u'Phil Ochs': 1,\n", " u'Phuture': 1,\n", " u'Pigbag': 1,\n", " u'Pink Floyd': 2,\n", " u'Pitman': 1,\n", " u'Pixies': 2,\n", " u'Pluto Shervington': 1,\n", " u'Portishead': 2,\n", " u'Primal Scream': 2,\n", " u'Prince': 6,\n", " u'Prince and the Revolution': 2,\n", " u'Public Enemy': 2,\n", " u'Pulp': 5,\n", " u'Queen': 2,\n", " u'Queens of the Stone Age': 1,\n", " u'R D Burman': 1,\n", " u'R Kelly': 1,\n", " u'REM': 3,\n", " u'Radiohead': 1,\n", " u'Rage Against the Machine': 1,\n", " u'Ralph Stanley': 1,\n", " u'Randy Newman': 8,\n", " u'Ray Charles': 4,\n", " u'Rhythim Is Rhythim': 1,\n", " u'Richard Hawley': 1,\n", " u'Richard Thompson': 1,\n", " u'Richard and Linda Thompson': 1,\n", " u'Richie Havens': 1,\n", " u'Rick James': 1,\n", " u'Rihanna': 1,\n", " u'Robbie Williams': 1,\n", " u'Robert Johnson': 1,\n", " u'Robert Wyatt': 1,\n", " u'Roberta Flack': 1,\n", " u'Rod Stewart': 4,\n", " u'Roots Manuva': 1,\n", " u'Rose Royce': 1,\n", " u'Roxy Music': 2,\n", " u'Roy Ayers': 1,\n", " u'Roy Bailey': 1,\n", " u'Roy Davis Jr': 1,\n", " u'Roy Orbison': 4,\n", " u'Rufus featuring Chaka Khan': 1,\n", " u'Ry Cooder': 1,\n", " u'Ryan Adams': 2,\n", " u'Salsoul Orchestra': 1,\n", " u'Salt-n-Pepa': 1,\n", " u'Sam Cooke': 3,\n", " u'Sam Mayo': 1,\n", " u'Sam and Dave': 1,\n", " u'Sapan Chakraborty': 1,\n", " u'Scarface': 1,\n", " u'Scott Walker': 2,\n", " u'Screamin\\u2019 Jay Hawkins': 1,\n", " u'Selector': 1,\n", " u'Sex Pistols': 2,\n", " u'Sham 69': 1,\n", " u'Shangri-Las': 1,\n", " u'Shannon': 1,\n", " u'Sheffield Socialist Choir': 1,\n", " u'Shirley Collins': 1,\n", " u'Shirley Ellis': 1,\n", " u'Simon and Garfunkel': 3,\n", " u'Sinead O\\u2019Connor': 1,\n", " u'Sister Sledge': 2,\n", " u'Skee-Lo': 1,\n", " u'Skip James': 1,\n", " u'Sly and the Family Stone': 2,\n", " u'Small Faces': 1,\n", " u'Smog': 2,\n", " u'Smokey Robinson and the Miracles': 1,\n", " u'Snooks Eaglin': 1,\n", " u'Soft Cell': 1,\n", " u'Solomon Burke': 1,\n", " u'Sonic Youth': 2,\n", " u'Sonny and Cher': 1,\n", " u'Soul Brothers Six': 1,\n", " u'Soul Sisters': 1,\n", " u'Spacemen 3': 1,\n", " u'Sparks': 1,\n", " u'Spike Jones and His City Slickers': 1,\n", " u'Spiritualized': 1,\n", " u'Squeeze': 1,\n", " u'Stephen Fretwell': 1,\n", " u'Steppenwolf': 2,\n", " u'Steve Earle': 4,\n", " u'Steve Goodman': 1,\n", " u'Stevie Wonder': 4,\n", " u'Stiff Little Fingers': 1,\n", " u'Suede': 2,\n", " u'Sue\\xf1o Latino': 1,\n", " u'Super Furry Animals': 1,\n", " u'Sylvester': 1,\n", " u'System Of A Down': 1,\n", " u'T Rex': 1,\n", " u'T-Connection': 1,\n", " u'TLC': 2,\n", " u'Take That': 1,\n", " u'Talib Kweli': 1,\n", " u'Talking Heads': 2,\n", " u'Tammy Wynette': 1,\n", " u'Tears for Fears': 1,\n", " u'Terry Jacks': 1,\n", " u'The Animals': 1,\n", " u'The Artful Dodger featuring Craig David': 1,\n", " u'The B-52\\u2019s': 1,\n", " u'The Beach Boys': 6,\n", " u'The Beat': 1,\n", " u'The Beatles': 19,\n", " u'The Beautiful South': 1,\n", " u'The Bee Gees': 1,\n", " u'The Blue Nile': 1,\n", " u'The Byrds': 3,\n", " u'The Cars': 1,\n", " u'The Carter Family': 1,\n", " u'The Chi-Lites': 2,\n", " u'The Clash': 5,\n", " u'The Coasters': 1,\n", " u'The Communards': 1,\n", " u'The Congos': 1,\n", " u'The Contours': 1,\n", " u'The Cramps': 1,\n", " u'The Crickets': 1,\n", " u'The Crystals': 2,\n", " u'The Cure': 3,\n", " u'The Decemberists': 1,\n", " u'The Disposable Heroes of Hiphoprisy': 1,\n", " u'The Drifters': 1,\n", " u'The Eagles': 1,\n", " u'The Everly Brothers': 1,\n", " u'The Faces': 1,\n", " u'The Fall': 1,\n", " u'The Flirts': 1,\n", " u'The Flying Burrito Brothers': 1,\n", " u'The Good, the Bad and the Queen': 1,\n", " u'The Handsome Family': 1,\n", " u'The Hidden Cameras': 1,\n", " u'The Hold Steady': 1,\n", " u'The House of Love': 1,\n", " u'The Human League': 1,\n", " u'The Impressions': 3,\n", " u'The Isley Brothers': 2,\n", " u'The Jackson 5': 1,\n", " u'The Jam': 3,\n", " u'The Jimi Hendrix Experience': 1,\n", " u'The Killers': 1,\n", " u'The Kingsmen': 1,\n", " u'The Kinks': 6,\n", " u'The Knack': 1,\n", " u'The Libertines': 1,\n", " u'The Louvin Brothers': 2,\n", " u'The Lovin\\u2019 Spoonful': 1,\n", " u'The Mamas and the Papas': 1,\n", " u'The Mothers of Invention': 1,\n", " u'The Mountain Goats': 1,\n", " u'The Nolans': 1,\n", " u'The Normal': 1,\n", " u'The Notorious BIG': 2,\n", " u'The Number of the Beast': 1,\n", " u'The Pogues': 4,\n", " u'The Pointer Sisters': 1,\n", " u'The Police': 4,\n", " u'The Pop Group': 1,\n", " u'The Pretenders': 2,\n", " u'The Proclaimers': 1,\n", " u'The Prodigy': 1,\n", " u'The Rapture': 1,\n", " u'The Righteous Brothers': 2,\n", " u'The Rolling Stones': 8,\n", " u'The Ronettes': 2,\n", " u'The Shangri-Las': 1,\n", " u'The Shirelles': 1,\n", " u'The Small Faces': 1,\n", " u'The Smiths': 5,\n", " u'The Special AKA': 1,\n", " u'The Specials': 2,\n", " u'The Spencer Davis Group': 1,\n", " u'The Spice Girls': 1,\n", " u'The Spinners': 1,\n", " u'The Stanley Brothers': 1,\n", " u'The Staple Singers': 1,\n", " u'The Stone Roses': 1,\n", " u'The Stooges': 1,\n", " u'The Streets': 1,\n", " u'The Strokes': 1,\n", " u'The Sugarhill Gang': 1,\n", " u'The Sundown Playboys': 1,\n", " u'The Supremes': 5,\n", " u'The Surfaris': 1,\n", " u'The Teenagers featuring Frankie Lymon': 1,\n", " u'The Temptations': 2,\n", " u'The The': 1,\n", " u'The Ting Tings': 1,\n", " u'The Trammps': 1,\n", " u'The Troggs': 1,\n", " u'The Undertones': 1,\n", " u'The Vapors': 1,\n", " u'The Velvet Underground': 2,\n", " u'The Velvet Underground and Nico': 1,\n", " u'The Wailers': 1,\n", " u'The Walker Brothers': 1,\n", " u'The Waterboys': 1,\n", " u'The Who': 4,\n", " u'Them': 1,\n", " u'Them/Van Morrison': 1,\n", " u'Thin Lizzy': 1,\n", " u'Thom Yorke': 1,\n", " u'Tim Buckley': 1,\n", " u'Tim Hardin': 1,\n", " u'Tito Puente': 1,\n", " u'Todd Rundgren': 1,\n", " u'Tom Jones': 1,\n", " u'Tom Robinson': 1,\n", " u'Tom Waits': 5,\n", " u'Tommy James and the Shondells': 1,\n", " u'Tone Loc': 1,\n", " u'Toni Braxton': 1,\n", " u'Tony Bennett': 1,\n", " u'Toots and the Maytals': 1,\n", " u'Townes Van Zandt': 1,\n", " u'Tullio De Piscopo': 1,\n", " u'Tupac': 2,\n", " u'Tupac - as Makaveli': 1,\n", " u'Tweet': 1,\n", " u'Twinkle': 1,\n", " u'U2': 5,\n", " u'USA for Africa': 1,\n", " u'Vampire Weekend': 1,\n", " u'Van Morrison': 3,\n", " u'Vic Chesnutt': 1,\n", " u'Wanda Jackson': 1,\n", " u'War': 1,\n", " u'Warren G and Nate Dogg': 1,\n", " u'Wayne Smith': 1,\n", " u'West Street Mob': 1,\n", " u'Wham!': 1,\n", " u'Whitney Houston': 2,\n", " u'William Blake, Charles Hubert and Hastings Parry': 1,\n", " u'Willie Nelson': 1,\n", " u'Wilson Pickett': 2,\n", " u'Woody Guthrie': 2,\n", " u'World Domination Enterprises': 1,\n", " u'X-Ray Spex': 1}" ] } ], "prompt_number": 5 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now, loop through that dictionary and select the key-value pairs corresponding to the artists with 4 or more songs in the *1,000 songs* list:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "song_by_artist_4plus = {k:v for k,v in song_by_artist.items() if v>=4}\n", "\n", "song_by_artist_4plus" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 6, "text": [ "{u'Al Green': 5,\n", " u'Arctic Monkeys': 4,\n", " u'Bill Withers': 4,\n", " u'Bob Dylan': 24,\n", " u'Bruce Springsteen': 4,\n", " u'Chuck Berry': 5,\n", " u'David Bowie': 9,\n", " u'Elvis Costello and the Attractions': 5,\n", " u'Elvis Presley': 6,\n", " u'Frank Sinatra': 6,\n", " u'Grace Jones': 4,\n", " u'Johnny Cash': 5,\n", " u'Kate Bush': 5,\n", " u'Madonna': 6,\n", " u'Marvin Gaye': 6,\n", " u'Neil Young': 5,\n", " u'Paul Simon': 4,\n", " u'Prince': 6,\n", " u'Pulp': 5,\n", " u'Randy Newman': 8,\n", " u'Ray Charles': 4,\n", " u'Rod Stewart': 4,\n", " u'Roy Orbison': 4,\n", " u'Steve Earle': 4,\n", " u'Stevie Wonder': 4,\n", " u'The Beach Boys': 6,\n", " u'The Beatles': 19,\n", " u'The Clash': 5,\n", " u'The Kinks': 6,\n", " u'The Pogues': 4,\n", " u'The Police': 4,\n", " u'The Rolling Stones': 8,\n", " u'The Smiths': 5,\n", " u'The Supremes': 5,\n", " u'The Who': 4,\n", " u'Tom Waits': 5,\n", " u'U2': 5}" ] } ], "prompt_number": 6 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Next, as Python dictionaries cannot be sorted, make separate lists of keys and values from the `song_by_artist_4plus` dictionary and sort them in descending order:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "import numpy as np \n", "\n", "# Lists of keys and values\n", "my_keys = song_by_artist_4plus.keys()\n", "my_vals = song_by_artist_4plus.values()\n", "\n", "# Find indices of sorted values (first converted to a numpy array)\n", "i_sorted = np.argsort(np.array(my_vals))[::-1]\n", "\n", "# Sort both the keys and value list \n", "my_keys_sorted = [my_keys[i] for i in i_sorted]\n", "my_vals_sorted = [my_vals[i] for i in i_sorted]" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 7 }, { "cell_type": "heading", "level": 5, "metadata": {}, "source": [ "Plot it using Plotly!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If have a plotly account as well as a credentials file set up on your machine, singing in to Plotly's servers is done automatically while importing `plotly.plotly`:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "import plotly.plotly as py " ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 8 }, { "cell_type": "markdown", "metadata": {}, "source": [ "For more info on how to sign up or sign in to Plotly, see Plotly's Python API User Guide.\n", "\n", "Next, import a few graph objects needed to make our Plotly plot:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "from plotly.graph_objs import Figure, Data, Layout\n", "from plotly.graph_objs import Bar\n", "from plotly.graph_objs import XAxis, YAxis, Marker, Font, Margin" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 9 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Make an instance of the bar and data object:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "my_bar = Bar(x=my_keys_sorted, # labels of the x-axis\n", " y=my_vals_sorted, # values of the y-axis\n", " marker= Marker(color='#2ca02c')) # a nice green color\n", "\n", "my_data = Data([my_bar]) # make data object, (Data accepts only list)" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 10 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Make an instance of the layout object:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "my_title = 'Number of songs listed in the Guardian\\'s
\\\n", "Top 1,000 Songs to Hear Before You Die per artist with 4 or more songs'\n", "my_ytitle = 'Number of songs per artist'\n", "\n", "my_layout = Layout(title=my_title, # set plot title\n", " showlegend=False, # remove legend \n", " font= Font(family='Georgia, serif', # set global font family\n", " color='#635F5D'), # and color \n", " plot_bgcolor='#EFECEA', # set plot color to grey\n", " xaxis= XAxis(title='', # no x-axis title\n", " tickangle=45, # tick labels' angle\n", " ticks='outside', # draw ticks outside axes \n", " ticklen=8, # tick length\n", " tickwidth=1.5,), # and width, \n", " yaxis= YAxis(title=my_ytitle, # y-axis title\n", " gridcolor='#FFFFFF', # white grid lines\n", " ticks='outside', \n", " ticklen=8, \n", " tickwidth=1.5),\n", " autosize=False, # manual figure size\n", " width=700, \n", " height=500,\n", " margin= Margin(b=140) # increase bottom margin, \n", " ) # to fit long x-axis tick labels" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 11 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Make instance of the figure object, send it to Plotly and get a plot in return inside this IPython notebook:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "my_fig = Figure(data=my_data, layout=my_layout)\n", "\n", "py.iplot(my_fig, filename='socrata1')" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "" ], "metadata": {}, "output_type": "display_data", "text": [ "" ] } ], "prompt_number": 12 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Not bad, but let's try to improve our plot by making use of Plotly's hover capabilities.\n", "\n", "Next, we add hover text to each of bars so that hovering with cursor over them will show a list of the songs' titles and years of release included in the *1,000 songs* list in chronological order.\n", "\n", "First, we need to trim the original dataframe so that it contains only the artists with 4 or more songs in the *1,000 songs* list:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "# Rows which have 'artist' name in song_by_artist_4plus\n", "i_good = (df['artist'].isin(song_by_artist_4plus)) \n", "\n", "df_good = df[i_good] # a new dataframe" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 13 }, { "cell_type": "code", "collapsed": false, "input": [ "df_good.shape # a much smaller dataframe than the original" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 14, "text": [ "(222, 5)" ] } ], "prompt_number": 14 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Next, loop through the sorted artists names building a text list to be linked the the `'text'` key in the data object.\n", "\n", "Unfortunately, the biggest lists will have to be truncated to fit inside the Plotly figure:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "my_text = [] # init. the hover-text list\n", "\n", "# Loop through the sorted artist names, so that my_text\n", "# will have to same ordering as the values linked to 'x' and 'y' in my_data\n", "for k in my_keys_sorted:\n", " \n", " # Slice dataframe to artist name and sort songs by year\n", " i_artist = (df['artist']==k)\n", " df_tmp = df_good[i_artist].sort(columns='year')\n", " \n", " my_text_tmp = '' # init. string \n", " cnt_song = 0 # song counter for given artist\n", " N_song = len(df_tmp['title']) # total number of song for given artist\n", " \n", " # Loop through songs\n", " for i_song, song in df_tmp.iterrows():\n", " \n", " # Add to string and counter\n", " my_text_tmp += song['title']+' ('+str(song['year'])+')
'\n", " cnt_song += 1\n", " \n", " # Skip if song list is too long to fit on figure\n", " if cnt_song>12:\n", " diff = N_song - cnt_song\n", " my_text_tmp += ' and '+str(diff)+' more ...'\n", " break\n", " \n", " # Append hover-text list\n", " my_text += [my_text_tmp]\n", " \n", "# Update figure object \n", "my_fig['data'][0].update(text=my_text)" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 16 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Finally, add a text annotation citing our data source to our plot:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "from plotly.graph_objs import Annotation\n", "\n", "my_anno_text = 'Open Data by Socrata
\\\n", "Hover over the bars to see list of songs'\n", "\n", "my_anno = Annotation(text=my_anno_text, # annotation text\n", " x=0.95, # position's x-coord\n", " y=0.95, # and y-coord\n", " xref='paper', # use paper coords\n", " yref='paper', # for both coordinates\n", " font= Font(size=14), # increase font size (default is 12)\n", " showarrow=False, # remove arrow \n", " bgcolor='#FFFFFF', # white background\n", " borderpad=4) # space bt. border and text (in px)\n", "\n", "# Update figure object\n", "my_fig['layout'].update(annotations=[my_anno])" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 17 }, { "cell_type": "markdown", "metadata": {}, "source": [ "And now all we have left to do is to send the updated figure object to plotly:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "py.iplot(my_fig, filename='socrata1-hover')" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "" ], "metadata": {}, "output_type": "display_data", "text": [ "" ] } ], "prompt_number": 18 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Spend some time hovering over the bars and admire plotly's interactibility!\n", "\n", "\n", "*Great data and beautiful visualization, at your finger tips.*\n", "\n", "
\n", "\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", " \n", "
\n", "\n", "

Got Questions or Feedback?

\n", "\n", "About Plotly\n", "\n", "* email: feedback@plot.ly \n", "* tweet: \n", "@plotlygraphs\n", "\n", "

Notebook styling ideas

\n", "\n", "Big thanks to\n", "\n", "* Cam Davidson-Pilon\n", "* Lorena A. Barba\n", "\n", "
" ] }, { "cell_type": "code", "collapsed": false, "input": [ "# CSS styling within IPython notebook\n", "from IPython.core.display import HTML\n", "import urllib2\n", "def css_styling():\n", " url = 'https://raw.githubusercontent.com/plotly/python-user-guide/master/custom.css'\n", " styles = urllib2.urlopen(url).read()\n", " return HTML(styles)\n", "\n", "css_styling()" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "\n", "\n" ], "metadata": {}, "output_type": "pyout", "prompt_number": 18, "text": [ "" ] } ], "prompt_number": 18 }, { "cell_type": "code", "collapsed": false, "input": [], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 18 } ], "metadata": {} } ] }