## 04 Grabbing HTML tables with Pandas
What if you saw a table you wanted on a web page? For example:https://en.wikipedia.org/wiki/List_of_countries_by_carbon_dioxide_emissions. Can Python help us download those data? 

Why yes. Yes it can.

Specifically, we use the Pandas' `read_html` function, which is able to identify tables in an HTML page and pull them out into a dataframe object.

In [None]:
#Import pandas
import pandas

In [None]:
#We'll need a package called lxml; install if not already
try:
 import lxml
except:
 !pip install lxml

In [None]:
#Here, the read_html function pulls into a list object any table in the URL we provide.
the_url = 'https://en.wikipedia.org/wiki/List_of_countries_by_carbon_dioxide_emissions'
tableList = pandas.read_html(the_url)
print ("{} tables were found".format(len(tableList)))

In [None]:
#Let's grab the 2nd table one and display it's firt five rows
df = tableList[1]
df.head()

As an aside, the resulting table has multiple column indices. Mutliindex dataframes are powerful, but also quite confusing. We'll simply drop the first header row using the `drop_level()` command

In [None]:
#Fetch just the columns associated with the top column level of "Fossil CO2 emissions(Mt CO2)"
df_fixed = df.droplevel(
 level=0, #drops the first header row
 axis ='columns') #tells pandas we are dealing with columns, not rows
df_fixed.head()

In [None]:
#Now we can save it to a local file using df.to_csv()
df.to_csv("Carbon.csv", # The output filename
 index=False, # We opt not to write out the index
 encoding='utf8') # This deals with issues surrounding countries with odd characters

In [None]:
#...or we can examine it
#Here is as quick preview of pandas' plotting capability
%matplotlib inline
df_fixed.iloc[3:,].plot.scatter(x='1990',y='2017');