{ "cells": [ { "cell_type": "markdown", "metadata": { "toc": "true" }, "source": [ "# Table of Contents\n", "

1  将数据进行初步整理
1.1  Year 2007
1.2  Year 2008
1.3  Year 2009
1.4  Year 2010
1.5  Year 2011
1.6  Year 2012
1.7  Year 2013
1.7.1  Data Source 1
1.7.2  Data Source 2 补充数据
1.7.3  Data Source 3: 2013年使用excel数据源
1.8  Year 2014
1.9  Year 2015
1.10  Year 2016
1.11  Year 2017
2  数据合并
" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "import pandas as pd\n", "import numpy as np\n", "import matplotlib.pyplot as plt\n", "import seaborn as sns\n", "\n", "% matplotlib inline" ] }, { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "# 将数据进行初步整理" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Year 2007" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* 2007年的数据,原始数据的单位为十亿美元" ] }, { "cell_type": "code", "execution_count": 205, "metadata": { "collapsed": false, "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "the shape of DataFrame: (2000, 9)\n", "年份 int64\n", "排名(Rank) int64\n", "公司名称(Company) object\n", "所在国家或地区(Country) object\n", "所在行业(Industry) object\n", "销售收入(Sales) object\n", "利润(Profits) object\n", "总资产(Assets) object\n", "市值(Market Vaue) float64\n", "dtype: object\n" ] }, { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
年份排名(Rank)公司名称(Company)所在国家或地区(Country)所在行业(Industry)销售收入(Sales)利润(Profits)总资产(Assets)市值(Market Vaue)
020071Citigroup /花旗集团美国(US)银行146.5621.541,884.32247.42
120072Bank of America /美国银行美国(US)银行116.5721.131,459.74226.61
220073HSBC Holdings/汇丰集团英国(UK)银行121.5116.631,860.76202.29
\n", "
" ], "text/plain": [ " 年份 排名(Rank) 公司名称(Company) 所在国家或地区(Country) 所在行业(Industry) \\\n", "0 2007 1 Citigroup /花旗集团 美国(US) 银行 \n", "1 2007 2 Bank of America /美国银行 美国(US) 银行 \n", "2 2007 3 HSBC Holdings/汇丰集团 英国(UK) 银行 \n", "\n", " 销售收入(Sales) 利润(Profits) 总资产(Assets) 市值(Market Vaue) \n", "0 146.56 21.54 1,884.32 247.42 \n", "1 116.57 21.13 1,459.74 226.61 \n", "2 121.51 16.63 1,860.76 202.29 " ] }, "execution_count": 205, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_2007 = pd.read_csv('./data/data_forbes_2007.csv', encoding='gbk', thousands=',')\n", "print('the shape of DataFrame: ', df_2007.shape)\n", "print(df_2007.dtypes)\n", "df_2007.head(3)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* 更新columns的命名" ] }, { "cell_type": "code", "execution_count": 206, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
YearRankCompany_cn_enCountry_cn_enIndustry_cnSalesProfitsAssetsMarket_value
020071Citigroup /花旗集团美国(US)银行146.5621.541,884.32247.42
120072Bank of America /美国银行美国(US)银行116.5721.131,459.74226.61
220073HSBC Holdings/汇丰集团英国(UK)银行121.5116.631,860.76202.29
\n", "
" ], "text/plain": [ " Year Rank Company_cn_en Country_cn_en Industry_cn Sales \\\n", "0 2007 1 Citigroup /花旗集团 美国(US) 银行 146.56 \n", "1 2007 2 Bank of America /美国银行 美国(US) 银行 116.57 \n", "2 2007 3 HSBC Holdings/汇丰集团 英国(UK) 银行 121.51 \n", "\n", " Profits Assets Market_value \n", "0 21.54 1,884.32 247.42 \n", "1 21.13 1,459.74 226.61 \n", "2 16.63 1,860.76 202.29 " ] }, "execution_count": 206, "metadata": {}, "output_type": "execute_result" } ], "source": [ "column_update = ['Year', 'Rank', 'Company_cn_en', 'Country_cn_en', \n", " 'Industry_cn', 'Sales', 'Profits', 'Assets', 'Market_value']\n", "df_2007.columns = column_update\n", "df_2007.head(3)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* **通过前面的分析可看出,只有“Market_value”是数字类型,找出'Sales','Profits'及'Assets'中非数字的内容**" ] }, { "cell_type": "code", "execution_count": 207, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
YearRankCompany_cn_enCountry_cn_enIndustry_cnSalesProfitsAssetsMarket_value
1172007118Repsol-YPF /瑞普索西班牙(SP)炼油64.20 E4.1258.4338.75
6162007617Inpex Holdings日本(JA)炼油6.49 E1.02 E10.77 E19.65
8802007881Asahi Breweries/朝日啤酒日本(JA)食品、饮料和烟草7.97 E0.3810.667.71
\n", "
" ], "text/plain": [ " Year Rank Company_cn_en Country_cn_en Industry_cn Sales \\\n", "117 2007 118 Repsol-YPF /瑞普索 西班牙(SP) 炼油 64.20 E \n", "616 2007 617 Inpex Holdings 日本(JA) 炼油 6.49 E \n", "880 2007 881 Asahi Breweries/朝日啤酒 日本(JA) 食品、饮料和烟草 7.97 E \n", "\n", " Profits Assets Market_value \n", "117 4.12 58.43 38.75 \n", "616 1.02 E 10.77 E 19.65 \n", "880 0.38 10.66 7.71 " ] }, "execution_count": 207, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_2007[df_2007['Sales'].str.contains('.*[A-Za-z]', regex=True)]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* 用replace()方法替换“Sales”列中含有字母的内容" ] }, { "cell_type": "code", "execution_count": 208, "metadata": { "collapsed": false, "scrolled": true }, "outputs": [], "source": [ "df_2007['Sales'] = df_2007['Sales'].replace('([A-Za-z])', '', regex=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* 查看替换后的结果" ] }, { "cell_type": "code", "execution_count": 209, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
YearRankCompany_cn_enCountry_cn_enIndustry_cnSalesProfitsAssetsMarket_value
1172007118Repsol-YPF /瑞普索西班牙(SP)炼油64.204.1258.4338.75
6162007617Inpex Holdings日本(JA)炼油6.491.02 E10.77 E19.65
8802007881Asahi Breweries/朝日啤酒日本(JA)食品、饮料和烟草7.970.3810.667.71
\n", "
" ], "text/plain": [ " Year Rank Company_cn_en Country_cn_en Industry_cn Sales \\\n", "117 2007 118 Repsol-YPF /瑞普索 西班牙(SP) 炼油 64.20 \n", "616 2007 617 Inpex Holdings 日本(JA) 炼油 6.49 \n", "880 2007 881 Asahi Breweries/朝日啤酒 日本(JA) 食品、饮料和烟草 7.97 \n", "\n", " Profits Assets Market_value \n", "117 4.12 58.43 38.75 \n", "616 1.02 E 10.77 E 19.65 \n", "880 0.38 10.66 7.71 " ] }, "execution_count": 209, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_2007.loc[[117,616,880], :]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* **查看“Assets”列中非数字的内容**" ] }, { "cell_type": "code", "execution_count": 210, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
YearRankCompany_cn_enCountry_cn_enIndustry_cnSalesProfitsAssetsMarket_value
6162007617Inpex Holdings日本(JA)炼油6.491.02 E10.77 E19.65
\n", "
" ], "text/plain": [ " Year Rank Company_cn_en Country_cn_en Industry_cn Sales Profits \\\n", "616 2007 617 Inpex Holdings 日本(JA) 炼油 6.49 1.02 E \n", "\n", " Assets Market_value \n", "616 10.77 E 19.65 " ] }, "execution_count": 210, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_2007[df_2007['Assets'].str.contains('.*[A-Za-z]', regex=True)]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* 替换非数字的内容,以及替换千分位间隔符号" ] }, { "cell_type": "code", "execution_count": 211, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "Year 2007\n", "Rank 617\n", "Company_cn_en Inpex Holdings\n", "Country_cn_en 日本(JA)\n", "Industry_cn 炼油\n", "Sales 6.49 \n", "Profits 1.02 E\n", "Assets 10.77 \n", "Market_value 19.65\n", "Name: 616, dtype: object" ] }, "execution_count": 211, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# 将数字后面的字母进行替换\n", "df_2007['Assets'] = df_2007['Assets'].replace('([A-Za-z])', '', regex=True)\n", "\n", "# 千分位数字的逗号被识别为string了,需要替换\n", "df_2007['Assets'] = df_2007['Assets'].replace(',', '', regex=True)\n", "df_2007.loc[616, :]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* **发现“Profits”中有NaN值,需要先进行替换**" ] }, { "cell_type": "code", "execution_count": 212, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
YearRankCompany_cn_enCountry_cn_enIndustry_cnSalesProfitsAssetsMarket_value
9582007959UAL/美国联合航空公司美国(US)运输19.34NaN25.864.43
144020071441Owens Corning/欧文斯科宁美国(US)建筑6.46NaN8.474.19
154420071545Parmalat/帕玛拉特公司意大利(IT)食品、饮料和烟草4.83NaN4.907.02
191220071912Winn-Dixie Stores美国(US)食品市场6.96NaN1.621.05
\n", "
" ], "text/plain": [ " Year Rank Company_cn_en Country_cn_en Industry_cn Sales \\\n", "958 2007 959 UAL/美国联合航空公司 美国(US) 运输 19.34 \n", "1440 2007 1441 Owens Corning/欧文斯科宁 美国(US) 建筑 6.46 \n", "1544 2007 1545 Parmalat/帕玛拉特公司 意大利(IT) 食品、饮料和烟草 4.83 \n", "1912 2007 1912 Winn-Dixie Stores 美国(US) 食品市场 6.96 \n", "\n", " Profits Assets Market_value \n", "958 NaN 25.86 4.43 \n", "1440 NaN 8.47 4.19 \n", "1544 NaN 4.90 7.02 \n", "1912 NaN 1.62 1.05 " ] }, "execution_count": 212, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_2007[pd.isnull(df_2007['Profits'])]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* 将NaN值填充为 0" ] }, { "cell_type": "code", "execution_count": 213, "metadata": { "collapsed": false, "scrolled": true }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
YearRankCompany_cn_enCountry_cn_enIndustry_cnSalesProfitsAssetsMarket_value
9582007959UAL/美国联合航空公司美国(US)运输19.34025.864.43
144020071441Owens Corning/欧文斯科宁美国(US)建筑6.4608.474.19
154420071545Parmalat/帕玛拉特公司意大利(IT)食品、饮料和烟草4.8304.907.02
191220071912Winn-Dixie Stores美国(US)食品市场6.9601.621.05
\n", "
" ], "text/plain": [ " Year Rank Company_cn_en Country_cn_en Industry_cn Sales \\\n", "958 2007 959 UAL/美国联合航空公司 美国(US) 运输 19.34 \n", "1440 2007 1441 Owens Corning/欧文斯科宁 美国(US) 建筑 6.46 \n", "1544 2007 1545 Parmalat/帕玛拉特公司 意大利(IT) 食品、饮料和烟草 4.83 \n", "1912 2007 1912 Winn-Dixie Stores 美国(US) 食品市场 6.96 \n", "\n", " Profits Assets Market_value \n", "958 0 25.86 4.43 \n", "1440 0 8.47 4.19 \n", "1544 0 4.90 7.02 \n", "1912 0 1.62 1.05 " ] }, "execution_count": 213, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_2007['Profits'].fillna(0, inplace=True)\n", "df_2007.loc[[958,1440,1544,1912], :]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* 将“Profits”列中非数字的内容进行替换,并查看替换后的结果" ] }, { "cell_type": "code", "execution_count": 214, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
YearRankCompany_cn_enCountry_cn_enIndustry_cnSalesProfitsAssetsMarket_value
1172007118Repsol-YPF /瑞普索西班牙(SP)炼油64.204.1258.4338.75
6162007617Inpex Holdings日本(JA)炼油6.491.0210.7719.65
8802007881Asahi Breweries/朝日啤酒日本(JA)食品、饮料和烟草7.970.3810.667.71
\n", "
" ], "text/plain": [ " Year Rank Company_cn_en Country_cn_en Industry_cn Sales \\\n", "117 2007 118 Repsol-YPF /瑞普索 西班牙(SP) 炼油 64.20 \n", "616 2007 617 Inpex Holdings 日本(JA) 炼油 6.49 \n", "880 2007 881 Asahi Breweries/朝日啤酒 日本(JA) 食品、饮料和烟草 7.97 \n", "\n", " Profits Assets Market_value \n", "117 4.12 58.43 38.75 \n", "616 1.02 10.77 19.65 \n", "880 0.38 10.66 7.71 " ] }, "execution_count": 214, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_2007['Profits'] = df_2007['Profits'].replace('([A-Za-z])', '', regex=True)\n", "df_2007.loc[[117,616,880], :]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* **将sting类型的数字转换为数据类型,这里使用 pd.to_numeric() 方法**" ] }, { "cell_type": "code", "execution_count": 215, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "Year int64\n", "Rank int64\n", "Company_cn_en object\n", "Country_cn_en object\n", "Industry_cn object\n", "Sales float64\n", "Profits float64\n", "Assets float64\n", "Market_value float64\n", "dtype: object" ] }, "execution_count": 215, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_2007['Sales'] = pd.to_numeric(df_2007['Sales'])\n", "df_2007['Profits'] = pd.to_numeric(df_2007['Profits'])\n", "df_2007['Assets'] = pd.to_numeric(df_2007['Assets'])\n", "df_2007.dtypes" ] }, { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "* **拆分\"Company_cn_en\"列**,新生成两列,分别为公司英文名称和中文名称" ] }, { "cell_type": "code", "execution_count": 216, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0 Citigroup \n", "1 Bank of America \n", "2 HSBC Holdings\n", "3 General Electric \n", "4 JPMorgan Chase \n", "Name: Company_en, dtype: object\n", "1995 NaN\n", "1996 NaN\n", "1997 NaN\n", "1998 NaN\n", "1999 NaN\n", "Name: Company_cn, dtype: object\n" ] } ], "source": [ "df_2007['Company_en'],df_2007['Company_cn'] = df_2007['Company_cn_en'].str.split('/', 1).str\n", "print(df_2007['Company_en'][:5])\n", "print(df_2007['Company_cn'] [-5:])" ] }, { "cell_type": "code", "execution_count": 217, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
YearRankCompany_cn_enCountry_cn_enIndustry_cnSalesProfitsAssetsMarket_valueCompany_enCompany_cn
199720071998CBOT Holdings美国(US)综合金融0.640.170.818.54CBOT HoldingsNaN
199820071998Singapore Petroleum新加坡(SI)炼油5.590.192.051.50Singapore PetroleumNaN
199920072000DVB Bank德国(GE)银行0.770.0612.741.26DVB BankNaN
\n", "
" ], "text/plain": [ " Year Rank Company_cn_en Country_cn_en Industry_cn Sales \\\n", "1997 2007 1998 CBOT Holdings 美国(US) 综合金融 0.64 \n", "1998 2007 1998 Singapore Petroleum 新加坡(SI) 炼油 5.59 \n", "1999 2007 2000 DVB Bank 德国(GE) 银行 0.77 \n", "\n", " Profits Assets Market_value Company_en Company_cn \n", "1997 0.17 0.81 8.54 CBOT Holdings NaN \n", "1998 0.19 2.05 1.50 Singapore Petroleum NaN \n", "1999 0.06 12.74 1.26 DVB Bank NaN " ] }, "execution_count": 217, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_2007.tail(3)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* **拆分\"Country_cn_en\"列**,新生成两列,分别为国家中文名称和英文名称" ] }, { "cell_type": "code", "execution_count": 218, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0 美国\n", "1 美国\n", "2 英国\n", "3 美国\n", "4 美国\n", "Name: Country_cn, dtype: object\n", "1995 US)\n", "1996 US)\n", "1997 US)\n", "1998 SI)\n", "1999 GE)\n", "Name: Country_en, dtype: object\n" ] } ], "source": [ "df_2007['Country_cn'],df_2007['Country_en'] = df_2007['Country_cn_en'].str.split('(', 1).str\n", "print(df_2007['Country_cn'][:5])\n", "print(df_2007['Country_en'][-5:])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* 由于国家的英文名称中,最后有半个括号,需要去除,用 Series.str.slice()方法\n", "* 参数表示选取从开始到倒数第二个,即不要括号\")\"" ] }, { "cell_type": "code", "execution_count": 219, "metadata": { "collapsed": false, "scrolled": true }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
YearRankCompany_cn_enCountry_cn_enIndustry_cnSalesProfitsAssetsMarket_valueCompany_enCompany_cnCountry_cnCountry_en
020071Citigroup /花旗集团美国(US)银行146.5621.541884.32247.42Citigroup花旗集团美国US
120072Bank of America /美国银行美国(US)银行116.5721.131459.74226.61Bank of America美国银行美国US
220073HSBC Holdings/汇丰集团英国(UK)银行121.5116.631860.76202.29HSBC Holdings汇丰集团英国UK
\n", "
" ], "text/plain": [ " Year Rank Company_cn_en Country_cn_en Industry_cn Sales \\\n", "0 2007 1 Citigroup /花旗集团 美国(US) 银行 146.56 \n", "1 2007 2 Bank of America /美国银行 美国(US) 银行 116.57 \n", "2 2007 3 HSBC Holdings/汇丰集团 英国(UK) 银行 121.51 \n", "\n", " Profits Assets Market_value Company_en Company_cn Country_cn \\\n", "0 21.54 1884.32 247.42 Citigroup 花旗集团 美国 \n", "1 21.13 1459.74 226.61 Bank of America 美国银行 美国 \n", "2 16.63 1860.76 202.29 HSBC Holdings 汇丰集团 英国 \n", "\n", " Country_en \n", "0 US \n", "1 US \n", "2 UK " ] }, "execution_count": 219, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_2007['Country_en'] = df_2007['Country_en'].str.slice(0,-1)\n", "df_2007.head(3)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* 考虑的中国的企业有区分为中国大陆,中国香港,中国台湾\n", "* 对应的国家英文名称也需要修改下\n", "* 中国大陆:CN;中国香港:CN-HK;中国台湾:CN-TA" ] }, { "cell_type": "code", "execution_count": 220, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
YearRankCompany_cn_enCountry_cn_enIndustry_cnSalesProfitsAssetsMarket_valueCompany_enCompany_cnCountry_cnCountry_en
40200741PetroChina /中国石油中国大陆(CN)炼油68.4316.5396.42208.76PetroChina中国石油中国大陆CN
52200753ICBC /中国工商银行中国大陆(CN)银行31.984.65800.04176.03ICBC中国工商银行中国大陆CN
68200769CCB-China Construction Bank /中国建设银行中国大陆(CN)银行23.185.84568.21126.55CCB-China Construction Bank中国建设银行中国大陆CN
70200771Sinopec-China Petroleum /中石化中国大陆(CN)炼油99.035.0765.8393.57Sinopec-China Petroleum中石化中国大陆CN
81200782Bank of China /中国银行中国大陆(CN)银行23.103.41585.55143.80Bank of China中国银行中国大陆CN
88200789China Mobile /中国移动中国香港(HK)/中国大陆(CN)电信运营商29.796.5651.35185.31China Mobile中国移动中国香港HK)/中国大陆(CN
1752007176Hutchison Whampoa/和记黄埔中国香港(HK)/中国大陆(CN)多元化23.551.8574.9740.57Hutchison Whampoa和记黄埔中国香港HK)/中国大陆(CN
1802007181China Telecom/中国电信中国大陆(CN)电信运营商20.983.4650.3437.50China Telecom中国电信中国大陆CN
2422007243China Life Insurance /中国人寿中国大陆(CN)保险11.181.1569.30109.96China Life Insurance中国人寿中国大陆CN
3072007308Bank of Communications/中国交通银行中国大陆(CN)银行6.641.15176.2746.14Bank of Communications中国交通银行中国大陆CN
3092007310Taiwan Semiconductor/台积电中国台湾(TA)半导体9.743.9018.0254.32Taiwan Semiconductor台积电中国台湾TA
3402007341Hon Hai Precision Ind /鸿海精密中国台湾(TA)技术硬件和装备27.781.2413.9934.83Hon Hai Precision Ind鸿海精密中国台湾TA
3652007366Baoshan Iron & Steel /上海宝钢集团中国大陆(CN)材料15.631.5717.5921.42Baoshan Iron & Steel上海宝钢集团中国大陆CN
3882007389Cathay Financial/国泰金融中国台湾(TA)保险10.090.6693.2919.87Cathay Financial国泰金融中国台湾TA
3942007395Cnooc /中海油中国香港(HK)/中国大陆(CN)炼油8.513.1014.2234.94Cnooc中海油中国香港HK)/中国大陆(CN
4002007401China Netcom Group /中国网通中国香港(HK)/中国大陆(CN)电信运营商10.691.7024.7015.70China Netcom Group中国网通中国香港HK)/中国大陆(CN
4222007423China Shenhua Energy/中国神华能源股份有限公司中国大陆(CN)材料6.471.9417.0845.94China Shenhua Energy中国神华能源股份有限公司中国大陆CN
4292007430BOC Hong Kong/中银香港中国香港(HK)/中国大陆(CN)银行4.131.74106.0325.58BOC Hong Kong中银香港中国香港HK)/中国大陆(CN
4362007437Formosa Petrochemical/台塑石化中国台湾(TA)炼油13.561.7412.3519.28Formosa Petrochemical台塑石化中国台湾TA
4392007440Ping An Insurance Group/平安保险中国大陆(CN)保险7.950.5239.6239.60Ping An Insurance Group平安保险中国大陆CN
4512007452Jardine Matheson/香港怡和集团中国香港(HK)/中国大陆(CN)食品市场11.961.2518.3413.59Jardine Matheson香港怡和集团中国香港HK)/中国大陆(CN
5102007511Sun Hung Kai Properties /新鸿基房地产中国香港(HK)/中国大陆(CN)综合金融3.302.5629.7229.49Sun Hung Kai Properties新鸿基房地产中国香港HK)/中国大陆(CN
5412007542China Unicom /中国联通中国香港(HK)/中国大陆(CN)电信运营商10.670.6017.6316.03China Unicom中国联通中国香港HK)/中国大陆(CN
5512007552CLP Holdings /中电控股中国香港(HK)/中国大陆(CN)公用事业5.871.2716.4217.65CLP Holdings中电控股中国香港HK)/中国大陆(CN
5752007576Chunghwa Telecom/中华电信中国台湾(TA)电信运营商5.591.4513.9818.22Chunghwa Telecom中华电信中国台湾TA
6002007601China Steel/台湾中钢公司中国台湾(TA)材料8.661.5410.3512.24China Steel台湾中钢公司中国台湾TA
6032007604China Merchants Bank/招商银行中国大陆(CN)银行3.530.4690.7633.19China Merchants Bank招商银行中国大陆CN
6172007617Nan Ya Plastic/南亚塑胶工业中国台湾(TA)化学制品7.641.2211.4713.37Nan Ya Plastic南亚塑胶工业中国台湾TA
6272007628Cheung Kong/长江集团中国香港(HK)/中国大陆(CN)综合金融0.801.8028.0128.39Cheung Kong长江集团中国香港HK)/中国大陆(CN
7362007737Swire Pacific /太古集团中国香港(HK)/中国大陆(CN)多元化2.442.4216.0517.32Swire Pacific太古集团中国香港HK)/中国大陆(CN
..........................................
163620071637Champion REIT中国香港(HK)/中国大陆(CN)综合金融0.051.162.951.54Champion REITNaN中国香港HK)/中国大陆(CN
164120071642Noble Group中国香港(HK)/中国大陆(CN)运输13.750.133.812.14Noble GroupNaN中国香港HK)/中国大陆(CN
166120071662Taiwan Mobile中国台湾(TA)电信运营商1.810.503.594.84Taiwan MobileNaN中国台湾TA
168120071682Evergreen Marine中国台湾(TA)运输4.290.373.961.90Evergreen MarineNaN中国台湾TA
169220071693China Southern Airlines中国大陆(CN)运输4.64-0.238.841.97China Southern AirlinesNaN中国大陆CN
170520071706Cosco Pacific中国香港(HK)/中国大陆(CN)运输0.300.342.855.94Cosco PacificNaN中国香港HK)/中国大陆(CN
171020071711China Shipping Container中国大陆(CN)运输3.520.443.592.26China Shipping ContainerNaN中国大陆CN
173620071737China Resources Power Holdings中国香港(HK)/中国大陆(CN)公用事业0.760.373.675.37China Resources Power HoldingsNaN中国香港HK)/中国大陆(CN
173920071740Citic Securities中国大陆(CN)综合金融0.140.042.5214.29Citic SecuritiesNaN中国大陆CN
178020071781Far EasTone Telecom中国台湾(TA)电信运营商2.190.453.014.45Far EasTone TelecomNaN中国台湾TA
178620071787E.Sun Financial中国台湾(TA)银行0.730.1419.362.19E.Sun FinancialNaN中国台湾TA
182420071825Minmetals Development中国大陆(CN)贸易公司8.250.043.461.50Minmetals DevelopmentNaN中国大陆CN
184020071841Shanghai Automotive中国大陆(CN)耐用消费品0.790.141.8111.10Shanghai AutomotiveNaN中国大陆CN
184620071847HK Exchanges & Clearing中国香港(HK)/中国大陆(CN)综合金融0.350.172.9610.97HK Exchanges & ClearingNaN中国香港HK)/中国大陆(CN
185220071853Link REIT中国香港(HK)/中国大陆(CN)综合金融0.430.275.245.00Link REITNaN中国香港HK)/中国大陆(CN
186020071861Kweichow Moutai中国大陆(CN)食品、饮料和烟草0.430.141.0010.69Kweichow MoutaiNaN中国大陆CN
189220071892Yanzhou Coal Mining中国大陆(CN)材料1.430.362.634.52Yanzhou Coal MiningNaN中国大陆CN
190820071909China Shipping Develop中国大陆(CN)运输1.060.331.664.61China Shipping DevelopNaN中国大陆CN
192020071920Wing Lung Bank中国香港(HK)/中国大陆(CN)银行0.660.2110.922.43Wing Lung BankNaN中国香港HK)/中国大陆(CN
192220071923Delta Electronics中国台湾(TA)技术硬件和装备2.460.232.496.40Delta ElectronicsNaN中国台湾TA
194520071946China Airlines中国台湾(TA)运输3.610.027.631.85China AirlinesNaN中国台湾TA
194820071949Wing Hang Bank中国香港(HK)/中国大陆(CN)银行0.660.1713.453.33Wing Hang BankNaN中国香港HK)/中国大陆(CN
195920071959PCCW中国香港(HK)/中国大陆(CN)电信运营商2.900.216.873.98PCCWNaN中国香港HK)/中国大陆(CN
196020071961Benq中国台湾(TA)技术硬件和装备5.39-0.165.041.27BenqNaN中国台湾TA
196320071964TCL Corp中国大陆(CN)技术硬件和装备6.40-0.043.771.39TCL CorpNaN中国大陆CN
197020071971Wuliangye Yibin中国大陆(CN)食品、饮料和烟草0.700.101.198.81Wuliangye YibinNaN中国大陆CN
197320071974CNPC (Hong Kong)中国香港(HK)/中国大陆(CN)炼油0.440.472.072.30CNPC (Hong Kong)NaN中国香港HK)/中国大陆(CN
197520071976K Wah International中国香港(HK)/中国大陆(CN)综合金融0.040.471.290.98K Wah InternationalNaN中国香港HK)/中国大陆(CN
198620071987China Overseas Land & Inv中国香港(HK)/中国大陆(CN)综合金融0.900.203.247.05China Overseas Land & InvNaN中国香港HK)/中国大陆(CN
198920071989Nine Dragons Paper Holdings中国香港(HK)/中国大陆(CN)材料0.990.171.868.61Nine Dragons Paper HoldingsNaN中国香港HK)/中国大陆(CN
\n", "

131 rows × 13 columns

\n", "
" ], "text/plain": [ " Year Rank Company_cn_en Country_cn_en \\\n", "40 2007 41 PetroChina /中国石油 中国大陆(CN) \n", "52 2007 53 ICBC /中国工商银行 中国大陆(CN) \n", "68 2007 69 CCB-China Construction Bank /中国建设银行 中国大陆(CN) \n", "70 2007 71 Sinopec-China Petroleum /中石化 中国大陆(CN) \n", "81 2007 82 Bank of China /中国银行 中国大陆(CN) \n", "88 2007 89 China Mobile /中国移动 中国香港(HK)/中国大陆(CN) \n", "175 2007 176 Hutchison Whampoa/和记黄埔 中国香港(HK)/中国大陆(CN) \n", "180 2007 181 China Telecom/中国电信 中国大陆(CN) \n", "242 2007 243 China Life Insurance /中国人寿 中国大陆(CN) \n", "307 2007 308 Bank of Communications/中国交通银行 中国大陆(CN) \n", "309 2007 310 Taiwan Semiconductor/台积电 中国台湾(TA) \n", "340 2007 341 Hon Hai Precision Ind /鸿海精密 中国台湾(TA) \n", "365 2007 366 Baoshan Iron & Steel /上海宝钢集团 中国大陆(CN) \n", "388 2007 389 Cathay Financial/国泰金融 中国台湾(TA) \n", "394 2007 395 Cnooc /中海油 中国香港(HK)/中国大陆(CN) \n", "400 2007 401 China Netcom Group /中国网通 中国香港(HK)/中国大陆(CN) \n", "422 2007 423 China Shenhua Energy/中国神华能源股份有限公司 中国大陆(CN) \n", "429 2007 430 BOC Hong Kong/中银香港 中国香港(HK)/中国大陆(CN) \n", "436 2007 437 Formosa Petrochemical/台塑石化 中国台湾(TA) \n", "439 2007 440 Ping An Insurance Group/平安保险 中国大陆(CN) \n", "451 2007 452 Jardine Matheson/香港怡和集团 中国香港(HK)/中国大陆(CN) \n", "510 2007 511 Sun Hung Kai Properties /新鸿基房地产 中国香港(HK)/中国大陆(CN) \n", "541 2007 542 China Unicom /中国联通 中国香港(HK)/中国大陆(CN) \n", "551 2007 552 CLP Holdings /中电控股 中国香港(HK)/中国大陆(CN) \n", "575 2007 576 Chunghwa Telecom/中华电信 中国台湾(TA) \n", "600 2007 601 China Steel/台湾中钢公司 中国台湾(TA) \n", "603 2007 604 China Merchants Bank/招商银行 中国大陆(CN) \n", "617 2007 617 Nan Ya Plastic/南亚塑胶工业 中国台湾(TA) \n", "627 2007 628 Cheung Kong/长江集团 中国香港(HK)/中国大陆(CN) \n", "736 2007 737 Swire Pacific /太古集团 中国香港(HK)/中国大陆(CN) \n", "... ... ... ... ... \n", "1636 2007 1637 Champion REIT 中国香港(HK)/中国大陆(CN) \n", "1641 2007 1642 Noble Group 中国香港(HK)/中国大陆(CN) \n", "1661 2007 1662 Taiwan Mobile 中国台湾(TA) \n", "1681 2007 1682 Evergreen Marine 中国台湾(TA) \n", "1692 2007 1693 China Southern Airlines 中国大陆(CN) \n", "1705 2007 1706 Cosco Pacific 中国香港(HK)/中国大陆(CN) \n", "1710 2007 1711 China Shipping Container 中国大陆(CN) \n", "1736 2007 1737 China Resources Power Holdings 中国香港(HK)/中国大陆(CN) \n", "1739 2007 1740 Citic Securities 中国大陆(CN) \n", "1780 2007 1781 Far EasTone Telecom 中国台湾(TA) \n", "1786 2007 1787 E.Sun Financial 中国台湾(TA) \n", "1824 2007 1825 Minmetals Development 中国大陆(CN) \n", "1840 2007 1841 Shanghai Automotive 中国大陆(CN) \n", "1846 2007 1847 HK Exchanges & Clearing 中国香港(HK)/中国大陆(CN) \n", "1852 2007 1853 Link REIT 中国香港(HK)/中国大陆(CN) \n", "1860 2007 1861 Kweichow Moutai 中国大陆(CN) \n", "1892 2007 1892 Yanzhou Coal Mining 中国大陆(CN) \n", "1908 2007 1909 China Shipping Develop 中国大陆(CN) \n", "1920 2007 1920 Wing Lung Bank 中国香港(HK)/中国大陆(CN) \n", "1922 2007 1923 Delta Electronics 中国台湾(TA) \n", "1945 2007 1946 China Airlines 中国台湾(TA) \n", "1948 2007 1949 Wing Hang Bank 中国香港(HK)/中国大陆(CN) \n", "1959 2007 1959 PCCW 中国香港(HK)/中国大陆(CN) \n", "1960 2007 1961 Benq 中国台湾(TA) \n", "1963 2007 1964 TCL Corp 中国大陆(CN) \n", "1970 2007 1971 Wuliangye Yibin 中国大陆(CN) \n", "1973 2007 1974 CNPC (Hong Kong) 中国香港(HK)/中国大陆(CN) \n", "1975 2007 1976 K Wah International 中国香港(HK)/中国大陆(CN) \n", "1986 2007 1987 China Overseas Land & Inv 中国香港(HK)/中国大陆(CN) \n", "1989 2007 1989 Nine Dragons Paper Holdings 中国香港(HK)/中国大陆(CN) \n", "\n", " Industry_cn Sales Profits Assets Market_value \\\n", "40 炼油 68.43 16.53 96.42 208.76 \n", "52 银行 31.98 4.65 800.04 176.03 \n", "68 银行 23.18 5.84 568.21 126.55 \n", "70 炼油 99.03 5.07 65.83 93.57 \n", "81 银行 23.10 3.41 585.55 143.80 \n", "88 电信运营商 29.79 6.56 51.35 185.31 \n", "175 多元化 23.55 1.85 74.97 40.57 \n", "180 电信运营商 20.98 3.46 50.34 37.50 \n", "242 保险 11.18 1.15 69.30 109.96 \n", "307 银行 6.64 1.15 176.27 46.14 \n", "309 半导体 9.74 3.90 18.02 54.32 \n", "340 技术硬件和装备 27.78 1.24 13.99 34.83 \n", "365 材料 15.63 1.57 17.59 21.42 \n", "388 保险 10.09 0.66 93.29 19.87 \n", "394 炼油 8.51 3.10 14.22 34.94 \n", "400 电信运营商 10.69 1.70 24.70 15.70 \n", "422 材料 6.47 1.94 17.08 45.94 \n", "429 银行 4.13 1.74 106.03 25.58 \n", "436 炼油 13.56 1.74 12.35 19.28 \n", "439 保险 7.95 0.52 39.62 39.60 \n", "451 食品市场 11.96 1.25 18.34 13.59 \n", "510 综合金融 3.30 2.56 29.72 29.49 \n", "541 电信运营商 10.67 0.60 17.63 16.03 \n", "551 公用事业 5.87 1.27 16.42 17.65 \n", "575 电信运营商 5.59 1.45 13.98 18.22 \n", "600 材料 8.66 1.54 10.35 12.24 \n", "603 银行 3.53 0.46 90.76 33.19 \n", "617 化学制品 7.64 1.22 11.47 13.37 \n", "627 综合金融 0.80 1.80 28.01 28.39 \n", "736 多元化 2.44 2.42 16.05 17.32 \n", "... ... ... ... ... ... \n", "1636 综合金融 0.05 1.16 2.95 1.54 \n", "1641 运输 13.75 0.13 3.81 2.14 \n", "1661 电信运营商 1.81 0.50 3.59 4.84 \n", "1681 运输 4.29 0.37 3.96 1.90 \n", "1692 运输 4.64 -0.23 8.84 1.97 \n", "1705 运输 0.30 0.34 2.85 5.94 \n", "1710 运输 3.52 0.44 3.59 2.26 \n", "1736 公用事业 0.76 0.37 3.67 5.37 \n", "1739 综合金融 0.14 0.04 2.52 14.29 \n", "1780 电信运营商 2.19 0.45 3.01 4.45 \n", "1786 银行 0.73 0.14 19.36 2.19 \n", "1824 贸易公司 8.25 0.04 3.46 1.50 \n", "1840 耐用消费品 0.79 0.14 1.81 11.10 \n", "1846 综合金融 0.35 0.17 2.96 10.97 \n", "1852 综合金融 0.43 0.27 5.24 5.00 \n", "1860 食品、饮料和烟草 0.43 0.14 1.00 10.69 \n", "1892 材料 1.43 0.36 2.63 4.52 \n", "1908 运输 1.06 0.33 1.66 4.61 \n", "1920 银行 0.66 0.21 10.92 2.43 \n", "1922 技术硬件和装备 2.46 0.23 2.49 6.40 \n", "1945 运输 3.61 0.02 7.63 1.85 \n", "1948 银行 0.66 0.17 13.45 3.33 \n", "1959 电信运营商 2.90 0.21 6.87 3.98 \n", "1960 技术硬件和装备 5.39 -0.16 5.04 1.27 \n", "1963 技术硬件和装备 6.40 -0.04 3.77 1.39 \n", "1970 食品、饮料和烟草 0.70 0.10 1.19 8.81 \n", "1973 炼油 0.44 0.47 2.07 2.30 \n", "1975 综合金融 0.04 0.47 1.29 0.98 \n", "1986 综合金融 0.90 0.20 3.24 7.05 \n", "1989 材料 0.99 0.17 1.86 8.61 \n", "\n", " Company_en Company_cn Country_cn Country_en \n", "40 PetroChina 中国石油 中国大陆 CN \n", "52 ICBC 中国工商银行 中国大陆 CN \n", "68 CCB-China Construction Bank 中国建设银行 中国大陆 CN \n", "70 Sinopec-China Petroleum 中石化 中国大陆 CN \n", "81 Bank of China 中国银行 中国大陆 CN \n", "88 China Mobile 中国移动 中国香港 HK)/中国大陆(CN \n", "175 Hutchison Whampoa 和记黄埔 中国香港 HK)/中国大陆(CN \n", "180 China Telecom 中国电信 中国大陆 CN \n", "242 China Life Insurance 中国人寿 中国大陆 CN \n", "307 Bank of Communications 中国交通银行 中国大陆 CN \n", "309 Taiwan Semiconductor 台积电 中国台湾 TA \n", "340 Hon Hai Precision Ind 鸿海精密 中国台湾 TA \n", "365 Baoshan Iron & Steel 上海宝钢集团 中国大陆 CN \n", "388 Cathay Financial 国泰金融 中国台湾 TA \n", "394 Cnooc 中海油 中国香港 HK)/中国大陆(CN \n", "400 China Netcom Group 中国网通 中国香港 HK)/中国大陆(CN \n", "422 China Shenhua Energy 中国神华能源股份有限公司 中国大陆 CN \n", "429 BOC Hong Kong 中银香港 中国香港 HK)/中国大陆(CN \n", "436 Formosa Petrochemical 台塑石化 中国台湾 TA \n", "439 Ping An Insurance Group 平安保险 中国大陆 CN \n", "451 Jardine Matheson 香港怡和集团 中国香港 HK)/中国大陆(CN \n", "510 Sun Hung Kai Properties 新鸿基房地产 中国香港 HK)/中国大陆(CN \n", "541 China Unicom 中国联通 中国香港 HK)/中国大陆(CN \n", "551 CLP Holdings 中电控股 中国香港 HK)/中国大陆(CN \n", "575 Chunghwa Telecom 中华电信 中国台湾 TA \n", "600 China Steel 台湾中钢公司 中国台湾 TA \n", "603 China Merchants Bank 招商银行 中国大陆 CN \n", "617 Nan Ya Plastic 南亚塑胶工业 中国台湾 TA \n", "627 Cheung Kong 长江集团 中国香港 HK)/中国大陆(CN \n", "736 Swire Pacific 太古集团 中国香港 HK)/中国大陆(CN \n", "... ... ... ... ... \n", "1636 Champion REIT NaN 中国香港 HK)/中国大陆(CN \n", "1641 Noble Group NaN 中国香港 HK)/中国大陆(CN \n", "1661 Taiwan Mobile NaN 中国台湾 TA \n", "1681 Evergreen Marine NaN 中国台湾 TA \n", "1692 China Southern Airlines NaN 中国大陆 CN \n", "1705 Cosco Pacific NaN 中国香港 HK)/中国大陆(CN \n", "1710 China Shipping Container NaN 中国大陆 CN \n", "1736 China Resources Power Holdings NaN 中国香港 HK)/中国大陆(CN \n", "1739 Citic Securities NaN 中国大陆 CN \n", "1780 Far EasTone Telecom NaN 中国台湾 TA \n", "1786 E.Sun Financial NaN 中国台湾 TA \n", "1824 Minmetals Development NaN 中国大陆 CN \n", "1840 Shanghai Automotive NaN 中国大陆 CN \n", "1846 HK Exchanges & Clearing NaN 中国香港 HK)/中国大陆(CN \n", "1852 Link REIT NaN 中国香港 HK)/中国大陆(CN \n", "1860 Kweichow Moutai NaN 中国大陆 CN \n", "1892 Yanzhou Coal Mining NaN 中国大陆 CN \n", "1908 China Shipping Develop NaN 中国大陆 CN \n", "1920 Wing Lung Bank NaN 中国香港 HK)/中国大陆(CN \n", "1922 Delta Electronics NaN 中国台湾 TA \n", "1945 China Airlines NaN 中国台湾 TA \n", "1948 Wing Hang Bank NaN 中国香港 HK)/中国大陆(CN \n", "1959 PCCW NaN 中国香港 HK)/中国大陆(CN \n", "1960 Benq NaN 中国台湾 TA \n", "1963 TCL Corp NaN 中国大陆 CN \n", "1970 Wuliangye Yibin NaN 中国大陆 CN \n", "1973 CNPC (Hong Kong) NaN 中国香港 HK)/中国大陆(CN \n", "1975 K Wah International NaN 中国香港 HK)/中国大陆(CN \n", "1986 China Overseas Land & Inv NaN 中国香港 HK)/中国大陆(CN \n", "1989 Nine Dragons Paper Holdings NaN 中国香港 HK)/中国大陆(CN \n", "\n", "[131 rows x 13 columns]" ] }, "execution_count": 220, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_2007[df_2007['Country_cn'].str.contains('中国',regex=True)]" ] }, { "cell_type": "code", "execution_count": 221, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
YearRankCompany_cn_enCountry_cn_enIndustry_cnSalesProfitsAssetsMarket_valueCompany_enCompany_cnCountry_cnCountry_en
40200741PetroChina /中国石油中国大陆(CN)炼油68.4316.5396.42208.76PetroChina中国石油中国大陆CN
52200753ICBC /中国工商银行中国大陆(CN)银行31.984.65800.04176.03ICBC中国工商银行中国大陆CN
68200769CCB-China Construction Bank /中国建设银行中国大陆(CN)银行23.185.84568.21126.55CCB-China Construction Bank中国建设银行中国大陆CN
70200771Sinopec-China Petroleum /中石化中国大陆(CN)炼油99.035.0765.8393.57Sinopec-China Petroleum中石化中国大陆CN
81200782Bank of China /中国银行中国大陆(CN)银行23.103.41585.55143.80Bank of China中国银行中国大陆CN
88200789China Mobile /中国移动中国香港(HK)/中国大陆(CN)电信运营商29.796.5651.35185.31China Mobile中国移动中国香港CN-HK
1752007176Hutchison Whampoa/和记黄埔中国香港(HK)/中国大陆(CN)多元化23.551.8574.9740.57Hutchison Whampoa和记黄埔中国香港CN-HK
1802007181China Telecom/中国电信中国大陆(CN)电信运营商20.983.4650.3437.50China Telecom中国电信中国大陆CN
2422007243China Life Insurance /中国人寿中国大陆(CN)保险11.181.1569.30109.96China Life Insurance中国人寿中国大陆CN
3072007308Bank of Communications/中国交通银行中国大陆(CN)银行6.641.15176.2746.14Bank of Communications中国交通银行中国大陆CN
3092007310Taiwan Semiconductor/台积电中国台湾(TA)半导体9.743.9018.0254.32Taiwan Semiconductor台积电中国台湾CN-TA
3402007341Hon Hai Precision Ind /鸿海精密中国台湾(TA)技术硬件和装备27.781.2413.9934.83Hon Hai Precision Ind鸿海精密中国台湾CN-TA
3652007366Baoshan Iron & Steel /上海宝钢集团中国大陆(CN)材料15.631.5717.5921.42Baoshan Iron & Steel上海宝钢集团中国大陆CN
3882007389Cathay Financial/国泰金融中国台湾(TA)保险10.090.6693.2919.87Cathay Financial国泰金融中国台湾CN-TA
3942007395Cnooc /中海油中国香港(HK)/中国大陆(CN)炼油8.513.1014.2234.94Cnooc中海油中国香港CN-HK
4002007401China Netcom Group /中国网通中国香港(HK)/中国大陆(CN)电信运营商10.691.7024.7015.70China Netcom Group中国网通中国香港CN-HK
4222007423China Shenhua Energy/中国神华能源股份有限公司中国大陆(CN)材料6.471.9417.0845.94China Shenhua Energy中国神华能源股份有限公司中国大陆CN
4292007430BOC Hong Kong/中银香港中国香港(HK)/中国大陆(CN)银行4.131.74106.0325.58BOC Hong Kong中银香港中国香港CN-HK
4362007437Formosa Petrochemical/台塑石化中国台湾(TA)炼油13.561.7412.3519.28Formosa Petrochemical台塑石化中国台湾CN-TA
4392007440Ping An Insurance Group/平安保险中国大陆(CN)保险7.950.5239.6239.60Ping An Insurance Group平安保险中国大陆CN
4512007452Jardine Matheson/香港怡和集团中国香港(HK)/中国大陆(CN)食品市场11.961.2518.3413.59Jardine Matheson香港怡和集团中国香港CN-HK
5102007511Sun Hung Kai Properties /新鸿基房地产中国香港(HK)/中国大陆(CN)综合金融3.302.5629.7229.49Sun Hung Kai Properties新鸿基房地产中国香港CN-HK
5412007542China Unicom /中国联通中国香港(HK)/中国大陆(CN)电信运营商10.670.6017.6316.03China Unicom中国联通中国香港CN-HK
5512007552CLP Holdings /中电控股中国香港(HK)/中国大陆(CN)公用事业5.871.2716.4217.65CLP Holdings中电控股中国香港CN-HK
5752007576Chunghwa Telecom/中华电信中国台湾(TA)电信运营商5.591.4513.9818.22Chunghwa Telecom中华电信中国台湾CN-TA
6002007601China Steel/台湾中钢公司中国台湾(TA)材料8.661.5410.3512.24China Steel台湾中钢公司中国台湾CN-TA
6032007604China Merchants Bank/招商银行中国大陆(CN)银行3.530.4690.7633.19China Merchants Bank招商银行中国大陆CN
6172007617Nan Ya Plastic/南亚塑胶工业中国台湾(TA)化学制品7.641.2211.4713.37Nan Ya Plastic南亚塑胶工业中国台湾CN-TA
6272007628Cheung Kong/长江集团中国香港(HK)/中国大陆(CN)综合金融0.801.8028.0128.39Cheung Kong长江集团中国香港CN-HK
7362007737Swire Pacific /太古集团中国香港(HK)/中国大陆(CN)多元化2.442.4216.0517.32Swire Pacific太古集团中国香港CN-HK
..........................................
163620071637Champion REIT中国香港(HK)/中国大陆(CN)综合金融0.051.162.951.54Champion REITNaN中国香港CN-HK
164120071642Noble Group中国香港(HK)/中国大陆(CN)运输13.750.133.812.14Noble GroupNaN中国香港CN-HK
166120071662Taiwan Mobile中国台湾(TA)电信运营商1.810.503.594.84Taiwan MobileNaN中国台湾CN-TA
168120071682Evergreen Marine中国台湾(TA)运输4.290.373.961.90Evergreen MarineNaN中国台湾CN-TA
169220071693China Southern Airlines中国大陆(CN)运输4.64-0.238.841.97China Southern AirlinesNaN中国大陆CN
170520071706Cosco Pacific中国香港(HK)/中国大陆(CN)运输0.300.342.855.94Cosco PacificNaN中国香港CN-HK
171020071711China Shipping Container中国大陆(CN)运输3.520.443.592.26China Shipping ContainerNaN中国大陆CN
173620071737China Resources Power Holdings中国香港(HK)/中国大陆(CN)公用事业0.760.373.675.37China Resources Power HoldingsNaN中国香港CN-HK
173920071740Citic Securities中国大陆(CN)综合金融0.140.042.5214.29Citic SecuritiesNaN中国大陆CN
178020071781Far EasTone Telecom中国台湾(TA)电信运营商2.190.453.014.45Far EasTone TelecomNaN中国台湾CN-TA
178620071787E.Sun Financial中国台湾(TA)银行0.730.1419.362.19E.Sun FinancialNaN中国台湾CN-TA
182420071825Minmetals Development中国大陆(CN)贸易公司8.250.043.461.50Minmetals DevelopmentNaN中国大陆CN
184020071841Shanghai Automotive中国大陆(CN)耐用消费品0.790.141.8111.10Shanghai AutomotiveNaN中国大陆CN
184620071847HK Exchanges & Clearing中国香港(HK)/中国大陆(CN)综合金融0.350.172.9610.97HK Exchanges & ClearingNaN中国香港CN-HK
185220071853Link REIT中国香港(HK)/中国大陆(CN)综合金融0.430.275.245.00Link REITNaN中国香港CN-HK
186020071861Kweichow Moutai中国大陆(CN)食品、饮料和烟草0.430.141.0010.69Kweichow MoutaiNaN中国大陆CN
189220071892Yanzhou Coal Mining中国大陆(CN)材料1.430.362.634.52Yanzhou Coal MiningNaN中国大陆CN
190820071909China Shipping Develop中国大陆(CN)运输1.060.331.664.61China Shipping DevelopNaN中国大陆CN
192020071920Wing Lung Bank中国香港(HK)/中国大陆(CN)银行0.660.2110.922.43Wing Lung BankNaN中国香港CN-HK
192220071923Delta Electronics中国台湾(TA)技术硬件和装备2.460.232.496.40Delta ElectronicsNaN中国台湾CN-TA
194520071946China Airlines中国台湾(TA)运输3.610.027.631.85China AirlinesNaN中国台湾CN-TA
194820071949Wing Hang Bank中国香港(HK)/中国大陆(CN)银行0.660.1713.453.33Wing Hang BankNaN中国香港CN-HK
195920071959PCCW中国香港(HK)/中国大陆(CN)电信运营商2.900.216.873.98PCCWNaN中国香港CN-HK
196020071961Benq中国台湾(TA)技术硬件和装备5.39-0.165.041.27BenqNaN中国台湾CN-TA
196320071964TCL Corp中国大陆(CN)技术硬件和装备6.40-0.043.771.39TCL CorpNaN中国大陆CN
197020071971Wuliangye Yibin中国大陆(CN)食品、饮料和烟草0.700.101.198.81Wuliangye YibinNaN中国大陆CN
197320071974CNPC (Hong Kong)中国香港(HK)/中国大陆(CN)炼油0.440.472.072.30CNPC (Hong Kong)NaN中国香港CN-HK
197520071976K Wah International中国香港(HK)/中国大陆(CN)综合金融0.040.471.290.98K Wah InternationalNaN中国香港CN-HK
198620071987China Overseas Land & Inv中国香港(HK)/中国大陆(CN)综合金融0.900.203.247.05China Overseas Land & InvNaN中国香港CN-HK
198920071989Nine Dragons Paper Holdings中国香港(HK)/中国大陆(CN)材料0.990.171.868.61Nine Dragons Paper HoldingsNaN中国香港CN-HK
\n", "

131 rows × 13 columns

\n", "
" ], "text/plain": [ " Year Rank Company_cn_en Country_cn_en \\\n", "40 2007 41 PetroChina /中国石油 中国大陆(CN) \n", "52 2007 53 ICBC /中国工商银行 中国大陆(CN) \n", "68 2007 69 CCB-China Construction Bank /中国建设银行 中国大陆(CN) \n", "70 2007 71 Sinopec-China Petroleum /中石化 中国大陆(CN) \n", "81 2007 82 Bank of China /中国银行 中国大陆(CN) \n", "88 2007 89 China Mobile /中国移动 中国香港(HK)/中国大陆(CN) \n", "175 2007 176 Hutchison Whampoa/和记黄埔 中国香港(HK)/中国大陆(CN) \n", "180 2007 181 China Telecom/中国电信 中国大陆(CN) \n", "242 2007 243 China Life Insurance /中国人寿 中国大陆(CN) \n", "307 2007 308 Bank of Communications/中国交通银行 中国大陆(CN) \n", "309 2007 310 Taiwan Semiconductor/台积电 中国台湾(TA) \n", "340 2007 341 Hon Hai Precision Ind /鸿海精密 中国台湾(TA) \n", "365 2007 366 Baoshan Iron & Steel /上海宝钢集团 中国大陆(CN) \n", "388 2007 389 Cathay Financial/国泰金融 中国台湾(TA) \n", "394 2007 395 Cnooc /中海油 中国香港(HK)/中国大陆(CN) \n", "400 2007 401 China Netcom Group /中国网通 中国香港(HK)/中国大陆(CN) \n", "422 2007 423 China Shenhua Energy/中国神华能源股份有限公司 中国大陆(CN) \n", "429 2007 430 BOC Hong Kong/中银香港 中国香港(HK)/中国大陆(CN) \n", "436 2007 437 Formosa Petrochemical/台塑石化 中国台湾(TA) \n", "439 2007 440 Ping An Insurance Group/平安保险 中国大陆(CN) \n", "451 2007 452 Jardine Matheson/香港怡和集团 中国香港(HK)/中国大陆(CN) \n", "510 2007 511 Sun Hung Kai Properties /新鸿基房地产 中国香港(HK)/中国大陆(CN) \n", "541 2007 542 China Unicom /中国联通 中国香港(HK)/中国大陆(CN) \n", "551 2007 552 CLP Holdings /中电控股 中国香港(HK)/中国大陆(CN) \n", "575 2007 576 Chunghwa Telecom/中华电信 中国台湾(TA) \n", "600 2007 601 China Steel/台湾中钢公司 中国台湾(TA) \n", "603 2007 604 China Merchants Bank/招商银行 中国大陆(CN) \n", "617 2007 617 Nan Ya Plastic/南亚塑胶工业 中国台湾(TA) \n", "627 2007 628 Cheung Kong/长江集团 中国香港(HK)/中国大陆(CN) \n", "736 2007 737 Swire Pacific /太古集团 中国香港(HK)/中国大陆(CN) \n", "... ... ... ... ... \n", "1636 2007 1637 Champion REIT 中国香港(HK)/中国大陆(CN) \n", "1641 2007 1642 Noble Group 中国香港(HK)/中国大陆(CN) \n", "1661 2007 1662 Taiwan Mobile 中国台湾(TA) \n", "1681 2007 1682 Evergreen Marine 中国台湾(TA) \n", "1692 2007 1693 China Southern Airlines 中国大陆(CN) \n", "1705 2007 1706 Cosco Pacific 中国香港(HK)/中国大陆(CN) \n", "1710 2007 1711 China Shipping Container 中国大陆(CN) \n", "1736 2007 1737 China Resources Power Holdings 中国香港(HK)/中国大陆(CN) \n", "1739 2007 1740 Citic Securities 中国大陆(CN) \n", "1780 2007 1781 Far EasTone Telecom 中国台湾(TA) \n", "1786 2007 1787 E.Sun Financial 中国台湾(TA) \n", "1824 2007 1825 Minmetals Development 中国大陆(CN) \n", "1840 2007 1841 Shanghai Automotive 中国大陆(CN) \n", "1846 2007 1847 HK Exchanges & Clearing 中国香港(HK)/中国大陆(CN) \n", "1852 2007 1853 Link REIT 中国香港(HK)/中国大陆(CN) \n", "1860 2007 1861 Kweichow Moutai 中国大陆(CN) \n", "1892 2007 1892 Yanzhou Coal Mining 中国大陆(CN) \n", "1908 2007 1909 China Shipping Develop 中国大陆(CN) \n", "1920 2007 1920 Wing Lung Bank 中国香港(HK)/中国大陆(CN) \n", "1922 2007 1923 Delta Electronics 中国台湾(TA) \n", "1945 2007 1946 China Airlines 中国台湾(TA) \n", "1948 2007 1949 Wing Hang Bank 中国香港(HK)/中国大陆(CN) \n", "1959 2007 1959 PCCW 中国香港(HK)/中国大陆(CN) \n", "1960 2007 1961 Benq 中国台湾(TA) \n", "1963 2007 1964 TCL Corp 中国大陆(CN) \n", "1970 2007 1971 Wuliangye Yibin 中国大陆(CN) \n", "1973 2007 1974 CNPC (Hong Kong) 中国香港(HK)/中国大陆(CN) \n", "1975 2007 1976 K Wah International 中国香港(HK)/中国大陆(CN) \n", "1986 2007 1987 China Overseas Land & Inv 中国香港(HK)/中国大陆(CN) \n", "1989 2007 1989 Nine Dragons Paper Holdings 中国香港(HK)/中国大陆(CN) \n", "\n", " Industry_cn Sales Profits Assets Market_value \\\n", "40 炼油 68.43 16.53 96.42 208.76 \n", "52 银行 31.98 4.65 800.04 176.03 \n", "68 银行 23.18 5.84 568.21 126.55 \n", "70 炼油 99.03 5.07 65.83 93.57 \n", "81 银行 23.10 3.41 585.55 143.80 \n", "88 电信运营商 29.79 6.56 51.35 185.31 \n", "175 多元化 23.55 1.85 74.97 40.57 \n", "180 电信运营商 20.98 3.46 50.34 37.50 \n", "242 保险 11.18 1.15 69.30 109.96 \n", "307 银行 6.64 1.15 176.27 46.14 \n", "309 半导体 9.74 3.90 18.02 54.32 \n", "340 技术硬件和装备 27.78 1.24 13.99 34.83 \n", "365 材料 15.63 1.57 17.59 21.42 \n", "388 保险 10.09 0.66 93.29 19.87 \n", "394 炼油 8.51 3.10 14.22 34.94 \n", "400 电信运营商 10.69 1.70 24.70 15.70 \n", "422 材料 6.47 1.94 17.08 45.94 \n", "429 银行 4.13 1.74 106.03 25.58 \n", "436 炼油 13.56 1.74 12.35 19.28 \n", "439 保险 7.95 0.52 39.62 39.60 \n", "451 食品市场 11.96 1.25 18.34 13.59 \n", "510 综合金融 3.30 2.56 29.72 29.49 \n", "541 电信运营商 10.67 0.60 17.63 16.03 \n", "551 公用事业 5.87 1.27 16.42 17.65 \n", "575 电信运营商 5.59 1.45 13.98 18.22 \n", "600 材料 8.66 1.54 10.35 12.24 \n", "603 银行 3.53 0.46 90.76 33.19 \n", "617 化学制品 7.64 1.22 11.47 13.37 \n", "627 综合金融 0.80 1.80 28.01 28.39 \n", "736 多元化 2.44 2.42 16.05 17.32 \n", "... ... ... ... ... ... \n", "1636 综合金融 0.05 1.16 2.95 1.54 \n", "1641 运输 13.75 0.13 3.81 2.14 \n", "1661 电信运营商 1.81 0.50 3.59 4.84 \n", "1681 运输 4.29 0.37 3.96 1.90 \n", "1692 运输 4.64 -0.23 8.84 1.97 \n", "1705 运输 0.30 0.34 2.85 5.94 \n", "1710 运输 3.52 0.44 3.59 2.26 \n", "1736 公用事业 0.76 0.37 3.67 5.37 \n", "1739 综合金融 0.14 0.04 2.52 14.29 \n", "1780 电信运营商 2.19 0.45 3.01 4.45 \n", "1786 银行 0.73 0.14 19.36 2.19 \n", "1824 贸易公司 8.25 0.04 3.46 1.50 \n", "1840 耐用消费品 0.79 0.14 1.81 11.10 \n", "1846 综合金融 0.35 0.17 2.96 10.97 \n", "1852 综合金融 0.43 0.27 5.24 5.00 \n", "1860 食品、饮料和烟草 0.43 0.14 1.00 10.69 \n", "1892 材料 1.43 0.36 2.63 4.52 \n", "1908 运输 1.06 0.33 1.66 4.61 \n", "1920 银行 0.66 0.21 10.92 2.43 \n", "1922 技术硬件和装备 2.46 0.23 2.49 6.40 \n", "1945 运输 3.61 0.02 7.63 1.85 \n", "1948 银行 0.66 0.17 13.45 3.33 \n", "1959 电信运营商 2.90 0.21 6.87 3.98 \n", "1960 技术硬件和装备 5.39 -0.16 5.04 1.27 \n", "1963 技术硬件和装备 6.40 -0.04 3.77 1.39 \n", "1970 食品、饮料和烟草 0.70 0.10 1.19 8.81 \n", "1973 炼油 0.44 0.47 2.07 2.30 \n", "1975 综合金融 0.04 0.47 1.29 0.98 \n", "1986 综合金融 0.90 0.20 3.24 7.05 \n", "1989 材料 0.99 0.17 1.86 8.61 \n", "\n", " Company_en Company_cn Country_cn Country_en \n", "40 PetroChina 中国石油 中国大陆 CN \n", "52 ICBC 中国工商银行 中国大陆 CN \n", "68 CCB-China Construction Bank 中国建设银行 中国大陆 CN \n", "70 Sinopec-China Petroleum 中石化 中国大陆 CN \n", "81 Bank of China 中国银行 中国大陆 CN \n", "88 China Mobile 中国移动 中国香港 CN-HK \n", "175 Hutchison Whampoa 和记黄埔 中国香港 CN-HK \n", "180 China Telecom 中国电信 中国大陆 CN \n", "242 China Life Insurance 中国人寿 中国大陆 CN \n", "307 Bank of Communications 中国交通银行 中国大陆 CN \n", "309 Taiwan Semiconductor 台积电 中国台湾 CN-TA \n", "340 Hon Hai Precision Ind 鸿海精密 中国台湾 CN-TA \n", "365 Baoshan Iron & Steel 上海宝钢集团 中国大陆 CN \n", "388 Cathay Financial 国泰金融 中国台湾 CN-TA \n", "394 Cnooc 中海油 中国香港 CN-HK \n", "400 China Netcom Group 中国网通 中国香港 CN-HK \n", "422 China Shenhua Energy 中国神华能源股份有限公司 中国大陆 CN \n", "429 BOC Hong Kong 中银香港 中国香港 CN-HK \n", "436 Formosa Petrochemical 台塑石化 中国台湾 CN-TA \n", "439 Ping An Insurance Group 平安保险 中国大陆 CN \n", "451 Jardine Matheson 香港怡和集团 中国香港 CN-HK \n", "510 Sun Hung Kai Properties 新鸿基房地产 中国香港 CN-HK \n", "541 China Unicom 中国联通 中国香港 CN-HK \n", "551 CLP Holdings 中电控股 中国香港 CN-HK \n", "575 Chunghwa Telecom 中华电信 中国台湾 CN-TA \n", "600 China Steel 台湾中钢公司 中国台湾 CN-TA \n", "603 China Merchants Bank 招商银行 中国大陆 CN \n", "617 Nan Ya Plastic 南亚塑胶工业 中国台湾 CN-TA \n", "627 Cheung Kong 长江集团 中国香港 CN-HK \n", "736 Swire Pacific 太古集团 中国香港 CN-HK \n", "... ... ... ... ... \n", "1636 Champion REIT NaN 中国香港 CN-HK \n", "1641 Noble Group NaN 中国香港 CN-HK \n", "1661 Taiwan Mobile NaN 中国台湾 CN-TA \n", "1681 Evergreen Marine NaN 中国台湾 CN-TA \n", "1692 China Southern Airlines NaN 中国大陆 CN \n", "1705 Cosco Pacific NaN 中国香港 CN-HK \n", "1710 China Shipping Container NaN 中国大陆 CN \n", "1736 China Resources Power Holdings NaN 中国香港 CN-HK \n", "1739 Citic Securities NaN 中国大陆 CN \n", "1780 Far EasTone Telecom NaN 中国台湾 CN-TA \n", "1786 E.Sun Financial NaN 中国台湾 CN-TA \n", "1824 Minmetals Development NaN 中国大陆 CN \n", "1840 Shanghai Automotive NaN 中国大陆 CN \n", "1846 HK Exchanges & Clearing NaN 中国香港 CN-HK \n", "1852 Link REIT NaN 中国香港 CN-HK \n", "1860 Kweichow Moutai NaN 中国大陆 CN \n", "1892 Yanzhou Coal Mining NaN 中国大陆 CN \n", "1908 China Shipping Develop NaN 中国大陆 CN \n", "1920 Wing Lung Bank NaN 中国香港 CN-HK \n", "1922 Delta Electronics NaN 中国台湾 CN-TA \n", "1945 China Airlines NaN 中国台湾 CN-TA \n", "1948 Wing Hang Bank NaN 中国香港 CN-HK \n", "1959 PCCW NaN 中国香港 CN-HK \n", "1960 Benq NaN 中国台湾 CN-TA \n", "1963 TCL Corp NaN 中国大陆 CN \n", "1970 Wuliangye Yibin NaN 中国大陆 CN \n", "1973 CNPC (Hong Kong) NaN 中国香港 CN-HK \n", "1975 K Wah International NaN 中国香港 CN-HK \n", "1986 China Overseas Land & Inv NaN 中国香港 CN-HK \n", "1989 Nine Dragons Paper Holdings NaN 中国香港 CN-HK \n", "\n", "[131 rows x 13 columns]" ] }, "execution_count": 221, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_2007['Country_en'] = df_2007['Country_en'].replace(['HK.*','TA'],['CN-HK', 'CN-TA'],regex=True)\n", "df_2007[df_2007['Country_en'].str.contains('CN',regex=True)]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* 考虑到其他年份,公司所在行业有用英文名称展示的,这里添加一列英文的行业名称,但内容是空白" ] }, { "cell_type": "code", "execution_count": 222, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
YearRankCompany_cn_enCountry_cn_enIndustry_cnSalesProfitsAssetsMarket_valueCompany_enCompany_cnCountry_cnCountry_enIndustry_en
199520071995Fremont General美国(US)综合金融1.250.1712.800.69Fremont GeneralNaN美国US
199620071997United Rentals美国(US)商业服务和供应3.640.225.372.32United RentalsNaN美国US
199720071998CBOT Holdings美国(US)综合金融0.640.170.818.54CBOT HoldingsNaN美国US
199820071998Singapore Petroleum新加坡(SI)炼油5.590.192.051.50Singapore PetroleumNaN新加坡SI
199920072000DVB Bank德国(GE)银行0.770.0612.741.26DVB BankNaN德国GE
\n", "
" ], "text/plain": [ " Year Rank Company_cn_en Country_cn_en Industry_cn Sales \\\n", "1995 2007 1995 Fremont General 美国(US) 综合金融 1.25 \n", "1996 2007 1997 United Rentals 美国(US) 商业服务和供应 3.64 \n", "1997 2007 1998 CBOT Holdings 美国(US) 综合金融 0.64 \n", "1998 2007 1998 Singapore Petroleum 新加坡(SI) 炼油 5.59 \n", "1999 2007 2000 DVB Bank 德国(GE) 银行 0.77 \n", "\n", " Profits Assets Market_value Company_en Company_cn \\\n", "1995 0.17 12.80 0.69 Fremont General NaN \n", "1996 0.22 5.37 2.32 United Rentals NaN \n", "1997 0.17 0.81 8.54 CBOT Holdings NaN \n", "1998 0.19 2.05 1.50 Singapore Petroleum NaN \n", "1999 0.06 12.74 1.26 DVB Bank NaN \n", "\n", " Country_cn Country_en Industry_en \n", "1995 美国 US \n", "1996 美国 US \n", "1997 美国 US \n", "1998 新加坡 SI \n", "1999 德国 GE " ] }, "execution_count": 222, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_2007['Industry_en'] = ''\n", "df_2007.tail(5)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* **将列名进行重新排序**" ] }, { "cell_type": "code", "execution_count": 223, "metadata": { "collapsed": true }, "outputs": [], "source": [ "columns_sort = ['Year', 'Rank', 'Company_cn_en','Company_en',\n", " 'Company_cn', 'Country_cn_en', 'Country_cn', \n", " 'Country_en', 'Industry_cn', 'Industry_en',\n", " 'Sales', 'Profits', 'Assets', 'Market_value']" ] }, { "cell_type": "code", "execution_count": 224, "metadata": { "collapsed": false, "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(2000, 14)\n", "Year int64\n", "Rank int64\n", "Company_cn_en object\n", "Company_en object\n", "Company_cn object\n", "Country_cn_en object\n", "Country_cn object\n", "Country_en object\n", "Industry_cn object\n", "Industry_en object\n", "Sales float64\n", "Profits float64\n", "Assets float64\n", "Market_value float64\n", "dtype: object\n" ] }, { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
YearRankCompany_cn_enCompany_enCompany_cnCountry_cn_enCountry_cnCountry_enIndustry_cnIndustry_enSalesProfitsAssetsMarket_value
020071Citigroup /花旗集团Citigroup花旗集团美国(US)美国US银行146.5621.541884.32247.42
120072Bank of America /美国银行Bank of America美国银行美国(US)美国US银行116.5721.131459.74226.61
220073HSBC Holdings/汇丰集团HSBC Holdings汇丰集团英国(UK)英国UK银行121.5116.631860.76202.29
\n", "
" ], "text/plain": [ " Year Rank Company_cn_en Company_en Company_cn \\\n", "0 2007 1 Citigroup /花旗集团 Citigroup 花旗集团 \n", "1 2007 2 Bank of America /美国银行 Bank of America 美国银行 \n", "2 2007 3 HSBC Holdings/汇丰集团 HSBC Holdings 汇丰集团 \n", "\n", " Country_cn_en Country_cn Country_en Industry_cn Industry_en Sales \\\n", "0 美国(US) 美国 US 银行 146.56 \n", "1 美国(US) 美国 US 银行 116.57 \n", "2 英国(UK) 英国 UK 银行 121.51 \n", "\n", " Profits Assets Market_value \n", "0 21.54 1884.32 247.42 \n", "1 21.13 1459.74 226.61 \n", "2 16.63 1860.76 202.29 " ] }, "execution_count": 224, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# 按指定list重新将columns进行排序\n", "df_2007 = df_2007.reindex(columns=columns_sort)\n", "print(df_2007.shape)\n", "print(df_2007.dtypes)\n", "df_2007.head(3)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Year 2008" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* 数据加载" ] }, { "cell_type": "code", "execution_count": 225, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "the shape of DataFrame: (2000, 10)\n", "年份 int64\n", "Rank int64\n", "公司名称(英文) object\n", "公司名称(中文) object\n", "Country/area(国家或地区) object\n", "Industry(行业) object\n", "Sales (销售额)($bil十亿美元) object\n", "Profits (利润)($bil) object\n", "Assets 资产($bil) object\n", "Market Value 市值($bil) float64\n", "dtype: object\n" ] }, { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
年份Rank公司名称(英文)公司名称(中文)Country/area(国家或地区)Industry(行业)Sales (销售额)($bil十亿美元)Profits (利润)($bil)Assets 资产($bil)Market Value 市值($bil)
020081HSBC Holdings汇丰集团United KingdomBanking146.519.132,348.98180.81
120082General Electric通用电气公司United StatesConglomerates172.7422.21795.34330.93
220083Bank of America美国银行United StatesBanking119.1914.981,715.75176.53
320084JPMorgan Chase摩根大通公司United StatesBanking116.3515.371,562.15136.88
420085ExxonMobil埃克森美孚公司United StatesOil & Gas Operations358.640.61242.08465.51
\n", "
" ], "text/plain": [ " 年份 Rank 公司名称(英文) 公司名称(中文) Country/area(国家或地区) \\\n", "0 2008 1 HSBC Holdings 汇丰集团 United Kingdom \n", "1 2008 2 General Electric 通用电气公司 United States \n", "2 2008 3 Bank of America 美国银行 United States \n", "3 2008 4 JPMorgan Chase 摩根大通公司 United States \n", "4 2008 5 ExxonMobil 埃克森美孚公司 United States \n", "\n", " Industry(行业) Sales (销售额)($bil十亿美元) Profits (利润)($bil) \\\n", "0 Banking 146.5 19.13 \n", "1 Conglomerates 172.74 22.21 \n", "2 Banking 119.19 14.98 \n", "3 Banking 116.35 15.37 \n", "4 Oil & Gas Operations 358.6 40.61 \n", "\n", " Assets 资产($bil) Market Value 市值($bil) \n", "0 2,348.98 180.81 \n", "1 795.34 330.93 \n", "2 1,715.75 176.53 \n", "3 1,562.15 136.88 \n", "4 242.08 465.51 " ] }, "execution_count": 225, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_2008 = pd.read_csv('./data/data_forbes_2008.csv', encoding='gbk', thousands=',')\n", "print('the shape of DataFrame: ', df_2008.shape)\n", "print(df_2008.dtypes)\n", "df_2008.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* 更新columns的名称" ] }, { "cell_type": "code", "execution_count": 226, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
YearRankCompany_enCompany_cnCountry_enIndustry_enSalesProfitsAssetsMarket_value
020081HSBC Holdings汇丰集团United KingdomBanking146.519.132,348.98180.81
120082General Electric通用电气公司United StatesConglomerates172.7422.21795.34330.93
220083Bank of America美国银行United StatesBanking119.1914.981,715.75176.53
320084JPMorgan Chase摩根大通公司United StatesBanking116.3515.371,562.15136.88
420085ExxonMobil埃克森美孚公司United StatesOil & Gas Operations358.640.61242.08465.51
\n", "
" ], "text/plain": [ " Year Rank Company_en Company_cn Country_en \\\n", "0 2008 1 HSBC Holdings 汇丰集团 United Kingdom \n", "1 2008 2 General Electric 通用电气公司 United States \n", "2 2008 3 Bank of America 美国银行 United States \n", "3 2008 4 JPMorgan Chase 摩根大通公司 United States \n", "4 2008 5 ExxonMobil 埃克森美孚公司 United States \n", "\n", " Industry_en Sales Profits Assets Market_value \n", "0 Banking 146.5 19.13 2,348.98 180.81 \n", "1 Conglomerates 172.74 22.21 795.34 330.93 \n", "2 Banking 119.19 14.98 1,715.75 176.53 \n", "3 Banking 116.35 15.37 1,562.15 136.88 \n", "4 Oil & Gas Operations 358.6 40.61 242.08 465.51 " ] }, "execution_count": 226, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_2008.columns = ['Year', 'Rank', 'Company_en', 'Company_cn','Country_en', 'Industry_en', 'Sales', 'Profits', 'Assets', 'Market_value']\n", "df_2008.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* 添加空白列,使之与其他年份的格式保持一致" ] }, { "cell_type": "code", "execution_count": 227, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
YearRankCompany_enCompany_cnCountry_enIndustry_enSalesProfitsAssetsMarket_valueCompany_cn_enCountry_cn_enCountry_cnIndustry_cn
020081HSBC Holdings汇丰集团United KingdomBanking146.519.132,348.98180.81
120082General Electric通用电气公司United StatesConglomerates172.7422.21795.34330.93
220083Bank of America美国银行United StatesBanking119.1914.981,715.75176.53
320084JPMorgan Chase摩根大通公司United StatesBanking116.3515.371,562.15136.88
420085ExxonMobil埃克森美孚公司United StatesOil & Gas Operations358.640.61242.08465.51
\n", "
" ], "text/plain": [ " Year Rank Company_en Company_cn Country_en \\\n", "0 2008 1 HSBC Holdings 汇丰集团 United Kingdom \n", "1 2008 2 General Electric 通用电气公司 United States \n", "2 2008 3 Bank of America 美国银行 United States \n", "3 2008 4 JPMorgan Chase 摩根大通公司 United States \n", "4 2008 5 ExxonMobil 埃克森美孚公司 United States \n", "\n", " Industry_en Sales Profits Assets Market_value Company_cn_en \\\n", "0 Banking 146.5 19.13 2,348.98 180.81 \n", "1 Conglomerates 172.74 22.21 795.34 330.93 \n", "2 Banking 119.19 14.98 1,715.75 176.53 \n", "3 Banking 116.35 15.37 1,562.15 136.88 \n", "4 Oil & Gas Operations 358.6 40.61 242.08 465.51 \n", "\n", " Country_cn_en Country_cn Industry_cn \n", "0 \n", "1 \n", "2 \n", "3 \n", "4 " ] }, "execution_count": 227, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_2008['Company_cn_en'], df_2008['Country_cn_en'], df_2008['Country_cn'], df_2008['Industry_cn'] = ['','','','']\n", "df_2008.head()" ] }, { "cell_type": "code", "execution_count": 228, "metadata": { "collapsed": true }, "outputs": [], "source": [ "col_digit = ['Sales', 'Profits', 'Assets', 'Market_value']\n", "\n", "for col in col_digit:\n", " # 将数字后面的字母进行替换\n", " df_2008[col] = df_2008[col].replace('([A-Za-z])', '', regex=True)\n", "\n", " # 千分位数字的逗号被识别为string了,需要替换\n", " df_2008[col] = df_2008[col].replace(',', '', regex=True)\n", " \n", " #将数字型字符串转换为可进行计算的数据类型\n", " df_2008[col] = pd.to_numeric(df_2008[col])" ] }, { "cell_type": "code", "execution_count": 229, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# df_2008['Sales'] = pd.to_numeric(df_2008['Sales'])\n", "# df_2008['Profits'] = pd.to_numeric(df_2008['Profits'])\n", "# df_2008['Assets'] = pd.to_numeric(df_2008['Assets'])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* 按指定list重新将columns进行排序" ] }, { "cell_type": "code", "execution_count": 230, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(2000, 14)\n", "Year int64\n", "Rank int64\n", "Company_cn_en object\n", "Company_en object\n", "Company_cn object\n", "Country_cn_en object\n", "Country_cn object\n", "Country_en object\n", "Industry_cn object\n", "Industry_en object\n", "Sales float64\n", "Profits float64\n", "Assets float64\n", "Market_value float64\n", "dtype: object\n" ] }, { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
YearRankCompany_cn_enCompany_enCompany_cnCountry_cn_enCountry_cnCountry_enIndustry_cnIndustry_enSalesProfitsAssetsMarket_value
020081HSBC Holdings汇丰集团United KingdomBanking146.5019.132348.98180.81
120082General Electric通用电气公司United StatesConglomerates172.7422.21795.34330.93
220083Bank of America美国银行United StatesBanking119.1914.981715.75176.53
320084JPMorgan Chase摩根大通公司United StatesBanking116.3515.371562.15136.88
420085ExxonMobil埃克森美孚公司United StatesOil & Gas Operations358.6040.61242.08465.51
\n", "
" ], "text/plain": [ " Year Rank Company_cn_en Company_en Company_cn Country_cn_en \\\n", "0 2008 1 HSBC Holdings 汇丰集团 \n", "1 2008 2 General Electric 通用电气公司 \n", "2 2008 3 Bank of America 美国银行 \n", "3 2008 4 JPMorgan Chase 摩根大通公司 \n", "4 2008 5 ExxonMobil 埃克森美孚公司 \n", "\n", " Country_cn Country_en Industry_cn Industry_en Sales \\\n", "0 United Kingdom Banking 146.50 \n", "1 United States Conglomerates 172.74 \n", "2 United States Banking 119.19 \n", "3 United States Banking 116.35 \n", "4 United States Oil & Gas Operations 358.60 \n", "\n", " Profits Assets Market_value \n", "0 19.13 2348.98 180.81 \n", "1 22.21 795.34 330.93 \n", "2 14.98 1715.75 176.53 \n", "3 15.37 1562.15 136.88 \n", "4 40.61 242.08 465.51 " ] }, "execution_count": 230, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# 按指定list重新将columns进行排序\n", "df_2008 = df_2008.reindex(columns=columns_sort)\n", "print(df_2008.shape)\n", "print(df_2008.dtypes)\n", "df_2008.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Year 2009" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* 数据加载" ] }, { "cell_type": "code", "execution_count": 231, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "the shape of DataFrame: (2000, 9)\n" ] }, { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
年份排名(Rank)公司名称(Company)所在国家或地区(Country)所在行业(Industry)销售收入(Sales) ($bil)利润(Profits)总资产(Assets) ($bil)市值(Market Vaue) ($bil)
020091General Electric/通用电气公司United StatesConglomerates182.5217.41797.7789.87
120092Royal Dutch Shell/英荷壳牌集团NetherlandsOil & Gas Operations458.3626.28278.44135.10
220093Toyota Motor/丰田汽车公司JapanConsumer Durables263.4217.21324.98102.35
320094ExxonMobil/埃克森美孚公司United StatesOil & Gas Operations425.745.22228.05335.54
420095BP/英国石油公司United KingdomOil & Gas Operations361.1421.16228.24119.70
\n", "
" ], "text/plain": [ " 年份 排名(Rank) 公司名称(Company) 所在国家或地区(Country) \\\n", "0 2009 1 General Electric/通用电气公司 United States \n", "1 2009 2 Royal Dutch Shell/英荷壳牌集团 Netherlands \n", "2 2009 3 Toyota Motor/丰田汽车公司 Japan \n", "3 2009 4 ExxonMobil/埃克森美孚公司 United States \n", "4 2009 5 BP/英国石油公司 United Kingdom \n", "\n", " 所在行业(Industry) 销售收入(Sales) ($bil) 利润(Profits) 总资产(Assets) ($bil) \\\n", "0 Conglomerates 182.52 17.41 797.77 \n", "1 Oil & Gas Operations 458.36 26.28 278.44 \n", "2 Consumer Durables 263.42 17.21 324.98 \n", "3 Oil & Gas Operations 425.7 45.22 228.05 \n", "4 Oil & Gas Operations 361.14 21.16 228.24 \n", "\n", " 市值(Market Vaue) ($bil) \n", "0 89.87 \n", "1 135.10 \n", "2 102.35 \n", "3 335.54 \n", "4 119.70 " ] }, "execution_count": 231, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_2009 = pd.read_csv('./data/data_forbes_2009.csv', encoding='gbk')\n", "print('the shape of DataFrame: ', df_2009.shape)\n", "df_2009.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* 更新columns名称" ] }, { "cell_type": "code", "execution_count": 232, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
YearRankCompany_cn_enCountry_enIndustry_enSalesProfitsAssetsMarket_value
020091General Electric/通用电气公司United StatesConglomerates182.5217.41797.7789.87
120092Royal Dutch Shell/英荷壳牌集团NetherlandsOil & Gas Operations458.3626.28278.44135.10
220093Toyota Motor/丰田汽车公司JapanConsumer Durables263.4217.21324.98102.35
320094ExxonMobil/埃克森美孚公司United StatesOil & Gas Operations425.745.22228.05335.54
420095BP/英国石油公司United KingdomOil & Gas Operations361.1421.16228.24119.70
\n", "
" ], "text/plain": [ " Year Rank Company_cn_en Country_en Industry_en \\\n", "0 2009 1 General Electric/通用电气公司 United States Conglomerates \n", "1 2009 2 Royal Dutch Shell/英荷壳牌集团 Netherlands Oil & Gas Operations \n", "2 2009 3 Toyota Motor/丰田汽车公司 Japan Consumer Durables \n", "3 2009 4 ExxonMobil/埃克森美孚公司 United States Oil & Gas Operations \n", "4 2009 5 BP/英国石油公司 United Kingdom Oil & Gas Operations \n", "\n", " Sales Profits Assets Market_value \n", "0 182.52 17.41 797.77 89.87 \n", "1 458.36 26.28 278.44 135.10 \n", "2 263.42 17.21 324.98 102.35 \n", "3 425.7 45.22 228.05 335.54 \n", "4 361.14 21.16 228.24 119.70 " ] }, "execution_count": 232, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_2009.columns = ['Year', 'Rank', 'Company_cn_en', 'Country_en', 'Industry_en', 'Sales', 'Profits', 'Assets', 'Market_value']\n", "df_2009.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* **拆分\"Company_cn_en\"列**,新生成两列,分别为公司英文名称和中文名称" ] }, { "cell_type": "code", "execution_count": 233, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0 General Electric\n", "1 Royal Dutch Shell\n", "2 Toyota Motor\n", "3 ExxonMobil\n", "4 BP\n", "Name: Company_en, dtype: object\n", "1995 NaN\n", "1996 NaN\n", "1997 NaN\n", "1998 NaN\n", "1999 NaN\n", "Name: Company_cn, dtype: object\n" ] }, { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
YearRankCompany_cn_enCountry_enIndustry_enSalesProfitsAssetsMarket_valueCompany_enCompany_cn
020091General Electric/通用电气公司United StatesConglomerates182.5217.41797.7789.87General Electric通用电气公司
120092Royal Dutch Shell/英荷壳牌集团NetherlandsOil & Gas Operations458.3626.28278.44135.10Royal Dutch Shell英荷壳牌集团
220093Toyota Motor/丰田汽车公司JapanConsumer Durables263.4217.21324.98102.35Toyota Motor丰田汽车公司
320094ExxonMobil/埃克森美孚公司United StatesOil & Gas Operations425.745.22228.05335.54ExxonMobil埃克森美孚公司
420095BP/英国石油公司United KingdomOil & Gas Operations361.1421.16228.24119.70BP英国石油公司
\n", "
" ], "text/plain": [ " Year Rank Company_cn_en Country_en Industry_en \\\n", "0 2009 1 General Electric/通用电气公司 United States Conglomerates \n", "1 2009 2 Royal Dutch Shell/英荷壳牌集团 Netherlands Oil & Gas Operations \n", "2 2009 3 Toyota Motor/丰田汽车公司 Japan Consumer Durables \n", "3 2009 4 ExxonMobil/埃克森美孚公司 United States Oil & Gas Operations \n", "4 2009 5 BP/英国石油公司 United Kingdom Oil & Gas Operations \n", "\n", " Sales Profits Assets Market_value Company_en Company_cn \n", "0 182.52 17.41 797.77 89.87 General Electric 通用电气公司 \n", "1 458.36 26.28 278.44 135.10 Royal Dutch Shell 英荷壳牌集团 \n", "2 263.42 17.21 324.98 102.35 Toyota Motor 丰田汽车公司 \n", "3 425.7 45.22 228.05 335.54 ExxonMobil 埃克森美孚公司 \n", "4 361.14 21.16 228.24 119.70 BP 英国石油公司 " ] }, "execution_count": 233, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_2009['Company_en'],df_2009['Company_cn'] = df_2009['Company_cn_en'].str.split('/', 1).str\n", "print(df_2009['Company_en'][:5])\n", "print(df_2009['Company_cn'] [-5:])\n", "df_2009.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* 添加空白列" ] }, { "cell_type": "code", "execution_count": 234, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
YearRankCompany_cn_enCountry_enIndustry_enSalesProfitsAssetsMarket_valueCompany_enCompany_cnCountry_cn_enCountry_cnIndustry_cn
020091General Electric/通用电气公司United StatesConglomerates182.5217.41797.7789.87General Electric通用电气公司
120092Royal Dutch Shell/英荷壳牌集团NetherlandsOil & Gas Operations458.3626.28278.44135.10Royal Dutch Shell英荷壳牌集团
220093Toyota Motor/丰田汽车公司JapanConsumer Durables263.4217.21324.98102.35Toyota Motor丰田汽车公司
320094ExxonMobil/埃克森美孚公司United StatesOil & Gas Operations425.745.22228.05335.54ExxonMobil埃克森美孚公司
420095BP/英国石油公司United KingdomOil & Gas Operations361.1421.16228.24119.70BP英国石油公司
\n", "
" ], "text/plain": [ " Year Rank Company_cn_en Country_en Industry_en \\\n", "0 2009 1 General Electric/通用电气公司 United States Conglomerates \n", "1 2009 2 Royal Dutch Shell/英荷壳牌集团 Netherlands Oil & Gas Operations \n", "2 2009 3 Toyota Motor/丰田汽车公司 Japan Consumer Durables \n", "3 2009 4 ExxonMobil/埃克森美孚公司 United States Oil & Gas Operations \n", "4 2009 5 BP/英国石油公司 United Kingdom Oil & Gas Operations \n", "\n", " Sales Profits Assets Market_value Company_en Company_cn \\\n", "0 182.52 17.41 797.77 89.87 General Electric 通用电气公司 \n", "1 458.36 26.28 278.44 135.10 Royal Dutch Shell 英荷壳牌集团 \n", "2 263.42 17.21 324.98 102.35 Toyota Motor 丰田汽车公司 \n", "3 425.7 45.22 228.05 335.54 ExxonMobil 埃克森美孚公司 \n", "4 361.14 21.16 228.24 119.70 BP 英国石油公司 \n", "\n", " Country_cn_en Country_cn Industry_cn \n", "0 \n", "1 \n", "2 \n", "3 \n", "4 " ] }, "execution_count": 234, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_2009['Country_cn_en'], df_2009['Country_cn'], df_2009['Industry_cn'] = ['','','']\n", "df_2009.head()" ] }, { "cell_type": "code", "execution_count": 235, "metadata": { "collapsed": true }, "outputs": [], "source": [ "col_digit = ['Sales', 'Profits', 'Assets', 'Market_value']\n", "\n", "for col in col_digit:\n", " # 将数字后面的字母进行替换\n", " df_2009[col] = df_2009[col].replace('([A-Za-z])', '', regex=True)\n", "\n", " # 千分位数字的逗号被识别为string了,需要替换\n", " df_2009[col] = df_2009[col].replace(',', '', regex=True)\n", " \n", " df_2009[col] = pd.to_numeric(df_2009[col])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* 将columns重新排序" ] }, { "cell_type": "code", "execution_count": 236, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(2000, 14)\n", "Year int64\n", "Rank int64\n", "Company_cn_en object\n", "Company_en object\n", "Company_cn object\n", "Country_cn_en object\n", "Country_cn object\n", "Country_en object\n", "Industry_cn object\n", "Industry_en object\n", "Sales float64\n", "Profits float64\n", "Assets float64\n", "Market_value float64\n", "dtype: object\n" ] }, { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
YearRankCompany_cn_enCompany_enCompany_cnCountry_cn_enCountry_cnCountry_enIndustry_cnIndustry_enSalesProfitsAssetsMarket_value
020091General Electric/通用电气公司General Electric通用电气公司United StatesConglomerates182.5217.41797.7789.87
120092Royal Dutch Shell/英荷壳牌集团Royal Dutch Shell英荷壳牌集团NetherlandsOil & Gas Operations458.3626.28278.44135.10
220093Toyota Motor/丰田汽车公司Toyota Motor丰田汽车公司JapanConsumer Durables263.4217.21324.98102.35
320094ExxonMobil/埃克森美孚公司ExxonMobil埃克森美孚公司United StatesOil & Gas Operations425.7045.22228.05335.54
420095BP/英国石油公司BP英国石油公司United KingdomOil & Gas Operations361.1421.16228.24119.70
\n", "
" ], "text/plain": [ " Year Rank Company_cn_en Company_en Company_cn \\\n", "0 2009 1 General Electric/通用电气公司 General Electric 通用电气公司 \n", "1 2009 2 Royal Dutch Shell/英荷壳牌集团 Royal Dutch Shell 英荷壳牌集团 \n", "2 2009 3 Toyota Motor/丰田汽车公司 Toyota Motor 丰田汽车公司 \n", "3 2009 4 ExxonMobil/埃克森美孚公司 ExxonMobil 埃克森美孚公司 \n", "4 2009 5 BP/英国石油公司 BP 英国石油公司 \n", "\n", " Country_cn_en Country_cn Country_en Industry_cn Industry_en \\\n", "0 United States Conglomerates \n", "1 Netherlands Oil & Gas Operations \n", "2 Japan Consumer Durables \n", "3 United States Oil & Gas Operations \n", "4 United Kingdom Oil & Gas Operations \n", "\n", " Sales Profits Assets Market_value \n", "0 182.52 17.41 797.77 89.87 \n", "1 458.36 26.28 278.44 135.10 \n", "2 263.42 17.21 324.98 102.35 \n", "3 425.70 45.22 228.05 335.54 \n", "4 361.14 21.16 228.24 119.70 " ] }, "execution_count": 236, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# 按指定list重新将columns进行排序\n", "df_2009 = df_2009.reindex(columns=columns_sort)\n", "print(df_2009.shape)\n", "print(df_2009.dtypes)\n", "df_2009.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Year 2010" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* 数据加载,单位为十亿美元" ] }, { "cell_type": "code", "execution_count": 238, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "the shape of DataFrame: (2001, 10)\n" ] }, { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
0123456789
020101摩根大通公司JPMorgan ChaseUnited StatesBanking115.6311.652,031.99166.19
120102通用电气公司General ElectricUnited StatesConglomerates156.7811.03781.82169.65
220103美国银行Bank of AmericaUnited StatesBanking150.456.282,223.30167.63
320104埃克森美孚公司ExxonMobilUnited StatesOil & Gas Operations275.5619.28233.32308.77
420105中国工商银行ICBCChinaBanking71.8616.271,428.46242.23
\n", "
" ], "text/plain": [ " 0 1 2 3 4 5 \\\n", "0 2010 1 摩根大通公司 JPMorgan Chase United States Banking \n", "1 2010 2 通用电气公司 General Electric United States Conglomerates \n", "2 2010 3 美国银行 Bank of America United States Banking \n", "3 2010 4 埃克森美孚公司 ExxonMobil United States Oil & Gas Operations \n", "4 2010 5 中国工商银行 ICBC China Banking \n", "\n", " 6 7 8 9 \n", "0 115.63 11.65 2,031.99 166.19 \n", "1 156.78 11.03 781.82 169.65 \n", "2 150.45 6.28 2,223.30 167.63 \n", "3 275.56 19.28 233.32 308.77 \n", "4 71.86 16.27 1,428.46 242.23 " ] }, "execution_count": 238, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_2010 = pd.read_csv('./data/data_forbes_2010.csv', encoding='gbk', header=None)\n", "print('the shape of DataFrame: ', df_2010.shape)\n", "df_2010.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* 添加columns的名称" ] }, { "cell_type": "code", "execution_count": 239, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
YearRankCompany_cnCompany_enCountry_enIndustry_enSalesProfitsAssetsMarket_value
020101摩根大通公司JPMorgan ChaseUnited StatesBanking115.6311.652,031.99166.19
120102通用电气公司General ElectricUnited StatesConglomerates156.7811.03781.82169.65
220103美国银行Bank of AmericaUnited StatesBanking150.456.282,223.30167.63
320104埃克森美孚公司ExxonMobilUnited StatesOil & Gas Operations275.5619.28233.32308.77
420105中国工商银行ICBCChinaBanking71.8616.271,428.46242.23
\n", "
" ], "text/plain": [ " Year Rank Company_cn Company_en Country_en \\\n", "0 2010 1 摩根大通公司 JPMorgan Chase United States \n", "1 2010 2 通用电气公司 General Electric United States \n", "2 2010 3 美国银行 Bank of America United States \n", "3 2010 4 埃克森美孚公司 ExxonMobil United States \n", "4 2010 5 中国工商银行 ICBC China \n", "\n", " Industry_en Sales Profits Assets Market_value \n", "0 Banking 115.63 11.65 2,031.99 166.19 \n", "1 Conglomerates 156.78 11.03 781.82 169.65 \n", "2 Banking 150.45 6.28 2,223.30 167.63 \n", "3 Oil & Gas Operations 275.56 19.28 233.32 308.77 \n", "4 Banking 71.86 16.27 1,428.46 242.23 " ] }, "execution_count": 239, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_2010.columns = ['Year', 'Rank', 'Company_cn','Company_en', 'Country_en', \n", " 'Industry_en', 'Sales', 'Profits', 'Assets', 'Market_value']\n", "df_2010.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* 添加空白列" ] }, { "cell_type": "code", "execution_count": 240, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
YearRankCompany_cnCompany_enCountry_enIndustry_enSalesProfitsAssetsMarket_valueCompany_cn_enCountry_cn_enCountry_cnIndustry_cn
020101摩根大通公司JPMorgan ChaseUnited StatesBanking115.6311.652,031.99166.19
120102通用电气公司General ElectricUnited StatesConglomerates156.7811.03781.82169.65
220103美国银行Bank of AmericaUnited StatesBanking150.456.282,223.30167.63
320104埃克森美孚公司ExxonMobilUnited StatesOil & Gas Operations275.5619.28233.32308.77
420105中国工商银行ICBCChinaBanking71.8616.271,428.46242.23
\n", "
" ], "text/plain": [ " Year Rank Company_cn Company_en Country_en \\\n", "0 2010 1 摩根大通公司 JPMorgan Chase United States \n", "1 2010 2 通用电气公司 General Electric United States \n", "2 2010 3 美国银行 Bank of America United States \n", "3 2010 4 埃克森美孚公司 ExxonMobil United States \n", "4 2010 5 中国工商银行 ICBC China \n", "\n", " Industry_en Sales Profits Assets Market_value Company_cn_en \\\n", "0 Banking 115.63 11.65 2,031.99 166.19 \n", "1 Conglomerates 156.78 11.03 781.82 169.65 \n", "2 Banking 150.45 6.28 2,223.30 167.63 \n", "3 Oil & Gas Operations 275.56 19.28 233.32 308.77 \n", "4 Banking 71.86 16.27 1,428.46 242.23 \n", "\n", " Country_cn_en Country_cn Industry_cn \n", "0 \n", "1 \n", "2 \n", "3 \n", "4 " ] }, "execution_count": 240, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_2010['Company_cn_en'], df_2010['Country_cn_en'], df_2010['Country_cn'], df_2010['Industry_cn'] = ['','','','']\n", "df_2010.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* 1600行的标题重复,需要删除" ] }, { "cell_type": "code", "execution_count": 241, "metadata": { "collapsed": false }, "outputs": [], "source": [ "df_2010 = df_2010.drop(1600)\n", "# df_2010.drop(1600, inplace=True)" ] }, { "cell_type": "code", "execution_count": 242, "metadata": { "collapsed": false }, "outputs": [], "source": [ "col_digit = ['Sales', 'Profits', 'Assets', 'Market_value', 'Rank']\n", "\n", "for col in col_digit:\n", " # 将数字后面的字母进行替换\n", " df_2010[col] = df_2010[col].replace('([A-Za-z])', '', regex=True)\n", "\n", " # 千分位数字的逗号被识别为string了,需要替换\n", " df_2010[col] = df_2010[col].replace(',', '', regex=True)\n", " \n", " df_2010[col] = pd.to_numeric(df_2010[col])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* 将columns重新排序" ] }, { "cell_type": "code", "execution_count": 243, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(2000, 14)\n", "Year int64\n", "Rank int64\n", "Company_cn_en object\n", "Company_en object\n", "Company_cn object\n", "Country_cn_en object\n", "Country_cn object\n", "Country_en object\n", "Industry_cn object\n", "Industry_en object\n", "Sales float64\n", "Profits float64\n", "Assets float64\n", "Market_value float64\n", "dtype: object\n" ] }, { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
YearRankCompany_cn_enCompany_enCompany_cnCountry_cn_enCountry_cnCountry_enIndustry_cnIndustry_enSalesProfitsAssetsMarket_value
020101JPMorgan Chase摩根大通公司United StatesBanking115.6311.652031.99166.19
120102General Electric通用电气公司United StatesConglomerates156.7811.03781.82169.65
220103Bank of America美国银行United StatesBanking150.456.282223.30167.63
320104ExxonMobil埃克森美孚公司United StatesOil & Gas Operations275.5619.28233.32308.77
420105ICBC中国工商银行ChinaBanking71.8616.271428.46242.23
\n", "
" ], "text/plain": [ " Year Rank Company_cn_en Company_en Company_cn Country_cn_en \\\n", "0 2010 1 JPMorgan Chase 摩根大通公司 \n", "1 2010 2 General Electric 通用电气公司 \n", "2 2010 3 Bank of America 美国银行 \n", "3 2010 4 ExxonMobil 埃克森美孚公司 \n", "4 2010 5 ICBC 中国工商银行 \n", "\n", " Country_cn Country_en Industry_cn Industry_en Sales \\\n", "0 United States Banking 115.63 \n", "1 United States Conglomerates 156.78 \n", "2 United States Banking 150.45 \n", "3 United States Oil & Gas Operations 275.56 \n", "4 China Banking 71.86 \n", "\n", " Profits Assets Market_value \n", "0 11.65 2031.99 166.19 \n", "1 11.03 781.82 169.65 \n", "2 6.28 2223.30 167.63 \n", "3 19.28 233.32 308.77 \n", "4 16.27 1428.46 242.23 " ] }, "execution_count": 243, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# 按指定list重新将columns进行排序\n", "df_2010 = df_2010.reindex(columns=columns_sort)\n", "print(df_2010.shape)\n", "print(df_2010.dtypes)\n", "df_2010.head()" ] }, { "cell_type": "code", "execution_count": 244, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# df_2010.to_csv('data_forbes_2010_update.csv')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Year 2011" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* 加载数据\n", "* 原始数据的单位为 亿美元,需要统一成十亿美元" ] }, { "cell_type": "code", "execution_count": 246, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "the shape of DataFrame: (2000, 10)\n" ] }, { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
0123456789
020111JPMorgan Chase摩根大通美国银行1155174211761822
120112HSBC Holdings汇丰控股英国银行1033133246791865
220113General Electric通用电气美国企业集团150211675122162
320114ExxonMobil埃克森美孚美国石油天然气341630530254072
420115Royal Dutch Shell皇家荷兰壳牌集团荷兰石油天然气369120131722129
\n", "
" ], "text/plain": [ " 0 1 2 3 4 5 6 7 8 9\n", "0 2011 1 JPMorgan Chase 摩根大通 美国 银行 1155 174 21176 1822\n", "1 2011 2 HSBC Holdings 汇丰控股 英国 银行 1033 133 24679 1865\n", "2 2011 3 General Electric 通用电气 美国 企业集团 1502 116 7512 2162\n", "3 2011 4 ExxonMobil 埃克森美孚 美国 石油天然气 3416 305 3025 4072\n", "4 2011 5 Royal Dutch Shell 皇家荷兰壳牌集团 荷兰 石油天然气 3691 201 3172 2129" ] }, "execution_count": 246, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_2011 = pd.read_csv('./data/data_forbes_2011.csv', encoding='gbk', header=None, thousands=',')\n", "print('the shape of DataFrame: ', df_2011.shape)\n", "df_2011.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* 添加colunmns名称" ] }, { "cell_type": "code", "execution_count": 247, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
YearRankCompany_enCompany_cnCountry_cnIndustry_cnSalesProfitsAssetsMarket_value
020111JPMorgan Chase摩根大通美国银行1155174211761822
120112HSBC Holdings汇丰控股英国银行1033133246791865
220113General Electric通用电气美国企业集团150211675122162
320114ExxonMobil埃克森美孚美国石油天然气341630530254072
420115Royal Dutch Shell皇家荷兰壳牌集团荷兰石油天然气369120131722129
\n", "
" ], "text/plain": [ " Year Rank Company_en Company_cn Country_cn Industry_cn Sales \\\n", "0 2011 1 JPMorgan Chase 摩根大通 美国 银行 1155 \n", "1 2011 2 HSBC Holdings 汇丰控股 英国 银行 1033 \n", "2 2011 3 General Electric 通用电气 美国 企业集团 1502 \n", "3 2011 4 ExxonMobil 埃克森美孚 美国 石油天然气 3416 \n", "4 2011 5 Royal Dutch Shell 皇家荷兰壳牌集团 荷兰 石油天然气 3691 \n", "\n", " Profits Assets Market_value \n", "0 174 21176 1822 \n", "1 133 24679 1865 \n", "2 116 7512 2162 \n", "3 305 3025 4072 \n", "4 201 3172 2129 " ] }, "execution_count": 247, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_2011.columns = ['Year', 'Rank', 'Company_en','Company_cn', 'Country_cn',\n", " 'Industry_cn', 'Sales', 'Profits', 'Assets', 'Market_value']\n", "df_2011.head()" ] }, { "cell_type": "code", "execution_count": 248, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "Year int64\n", "Rank int64\n", "Company_en object\n", "Company_cn object\n", "Country_cn object\n", "Industry_cn object\n", "Sales int64\n", "Profits int64\n", "Assets int64\n", "Market_value int64\n", "dtype: object" ] }, "execution_count": 248, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_2011.dtypes" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* 修改货币单位为 十亿美元" ] }, { "cell_type": "code", "execution_count": 249, "metadata": { "collapsed": false, "scrolled": true }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
YearRankCompany_enCompany_cnCountry_cnIndustry_cnSalesProfitsAssetsMarket_value
020111JPMorgan Chase摩根大通美国银行115.517.42117.6182.2
120112HSBC Holdings汇丰控股英国银行103.313.32467.9186.5
220113General Electric通用电气美国企业集团150.211.6751.2216.2
320114ExxonMobil埃克森美孚美国石油天然气341.630.5302.5407.2
420115Royal Dutch Shell皇家荷兰壳牌集团荷兰石油天然气369.120.1317.2212.9
\n", "
" ], "text/plain": [ " Year Rank Company_en Company_cn Country_cn Industry_cn Sales \\\n", "0 2011 1 JPMorgan Chase 摩根大通 美国 银行 115.5 \n", "1 2011 2 HSBC Holdings 汇丰控股 英国 银行 103.3 \n", "2 2011 3 General Electric 通用电气 美国 企业集团 150.2 \n", "3 2011 4 ExxonMobil 埃克森美孚 美国 石油天然气 341.6 \n", "4 2011 5 Royal Dutch Shell 皇家荷兰壳牌集团 荷兰 石油天然气 369.1 \n", "\n", " Profits Assets Market_value \n", "0 17.4 2117.6 182.2 \n", "1 13.3 2467.9 186.5 \n", "2 11.6 751.2 216.2 \n", "3 30.5 302.5 407.2 \n", "4 20.1 317.2 212.9 " ] }, "execution_count": 249, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_2011[['Sales','Profits','Assets','Market_value']] =df_2011[['Sales','Profits','Assets','Market_value']].apply(lambda x: x/10)\n", "df_2011.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* 添加空白列" ] }, { "cell_type": "code", "execution_count": 250, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
YearRankCompany_enCompany_cnCountry_cnIndustry_cnSalesProfitsAssetsMarket_valueCompany_cn_enCountry_cn_enCountry_enIndustry_en
020111JPMorgan Chase摩根大通美国银行115.517.42117.6182.2
120112HSBC Holdings汇丰控股英国银行103.313.32467.9186.5
220113General Electric通用电气美国企业集团150.211.6751.2216.2
320114ExxonMobil埃克森美孚美国石油天然气341.630.5302.5407.2
420115Royal Dutch Shell皇家荷兰壳牌集团荷兰石油天然气369.120.1317.2212.9
\n", "
" ], "text/plain": [ " Year Rank Company_en Company_cn Country_cn Industry_cn Sales \\\n", "0 2011 1 JPMorgan Chase 摩根大通 美国 银行 115.5 \n", "1 2011 2 HSBC Holdings 汇丰控股 英国 银行 103.3 \n", "2 2011 3 General Electric 通用电气 美国 企业集团 150.2 \n", "3 2011 4 ExxonMobil 埃克森美孚 美国 石油天然气 341.6 \n", "4 2011 5 Royal Dutch Shell 皇家荷兰壳牌集团 荷兰 石油天然气 369.1 \n", "\n", " Profits Assets Market_value Company_cn_en Country_cn_en Country_en \\\n", "0 17.4 2117.6 182.2 \n", "1 13.3 2467.9 186.5 \n", "2 11.6 751.2 216.2 \n", "3 30.5 302.5 407.2 \n", "4 20.1 317.2 212.9 \n", "\n", " Industry_en \n", "0 \n", "1 \n", "2 \n", "3 \n", "4 " ] }, "execution_count": 250, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_2011['Company_cn_en'], df_2011['Country_cn_en'], df_2011['Country_en'], df_2011['Industry_en'] = ['','','','']\n", "df_2011.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* 按指定list重新将columns进行排序" ] }, { "cell_type": "code", "execution_count": 251, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(2000, 14)\n", "Year int64\n", "Rank int64\n", "Company_cn_en object\n", "Company_en object\n", "Company_cn object\n", "Country_cn_en object\n", "Country_cn object\n", "Country_en object\n", "Industry_cn object\n", "Industry_en object\n", "Sales float64\n", "Profits float64\n", "Assets float64\n", "Market_value float64\n", "dtype: object\n" ] }, { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
YearRankCompany_cn_enCompany_enCompany_cnCountry_cn_enCountry_cnCountry_enIndustry_cnIndustry_enSalesProfitsAssetsMarket_value
020111JPMorgan Chase摩根大通美国银行115.517.42117.6182.2
120112HSBC Holdings汇丰控股英国银行103.313.32467.9186.5
220113General Electric通用电气美国企业集团150.211.6751.2216.2
320114ExxonMobil埃克森美孚美国石油天然气341.630.5302.5407.2
420115Royal Dutch Shell皇家荷兰壳牌集团荷兰石油天然气369.120.1317.2212.9
\n", "
" ], "text/plain": [ " Year Rank Company_cn_en Company_en Company_cn Country_cn_en \\\n", "0 2011 1 JPMorgan Chase 摩根大通 \n", "1 2011 2 HSBC Holdings 汇丰控股 \n", "2 2011 3 General Electric 通用电气 \n", "3 2011 4 ExxonMobil 埃克森美孚 \n", "4 2011 5 Royal Dutch Shell 皇家荷兰壳牌集团 \n", "\n", " Country_cn Country_en Industry_cn Industry_en Sales Profits Assets \\\n", "0 美国 银行 115.5 17.4 2117.6 \n", "1 英国 银行 103.3 13.3 2467.9 \n", "2 美国 企业集团 150.2 11.6 751.2 \n", "3 美国 石油天然气 341.6 30.5 302.5 \n", "4 荷兰 石油天然气 369.1 20.1 317.2 \n", "\n", " Market_value \n", "0 182.2 \n", "1 186.5 \n", "2 216.2 \n", "3 407.2 \n", "4 212.9 " ] }, "execution_count": 251, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# 按指定list重新将columns进行排序\n", "df_2011 = df_2011.reindex(columns=columns_sort)\n", "print(df_2011.shape)\n", "print(df_2011.dtypes)\n", "df_2011.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Year 2012" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* 采集的原始数据有异常,用Pandas不能顺利加载,需要进行手动调整;\n", "* 2012年原始数据的单位是 亿美元" ] }, { "cell_type": "code", "execution_count": 252, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "the shape of DataFrame: (2000, 9)\n", "0 int64\n", "1 int64\n", "2 float64\n", "3 object\n", "4 object\n", "5 int64\n", "6 int64\n", "7 int64\n", "8 float64\n", "dtype: object\n" ] }, { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
012345678
020121NaN埃克森美孚/Exxon Mobil美国433541133114074.0
120122NaN摩根大通/JPMorgan Chase美国1108190226581701.0
220123NaN通用电气/General Electric美国147314271722137.0
320124NaN皇家荷兰壳牌集团/Royal Dutch Shell荷兰470230934052276.0
420125NaN中国工商银行/ICBC中国826251203912374.0
\n", "
" ], "text/plain": [ " 0 1 2 3 4 5 6 7 8\n", "0 2012 1 NaN 埃克森美孚/Exxon Mobil 美国 4335 411 3311 4074.0\n", "1 2012 2 NaN 摩根大通/JPMorgan Chase 美国 1108 190 22658 1701.0\n", "2 2012 3 NaN 通用电气/General Electric 美国 1473 142 7172 2137.0\n", "3 2012 4 NaN 皇家荷兰壳牌集团/Royal Dutch Shell 荷兰 4702 309 3405 2276.0\n", "4 2012 5 NaN 中国工商银行/ICBC 中国 826 251 20391 2374.0" ] }, "execution_count": 252, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_2012 = pd.read_csv('./data/data_forbes_2012.csv', encoding='gbk',header=None)\n", "print('the shape of DataFrame: ', df_2012.shape)\n", "print(df_2012.dtypes)\n", "df_2012.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* 数据初步整理" ] }, { "cell_type": "code", "execution_count": 253, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(2000, 14)\n", "Year int64\n", "Rank int64\n", "Company_cn_en object\n", "Company_en object\n", "Company_cn object\n", "Country_cn_en object\n", "Country_cn object\n", "Country_en object\n", "Industry_cn object\n", "Industry_en object\n", "Sales float64\n", "Profits float64\n", "Assets float64\n", "Market_value float64\n", "dtype: object\n" ] }, { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
YearRankCompany_cn_enCompany_enCompany_cnCountry_cn_enCountry_cnCountry_enIndustry_cnIndustry_enSalesProfitsAssetsMarket_value
020121埃克森美孚/Exxon MobilExxon Mobil埃克森美孚美国433.541.1331.1407.4
120122摩根大通/JPMorgan ChaseJPMorgan Chase摩根大通美国110.819.02265.8170.1
220123通用电气/General ElectricGeneral Electric通用电气美国147.314.2717.2213.7
320124皇家荷兰壳牌集团/Royal Dutch ShellRoyal Dutch Shell皇家荷兰壳牌集团荷兰470.230.9340.5227.6
420125中国工商银行/ICBCICBC中国工商银行中国82.625.12039.1237.4
\n", "
" ], "text/plain": [ " Year Rank Company_cn_en Company_en Company_cn \\\n", "0 2012 1 埃克森美孚/Exxon Mobil Exxon Mobil 埃克森美孚 \n", "1 2012 2 摩根大通/JPMorgan Chase JPMorgan Chase 摩根大通 \n", "2 2012 3 通用电气/General Electric General Electric 通用电气 \n", "3 2012 4 皇家荷兰壳牌集团/Royal Dutch Shell Royal Dutch Shell 皇家荷兰壳牌集团 \n", "4 2012 5 中国工商银行/ICBC ICBC 中国工商银行 \n", "\n", " Country_cn_en Country_cn Country_en Industry_cn Industry_en Sales Profits \\\n", "0 美国 433.5 41.1 \n", "1 美国 110.8 19.0 \n", "2 美国 147.3 14.2 \n", "3 荷兰 470.2 30.9 \n", "4 中国 82.6 25.1 \n", "\n", " Assets Market_value \n", "0 331.1 407.4 \n", "1 2265.8 170.1 \n", "2 717.2 213.7 \n", "3 340.5 227.6 \n", "4 2039.1 237.4 " ] }, "execution_count": 253, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# 更新列名\n", "df_2012.columns = ['Year', 'Rank', 'Company_cn','Company_cn_en', \n", " 'Country_cn', 'Sales', 'Profits', 'Assets', 'Market_value']\n", "\n", "# 拆分\"Company_cn_en\"列,新生成两列,分别为公司英文名称和中文名称\n", "df_2012['Company_cn'],df_2012['Company_en'] = df_2012['Company_cn_en'].str.split('/', 1).str\n", "# print(df_2012['Company_cn'][:5])\n", "# print(df_2012['Company_en'] [-5:])\n", "\n", "# 将数据单位转换成十亿美元\n", "df_2012[['Sales','Profits','Assets','Market_value']] =df_2012[['Sales','Profits','Assets','Market_value']].apply(lambda x: x/10)\n", "\n", "# 添加空白列\n", "df_2012['Country_en'],df_2012['Country_cn_en'], df_2012['Industry_cn'], df_2012['Industry_en'] = ['','','','']\n", "\n", "# 按指定list重新将columns进行排序\n", "df_2012 = df_2012.reindex(columns=columns_sort)\n", "print(df_2012.shape)\n", "print(df_2012.dtypes)\n", "df_2012.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Year 2013" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Data Source 1" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* 2013年原始数据的单位是 亿美元" ] }, { "cell_type": "code", "execution_count": 254, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "the shape of DataFrame: (1991, 9)\n", "0 int64\n", "1 int64\n", "2 float64\n", "3 object\n", "4 object\n", "5 object\n", "6 object\n", "7 object\n", "8 object\n", "dtype: object\n" ] }, { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
012345678
020131NaN中国工商银行/ICBC中国大陆1348378281352373
120132NaN中国建设银行/China Construction Bank中国大陆1131306224102020
220133NaN摩根大通/JPMorgan Chase美国1082213235911914
320134NaN通用电气/General Electric美国147413668532437
420135NaN埃克森美孚/Exxon Mobil美国420744933384004
\n", "
" ], "text/plain": [ " 0 1 2 3 4 5 6 7 8\n", "0 2013 1 NaN 中国工商银行/ICBC 中国大陆 1348 378 28135 2373\n", "1 2013 2 NaN 中国建设银行/China Construction Bank 中国大陆 1131 306 22410 2020\n", "2 2013 3 NaN 摩根大通/JPMorgan Chase 美国 1082 213 23591 1914\n", "3 2013 4 NaN 通用电气/General Electric 美国 1474 136 6853 2437\n", "4 2013 5 NaN 埃克森美孚/Exxon Mobil 美国 4207 449 3338 4004" ] }, "execution_count": 254, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_2013 = pd.read_csv('./data/data_forbes_2013.csv', encoding='gbk',header=None)\n", "print('the shape of DataFrame: ', df_2013.shape)\n", "print(df_2013.dtypes)\n", "df_2013.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* 2013年的数据只有1991条记录,数据可能有缺失,待进一步核实。" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Data Source 2 补充数据" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* 2013年的数据在网上继续寻找,发现Economy Watch网站有相关数据,于是进行数据爬取\n", "* Economy Watch:Forbes Global 2000: China's Largest Companies\n", "* http://www.economywatch.com/companies/forbes-list/china.html" ] }, { "cell_type": "code", "execution_count": 255, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "the shape of DataFrame: (1984, 7)\n", "Global Rank int64\n", "Company object\n", "Country object\n", "Sales\\n ($billion) float64\n", "Profits\\n ($billion) float64\n", "Assets\\n ($billion) float64\n", "Market Value\\n ($billion) float64\n", "dtype: object\n" ] }, { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Global RankCompanyCountrySales\n", " ($billion)Profits\n", " ($billion)Assets\n", " ($billion)Market Value\n", " ($billion)
01ICBCChina134.837.82813.5237.3
12China Construction BankChina113.130.62241.0202.0
28Agricultural Bank of ChinaChina103.023.02124.2150.8
39PetroChinaChina308.918.3347.8261.2
411Bank of ChinaChina98.122.12033.8131.7
\n", "
" ], "text/plain": [ " Global Rank Company Country \\\n", "0 1 ICBC China \n", "1 2 China Construction Bank China \n", "2 8 Agricultural Bank of China China \n", "3 9 PetroChina China \n", "4 11 Bank of China China \n", "\n", " Sales\\n ($billion) Profits\\n ($billion) \\\n", "0 134.8 37.8 \n", "1 113.1 30.6 \n", "2 103.0 23.0 \n", "3 308.9 18.3 \n", "4 98.1 22.1 \n", "\n", " Assets\\n ($billion) Market Value\\n ($billion) \n", "0 2813.5 237.3 \n", "1 2241.0 202.0 \n", "2 2124.2 150.8 \n", "3 347.8 261.2 \n", "4 2033.8 131.7 " ] }, "execution_count": 255, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_2013_economy = pd.read_csv('./data/data_forbes_2013_economywatch.csv', encoding='gbk')\n", "print('the shape of DataFrame: ', df_2013_economy.shape)\n", "print(df_2013_economy.dtypes)\n", "df_2013_economy.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* 发现数据只有1984条记录,也缺少相关记录\n", "* 继续寻找其他记录" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Data Source 3: 2013年使用excel数据源" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* 后来,在找到一个excel文件,发现数据记录是完整的,如下:" ] }, { "cell_type": "code", "execution_count": 256, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "the shape of DataFrame: (2000, 7)\n", "排名 int64\n", "公司名 object\n", "国家(地区) object\n", "销售额(亿美元) int64\n", "利润(亿美元) int64\n", "资产(亿美元) int64\n", "市值(亿美元) int64\n", "dtype: object\n" ] }, { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
排名公司名国家(地区)销售额(亿美元)利润(亿美元)资产(亿美元)市值(亿美元)
01中国工商银行/ICBC中国大陆1348378281352373
12中国建设银行/China Construction Bank中国大陆1131306224102020
23摩根大通/JPMorgan Chase美国1082213235911914
34通用电气/General Electric美国147413668532437
45埃克森美孚/Exxon Mobil美国420744933384004
\n", "
" ], "text/plain": [ " 排名 公司名 国家(地区) 销售额(亿美元) 利润(亿美元) 资产(亿美元) \\\n", "0 1 中国工商银行/ICBC 中国大陆 1348 378 28135 \n", "1 2 中国建设银行/China Construction Bank 中国大陆 1131 306 22410 \n", "2 3 摩根大通/JPMorgan Chase 美国 1082 213 23591 \n", "3 4 通用电气/General Electric 美国 1474 136 6853 \n", "4 5 埃克森美孚/Exxon Mobil 美国 4207 449 3338 \n", "\n", " 市值(亿美元) \n", "0 2373 \n", "1 2020 \n", "2 1914 \n", "3 2437 \n", "4 4004 " ] }, "execution_count": 256, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_2013_all = pd.read_excel('./data/data_forbes_2013_all.xlsx')\n", "print('the shape of DataFrame: ', df_2013_all.shape)\n", "print(df_2013_all.dtypes)\n", "df_2013_all.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* 数据整理" ] }, { "cell_type": "code", "execution_count": 257, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(2000, 14)\n", "Year int64\n", "Rank int64\n", "Company_cn_en object\n", "Company_en object\n", "Company_cn object\n", "Country_cn_en object\n", "Country_cn object\n", "Country_en object\n", "Industry_cn object\n", "Industry_en object\n", "Sales float64\n", "Profits float64\n", "Assets float64\n", "Market_value float64\n", "dtype: object\n" ] }, { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
YearRankCompany_cn_enCompany_enCompany_cnCountry_cn_enCountry_cnCountry_enIndustry_cnIndustry_enSalesProfitsAssetsMarket_value
020131中国工商银行/ICBCICBC中国工商银行中国大陆134.837.82813.5237.3
120132中国建设银行/China Construction BankChina Construction Bank中国建设银行中国大陆113.130.62241.0202.0
220133摩根大通/JPMorgan ChaseJPMorgan Chase摩根大通美国108.221.32359.1191.4
320134通用电气/General ElectricGeneral Electric通用电气美国147.413.6685.3243.7
420135埃克森美孚/Exxon MobilExxon Mobil埃克森美孚美国420.744.9333.8400.4
\n", "
" ], "text/plain": [ " Year Rank Company_cn_en Company_en \\\n", "0 2013 1 中国工商银行/ICBC ICBC \n", "1 2013 2 中国建设银行/China Construction Bank China Construction Bank \n", "2 2013 3 摩根大通/JPMorgan Chase JPMorgan Chase \n", "3 2013 4 通用电气/General Electric General Electric \n", "4 2013 5 埃克森美孚/Exxon Mobil Exxon Mobil \n", "\n", " Company_cn Country_cn_en Country_cn Country_en Industry_cn Industry_en \\\n", "0 中国工商银行 中国大陆 \n", "1 中国建设银行 中国大陆 \n", "2 摩根大通 美国 \n", "3 通用电气 美国 \n", "4 埃克森美孚 美国 \n", "\n", " Sales Profits Assets Market_value \n", "0 134.8 37.8 2813.5 237.3 \n", "1 113.1 30.6 2241.0 202.0 \n", "2 108.2 21.3 2359.1 191.4 \n", "3 147.4 13.6 685.3 243.7 \n", "4 420.7 44.9 333.8 400.4 " ] }, "execution_count": 257, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# 更新列名\n", "df_2013_all.columns = ['Rank', 'Company_cn_en', \n", " 'Country_cn', 'Sales', 'Profits', 'Assets', 'Market_value']\n", "\n", "# 拆分\"Company_cn_en\"列,新生成两列,分别为公司英文名称和中文名称\n", "df_2013_all['Company_cn'],df_2013_all['Company_en'] = df_2013_all['Company_cn_en'].str.split('/', 1).str\n", "# print(df_2013_all['Company_cn'][:5])\n", "# print(df_2013_all['Company_en'] [-5:])\n", "\n", "# 将数据单位转换成十亿美元\n", "df_2013_all[['Sales','Profits','Assets','Market_value']] =df_2013_all[['Sales','Profits','Assets','Market_value']].apply(lambda x: x/10)\n", "\n", "# 添加年份2013\n", "df_2013_all['Year'] = 2013\n", "\n", "# 添加空白列\n", "df_2013_all['Country_en'],df_2013_all['Country_cn_en'], df_2013_all['Industry_cn'], df_2013_all['Industry_en'] = ['','','','']\n", "\n", "df_2013_all['Rank'] = pd.to_numeric(df_2013_all['Rank'])\n", "\n", "# 按指定list重新将columns进行排序\n", "df_2013_all = df_2013_all.reindex(columns=columns_sort)\n", "print(df_2013_all.shape)\n", "print(df_2013_all.dtypes)\n", "df_2013_all.head()" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Year 2014" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* 采集的原始数据有异常,用Pandas不能顺利加载,需要进行手动调整;\n", "* 2014年原始数据的单位是 亿美元" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# columns_sort = ['Year', 'Rank', 'Company_cn_en','Company_en','Company_cn', 'Country_cn_en', 'Country_cn', 'Country_en', 'Industry_cn', 'Industry_en','Sales', 'Profits', 'Assets', 'Market_value']" ] }, { "cell_type": "code", "execution_count": 258, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "the shape of DataFrame: (2000, 9)\n", "0 int64\n", "1 int64\n", "2 float64\n", "3 object\n", "4 object\n", "5 int64\n", "6 int64\n", "7 int64\n", "8 int64\n", "dtype: object\n" ] }, { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
012345678
020141NaN中国工商银行/ICBC中国大陆1487427312492156
120142NaN中国建设银行/China Construction Bank中国大陆1213342244951744
220143NaN中国农业银行/Agricultural Bank of China中国大陆1364270240541411
320144NaN摩根大通/JPMorgan Chase美国1057173243532297
420145NaN伯克希尔哈撒韦/Berkshire Hathaway美国178819549343091
\n", "
" ], "text/plain": [ " 0 1 2 3 4 5 6 7 \\\n", "0 2014 1 NaN 中国工商银行/ICBC 中国大陆 1487 427 31249 \n", "1 2014 2 NaN 中国建设银行/China Construction Bank 中国大陆 1213 342 24495 \n", "2 2014 3 NaN 中国农业银行/Agricultural Bank of China 中国大陆 1364 270 24054 \n", "3 2014 4 NaN 摩根大通/JPMorgan Chase 美国 1057 173 24353 \n", "4 2014 5 NaN 伯克希尔哈撒韦/Berkshire Hathaway 美国 1788 195 4934 \n", "\n", " 8 \n", "0 2156 \n", "1 1744 \n", "2 1411 \n", "3 2297 \n", "4 3091 " ] }, "execution_count": 258, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_2014 = pd.read_csv('./data/data_forbes_2014.csv', encoding='gbk',header=None)\n", "print('the shape of DataFrame: ', df_2014.shape)\n", "print(df_2014.dtypes)\n", "df_2014.head()" ] }, { "cell_type": "code", "execution_count": 259, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(2000, 14)\n" ] }, { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
YearRankCompany_cn_enCompany_enCompany_cnCountry_cn_enCountry_cnCountry_enIndustry_cnIndustry_enSalesProfitsAssetsMarket_value
020141中国工商银行/ICBCICBC中国工商银行中国大陆148.742.73124.9215.6
120142中国建设银行/China Construction BankChina Construction Bank中国建设银行中国大陆121.334.22449.5174.4
220143中国农业银行/Agricultural Bank of ChinaAgricultural Bank of China中国农业银行中国大陆136.427.02405.4141.1
320144摩根大通/JPMorgan ChaseJPMorgan Chase摩根大通美国105.717.32435.3229.7
420145伯克希尔哈撒韦/Berkshire HathawayBerkshire Hathaway伯克希尔哈撒韦美国178.819.5493.4309.1
\n", "
" ], "text/plain": [ " Year Rank Company_cn_en Company_en \\\n", "0 2014 1 中国工商银行/ICBC ICBC \n", "1 2014 2 中国建设银行/China Construction Bank China Construction Bank \n", "2 2014 3 中国农业银行/Agricultural Bank of China Agricultural Bank of China \n", "3 2014 4 摩根大通/JPMorgan Chase JPMorgan Chase \n", "4 2014 5 伯克希尔哈撒韦/Berkshire Hathaway Berkshire Hathaway \n", "\n", " Company_cn Country_cn_en Country_cn Country_en Industry_cn Industry_en \\\n", "0 中国工商银行 中国大陆 \n", "1 中国建设银行 中国大陆 \n", "2 中国农业银行 中国大陆 \n", "3 摩根大通 美国 \n", "4 伯克希尔哈撒韦 美国 \n", "\n", " Sales Profits Assets Market_value \n", "0 148.7 42.7 3124.9 215.6 \n", "1 121.3 34.2 2449.5 174.4 \n", "2 136.4 27.0 2405.4 141.1 \n", "3 105.7 17.3 2435.3 229.7 \n", "4 178.8 19.5 493.4 309.1 " ] }, "execution_count": 259, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# 更新列名\n", "df_2014.columns = ['Year', 'Rank', 'Company_cn','Company_cn_en', \n", " 'Country_cn', 'Sales', 'Profits', 'Assets', 'Market_value']\n", "\n", "# 拆分\"Company_cn_en\"列,新生成两列,分别为公司英文名称和中文名称\n", "df_2014['Company_cn'],df_2014['Company_en'] = df_2014['Company_cn_en'].str.split('/', 1).str\n", "# print(df_2014['Company_cn'][:5])\n", "# print(df_2014['Company_en'] [-5:])\n", "\n", "# 将数据单位转换成十亿美元\n", "df_2014[['Sales','Profits','Assets','Market_value']] =df_2014[['Sales','Profits','Assets','Market_value']].apply(lambda x: x/10)\n", "\n", "# 添加空白列\n", "df_2014['Country_en'],df_2014['Country_cn_en'], df_2014['Industry_cn'], df_2014['Industry_en'] = ['','','','']\n", "\n", "# 按指定list重新将columns进行排序\n", "df_2014 = df_2014.reindex(columns=columns_sort)\n", "print(df_2014.shape)\n", "df_2014.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Year 2015" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* 2015年原始数据的单位是 亿美元\n", "* 2015年,有部分企业是重复的" ] }, { "cell_type": "code", "execution_count": 260, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "the shape of DataFrame: (2020, 9)\n", "0 int64\n", "1 int64\n", "2 float64\n", "3 object\n", "4 object\n", "5 float64\n", "6 float64\n", "7 float64\n", "8 float64\n", "dtype: object\n" ] }, { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
012345678
020151NaN中国工商银行/ICBC中国大陆1668.0448.033220.02783.0
120152NaN中国建设银行/China Construction Bank中国大陆1305.0370.026989.02129.0
220153NaN中国农业银行/Agricultural Bank of China中国大陆1292.0291.025748.01899.0
320154NaN中国银行/Bank of China中国大陆1203.0275.024583.01991.0
420155NaN伯克希尔哈撒韦/Berkshire Hathaway美国1947.0199.05346.03548.0
\n", "
" ], "text/plain": [ " 0 1 2 3 4 5 6 \\\n", "0 2015 1 NaN 中国工商银行/ICBC 中国大陆 1668.0 448.0 \n", "1 2015 2 NaN 中国建设银行/China Construction Bank 中国大陆 1305.0 370.0 \n", "2 2015 3 NaN 中国农业银行/Agricultural Bank of China 中国大陆 1292.0 291.0 \n", "3 2015 4 NaN 中国银行/Bank of China 中国大陆 1203.0 275.0 \n", "4 2015 5 NaN 伯克希尔哈撒韦/Berkshire Hathaway 美国 1947.0 199.0 \n", "\n", " 7 8 \n", "0 33220.0 2783.0 \n", "1 26989.0 2129.0 \n", "2 25748.0 1899.0 \n", "3 24583.0 1991.0 \n", "4 5346.0 3548.0 " ] }, "execution_count": 260, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_2015 = pd.read_csv('./data/data_forbes_2015.csv', encoding='gbk',header=None)\n", "print('the shape of DataFrame: ', df_2015.shape)\n", "print(df_2015.dtypes)\n", "df_2015.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* 整理数据" ] }, { "cell_type": "code", "execution_count": 261, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(2020, 14)\n" ] }, { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
YearRankCompany_cn_enCompany_enCompany_cnCountry_cn_enCountry_cnCountry_enIndustry_cnIndustry_enSalesProfitsAssetsMarket_value
020151中国工商银行/ICBCICBC中国工商银行中国大陆166.844.83322.0278.3
120152中国建设银行/China Construction BankChina Construction Bank中国建设银行中国大陆130.537.02698.9212.9
220153中国农业银行/Agricultural Bank of ChinaAgricultural Bank of China中国农业银行中国大陆129.229.12574.8189.9
320154中国银行/Bank of ChinaBank of China中国银行中国大陆120.327.52458.3199.1
420155伯克希尔哈撒韦/Berkshire HathawayBerkshire Hathaway伯克希尔哈撒韦美国194.719.9534.6354.8
\n", "
" ], "text/plain": [ " Year Rank Company_cn_en Company_en \\\n", "0 2015 1 中国工商银行/ICBC ICBC \n", "1 2015 2 中国建设银行/China Construction Bank China Construction Bank \n", "2 2015 3 中国农业银行/Agricultural Bank of China Agricultural Bank of China \n", "3 2015 4 中国银行/Bank of China Bank of China \n", "4 2015 5 伯克希尔哈撒韦/Berkshire Hathaway Berkshire Hathaway \n", "\n", " Company_cn Country_cn_en Country_cn Country_en Industry_cn Industry_en \\\n", "0 中国工商银行 中国大陆 \n", "1 中国建设银行 中国大陆 \n", "2 中国农业银行 中国大陆 \n", "3 中国银行 中国大陆 \n", "4 伯克希尔哈撒韦 美国 \n", "\n", " Sales Profits Assets Market_value \n", "0 166.8 44.8 3322.0 278.3 \n", "1 130.5 37.0 2698.9 212.9 \n", "2 129.2 29.1 2574.8 189.9 \n", "3 120.3 27.5 2458.3 199.1 \n", "4 194.7 19.9 534.6 354.8 " ] }, "execution_count": 261, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# 更新列名\n", "df_2015.columns = ['Year', 'Rank', 'Company_cn','Company_cn_en', \n", " 'Country_cn', 'Sales', 'Profits', 'Assets', 'Market_value']\n", "\n", "# 拆分\"Company_cn_en\"列,新生成两列,分别为公司英文名称和中文名称\n", "df_2015['Company_cn'],df_2015['Company_en'] = df_2015['Company_cn_en'].str.split('/', 1).str\n", "# print(df_2014['Company_cn'][:5])\n", "# print(df_2014['Company_en'] [-5:])\n", "\n", "# 将数据单位转换成十亿美元\n", "df_2015[['Sales','Profits','Assets','Market_value']] =df_2015[['Sales','Profits','Assets','Market_value']].apply(lambda x: x/10)\n", "\n", "# 添加空白列\n", "df_2015['Country_en'],df_2015['Country_cn_en'], df_2015['Industry_cn'], df_2015['Industry_en'] = ['','','','']\n", "\n", "\n", "# 按指定list重新将columns进行排序\n", "df_2015 = df_2015.reindex(columns=columns_sort)\n", "print(df_2015.shape)\n", "df_2015.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* 数据有2020行,有重复行,需要去除重复行" ] }, { "cell_type": "code", "execution_count": 262, "metadata": { "collapsed": false, "scrolled": true }, "outputs": [], "source": [ "# 数据有2020行,有重复行,需要去除重复行\n", "# inplace=True,使去重生效\n", "df_2015.drop_duplicates('Company_cn_en', inplace=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* 查看'Company_cn_en'是否还有重复行" ] }, { "cell_type": "code", "execution_count": 263, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
YearRankCompany_cn_enCompany_enCompany_cnCountry_cn_enCountry_cnCountry_enIndustry_cnIndustry_enSalesProfitsAssetsMarket_value
\n", "
" ], "text/plain": [ "Empty DataFrame\n", "Columns: [Year, Rank, Company_cn_en, Company_en, Company_cn, Country_cn_en, Country_cn, Country_en, Industry_cn, Industry_en, Sales, Profits, Assets, Market_value]\n", "Index: []" ] }, "execution_count": 263, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# 查看'Company_cn_en'是否还有重复行\n", "df_2015[df_2015['Company_cn_en'].duplicated()]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* 查看数据行数" ] }, { "cell_type": "code", "execution_count": 264, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "the shape of DataFrame: (2000, 14)\n" ] } ], "source": [ "print('the shape of DataFrame: ', df_2015.shape)\n", "# print(df_2015.dtypes)\n", "# df_2015.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Year 2016" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* 2016年原始数据的单位,各个单元格不一样" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* 通过查看原始数据,发现有些行的数据可能不准确\n", "* 对比网站的数据,进行了手动修改\n", "* http://www.askci.com/news/finance/20160530/13364822511_7.shtml\n", "* 修改的情况如下:" ] }, { "cell_type": "code", "execution_count": 265, "metadata": { "collapsed": false, "scrolled": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "the shape of DataFrame: (2001, 9)\n", "Year int64\n", "Rank int64\n", "Company_cn float64\n", "Company_en object\n", "Country_en object\n", "Sales object\n", "Profits object\n", "Assets object\n", "Market_value object\n", "dtype: object\n" ] }, { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
YearRankCompany_cnCompany_enCountry_enSalesProfitsAssetsMarket_value
020161NaNICBCChina$171.1 B$44.2 B$3,420.3 B$198 B
120162NaNChina Construction BankChina$146.8 B$36.4 B$2,826 B$162.8 B
220163NaNAgricultural Bank of ChinaChina$131.9 B$28.8 B$2,739.8 B$152.7 B
320164NaNBerkshire HathawayUnited States$210.8 B$24.1 B$561.1 B$360.1 B
420165NaNJPMorgan ChaseUnited States$99.9 B$23.5 B$2,423.8 B$234.2 B
\n", "
" ], "text/plain": [ " Year Rank Company_cn Company_en Country_en \\\n", "0 2016 1 NaN ICBC China \n", "1 2016 2 NaN China Construction Bank China \n", "2 2016 3 NaN Agricultural Bank of China China \n", "3 2016 4 NaN Berkshire Hathaway United States \n", "4 2016 5 NaN JPMorgan Chase United States \n", "\n", " Sales Profits Assets Market_value \n", "0 $171.1 B $44.2 B $3,420.3 B $198 B \n", "1 $146.8 B $36.4 B $2,826 B $162.8 B \n", "2 $131.9 B $28.8 B $2,739.8 B $152.7 B \n", "3 $210.8 B $24.1 B $561.1 B $360.1 B \n", "4 $99.9 B $23.5 B $2,423.8 B $234.2 B " ] }, "execution_count": 265, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_2016 = pd.read_csv('./data/data_forbes_2016.csv', encoding='gbk',header=None)\n", "print('the shape of DataFrame: ', df_2016.shape)\n", "\n", "# 更新列名\n", "df_2016.columns = ['Year', 'Rank', 'Company_cn','Company_en', \n", " 'Country_en', 'Sales', 'Profits', 'Assets', 'Market_value']\n", "\n", "print(df_2016.dtypes)\n", "df_2016.head()" ] }, { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "* 编写自定义处理函数" ] }, { "cell_type": "code", "execution_count": 266, "metadata": { "collapsed": true }, "outputs": [], "source": [ "def pro_col(df, col): \n", " # 替换相关字符串,如有更多的替换情形,可以自行添加\n", " df[col] = df[col].str.replace('$','')\n", " df[col] = df[col].str.replace('^[A-Za-z]+$','')\n", " df[col] = df[col].str.replace('B','')\n", " \n", " # 注意这里是'-$',即以'-'结尾,而不是'-',因为有负数\n", " df[col] = df[col].str.replace('-$','') \n", " df[col] = df[col].str.replace(',','')\n", " \n", " # 处理含有百万“M”为单位的数据,即以“M”结尾的数据\n", " # 思路:\n", " # (1)设定查找条件mask;\n", " # (2)替换字符串“M”为空值\n", " # (3)用pd.to_numeric()转换为数字\n", " # (4)除以1000,转换为十亿美元,与其他行的数据一致\n", " mask = df[col].str.endswith('M')\n", " df.loc[mask, col] = pd.to_numeric(df.loc[mask, col].str.replace('M',''))/1000\n", " \n", " # 将字符型的数字转换为数字类型\n", " df[col] = pd.to_numeric(df[col])\n", " return df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* 应用自定义函数进行数据处理" ] }, { "cell_type": "code", "execution_count": 267, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "the shape of DataFrame: (2001, 9)\n", "Year int64\n", "Rank int64\n", "Company_cn float64\n", "Company_en object\n", "Country_en object\n", "Sales float64\n", "Profits float64\n", "Assets float64\n", "Market_value float64\n", "dtype: object\n" ] }, { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
YearRankCompany_cnCompany_enCountry_enSalesProfitsAssetsMarket_value
020161NaNICBCChina171.144.23420.3198.0
120162NaNChina Construction BankChina146.836.42826.0162.8
220163NaNAgricultural Bank of ChinaChina131.928.82739.8152.7
320164NaNBerkshire HathawayUnited States210.824.1561.1360.1
420165NaNJPMorgan ChaseUnited States99.923.52423.8234.2
\n", "
" ], "text/plain": [ " Year Rank Company_cn Company_en Country_en Sales \\\n", "0 2016 1 NaN ICBC China 171.1 \n", "1 2016 2 NaN China Construction Bank China 146.8 \n", "2 2016 3 NaN Agricultural Bank of China China 131.9 \n", "3 2016 4 NaN Berkshire Hathaway United States 210.8 \n", "4 2016 5 NaN JPMorgan Chase United States 99.9 \n", "\n", " Profits Assets Market_value \n", "0 44.2 3420.3 198.0 \n", "1 36.4 2826.0 162.8 \n", "2 28.8 2739.8 152.7 \n", "3 24.1 561.1 360.1 \n", "4 23.5 2423.8 234.2 " ] }, "execution_count": 267, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# pro_col(df_2016, 'Sales')\n", "# pro_col(df_2016, 'Profits')\n", "# pro_col(df_2016, 'Assets')\n", "# pro_col(df_2016, 'Market_value')\n", "\n", "cols = ['Sales', 'Profits', 'Assets', 'Market_value']\n", "for col in cols:\n", " pro_col(df_2016, col)\n", "\n", "\n", "print('the shape of DataFrame: ', df_2016.shape)\n", "print(df_2016.dtypes)\n", "df_2016.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* 添加空白列" ] }, { "cell_type": "code", "execution_count": 268, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(2001, 14)\n" ] }, { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
YearRankCompany_cn_enCompany_enCompany_cnCountry_cn_enCountry_cnCountry_enIndustry_cnIndustry_enSalesProfitsAssetsMarket_value
020161ICBCNaNChina171.144.23420.3198.0
120162China Construction BankNaNChina146.836.42826.0162.8
220163Agricultural Bank of ChinaNaNChina131.928.82739.8152.7
320164Berkshire HathawayNaNUnited States210.824.1561.1360.1
420165JPMorgan ChaseNaNUnited States99.923.52423.8234.2
\n", "
" ], "text/plain": [ " Year Rank Company_cn_en Company_en Company_cn \\\n", "0 2016 1 ICBC NaN \n", "1 2016 2 China Construction Bank NaN \n", "2 2016 3 Agricultural Bank of China NaN \n", "3 2016 4 Berkshire Hathaway NaN \n", "4 2016 5 JPMorgan Chase NaN \n", "\n", " Country_cn_en Country_cn Country_en Industry_cn Industry_en Sales \\\n", "0 China 171.1 \n", "1 China 146.8 \n", "2 China 131.9 \n", "3 United States 210.8 \n", "4 United States 99.9 \n", "\n", " Profits Assets Market_value \n", "0 44.2 3420.3 198.0 \n", "1 36.4 2826.0 162.8 \n", "2 28.8 2739.8 152.7 \n", "3 24.1 561.1 360.1 \n", "4 23.5 2423.8 234.2 " ] }, "execution_count": 268, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# 添加空白列\n", "df_2016['Company_cn_en'],df_2016['Country_cn_en'], df_2016['Country_cn'], df_2016['Industry_cn'], df_2016['Industry_en'] = ['','','','','']\n", "\n", "\n", "# 按指定list重新将columns进行排序\n", "df_2016 = df_2016.reindex(columns=columns_sort)\n", "print(df_2016.shape)\n", "df_2016.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Year 2017" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* 2017年原始数据的单位,各个单元格不一样" ] }, { "cell_type": "code", "execution_count": 269, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "the shape of DataFrame: (2000, 9)\n" ] }, { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
YearRankCompany_cnCompany_enCountry_enSalesProfitsAssetsMarket_value
020171中国工商银行ICBCChina$151.4 B$42 B$3,473.2 B$229.8 B
120172中国建设银行China Construction BankChina$134.2 B$35 B$3,016.6 B$200.5 B
220173伯克希尔哈撒韦Berkshire HathawayUnited States$222.9 B$24.1 B$620.9 B$409.9 B
320174摩根大通JPMorgan ChaseUnited States$102.5 B$24.2 B$2,513 B$306.6 B
420175富国银行Wells FargoUnited States$97.6 B$21.9 B$1,943.4 B$274.4 B
\n", "
" ], "text/plain": [ " Year Rank Company_cn Company_en Country_en Sales \\\n", "0 2017 1 中国工商银行 ICBC China $151.4 B \n", "1 2017 2 中国建设银行 China Construction Bank China $134.2 B \n", "2 2017 3 伯克希尔哈撒韦 Berkshire Hathaway United States $222.9 B \n", "3 2017 4 摩根大通 JPMorgan Chase United States $102.5 B \n", "4 2017 5 富国银行 Wells Fargo United States $97.6 B \n", "\n", " Profits Assets Market_value \n", "0 $42 B $3,473.2 B $229.8 B \n", "1 $35 B $3,016.6 B $200.5 B \n", "2 $24.1 B $620.9 B $409.9 B \n", "3 $24.2 B $2,513 B $306.6 B \n", "4 $21.9 B $1,943.4 B $274.4 B " ] }, "execution_count": 269, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_2017 = pd.read_csv('./data/data_forbes_2017.csv', encoding='gbk')\n", "\n", "# 更新列名\n", "df_2017.columns = ['Year', 'Rank', 'Company_cn','Company_en', \n", " 'Country_en', 'Sales', 'Profits', 'Assets', 'Market_value']\n", "\n", "print('the shape of DataFrame: ', df_2017.shape)\n", "df_2017.head()" ] }, { "cell_type": "code", "execution_count": 270, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "the shape of DataFrame: (2000, 9)\n", "Year int64\n", "Rank int64\n", "Company_cn object\n", "Company_en object\n", "Country_en object\n", "Sales float64\n", "Profits float64\n", "Assets float64\n", "Market_value float64\n", "dtype: object\n" ] }, { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
YearRankCompany_cnCompany_enCountry_enSalesProfitsAssetsMarket_value
020171中国工商银行ICBCChina151.442.03473.2229.8
120172中国建设银行China Construction BankChina134.235.03016.6200.5
220173伯克希尔哈撒韦Berkshire HathawayUnited States222.924.1620.9409.9
320174摩根大通JPMorgan ChaseUnited States102.524.22513.0306.6
420175富国银行Wells FargoUnited States97.621.91943.4274.4
\n", "
" ], "text/plain": [ " Year Rank Company_cn Company_en Country_en Sales \\\n", "0 2017 1 中国工商银行 ICBC China 151.4 \n", "1 2017 2 中国建设银行 China Construction Bank China 134.2 \n", "2 2017 3 伯克希尔哈撒韦 Berkshire Hathaway United States 222.9 \n", "3 2017 4 摩根大通 JPMorgan Chase United States 102.5 \n", "4 2017 5 富国银行 Wells Fargo United States 97.6 \n", "\n", " Profits Assets Market_value \n", "0 42.0 3473.2 229.8 \n", "1 35.0 3016.6 200.5 \n", "2 24.1 620.9 409.9 \n", "3 24.2 2513.0 306.6 \n", "4 21.9 1943.4 274.4 " ] }, "execution_count": 270, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cols = ['Sales', 'Profits', 'Assets', 'Market_value']\n", "for col in cols:\n", " pro_col(df_2017, col)\n", "\n", "\n", "print('the shape of DataFrame: ', df_2017.shape)\n", "print(df_2017.dtypes)\n", "df_2017.head()" ] }, { "cell_type": "code", "execution_count": 271, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(2000, 14)\n" ] }, { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
YearRankCompany_cn_enCompany_enCompany_cnCountry_cn_enCountry_cnCountry_enIndustry_cnIndustry_enSalesProfitsAssetsMarket_value
020171ICBC中国工商银行China151.442.03473.2229.8
120172China Construction Bank中国建设银行China134.235.03016.6200.5
220173Berkshire Hathaway伯克希尔哈撒韦United States222.924.1620.9409.9
320174JPMorgan Chase摩根大通United States102.524.22513.0306.6
420175Wells Fargo富国银行United States97.621.91943.4274.4
\n", "
" ], "text/plain": [ " Year Rank Company_cn_en Company_en Company_cn Country_cn_en \\\n", "0 2017 1 ICBC 中国工商银行 \n", "1 2017 2 China Construction Bank 中国建设银行 \n", "2 2017 3 Berkshire Hathaway 伯克希尔哈撒韦 \n", "3 2017 4 JPMorgan Chase 摩根大通 \n", "4 2017 5 Wells Fargo 富国银行 \n", "\n", " Country_cn Country_en Industry_cn Industry_en Sales Profits Assets \\\n", "0 China 151.4 42.0 3473.2 \n", "1 China 134.2 35.0 3016.6 \n", "2 United States 222.9 24.1 620.9 \n", "3 United States 102.5 24.2 2513.0 \n", "4 United States 97.6 21.9 1943.4 \n", "\n", " Market_value \n", "0 229.8 \n", "1 200.5 \n", "2 409.9 \n", "3 306.6 \n", "4 274.4 " ] }, "execution_count": 271, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# 添加空白列\n", "df_2017['Company_cn_en'],df_2017['Country_cn_en'], df_2017['Country_cn'], df_2017['Industry_cn'], df_2017['Industry_en'] = ['','','','','']\n", "\n", "\n", "# 按指定list重新将columns进行排序\n", "# columns_sort = ['Year', 'Rank', 'Company_cn_en','Company_en',\n", "# 'Company_cn', 'Country_cn_en', 'Country_cn', \n", "# 'Country_en', 'Industry_cn', 'Industry_en',\n", "# 'Sales', 'Profits', 'Assets', 'Market_value']\n", "df_2017 = df_2017.reindex(columns=columns_sort)\n", "print(df_2017.shape)\n", "df_2017.head()" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# df_2017.to_csv('data_forbes_2017_update.csv')" ] }, { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "# 数据合并" ] }, { "cell_type": "code", "execution_count": 272, "metadata": { "collapsed": false, "scrolled": true }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
YearRankCompany_cn_enCompany_enCompany_cnCountry_cn_enCountry_cnCountry_enIndustry_cnIndustry_enSalesProfitsAssetsMarket_value
020071Citigroup /花旗集团Citigroup花旗集团美国(US)美国US银行146.56021.5401884.32247.420
120072Bank of America /美国银行Bank of America美国银行美国(US)美国US银行116.57021.1301459.74226.610
220073HSBC Holdings/汇丰集团HSBC Holdings汇丰集团英国(UK)英国UK银行121.51016.6301860.76202.290
320074General Electric /通用电气General Electric通用电气美国(US)美国US多元化163.39020.830697.24358.980
420075JPMorgan Chase /JP摩根大通JPMorgan ChaseJP摩根大通美国(US)美国US银行99.30014.4401351.52170.970
520076American Intl Group /美国国际集团American Intl Group美国国际集团美国(US)美国US保险113.19014.010979.41174.470
620077ExxonMobil /埃克森美孚ExxonMobil埃克森美孚美国(US)美国US炼油335.09039.500223.95410.650
720078Royal Dutch Shell /皇家荷兰壳牌集团Royal Dutch Shell皇家荷兰壳牌集团荷兰(NL)荷兰NL炼油318.85025.440232.31208.250
820079UBS /瑞士银行UBS瑞士银行瑞士(SZ)瑞士SZ综合金融105.5909.7801776.89116.840
9200710ING Group /荷兰国际集团ING Group荷兰国际集团荷兰(NL)荷兰NL保险153.4409.6501615.0593.990
10200711BP /英国石油BP英国石油英国(UK)英国UK炼油265.91022.290217.60198.140
11200712Toyota Motor /丰田汽车Toyota Motor丰田汽车日本(JA)日本JA耐用消费品179.02011.680243.60217.690
12200713Royal Bank of Scotland /苏格兰皇家银行Royal Bank of Scotland苏格兰皇家银行英国(UK)英国UK银行77.41012.5101705.35124.130
13200714BNP Paribas /法国巴黎银行BNP Paribas法国巴黎银行法国(FR)法国FR银行89.1609.6401898.1997.030
14200715Allianz /安联Allianz安联德国(GE)德国GE保险125.3308.8101380.8887.220
15200716Berkshire Hathaway /伯克夏·哈萨威Berkshire Hathaway伯克夏·哈萨威美国(US)美国US综合金融98.54011.020248.44163.790
16200717Wal-Mart Stores /沃尔玛Wal-Mart Stores沃尔玛美国(US)美国US零售348.65011.290151.19201.360
17200718Barclays /巴克莱Barclays巴克莱英国(UK)英国UK银行67.7108.9501949.1794.790
18200719Chevron /雪佛龙Chevron雪佛龙美国(US)美国US炼油195.34017.140132.63149.370
19200719Total /道达尔菲纳埃尔夫Total道达尔菲纳埃尔夫法国(FR)法国FR炼油175.05015.530138.82152.620
20200721HBOS /苏格兰哈里法克斯银行HBOS苏格兰哈里法克斯银行英国(UK)英国UK银行84.2807.5901156.6179.830
21200722ConocoPhillips /大陆菲利普斯ConocoPhillips大陆菲利普斯美国(US)美国US炼油167.58015.550164.78107.390
22200723AXA Group /安盛保险集团AXA Group安盛保险集团法国(FR)法国FR保险98.8506.380666.4787.640
23200724Société Générale Group /法国兴业银行Société Générale Group法国兴业银行法国(FR)法国FR银行84.4706.5501259.3277.620
24200725Goldman Sachs Group /高盛集团Goldman Sachs Group高盛集团美国(US)美国US综合金融69.3509.540838.2083.310
25200725Morgan Stanley /摩根士丹利Morgan Stanley摩根士丹利美国(US)美国US综合金融76.5507.4701120.6579.760
26200727Banco Santander /桑坦德银行Banco Santander桑坦德银行西班牙(SP)西班牙SP银行62.3407.370945.86115.750
27200727Deutsche Bank /德意志银行Deutsche Bank德意志银行德国(GE)德国GE综合金融95.5007.4501485.5865.150
28200729AT&T /美国电话电报公司AT&T美国电话电报公司美国(US)美国US电信运营商63.0607.360270.63229.780
29200730Electricité de France /法国电力公司Electricité de France法国电力公司法国(FR)法国FR公用事业77.7507.390233.40133.370
.............................................
2197120171971Tian An China InvestmentsNaNHong Kong0.2440.7364.301.200
2197220171972HollyFrontierNaNUnited States10.600-0.2619.604.700
2197320171973Dah Sing Financial HoldingsNaNHong Kong0.8680.24428.802.600
2197420171973Nanya TechnologyNaNTaiwan1.3000.7364.304.300
2197520171973Shanxi Taigang StainlessNaNChina8.500-0.32211.004.100
2197620171976First Horizon NationalNaNUnited States1.4000.22728.704.300
2197720171977Brother IndustriesNaNJapan6.0000.4085.805.200
2197820171977Chicago Bridge & IronNaNNetherlands10.700-0.3137.803.100
2197920171979Belle International HoldingsNaNHong Kong6.3000.3844.705.400
2198020171979F5 NetworksNaNUnited States2.0000.3702.408.900
2198120171979ValsparNaNUnited States4.2000.3414.208.800
2198220171982DKSH HoldingNaNSwitzerland10.7000.2124.305.100
2198320171982Grupo GaliciaNaNArgentina3.7000.40715.305.100
2198420171984Bank of IwateNaNJapan0.4270.09728.300.721
2198520171984Konica Minolta Business SolutionsNaNJapan8.9000.2808.804.300
2198620171986Synnex Technology IntlNaNTaiwan10.6000.1514.001.800
2198720171987EurazeoNaNFrance3.4000.57510.904.500
2198820171988BankunitedNaNUnited States1.2000.21728.103.900
2198920171988Barry CallebautNaNSwitzerland6.8000.2215.807.300
2199020171988InchcapeNaNUnited Kingdom10.6000.2495.404.400
2199120171991Yamanashi Chuo BankNaNJapan0.5220.06228.000.739
2199220171992Guangxi Guiguan Electric PowerNaNChina2.0000.6436.006.000
2199320171992Live Nation EntertainmentNaNUnited States8.400-0.0476.806.400
2199420171992Shaanxi Coal IndustryNaNChina3.800-0.02413.609.000
2199520171995AurubisNaNGermany10.6000.2494.503.100
2199620171996BEKB-BCBENaNSwitzerland0.5550.13127.901.700
2199720171996Fastighets BalderNaNSweden0.6300.63910.203.800
2199820171998Akamai TechnologiesNaNUnited States2.3000.3164.4010.100
2199920171998Oita BankNaNJapan0.5230.07127.900.595
2200020171998Tech MahindraNaNIndia4.2000.4693.606.700
\n", "

22001 rows × 14 columns

\n", "
" ], "text/plain": [ " Year Rank Company_cn_en \\\n", "0 2007 1 Citigroup /花旗集团 \n", "1 2007 2 Bank of America /美国银行 \n", "2 2007 3 HSBC Holdings/汇丰集团 \n", "3 2007 4 General Electric /通用电气 \n", "4 2007 5 JPMorgan Chase /JP摩根大通 \n", "5 2007 6 American Intl Group /美国国际集团 \n", "6 2007 7 ExxonMobil /埃克森美孚 \n", "7 2007 8 Royal Dutch Shell /皇家荷兰壳牌集团 \n", "8 2007 9 UBS /瑞士银行 \n", "9 2007 10 ING Group /荷兰国际集团 \n", "10 2007 11 BP /英国石油 \n", "11 2007 12 Toyota Motor /丰田汽车 \n", "12 2007 13 Royal Bank of Scotland /苏格兰皇家银行 \n", "13 2007 14 BNP Paribas /法国巴黎银行 \n", "14 2007 15 Allianz /安联 \n", "15 2007 16 Berkshire Hathaway /伯克夏·哈萨威 \n", "16 2007 17 Wal-Mart Stores /沃尔玛 \n", "17 2007 18 Barclays /巴克莱 \n", "18 2007 19 Chevron /雪佛龙 \n", "19 2007 19 Total /道达尔菲纳埃尔夫 \n", "20 2007 21 HBOS /苏格兰哈里法克斯银行 \n", "21 2007 22 ConocoPhillips /大陆菲利普斯 \n", "22 2007 23 AXA Group /安盛保险集团 \n", "23 2007 24 Société Générale Group /法国兴业银行 \n", "24 2007 25 Goldman Sachs Group /高盛集团 \n", "25 2007 25 Morgan Stanley /摩根士丹利 \n", "26 2007 27 Banco Santander /桑坦德银行 \n", "27 2007 27 Deutsche Bank /德意志银行 \n", "28 2007 29 AT&T /美国电话电报公司 \n", "29 2007 30 Electricité de France /法国电力公司 \n", "... ... ... ... \n", "21971 2017 1971 \n", "21972 2017 1972 \n", "21973 2017 1973 \n", "21974 2017 1973 \n", "21975 2017 1973 \n", "21976 2017 1976 \n", "21977 2017 1977 \n", "21978 2017 1977 \n", "21979 2017 1979 \n", "21980 2017 1979 \n", "21981 2017 1979 \n", "21982 2017 1982 \n", "21983 2017 1982 \n", "21984 2017 1984 \n", "21985 2017 1984 \n", "21986 2017 1986 \n", "21987 2017 1987 \n", "21988 2017 1988 \n", "21989 2017 1988 \n", "21990 2017 1988 \n", "21991 2017 1991 \n", "21992 2017 1992 \n", "21993 2017 1992 \n", "21994 2017 1992 \n", "21995 2017 1995 \n", "21996 2017 1996 \n", "21997 2017 1996 \n", "21998 2017 1998 \n", "21999 2017 1998 \n", "22000 2017 1998 \n", "\n", " Company_en Company_cn Country_cn_en Country_cn \\\n", "0 Citigroup 花旗集团 美国(US) 美国 \n", "1 Bank of America 美国银行 美国(US) 美国 \n", "2 HSBC Holdings 汇丰集团 英国(UK) 英国 \n", "3 General Electric 通用电气 美国(US) 美国 \n", "4 JPMorgan Chase JP摩根大通 美国(US) 美国 \n", "5 American Intl Group 美国国际集团 美国(US) 美国 \n", "6 ExxonMobil 埃克森美孚 美国(US) 美国 \n", "7 Royal Dutch Shell 皇家荷兰壳牌集团 荷兰(NL) 荷兰 \n", "8 UBS 瑞士银行 瑞士(SZ) 瑞士 \n", "9 ING Group 荷兰国际集团 荷兰(NL) 荷兰 \n", "10 BP 英国石油 英国(UK) 英国 \n", "11 Toyota Motor 丰田汽车 日本(JA) 日本 \n", "12 Royal Bank of Scotland 苏格兰皇家银行 英国(UK) 英国 \n", "13 BNP Paribas 法国巴黎银行 法国(FR) 法国 \n", "14 Allianz 安联 德国(GE) 德国 \n", "15 Berkshire Hathaway 伯克夏·哈萨威 美国(US) 美国 \n", "16 Wal-Mart Stores 沃尔玛 美国(US) 美国 \n", "17 Barclays 巴克莱 英国(UK) 英国 \n", "18 Chevron 雪佛龙 美国(US) 美国 \n", "19 Total 道达尔菲纳埃尔夫 法国(FR) 法国 \n", "20 HBOS 苏格兰哈里法克斯银行 英国(UK) 英国 \n", "21 ConocoPhillips 大陆菲利普斯 美国(US) 美国 \n", "22 AXA Group 安盛保险集团 法国(FR) 法国 \n", "23 Société Générale Group 法国兴业银行 法国(FR) 法国 \n", "24 Goldman Sachs Group 高盛集团 美国(US) 美国 \n", "25 Morgan Stanley 摩根士丹利 美国(US) 美国 \n", "26 Banco Santander 桑坦德银行 西班牙(SP) 西班牙 \n", "27 Deutsche Bank 德意志银行 德国(GE) 德国 \n", "28 AT&T 美国电话电报公司 美国(US) 美国 \n", "29 Electricité de France 法国电力公司 法国(FR) 法国 \n", "... ... ... ... ... \n", "21971 Tian An China Investments NaN \n", "21972 HollyFrontier NaN \n", "21973 Dah Sing Financial Holdings NaN \n", "21974 Nanya Technology NaN \n", "21975 Shanxi Taigang Stainless NaN \n", "21976 First Horizon National NaN \n", "21977 Brother Industries NaN \n", "21978 Chicago Bridge & Iron NaN \n", "21979 Belle International Holdings NaN \n", "21980 F5 Networks NaN \n", "21981 Valspar NaN \n", "21982 DKSH Holding NaN \n", "21983 Grupo Galicia NaN \n", "21984 Bank of Iwate NaN \n", "21985 Konica Minolta Business Solutions NaN \n", "21986 Synnex Technology Intl NaN \n", "21987 Eurazeo NaN \n", "21988 Bankunited NaN \n", "21989 Barry Callebaut NaN \n", "21990 Inchcape NaN \n", "21991 Yamanashi Chuo Bank NaN \n", "21992 Guangxi Guiguan Electric Power NaN \n", "21993 Live Nation Entertainment NaN \n", "21994 Shaanxi Coal Industry NaN \n", "21995 Aurubis NaN \n", "21996 BEKB-BCBE NaN \n", "21997 Fastighets Balder NaN \n", "21998 Akamai Technologies NaN \n", "21999 Oita Bank NaN \n", "22000 Tech Mahindra NaN \n", "\n", " Country_en Industry_cn Industry_en Sales Profits Assets \\\n", "0 US 银行 146.560 21.540 1884.32 \n", "1 US 银行 116.570 21.130 1459.74 \n", "2 UK 银行 121.510 16.630 1860.76 \n", "3 US 多元化 163.390 20.830 697.24 \n", "4 US 银行 99.300 14.440 1351.52 \n", "5 US 保险 113.190 14.010 979.41 \n", "6 US 炼油 335.090 39.500 223.95 \n", "7 NL 炼油 318.850 25.440 232.31 \n", "8 SZ 综合金融 105.590 9.780 1776.89 \n", "9 NL 保险 153.440 9.650 1615.05 \n", "10 UK 炼油 265.910 22.290 217.60 \n", "11 JA 耐用消费品 179.020 11.680 243.60 \n", "12 UK 银行 77.410 12.510 1705.35 \n", "13 FR 银行 89.160 9.640 1898.19 \n", "14 GE 保险 125.330 8.810 1380.88 \n", "15 US 综合金融 98.540 11.020 248.44 \n", "16 US 零售 348.650 11.290 151.19 \n", "17 UK 银行 67.710 8.950 1949.17 \n", "18 US 炼油 195.340 17.140 132.63 \n", "19 FR 炼油 175.050 15.530 138.82 \n", "20 UK 银行 84.280 7.590 1156.61 \n", "21 US 炼油 167.580 15.550 164.78 \n", "22 FR 保险 98.850 6.380 666.47 \n", "23 FR 银行 84.470 6.550 1259.32 \n", "24 US 综合金融 69.350 9.540 838.20 \n", "25 US 综合金融 76.550 7.470 1120.65 \n", "26 SP 银行 62.340 7.370 945.86 \n", "27 GE 综合金融 95.500 7.450 1485.58 \n", "28 US 电信运营商 63.060 7.360 270.63 \n", "29 FR 公用事业 77.750 7.390 233.40 \n", "... ... ... ... ... ... ... \n", "21971 Hong Kong 0.244 0.736 4.30 \n", "21972 United States 10.600 -0.261 9.60 \n", "21973 Hong Kong 0.868 0.244 28.80 \n", "21974 Taiwan 1.300 0.736 4.30 \n", "21975 China 8.500 -0.322 11.00 \n", "21976 United States 1.400 0.227 28.70 \n", "21977 Japan 6.000 0.408 5.80 \n", "21978 Netherlands 10.700 -0.313 7.80 \n", "21979 Hong Kong 6.300 0.384 4.70 \n", "21980 United States 2.000 0.370 2.40 \n", "21981 United States 4.200 0.341 4.20 \n", "21982 Switzerland 10.700 0.212 4.30 \n", "21983 Argentina 3.700 0.407 15.30 \n", "21984 Japan 0.427 0.097 28.30 \n", "21985 Japan 8.900 0.280 8.80 \n", "21986 Taiwan 10.600 0.151 4.00 \n", "21987 France 3.400 0.575 10.90 \n", "21988 United States 1.200 0.217 28.10 \n", "21989 Switzerland 6.800 0.221 5.80 \n", "21990 United Kingdom 10.600 0.249 5.40 \n", "21991 Japan 0.522 0.062 28.00 \n", "21992 China 2.000 0.643 6.00 \n", "21993 United States 8.400 -0.047 6.80 \n", "21994 China 3.800 -0.024 13.60 \n", "21995 Germany 10.600 0.249 4.50 \n", "21996 Switzerland 0.555 0.131 27.90 \n", "21997 Sweden 0.630 0.639 10.20 \n", "21998 United States 2.300 0.316 4.40 \n", "21999 Japan 0.523 0.071 27.90 \n", "22000 India 4.200 0.469 3.60 \n", "\n", " Market_value \n", "0 247.420 \n", "1 226.610 \n", "2 202.290 \n", "3 358.980 \n", "4 170.970 \n", "5 174.470 \n", "6 410.650 \n", "7 208.250 \n", "8 116.840 \n", "9 93.990 \n", "10 198.140 \n", "11 217.690 \n", "12 124.130 \n", "13 97.030 \n", "14 87.220 \n", "15 163.790 \n", "16 201.360 \n", "17 94.790 \n", "18 149.370 \n", "19 152.620 \n", "20 79.830 \n", "21 107.390 \n", "22 87.640 \n", "23 77.620 \n", "24 83.310 \n", "25 79.760 \n", "26 115.750 \n", "27 65.150 \n", "28 229.780 \n", "29 133.370 \n", "... ... \n", "21971 1.200 \n", "21972 4.700 \n", "21973 2.600 \n", "21974 4.300 \n", "21975 4.100 \n", "21976 4.300 \n", "21977 5.200 \n", "21978 3.100 \n", "21979 5.400 \n", "21980 8.900 \n", "21981 8.800 \n", "21982 5.100 \n", "21983 5.100 \n", "21984 0.721 \n", "21985 4.300 \n", "21986 1.800 \n", "21987 4.500 \n", "21988 3.900 \n", "21989 7.300 \n", "21990 4.400 \n", "21991 0.739 \n", "21992 6.000 \n", "21993 6.400 \n", "21994 9.000 \n", "21995 3.100 \n", "21996 1.700 \n", "21997 3.800 \n", "21998 10.100 \n", "21999 0.595 \n", "22000 6.700 \n", "\n", "[22001 rows x 14 columns]" ] }, "execution_count": 272, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_concat = pd.concat([df_2007, df_2008, df_2009, \n", " df_2010, df_2011, df_2012, \n", " df_2013_all, df_2014, \n", " df_2015, df_2016, df_2017], ignore_index=True)\n", "df_concat" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# df_concat.to_csv('data_forbes_concat.csv')" ] }, { "cell_type": "code", "execution_count": 273, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "the shape of DataFrame: (22001, 14)\n", "Year int64\n", "Rank int64\n", "Company_cn_en object\n", "Company_en object\n", "Company_cn object\n", "Country_cn_en object\n", "Country_cn object\n", "Country_en object\n", "Industry_cn object\n", "Industry_en object\n", "Sales float64\n", "Profits float64\n", "Assets float64\n", "Market_value float64\n", "dtype: object\n" ] } ], "source": [ "print('the shape of DataFrame: ', df_concat.shape)\n", "print(df_concat.dtypes)" ] }, { "cell_type": "markdown", "metadata": { "collapsed": false, "scrolled": true }, "source": [ "* 若 \"Country_en\"列有空值,则用\"Conutry_cn\"列的值替换" ] }, { "cell_type": "code", "execution_count": 274, "metadata": { "collapsed": false }, "outputs": [], "source": [ "df_concat.loc[df_concat['Country_en']=='', 'Country_en'] = df_concat.loc[df_concat['Country_en']=='', 'Country_cn']" ] }, { "cell_type": "code", "execution_count": 275, "metadata": { "collapsed": false, "scrolled": true }, "outputs": [], "source": [ "# df_concat[df_concat['Year']==2011]" ] }, { "cell_type": "code", "execution_count": 276, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# df_concat.to_csv('data_forbes_concat-1.csv')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* 将国家名称统一成英文名称,这里主要用的是 replace()方法" ] }, { "cell_type": "code", "execution_count": 277, "metadata": { "collapsed": false, "scrolled": true }, "outputs": [], "source": [ "list_origin = [']','AR', 'AS','AU', 'US', 'BE', 'BR', 'BS', 'BU', 'CA', 'CH', 'CI',\n", " 'CN-HK', 'CN-TA', 'CO', 'CZ', 'DE', 'EG', 'FI', 'FR', 'GE',\n", " 'GR', 'Hong Kong', 'Hong Kong/China', 'HU', 'IC', 'ID', 'IN', 'IR',\n", " 'IS','IT', 'JA', 'JO', 'KO','LI', 'LU', 'MX', 'NL', 'NO', 'NZ',\n", " 'PA', 'PE', 'PH', 'PK', 'PL', 'PO', 'RU', 'SI', 'SP', 'SU',\n", " 'SW', 'SZ', 'Taiwan', 'TH', 'TU', 'UK', 'VE', '阿根廷', '阿联酋',\n", " '阿曼', '埃及', '爱尔兰', '奥地利', '澳大利亚', '巴基斯坦', '巴林',\n", " '巴拿马', '巴西', '百慕大', '比利时', '波多黎各', '波兰', '丹麦',\n", " '德国', '多哥', '俄罗斯', '法国', '菲律宾', '芬兰', '哥伦比亚',\n", " '哈萨克斯坦', '海峡群岛', '韩国', '荷兰', '加拿大', '捷克共和国',\n", " '卡塔尔', '开曼群岛', '科威特', '克罗地亚', '黎巴嫩', '利比里亚',\n", " '列支敦士登', '列支敦斯登', '卢森堡', '马来西亚', '毛里求斯',\n", " '美国', '秘鲁', '摩洛哥', '墨西哥', '南非', '尼日利亚', '挪威',\n", " '葡萄牙', '日本', '瑞典', '瑞士', '塞浦路斯', '沙特$',\n", " '沙特阿拉伯', '斯洛伐克', '泰国', '土耳其', '委内瑞拉', '西班牙',\n", " '希腊', '新加坡', '新西兰', '匈牙利', '以色列', '意大利',\n", " '印度$', '印度尼西亚', '印尼', '英国', '约旦', '越南', '智利',\n", " '中国大陆', '中国台湾', '中国香港', 'CN$', '中国$', 'Netherlands']\n", "\n", "list_replace = ['','Argentina', 'Austria','Australia', 'United States', 'Belgium',\n", " 'Brazil', 'Bahamas', 'Bermuda', 'Canada', 'Chile', 'Cayman Islands',\n", " 'China-HongKong', 'China-Taiwan', 'Colombia', 'Czech Republic',\n", " 'Denmark', 'Egypt', 'Finland', 'France', 'Germany', 'Greece', 'China-HongKong',\n", " 'China-HongKong', 'Hungary', 'Iceland', 'Indonesia', 'India', 'Ireland', 'Israel',\n", " 'Italy', 'Japan', 'Jordan', 'South Korea','Liberia', 'Luxembourg', 'Mexico',\n", " 'Netherland', 'Norway', 'New Zealand', 'Panama', 'Peru', 'Philippines',\n", " 'Pakistan', 'Poland', 'Portugal', 'Russia', 'Singapore', 'Spain',\n", " 'Saudi Arabia', 'Sweden', 'Switzerland', 'China-Taiwan',\n", " 'Thailand', 'Turkey', 'United Kingdom', 'Venezuela', 'Argentina',\n", " 'United Arab Emirates', 'Oman', 'Egypt', 'Ireland', 'Austria', 'Australia',\n", " 'Pakistan', 'Bahrain', 'Panama', 'Brazil', 'Bermuda', 'Belgium',\n", " 'Puerto Rico', 'Poland', 'Denmark', 'Germany', 'Togo', 'Russia',\n", " 'France', 'Philippines', 'Finland', 'Colombia', 'Kazakhstan',\n", " 'Channel Islands', 'South Korea', 'Netherland', 'Canada',\n", " 'Czech Republic', 'Qatar', 'Cayman Islands', 'Kuwait', 'Croatia', 'Lebanon',\n", " 'Liberia', 'Liechtenstein', 'Liechtenstein', 'Luxembourg', 'Malaysia',\n", " 'Mauritius', 'United States', 'Peru', 'Morocco', 'Mexico', 'South Africa',\n", " 'Nigeria', 'Norway', 'Portugal', 'Japan', 'Sweden', 'Switzerland', 'Cyprus',\n", " 'Saudi Arabia', 'Saudi Arabia', 'Slovakia', 'Thailand', 'Turkey', 'Venezuela',\n", " 'Spain', 'Greece', 'Singapore', 'New Zealand', 'Hungary', 'Israel', 'Italy',\n", " 'India', 'Indonesia', 'Indonesia', 'United Kingdom', 'Jordan', 'Vietnam',\n", " 'Chile', 'China', 'China-Taiwan', 'China-HongKong', 'China', 'China',\n", " 'Netherland']\n", "\n", "df_concat['Country_en'] = df_concat['Country_en'].replace(list_origin,list_replace, regex=True)\n", "\n", "# df_concat['Country_en'] = df_concat['Country_en'].replace(['CN','中国'],['China', 'China'], regex=True)\n", "\n", "# df_concat.head()" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# df_concat.to_csv('data_forbes_concat-2.csv')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* 进一步替换,包括一家公司属于多个国家的情况" ] }, { "cell_type": "code", "execution_count": 278, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# 进一步替换,包括一家公司属于多个国家的情况\n", "# 注意这里没有 regex=True \n", "df_concat['Country_en'] = df_concat['Country_en'].replace(['China-China-Taiwan',r'China-HongKong/China', r'Australia)/United Kingdom(United Kingdom',\n", " r'Netherland)/United Kingdom(United Kingdom', r'Panama)/United Kingdom(United Kingdom',\n", " r'SA)/United Kingdom(United Kingdom', r'United Kingdom)/Australia(SA', r'United Kingdom)/Netherland(Netherland',\n", " r'United Kingdom)/South Africa(SA' , 'Netherland/United Kingdom', 'Panama/United Kingdom',\n", " 'Australia/United Kingdom'],\n", " ['China-Taiwan', 'China-HongKong', 'United Kingdom/Australia', 'United Kingdom/Netherland',\n", " 'United Kingdom/Panama', 'United Kingdom/Australia', 'United Kingdom/Australia',\n", " 'United Kingdom/Netherland', 'United Kingdom/South Africa', 'United Kingdom/Netherland',\n", " 'United Kingdom/Panama', 'United Kingdom/Australia'])\n", "\n", "\n", "# df_concat.head()" ] }, { "cell_type": "code", "execution_count": 279, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# df_concat.to_csv('data_forbes_concat-3.csv')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* 有两个特殊情况,同样的英文缩写,代表的国家可能不同\n", "* 需要进一步处理\n", "* SA可以代表Australia or South Africa\n", "* MA可以代表 Malaysia or Morocco\n", "* 如下图" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "处理这类特殊情况的思路与步骤为:\n", "\n", "(a)将此类英文缩写用空白值替换,\n", "\n", "(b)将空白值替换为当前样本的中文国家名称\n", "\n", "(c)将中文国家名称替换为各自的英文国家名称" ] }, { "cell_type": "code", "execution_count": 280, "metadata": { "collapsed": false, "scrolled": true }, "outputs": [], "source": [ "# 注意需要以 \"$\"结尾\n", "df_concat['Country_en'] = df_concat['Country_en'].replace(['MA$','SA$'],['', ''], regex=True)\n", "\n", "# df_concat[df_concat['Country_en']=='']" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* 若 \"Country_en\"列有空值,则用\"Conntry_cn\"列的值替换" ] }, { "cell_type": "code", "execution_count": 281, "metadata": { "collapsed": true }, "outputs": [], "source": [ "df_concat.loc[df_concat['Country_en']=='', 'Country_en'] = df_concat.loc[df_concat['Country_en']=='', 'Country_cn']" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# df_concat.to_csv('data_forbes_concat-4.csv')" ] }, { "cell_type": "code", "execution_count": 282, "metadata": { "collapsed": true }, "outputs": [], "source": [ "df_concat['Country_en'] = df_concat['Country_en'].replace(['澳大利亚','马来西亚', '摩洛哥', '南非'],\n", " ['Australia', 'Malaysia', 'Morocco', 'South Africa'])\n", "\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "df_concat.to_csv('data_forbes_concat.csv')" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] } ], "metadata": { "anaconda-cloud": {}, "kernelspec": { "display_name": "Python [conda root]", "language": "python", "name": "conda-root-py" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.5.2" }, "toc": { "colors": { "hover_highlight": "#DAA520", "navigate_num": "#000000", "navigate_text": "#333333", "running_highlight": "#FF0000", "selected_highlight": "#FFD700", "sidebar_border": "#EEEEEE", "wrapper_background": "#FFFFFF" }, "moveMenuLeft": true, "nav_menu": { "height": "12px", "width": "252px" }, "navigate_menu": true, "number_sections": true, "sideBar": true, "threshold": 4, "toc_cell": true, "toc_position": { "height": "668px", "left": "0px", "right": "1154px", "top": "106px", "width": "212px" }, "toc_section_display": "block", "toc_window_display": true, "widenNotebook": false } }, "nbformat": 4, "nbformat_minor": 1 }