{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Project: Investigate The Movie Database (TMDb) \n",
    "\n",
    "--by Lu Tang\n",
    "\n",
    "## Table of Contents\n",
    "<ul>\n",
    "<li><a href=\"#intro\">Introduction</a></li>\n",
    "<li><a href=\"#wrangling\">Data Wrangling</a></li>\n",
    "<li><a href=\"#eda\">Exploratory Data Analysis</a></li>\n",
    "<li><a href=\"#conclusions\">Conclusions</a></li>\n",
    "</ul>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a id='intro'></a>\n",
    "## Introduction\n",
    "\n",
    "> **Dataset**: This data set contains information about 10,000 movies collected from The Movie Database (TMDb), including user ratings and revenue. Data can be download from[here](https://d17h27t6h515a5.cloudfront.net/topher/2017/October/59dd1c4c_tmdb-movies/tmdb-movies.csv).\n",
    "> The final two columns ending with “_adj” show the budget and revenue of the associated movie in terms of 2010 dollars, accounting for inflation over time.\n",
    "\n",
    "> **The project aims to explore the following questions:**\n",
    "> - Question 1: What are the trend for movie industry? Are movie industry making more money over years\n",
    "> - Question 2: Are newer movies more popular?\n",
    "> - Question 3: What are the top 5 most common movie generes that associated with high revenue?\n",
    "> - Question 4. Is it possible to make extremely high profit movies with low budget?\n",
    "> - Question 5: What are the top 10 rated movies? and how is their profitibility?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "# import library that will be used in this project \n",
    "import pandas as pd\n",
    "import matplotlib.pyplot as plt\n",
    "import seaborn as sns\n",
    "%matplotlib inline"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a id='wrangling'></a>\n",
    "## Data Wrangling\n",
    "\n",
    "### General Properties"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "(10866, 21)\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>id</th>\n",
       "      <th>imdb_id</th>\n",
       "      <th>popularity</th>\n",
       "      <th>budget</th>\n",
       "      <th>revenue</th>\n",
       "      <th>original_title</th>\n",
       "      <th>cast</th>\n",
       "      <th>homepage</th>\n",
       "      <th>director</th>\n",
       "      <th>tagline</th>\n",
       "      <th>keywords</th>\n",
       "      <th>overview</th>\n",
       "      <th>runtime</th>\n",
       "      <th>genres</th>\n",
       "      <th>production_companies</th>\n",
       "      <th>release_date</th>\n",
       "      <th>vote_count</th>\n",
       "      <th>vote_average</th>\n",
       "      <th>release_year</th>\n",
       "      <th>budget_adj</th>\n",
       "      <th>revenue_adj</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>135397</td>\n",
       "      <td>tt0369610</td>\n",
       "      <td>32.985763</td>\n",
       "      <td>150000000</td>\n",
       "      <td>1513528810</td>\n",
       "      <td>Jurassic World</td>\n",
       "      <td>Chris Pratt|Bryce Dallas Howard|Irrfan Khan|Vi...</td>\n",
       "      <td>http://www.jurassicworld.com/</td>\n",
       "      <td>Colin Trevorrow</td>\n",
       "      <td>The park is open.</td>\n",
       "      <td>monster|dna|tyrannosaurus rex|velociraptor|island</td>\n",
       "      <td>Twenty-two years after the events of Jurassic ...</td>\n",
       "      <td>124</td>\n",
       "      <td>Action|Adventure|Science Fiction|Thriller</td>\n",
       "      <td>Universal Studios|Amblin Entertainment|Legenda...</td>\n",
       "      <td>6/9/15</td>\n",
       "      <td>5562</td>\n",
       "      <td>6.5</td>\n",
       "      <td>2015</td>\n",
       "      <td>1.379999e+08</td>\n",
       "      <td>1.392446e+09</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>76341</td>\n",
       "      <td>tt1392190</td>\n",
       "      <td>28.419936</td>\n",
       "      <td>150000000</td>\n",
       "      <td>378436354</td>\n",
       "      <td>Mad Max: Fury Road</td>\n",
       "      <td>Tom Hardy|Charlize Theron|Hugh Keays-Byrne|Nic...</td>\n",
       "      <td>http://www.madmaxmovie.com/</td>\n",
       "      <td>George Miller</td>\n",
       "      <td>What a Lovely Day.</td>\n",
       "      <td>future|chase|post-apocalyptic|dystopia|australia</td>\n",
       "      <td>An apocalyptic story set in the furthest reach...</td>\n",
       "      <td>120</td>\n",
       "      <td>Action|Adventure|Science Fiction|Thriller</td>\n",
       "      <td>Village Roadshow Pictures|Kennedy Miller Produ...</td>\n",
       "      <td>5/13/15</td>\n",
       "      <td>6185</td>\n",
       "      <td>7.1</td>\n",
       "      <td>2015</td>\n",
       "      <td>1.379999e+08</td>\n",
       "      <td>3.481613e+08</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       id    imdb_id  popularity     budget     revenue      original_title  \\\n",
       "0  135397  tt0369610   32.985763  150000000  1513528810      Jurassic World   \n",
       "1   76341  tt1392190   28.419936  150000000   378436354  Mad Max: Fury Road   \n",
       "\n",
       "                                                cast  \\\n",
       "0  Chris Pratt|Bryce Dallas Howard|Irrfan Khan|Vi...   \n",
       "1  Tom Hardy|Charlize Theron|Hugh Keays-Byrne|Nic...   \n",
       "\n",
       "                        homepage         director             tagline  \\\n",
       "0  http://www.jurassicworld.com/  Colin Trevorrow   The park is open.   \n",
       "1    http://www.madmaxmovie.com/    George Miller  What a Lovely Day.   \n",
       "\n",
       "                                            keywords  \\\n",
       "0  monster|dna|tyrannosaurus rex|velociraptor|island   \n",
       "1   future|chase|post-apocalyptic|dystopia|australia   \n",
       "\n",
       "                                            overview  runtime  \\\n",
       "0  Twenty-two years after the events of Jurassic ...      124   \n",
       "1  An apocalyptic story set in the furthest reach...      120   \n",
       "\n",
       "                                      genres  \\\n",
       "0  Action|Adventure|Science Fiction|Thriller   \n",
       "1  Action|Adventure|Science Fiction|Thriller   \n",
       "\n",
       "                                production_companies release_date  vote_count  \\\n",
       "0  Universal Studios|Amblin Entertainment|Legenda...       6/9/15        5562   \n",
       "1  Village Roadshow Pictures|Kennedy Miller Produ...      5/13/15        6185   \n",
       "\n",
       "   vote_average  release_year    budget_adj   revenue_adj  \n",
       "0           6.5          2015  1.379999e+08  1.392446e+09  \n",
       "1           7.1          2015  1.379999e+08  3.481613e+08  "
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# loading data\n",
    "tmdb=pd.read_csv('tmdb-movies.csv')\n",
    "# show number of rows and columns\n",
    "print(tmdb.shape)\n",
    "# to avoid truncated output \n",
    "pd.options.display.max_columns = 150 \n",
    "# show first 2 rows\n",
    "tmdb.head(2)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "> **Initial observation**: \n",
    ">- Our focus will be analyzing movie properties associated with high revenue, some columns are irrelavant for our analysis, e.g `id`,`imdb_id`, `homepage`, `tagline`, `keywords, overview, production_companies, release_date` (since we already have `release_year`).\n",
    ">- we can also remove `budget` and `revenue`, since we have `budget_adj` and `revenue_adj` to analyze."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>popularity</th>\n",
       "      <th>original_title</th>\n",
       "      <th>cast</th>\n",
       "      <th>director</th>\n",
       "      <th>runtime</th>\n",
       "      <th>genres</th>\n",
       "      <th>vote_count</th>\n",
       "      <th>vote_average</th>\n",
       "      <th>release_year</th>\n",
       "      <th>budget_adj</th>\n",
       "      <th>revenue_adj</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>32.985763</td>\n",
       "      <td>Jurassic World</td>\n",
       "      <td>Chris Pratt|Bryce Dallas Howard|Irrfan Khan|Vi...</td>\n",
       "      <td>Colin Trevorrow</td>\n",
       "      <td>124</td>\n",
       "      <td>Action|Adventure|Science Fiction|Thriller</td>\n",
       "      <td>5562</td>\n",
       "      <td>6.5</td>\n",
       "      <td>2015</td>\n",
       "      <td>1.379999e+08</td>\n",
       "      <td>1.392446e+09</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   popularity  original_title  \\\n",
       "0   32.985763  Jurassic World   \n",
       "\n",
       "                                                cast         director  \\\n",
       "0  Chris Pratt|Bryce Dallas Howard|Irrfan Khan|Vi...  Colin Trevorrow   \n",
       "\n",
       "   runtime                                     genres  vote_count  \\\n",
       "0      124  Action|Adventure|Science Fiction|Thriller        5562   \n",
       "\n",
       "   vote_average  release_year    budget_adj   revenue_adj  \n",
       "0           6.5          2015  1.379999e+08  1.392446e+09  "
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Drop extraneous columns  \n",
    "drop_col=['id','imdb_id','homepage','tagline','keywords','overview','production_companies','release_date','budget','revenue']\n",
    "tmdb = tmdb.drop(drop_col, axis=1)\n",
    "# check the result\n",
    "tmdb.head(1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<class 'pandas.core.frame.DataFrame'>\n",
      "RangeIndex: 10866 entries, 0 to 10865\n",
      "Data columns (total 11 columns):\n",
      "popularity        10866 non-null float64\n",
      "original_title    10866 non-null object\n",
      "cast              10790 non-null object\n",
      "director          10822 non-null object\n",
      "runtime           10866 non-null int64\n",
      "genres            10843 non-null object\n",
      "vote_count        10866 non-null int64\n",
      "vote_average      10866 non-null float64\n",
      "release_year      10866 non-null int64\n",
      "budget_adj        10866 non-null float64\n",
      "revenue_adj       10866 non-null float64\n",
      "dtypes: float64(4), int64(3), object(4)\n",
      "memory usage: 933.9+ KB\n"
     ]
    }
   ],
   "source": [
    "# check data type and missing values\n",
    "tmdb.info()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>popularity</th>\n",
       "      <th>original_title</th>\n",
       "      <th>cast</th>\n",
       "      <th>director</th>\n",
       "      <th>runtime</th>\n",
       "      <th>genres</th>\n",
       "      <th>vote_count</th>\n",
       "      <th>vote_average</th>\n",
       "      <th>release_year</th>\n",
       "      <th>budget_adj</th>\n",
       "      <th>revenue_adj</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>count</th>\n",
       "      <td>10866.000000</td>\n",
       "      <td>10866</td>\n",
       "      <td>10790</td>\n",
       "      <td>10822</td>\n",
       "      <td>10866.000000</td>\n",
       "      <td>10843</td>\n",
       "      <td>10866.000000</td>\n",
       "      <td>10866.000000</td>\n",
       "      <td>10866.000000</td>\n",
       "      <td>1.086600e+04</td>\n",
       "      <td>1.086600e+04</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>unique</th>\n",
       "      <td>NaN</td>\n",
       "      <td>10571</td>\n",
       "      <td>10719</td>\n",
       "      <td>5067</td>\n",
       "      <td>NaN</td>\n",
       "      <td>2039</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>top</th>\n",
       "      <td>NaN</td>\n",
       "      <td>Hamlet</td>\n",
       "      <td>Louis C.K.</td>\n",
       "      <td>Woody Allen</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Drama</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>freq</th>\n",
       "      <td>NaN</td>\n",
       "      <td>4</td>\n",
       "      <td>6</td>\n",
       "      <td>45</td>\n",
       "      <td>NaN</td>\n",
       "      <td>712</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>mean</th>\n",
       "      <td>0.646441</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>102.070863</td>\n",
       "      <td>NaN</td>\n",
       "      <td>217.389748</td>\n",
       "      <td>5.974922</td>\n",
       "      <td>2001.322658</td>\n",
       "      <td>1.755104e+07</td>\n",
       "      <td>5.136436e+07</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>std</th>\n",
       "      <td>1.000185</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>31.381405</td>\n",
       "      <td>NaN</td>\n",
       "      <td>575.619058</td>\n",
       "      <td>0.935142</td>\n",
       "      <td>12.812941</td>\n",
       "      <td>3.430616e+07</td>\n",
       "      <td>1.446325e+08</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>min</th>\n",
       "      <td>0.000065</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>NaN</td>\n",
       "      <td>10.000000</td>\n",
       "      <td>1.500000</td>\n",
       "      <td>1960.000000</td>\n",
       "      <td>0.000000e+00</td>\n",
       "      <td>0.000000e+00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>25%</th>\n",
       "      <td>0.207583</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>90.000000</td>\n",
       "      <td>NaN</td>\n",
       "      <td>17.000000</td>\n",
       "      <td>5.400000</td>\n",
       "      <td>1995.000000</td>\n",
       "      <td>0.000000e+00</td>\n",
       "      <td>0.000000e+00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>50%</th>\n",
       "      <td>0.383856</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>99.000000</td>\n",
       "      <td>NaN</td>\n",
       "      <td>38.000000</td>\n",
       "      <td>6.000000</td>\n",
       "      <td>2006.000000</td>\n",
       "      <td>0.000000e+00</td>\n",
       "      <td>0.000000e+00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>75%</th>\n",
       "      <td>0.713817</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>111.000000</td>\n",
       "      <td>NaN</td>\n",
       "      <td>145.750000</td>\n",
       "      <td>6.600000</td>\n",
       "      <td>2011.000000</td>\n",
       "      <td>2.085325e+07</td>\n",
       "      <td>3.369710e+07</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>max</th>\n",
       "      <td>32.985763</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>900.000000</td>\n",
       "      <td>NaN</td>\n",
       "      <td>9767.000000</td>\n",
       "      <td>9.200000</td>\n",
       "      <td>2015.000000</td>\n",
       "      <td>4.250000e+08</td>\n",
       "      <td>2.827124e+09</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "          popularity original_title        cast     director       runtime  \\\n",
       "count   10866.000000          10866       10790        10822  10866.000000   \n",
       "unique           NaN          10571       10719         5067           NaN   \n",
       "top              NaN         Hamlet  Louis C.K.  Woody Allen           NaN   \n",
       "freq             NaN              4           6           45           NaN   \n",
       "mean        0.646441            NaN         NaN          NaN    102.070863   \n",
       "std         1.000185            NaN         NaN          NaN     31.381405   \n",
       "min         0.000065            NaN         NaN          NaN      0.000000   \n",
       "25%         0.207583            NaN         NaN          NaN     90.000000   \n",
       "50%         0.383856            NaN         NaN          NaN     99.000000   \n",
       "75%         0.713817            NaN         NaN          NaN    111.000000   \n",
       "max        32.985763            NaN         NaN          NaN    900.000000   \n",
       "\n",
       "       genres    vote_count  vote_average  release_year    budget_adj  \\\n",
       "count   10843  10866.000000  10866.000000  10866.000000  1.086600e+04   \n",
       "unique   2039           NaN           NaN           NaN           NaN   \n",
       "top     Drama           NaN           NaN           NaN           NaN   \n",
       "freq      712           NaN           NaN           NaN           NaN   \n",
       "mean      NaN    217.389748      5.974922   2001.322658  1.755104e+07   \n",
       "std       NaN    575.619058      0.935142     12.812941  3.430616e+07   \n",
       "min       NaN     10.000000      1.500000   1960.000000  0.000000e+00   \n",
       "25%       NaN     17.000000      5.400000   1995.000000  0.000000e+00   \n",
       "50%       NaN     38.000000      6.000000   2006.000000  0.000000e+00   \n",
       "75%       NaN    145.750000      6.600000   2011.000000  2.085325e+07   \n",
       "max       NaN   9767.000000      9.200000   2015.000000  4.250000e+08   \n",
       "\n",
       "         revenue_adj  \n",
       "count   1.086600e+04  \n",
       "unique           NaN  \n",
       "top              NaN  \n",
       "freq             NaN  \n",
       "mean    5.136436e+07  \n",
       "std     1.446325e+08  \n",
       "min     0.000000e+00  \n",
       "25%     0.000000e+00  \n",
       "50%     0.000000e+00  \n",
       "75%     3.369710e+07  \n",
       "max     2.827124e+09  "
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# check statistical information \n",
    "tmdb.describe(include='all')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": true
   },
   "source": [
    "> **Insights**:\n",
    ">- Some columns contain NaN values, but the amount is not significant; we don't need to drop all the nulls at the beginning.\n",
    ">- Data type are all correct.\n",
    ">- The minimum `runtime` is 0, which is impossible, and some movies have extremely long runtime, we will investigate the outlier data\n",
    ">- `budget_adj` and `revenue_adj` have minimum and median value as 0 too, which is odd, and the difference from 75% to maximum is huge, we need to investigate in the later anaysis process\n",
    ">-  `popularity ` , `vote_count` has very uneven distribution, with some extreme high value data.  "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Data Cleaning "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This dataseat is generally clean, column names are also clear and with preferred snakecase. For some string columns that contains '|', we will clean and analyze in the later part specific to the question we want to answer."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**1. Remove duplicated data**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "1"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# check how many rows are duplicated\n",
    "sum(tmdb.duplicated())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Drop duplicated rows\n",
    "tmdb.drop_duplicates(inplace=True)\n",
    "\n",
    "# douch check the results\n",
    "sum(tmdb.duplicated())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**2. Cleaning abnormal data for runtime**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "31"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Find out how many rows are 0 for runtime\n",
    "sum(tmdb[\"runtime\"]==0)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Since it is impossible to have runtime as 0, we will remove these.\n",
    "tmdb=tmdb[tmdb[\"runtime\"]>0]\n",
    "\n",
    "#double check the result\n",
    "sum(tmdb[\"runtime\"]==0)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**3. Cleaning abnormal data for budget**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "5668"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "sum(tmdb[\"budget_adj\"]==0)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# It is impossible to make a movie without any budget, we will remove these data\n",
    "tmdb=tmdb[tmdb[\"budget_adj\"]>0]\n",
    "\n",
    "# Double check the result\n",
    "sum(tmdb[\"budget_adj\"]==0)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**4. Cleaning abnormal data for revenue**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "1312"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "sum(tmdb[\"revenue_adj\"]==0)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Since we will analyze moview revenue, movies with zero revenue may contain incorrect information, we will remove these data\n",
    "tmdb=tmdb[tmdb[\"revenue_adj\"]>0]\n",
    "\n",
    "# Double check the result\n",
    "sum(tmdb[\"revenue_adj\"]==0)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "(3854, 11)\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>popularity</th>\n",
       "      <th>original_title</th>\n",
       "      <th>cast</th>\n",
       "      <th>director</th>\n",
       "      <th>runtime</th>\n",
       "      <th>genres</th>\n",
       "      <th>vote_count</th>\n",
       "      <th>vote_average</th>\n",
       "      <th>release_year</th>\n",
       "      <th>budget_adj</th>\n",
       "      <th>revenue_adj</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>32.985763</td>\n",
       "      <td>Jurassic World</td>\n",
       "      <td>Chris Pratt|Bryce Dallas Howard|Irrfan Khan|Vi...</td>\n",
       "      <td>Colin Trevorrow</td>\n",
       "      <td>124</td>\n",
       "      <td>Action|Adventure|Science Fiction|Thriller</td>\n",
       "      <td>5562</td>\n",
       "      <td>6.5</td>\n",
       "      <td>2015</td>\n",
       "      <td>1.379999e+08</td>\n",
       "      <td>1.392446e+09</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   popularity  original_title  \\\n",
       "0   32.985763  Jurassic World   \n",
       "\n",
       "                                                cast         director  \\\n",
       "0  Chris Pratt|Bryce Dallas Howard|Irrfan Khan|Vi...  Colin Trevorrow   \n",
       "\n",
       "   runtime                                     genres  vote_count  \\\n",
       "0      124  Action|Adventure|Science Fiction|Thriller        5562   \n",
       "\n",
       "   vote_average  release_year    budget_adj   revenue_adj  \n",
       "0           6.5          2015  1.379999e+08  1.392446e+09  "
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Double check the cleaning result\n",
    "print(tmdb.shape)\n",
    "tmdb.head(1)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a id='eda'></a>\n",
    "## Exploratory Data Analysis"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 1. Find pattern and visualize relationship"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**1_1. Explore relations with `revenue_adj`**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Text(0.5, 1.0, 'Correlation heatmap for whole movie data')"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    },
    {
     "data": {
      "image/png": "\n",
      "text/plain": [
       "<Figure size 576x360 with 2 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    }
   ],
   "source": [
    "# plot a heatmap to see correlation with `revenue_adj` for each columns\n",
    "plt.figure(figsize=(8,5))\n",
    "sns.heatmap(tmdb.corr(),annot=True,cmap='coolwarm')\n",
    "plt.xticks(rotation=45)\n",
    "plt.title('Correlation heatmap for whole movie data')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Analysis:**\n",
    ">-  `revenue_adj` is positive-correlated with `popularity`, `vote_count` and `budget_adj`, which makes sense, the more popular, the more vote_count and and more revenues. And high budget movies are expected with high revenue too.\n",
    ">- `popularity` and `vote_count` are strongly correlated eith each other.\n",
    ">- `runtime`, `vote_average` and `release_year` do not have strong relation with any other columns. In fact `release_year` is slighly negative-correlated with `revenue_adj`. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**1_2. Plotting charts to find out the distribution for the variables that do not have strong correlation with movie revenue, i.e. `runtime, vote_average, release_year`.**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "image/png": "\n",
      "text/plain": [
       "<Figure size 432x288 with 1 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    },
    {
     "data": {
      "image/png": "\n",
      "text/plain": [
       "<Figure size 432x288 with 1 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    },
    {
     "data": {
      "image/png": "\n",
      "text/plain": [
       "<Figure size 432x288 with 1 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    }
   ],
   "source": [
    "# plotting distribution for 'runtime'\n",
    "tmdb['runtime'].plot.hist(title='Runtime distribution for the whole movies data')\n",
    "plt.xlabel('Movie Runtime')\n",
    "plt.show()\n",
    "\n",
    "# plotting distribution for 'vote_average'\n",
    "tmdb['vote_average'].plot.hist(title='Vote_average distribution for the whole movies data')\n",
    "plt.xlabel('Movie Vote_average')\n",
    "plt.show()\n",
    "\n",
    "# plotting distribution for 'release_year'\n",
    "tmdb['release_year'].plot.hist(title=('Movie Release_year Distribution'))\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Conclusion:**\n",
    ">- Most movies have median length from about 100 minutes to 180 minutes.\n",
    ">- `vote_average` has a distribution that looks like normal with average around 6.\n",
    ">- There are more movies produced over time."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**1_3. Plotting scatter chart to explore detailed relationship between `popularity` and `vote_count`, and find out outliers.** "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<matplotlib.axes._subplots.AxesSubplot at 0x1a15a02400>"
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    },
    {
     "data": {
      "image/png": "\n",
      "text/plain": [
       "<Figure size 432x288 with 1 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    }
   ],
   "source": [
    "# plotting relation for 'popularity' and 'vote_count'\n",
    "tmdb.plot.scatter(x='popularity',y='vote_count')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Analysis:**\n",
    ">- From the scatter chart, we can see `popularity` and `vote_count` have strong positive correlation, same result as the heatmap; however, we can also notice there are three movies rated extremely high popularity, but vote count is not extremely high.\n",
    ">- From the heatmap, both `popularity` and `vote_count` are positively correlated with moview revenue. If we run regression model to predict movie revenues, we need to choose of one of them as an independent variable, but this is beyond the scope of this project."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**1_4. Plotting scatter charts to furthur explore the relation with `revenue_adj` for the varibles of `popularity`, `vote_count` and `budget_adj`.**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<matplotlib.axes._subplots.AxesSubplot at 0x1a15f83240>"
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    },
    {
     "data": {
      "image/png": "\n",
      "text/plain": [
       "<Figure size 432x288 with 1 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    },
    {
     "data": {
      "image/png": "\n",
      "text/plain": [
       "<Figure size 432x288 with 1 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    },
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYYAAAESCAYAAAD5d3KwAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDMuMC4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvOIA7rQAAIABJREFUeJztvXmcXGWV8P8991Z1dzaSkIQlGwHCMklMovQQMIgsjoMQwowsyqKv81PRV1BmBML4KiLmNyqb76BBEREVRRSDSgjoiBKEIAYaTGISEJo1nSCENoSs1V1V5/2j6lbqVt3auuvW0nW+n09/uurWrXtP3Vt1zvOc5yyiqhiGYRiGh1NvAQzDMIzGwgyDYRiG4cMMg2EYhuHDDINhGIbhwwyDYRiG4cMMg2EYhuGjaQ2DiNwmIq+LyLoy9j1IRH4vImtF5CERmVwLGQ3DMJqRpjUMwA+AU8rc93rgdlWdDXwZ+GpYQhmGYTQ7TWsYVPVh4O/Z20TkUBH5jYg8KSKPiMiR6ZdmAL9PP14BnFFDUQ3DMJqKpjUMBbgF+LSqHgVcBnwrvX0NcGb68b8Co0RkXB3kMwzDaHgi9RagWojISOCdwM9FxNvcnv5/GbBERD4CPAxsAuK1ltEwDKMZGDKGgdTs501VnZv7gqpuBt4PGQNypqpuq7F8hmEYTcGQcSWp6lvAiyJyNoCkmJN+PF5EvM/6OeC2OolpGIbR8DStYRCRO4HHgCNEpEdEPgqcD3xURNYA69m7yHwC8FcReRbYH/ivOohsGIbRFIiV3TYMwzCyadoZg2EYhhEOTbn4PH78eJ02bVq9xTAMw2gqnnzyyTdUdUKp/ZrSMEybNo2urq56i2EYhtFUiMjL5exnriTDMAzDhxkGwzAMw4cZBsMwDMOHGQbDMAzDhxkGwzAMw4cZBsNoYHp3xFiz8U16d8TqLYrRQjRluKphtAL3rN7EFXevJeo49CeTXHvmbBbOnVRvsYwWwGYMhtGA9O6IccXda9nTn2R7LM6e/iSL7l5rMwejJphhCBlzBRgDoWfrbqKO/+cZdRx6tu6uk0RGK2GupBAxV4AxUCaPHUZ/Munb1p9MMnnssDpJZLQSNmMICXMFGINh3Mh2rj1zNh1Rh1HtETqiDteeOZtxI9tLv9kwBonNGELCcwXsYe+oz3MF2I/bKIeFcycxf/p4erbuZvLYYfa9MWqGGYaQMFeAUQ3GjWw3g2DUHHMlhYS5AgzDaFZsxhAi5gowDKMZMcMQMuYKMAyj2TBXkmEYhuHDDINhGIbhwwyDYRiG4cMMg2EYhuHDDINhGIbhwwyDYRiG4cMMg2EYhuHDDINhGIbhwwyDYRiG4cMMg2EYhuHDDINhGIbhwwyDYRiG4SNUwyAiU0RkhYg8LSLrReSSgH1OEJFtIrI6/ffFMGUyDMN6kRvFCbu6ahy4VFWfEpFRwJMi8oCqbsjZ7xFVXRCyLIZhYL3IjdKEOmNQ1VdV9an04+3A04B9Aw2jTlgvcqMcarbGICLTgLcDqwJePlZE1ojIr0VkZoH3XygiXSLStWXLlhAlNYyhi9eLPBuvF7lheNTEMIjISOBu4N9V9a2cl58CDlLVOcA3gV8FHUNVb1HVTlXtnDBhQrgCG8YQxXqRG+UQumEQkSgpo3CHqv4i93VVfUtVd6Qf3w9ERWR82HIZRitivciNcgh18VlEBPge8LSqfr3APgcAr6mqisjRpIxVb5hyGUYrY73IjVKEHZU0H/gQ8BcRWZ3e9n+AqQCqejNwFvC/RSQO7AY+qKoaslyG0dJYL3KjGKEaBlVdCUiJfZYAS8KUwzAMwygfy3w2DMMwfJhhMAzDMHyYYTAMwzB8mGEwDMMwfJhhMAzDMHyYYTAMwzB8mGEwDMMwfJhhMAzDMHyYYTAMwzB8mGEwDMMwfJhhMAzDMHyYYTAMwzB8mGEwDMMwfJhhMAzDMHyYYQiR3h0x1mx80xqtG4bRVITdqKdluWf1Jq64ey1Rx6E/meTaM2ezcO6keotlGIZREpsxhEDvjhhX3L2WPf1Jtsfi7OlPsujutTZzMAyjKTDDEAI9W3cTdfyXNuo49GzdXSeJDMMwyscMQwhMHjuM/mTSt60/mWTy2GF1ksgwDKN8zDCEwLiR7Vx75mw6og6j2iN0RB2uPXO2NV83DKMpsMXnkFg4dxLzp4+nZ+tuJo8dZkbBMIymwQxDiIwb2W4GwTCMpsNcSYZhGIYPMwyGYRiGDzMMhmEYhg8zDIZhGIaPUA2DiEwRkRUi8rSIrBeRSwL2ERH5hoh0i8haEXlHmDIZhmEYxQk7KikOXKqqT4nIKOBJEXlAVTdk7fM+4LD03zzg2+n/hmEYRh0Idcagqq+q6lPpx9uBp4HcSnJnALdrij8BY0TkwDDlMgzDMApTszUGEZkGvB1YlfPSJGBj1vMe8o0HInKhiHSJSNeWLVvCEtMwDKPlqYlhEJGRwN3Av6vqW7kvB7xF8zao3qKqnaraOWHChDDENAzDMKiBYRCRKCmjcIeq/iJglx5gStbzycDmsOUyDMMwggk7KkmA7wFPq+rXC+y2DPhwOjrpGGCbqr4aplyGYRhGYcKOSpoPfAj4i4isTm/7P8BUAFW9GbgfOBXoBnYB/xayTFWld0fMCuU1KXbvDCOYUA2Dqq4keA0hex8FLgpTjrCw9p3Ni907wyiMZT4PEGvf2bzYvTOM4phhGCDWvrN5sXtnGMUxwzBArH1n82L3zjCKY4ZhgFj7zubF7p1hFEdSa7/NRWdnp3Z1ddVbDMAiW5oZu3dGqyEiT6pqZ6n9rLXnILH2nc2L3TvDCMZcSRXSuyPGmo1vWgSLYRhDFpsxVIDFvhuG0QqUNAwiskhVrxWRb5Jf3E6BvwM/VtXnwxCwUciOfd9DKqJl0d1rmT99vLkjDMMYUpQzY3g6/b/Qau844BfAnKpI1KB4se+eUYC9se9mGAzDGEqUNAyqem/6/w8L7SMiO6spVCNise+GYbQK5biS7iWgP4KHqi5U1e9UVaoGxIt9X5SzxmCzBcMwhhrluJKuT/9/P3AA8OP083OBl0KQqWFZOHcS86ePt9h3wzCGNOW4kv4AICKLVfX4rJfuFZGHQ5OsQbHYd8MwhjqV5DFMEJFDvCcicjBgPTYNwzCGGJXkMfwH8JCIvJB+Pg34RNUlMgzDMOpK2YZBVX8jIocBR6Y3PaOqlv5rGIYxxKg08/kw4AigA5gjIqjq7dUXyzAMw6gXZRsGEbkKOAGYQapP8/uAlYAZBsMwjCFEJYvPZwEnA39T1X8jlels4TmGYRhDjEoMw25VTQJxEdkHeB04pMR7WgarumoYxlChkjWGLhEZA3wXeBLYATweilRNRO+OGHeseoWbVjxHm+ta1VXDMJqeSqKSPpV+eLOI/AbYR1XXeq+LyExVXV9tARuZe1ZvYtHStcTiqRpKsXgcsKqrhmE0NwNq1KOqL2UbhTQ/qoI8TYNXhtszCtl4VVcNwzCakWp2cJMqHqvh8cpwB2FVVw3DaGaqaRgKVmAdigSV4QZoj4hVXTUMo6kJteeziNwmIq+LyLoCr58gIttEZHX674thylNNvDLcHVGHUe0R2iMOl/7T4fzxP0+2hWfDMJqaavZ87gvY9gNgCcWT4B5R1QVVlKNmWBluwzCGIpVkPgtwPnCIqn5ZRKYCB6jq4wCqekzue1T1YRGZViVZGxIrw20YxlCjElfSt4BjSTXoAdgO3FQFGY4VkTUi8msRmVloJxG5UES6RKRry5YtVTitYRiGEUQlhmGeql4E7AFQ1a1A2yDP/xRwkKrOAb4J/KrQjqp6i6p2qmrnhAnWBsJoHCzr3RhqVLLG0C8iLunoIxGZAOSH5VSAqr6V9fh+EfmWiIxX1TcGc9xGp3dHzNYlhgj3rN7EFTl9wC34wGh2KjEM3wB+CewnIv9FqqjeFwZzchE5AHhNVVVEjiY1g+kdzDEbHVMkQwcvyXFPf5I96TGSZb0bQ4FKSmLcISJPkqqwKsC/qOrTxd4jIneSKtU9XkR6gKuAaPp4N5MyLv9bROLAbuCDqjpk8yFMkQwtvCTHPVkTZy/r3e6n0cxUEpU0FdgF3Ju9TVVfKfQeVT230Gvp15eQCmdtCRpNkdTTpVXLc4d1rqAkR8t6N4YClbiS7iO1viCkOrgdDPwVKBhJZPhpJEVST5dWLc8d5rm8JMdFOce32YLR7MhAPTci8g7gE6r6ieqKVJrOzk7t6uqq9WmrwrLVm/IUSa3XGHp3xJh/zYPs6d9rpDqiDo9ecVJNRu+1OnetzmXBBEazICJPqmpnqf0GnPmsqk+JyD8O9P3NzkCVQSNkS9fTpVXLc9fqXJbkaAw1Kllj+GzWUwd4B9CSmWaDdU/UW5EM1KVVjZFxLd1pjeS6M4xmopIEt1FZf+2k1hzOCEOoRiY7smh7LM6e/iSL7l7bVMlNuQUAO6JOSd/4Pas3Mf+aB7ng1lXMv+ZBlq3eVLNzD5RansswhhKVhKteHaYgzUKjRRYNlEpcWtUOs62lO60RXHeG0WxU4ko6HLgMmJb9PlU9qfpiNS5DyT1RrkurHGNYqZuplu60ervuaokthBvVoJLF558DNwO3AolwxGl8CoUoAqzZ+OaQ/EGWMoaWzd0Y2H0wqkXZ4arpMKejQpanLBohXDV7ZLay+42a/yBrPTIsFGZbz9BXYy92H4xyCCNc9V4R+RSpekmZlVZV/fsA5Gt6PPdEPcpc1GNkWMhXP1TWXJoduw9GNanEMPyv9P/Ls7YpcEj1xGk+av2DrGe9pSBf/VBac2lm7D4Y1aTscFVVPTjgr6WNAtT+B+kZomw8Q1QPwgwJtT4H5WOhuUY1qSQqaTjwWWCqql4oIocBR6jq8tCkawJqXS+nEUeGYYSE5rrLrjxtBrMmjR6Si/vVwkJzjWpRyeLzz4AngQ+r6iwRGQY8pqpzwxQwiEZYfM6llovBjVBvqVoEXbeghVSAke0u8aQ29ec1jHoSxuLzoar6ARE5F0BVd4uIDFjCIUYtY+WHysiw0CJ60LoNwI5YKkraelgYRrhUYhj60rMEr7XnoWRFJxnhkzu6bmbFWGwRPchdlk0zRNtYopnRzFRiGL4E/AaYIiJ3APOBj4QgkxHAUEteKhbNNWfKmMy6jSvCzj5/PmW111SqrcSH2r0yWo9KaiX9Nt3a8xhSzXouUdU3QpOsRSnkcw8rRLWQUgx7xFtqET3bXbZu8zYWL98QyuJ+tZW4tW81hgKVRCUtA+4ElqnqzvBEal0q8blXw51S6Hy1GPGWE83lucvmTBnDKTMPqLqhCkOJW6KZMRSoxJV0A/AB4Gsi8jjwM2C5qu4JRbIWo1Kf+2DdKYXON+PAfWo24q1kET2MNZUwlHgjhhMbRqVUkuD2B1X9FKlM51uAc4DXwxKs1SiWuFYoeQkYcAJYofOt3vhmTRPovBlBPUbTQUo8lkgyos0d8DEt0cwYClTU2jMdlXQ6qZnDO4AfhiFUK1KJz90r3Df/mgcH7O4pdL65U8aUHPEOlYibbHcWwJ7+JKLKgiUrB+U+GyrhxEbrUvaMIZ3g9jRwEnATqbyGT4clWKtRzkhz3Mh2Jo8dxvrNb7Fo6eC6yBU63/T9RxWVo1Ant0YsX1GOTAvnTmL5xceRTKYSPWMJLXg9K/mM9ZwJGcZgqWTG8H3gPFVt2V4MYVNqpOktCjsIsbh/VD8Q33ih8xXaXmhdYvueOIvv29BQ4ZmVLKDv7EuQm6mpSfVdTwtBNVqJSno+Pwx8TkRuARCRw0RkQThitS6FRprZSnlXf75tHugCZ6HzBW1fv/ktnBwV6opw9fINDdUDu9K+3CPaXGIJf2mYWEIzaw217vPdiLMvo7WoxDB8H+gD3pl+3gP8/1WXyAgkaLEYYHibW5MFzntWb+Ljt3flGaX+RJI2128s6lntFSqvQLuzL0F7xL9/R9TJJNYVel8Yn7GQq24oYYav8Qm1VpKI3AYsAF5X1VkBrwtwI3AqsAv4iKo+VYFMLUPQYnF7RLj5gncwc+LoUJPTvBFzrvuqPSJ8ccFMFt+3wbe9XuGZvTtirN+8jbd2x+lLlA4Z9a7Tqhd68z4bkNl/RJubV9BvT//gopcKyT/Uk+PMJdcchF0r6QfAEuD2Aq+/Dzgs/TcP+Hb6v5FDoYSw4w/fz7dfGD+8oHj/4W0uN1/wDo4/fD9GdURqVna8kNG7Z/UmLvv5GvrTLiEBoq7QEXEDZfKuU8SRTHG+bK5cMCOz/86+BO2u+NxN7W5+qY7BMtST4xrd8A2VaLtqUJZhSI/sb6bCWkmq+rCITCuyyxnA7Zqq/f0nERkjIgeq6qvlyNVqFFuc9kbLi5auIRbXqv7wgmYrSVVmThxdUq5qUsjo9e6IsWjp2oxRgNToRVBuOv/tgTMqT0EFMaLNZVb6s0Hq84sjkHV8caTqs6KhnhzXyIbPZjJ+ylpjSCvuS4D3kzIGdwKdqvrQIM8/CdiY9bwnvS0PEblQRLpEpGvLli2DPG3zErQo7PmlP/njp4jF/Yuo1fD3lxtKG2Z4ZrEF4J6tu3GdfK9mxHEZPawtT6ZC6zUeCVWfMq5V0tq4ke2cc9Rk37ZzOicHnqcZ/fSNavhqHVzQDFTiSvoTcIiq3lfF8wetUQR2DlLVW0hlXNPZ2Vled6EWoNTot1o/vEpnBdWelhcbbU4eO4xEMv8rkdDgz16orPeINpeEaqDSr8WsqHdHjLue7PFtu6urh0tOPjzQDZbpbrdgBrMmNn53u1p3OyyXRp7J1ItKDMOJwCdE5GVgJymlrqo6exDn7wGmZD2fDGwexPEalrD8l4Wa2gyPuiQJVnIDpdx6RWFMy4uNNseNbOe6s2ZzadYaQ8SB686aEyhvkIIqR7mG3QOjHAUV5Kf//C/X+YxaOde6Xv70RswKb9SZTD2pxDC8L4TzLwMuFpGfklp03jYU1xfCUJTeD3tEmxsQreRw84eOYubEfWr+w6vmAmOu8io22vQUzvrN2wAp+dmbVUEVGgh4C+HlXOt6+9MbrclUo85k6kkl/RhervTgInIncAIwXkR6gKuAaPp4NwP3kwpV7SYVrvpvlZ6j0QkjEiP3h31O52Tu6urJiVaaUFSmsBRitablhZRXMWU+bmR7XpRWMRpFQWUb+YtOmM6SFc/R5gZHUw22u12jRwbVi0YcKNSTioroVYqqnlvidQUuClOGelOue6AS333uD/uurh6WX3wcO/sSJY8R9mixGtPyUsprKP1o7/jTy1y9fAMCxOJJ2l0BES48/hDOmzc10Phluts5ws5YZd3tzJ9emKH23RoMoRqGoUoliryUoqxUURf6Ye/sSzBnypiScoc9WqzGtDws5dVocep3/OllPv+rdb5tqVwJ5aaHujlv3tTA9/m6223allenqthnM3+6UQ5mGHIopTwqVeTFFOVAFPVgfti1Gi0OdloehvKqt189l94dMa6+d33B10vdF290O2fKGE6ZVX53O/OnG+VghiGLUsqjEkWebWAKKcqBKOrB/LBrOVoczLQ89zP2JZJcdML0st5b657ZA6Vn626irkNfIjh7upL7Uum1Nn+6UQozDGnKUR7lKvJCBib3BzhQRV3JDztXUV552gyuvnc9UdcpGLPfCHif8Y5Vr3DTiue45eEXuOmh7sy17H5tO6s3vsncKWOYvv8oIPye2dV0RU0eO4yE5udetLmC40jo98X86UYxzDCkKUd5lKPIKxmdDmb0X84Pu1D0UlvEoS+hXHX6jJq7UypVrt96qJtYXInF40DqWq7sfoO7uvYmgn342KlccvLhofbMrqYryrsGVy6YweLlG3BF6E8kuey9RzDvkHE2ijfqjhmGNOUoj0KKHFK9l73uark9C4qNTsOa1gcZqNsfe8W3z+LlGzhl5gFVPWc112eCjLUDPqMAqc919LR9Cxr2OVPGlDTAxWSvpisqL2v5tBnMmtT4WctGa2GGIU25o/dsRT6izeX+dX/j8qW/p8112d0fR0R8xdwgZWBGtLkZ4xE0c6hFPkEuA114DlKid/zp5TwX1UDWZ7KPHWSsc8tpe7yxo6+intnluP48Ag2UCOs3b6sobyLoGiy+bwOPXnGSGQWjoTDDkMXCuZOYceA+eb7rbDzFtW7TNr68fEOmjr/n6sgt9RR1hYVzJrJgycqaRsSUSoSCvYqzEvdOkBLdvieeCbv0FlMHsj4TdOxcY/3Z9xzOV379TJ5cW3f2pZPDumlzgw17kAEux2AFXctdfQk+fnsX1501p+x7aTkERrNghiGLUiNH73VXyq/Fn+36qGVETNAMKChDemX3G2W7d4KU6OVL1xCwhoorUtH6TCEF/egVJ/HoFSf5DFf3lh157qQbH+xON87RgslhQZSjrL1reflSf7OiWFwrupeWQ2A0C2YY0pQaOZaqYlqI3F7CULtRYpD75JKTD888B5h/zYNl+86DlKgrDipAIr/lZznrM8VmFK4IK555nROP3C+TvNe7I8bR0/bl3tWb2J1TYtwz1oWSw4JmRpPHDssLGQ1S1gvnTmLM8DY++aMnfe1NK7mXlkNgNAtmGNKUGjmW47N3BRxHiLoOu4rMKModJaaa77wFaF6zmXLJdZ9kP1+z8c2inzlXkQaNeBOaJKh6+lWnzyy6PpPrtgo69s6+BF+6dz1fuGcd1545G4VM17Vco5CN60iesi40G1zZ/QbZFbsjDgWV9cyJ+5Akf/2okhG/5RAYzYAZhjSlpvkFa/i3u8QTypnvmMT/N/9gxo5oY/3mbXz89q68pjkdUQfNyR0o1qry0rtW43kuoq5ww9nl+7OzKXSOYp+5kCItFJXl1e7pT4fBnj/voEBZPMPkNZrxZPLVAMpy1XltNy9fugaQwN7MufQn/I12Cs0GZxy4D1fc7e/85joO86ePLyh7toz9iVRUUaXK3XIIjEbHDEOaUtP8oNevPG0GvTv7uGnFc9y75lV+8edNmbr+l/7TEVz/278SdR1i8QQigiOS3R2yoPLtfm07l/18Ddk6sD+hXL40pczKKZZX6hzFPjNQ0K1WaMRbySi4kEzesVc88zpfune9rxezan5E0vA2l75EkniOu+6q0/3KutBscHXAjKnNLe4aWjh3Etv3xLl6+QbaIg6L79vAqI5IS7eBNIYeZhiyKKT0vBH3/OnjfQuhkPLRZydgff6X62hzoC+Z6ovQn1BUIZ5U+rMidrzRaq7y3b4nztX3rs8LeYWUcjz1G4/QntXgvphCKifiJugzB7mYst0z2esCsHcEXKwsSLHyFJcvXcuY4W2ZHgonHrkfX7jHX1yuL+B6xBNJrj59Jl/Kul6uwKh2/9e60Mxo7pQx7In7XX574omirqHeHTEW37eBvniSvnQg2mCCCRqtsJ9hgBmGPHIVXLERd5AChZRRAAq6PbzRaiSnT7ErwtXLNwQqQdg7Yu5L7M0CLqaQyg2PzFX0gf7+WIJ1m7YxZ8qYshLVKilPEYsnufD2LhACXVaxRBJRzVvITyogkH0ZE5p/XQrNjMaOaENzQqpynw/0mpZDoxX2MwwPMwxpBlJ8rZxcgSBiiSQb/77L5yqBVCRPW8TJjESziTiCK/4oJwdh/ea3CjblKTc8MkhBXblgBp//pX/Uvvi+Dcw7eN+Ss5Bi163QNdsT9/v+Dxo3ItNjYkSby4IlKyHHMMSTytX3pkpOxygeKVRoZjQsGmF7bO8FHxaNFFXy1Qo5bcTCfobh4dRbgEbgntWbmH/Ng1xw6yrmX/Mgy1ZvAvaODrPxlA7sHYlG3fyonFza3NTiM4CocuOD3Xn7XPbPRxDPaWofdWDJuW/nN5e8C8mZYezqTyVZefLm4snXEXUY1R6hI+r4Sn6v2fgm3a9tzyio7bE4e/qTLLp7LVPGDk/nBfg/u+eXL3RNIPi6uU4q9BRSs4L2SPBXLx5Pcuo3VnLBratYsGQlL/fuZPr+o+g8aGzg/hFHiMXLa1YzbmSqTLWneEe0ucQSlSn5Yte0Ekp9twyjnrT8jKHS0W2u4pg/fTxOabvAeUcfxE8eT9UqCsptGNHmMu/gcYEujwVzJgJw5Wkz+NK968gKoycWT2ZG2UGL0kEj5ewZguemycZBeGt3X171z75Ego6omxf3H4snfEakkCvqqmV7Q09vOHs2l961Nk8xxxVIJDNus8uXrmVnLM7K7t7A67qrL0F7RIgnlXZXkDIrk3rXwPvsntEu9t5Ca00DGeFbspvRyEgpn2oj0tnZqV1dXVU51pqNb3LBrat87oRR7RF+/LF5zJkyhmWrN/kjkdJRR94PeMUzr3PVsvUlM6EdYHi7m+c+8mhzhfs/8y6m7z/K59baurOP1RvfpHdnH//3d88iwO6cJLt2V1AR2t3ivmovLyIVSlvcBdYecfjAP+7NlPbqQHVEXPbEE6gqrgixhNIeEVRTuQvnH5MKU/WumyOwq89/rqgrCEqZyeNEHCgjSpW2iMP9nz4usJRJNr07YpnEvsx7s65/EOX06qjUUHjXyAt9zb5+hhEGIvKkqnaW2q/lZwylRm7eiPux59/gwWde5+pl62nPUo4dUbes8hhJghejvQgmxxEWLFnp693wxV/9hdv/9Er+wXLw2kH25fjpN2/bg5cc55W+cAJyATqiDolkMm8m4vWS3rxtT8aY9KcXvtsjDsn0oMLL1/j8r9axMxbnwncfmgnr/FJAl7KgiKtilGMUANpdp6x7EbSA3B4pfB9LrQcMdBHZQl+NRqXlDUM5ZQpu/N2zPgXtRQUBvhnA8KjDriIlM959+ARWdr/hKwh3/QPPQjKZGb16Cmfrzr6SRqHYSPqf//vhzFpt1BVUtaiC/b/nzGXR3X/xZWx7vaRHD4vS5jo+g+I6giShPycT+Cu/foYR7RFOmXUAi+/bUJYRiDpQYaWRQMp1xVTqxilUrsNbD8g1Gpel803KmblUM/Q199gWBmsMlJY3DFC8TEH3a9vLGrWPaHf56PyDue3RFwu6ix55bgv3ffpdmbWAnq27aXedzEgf9i5APvfa9pLnLKToc+s5lVLOn33P4Rx76PjMDCDzvixlmatI44lk3v4eV9+7nin7DgsM5XUAJ8egDdQodEQdkkn15XWA+F+gAAAdVUlEQVSEUbOoULmOdZu3AeR9zr54klO/8QjXl8hUD6vaqoXBGoOlZaOSvKic3h0xID9ixWNl9xtlHS+RVBbOmZgXVZRNm5tyV3jnCe43kGDb7j6mjRseeIxC0TweXlhrJVz/27/yaPcbvmib9oiT6bOcHYmTiawqcryo6wAS2DvBdYRF/3wkHVEnL+ppINz/mXfx44/N49ErTmLh3Em++5p7j7NZOHcSj15xku+9hRg3MtUSNZer7lnHiDY3MPy2L5GqvBp0bo8wFqCz3V7ZUWbF5DCMXFrSMBQKTw3a7yv3Px34Wu6FO6dzMtP3H1U0FLMv4c+qzQ19jDippK2L7vgzF9z2OO+aPs5/zhIKP6WONTfcH0i5kwrp4b50uY0xw9tYfvFxfPz4Q1BVvv2H5zn2q7/nm79/jvnTx7P84uNIpg1fLJFyTQUZoYQqw6MOJwTkV/Qnlet/+1euXDCDqxfOZGR7ZcahPeIPE52+/6hM5dVv/P453vm133PBras45qu/Z95Xflf0HhcaDED+wGHKvvmGOp6Ezdv2cO2Zs2kLuOelwk+rFfqajYXBGtWg5VxJ5ZTX9rqzXXH32sAs5PccOYFHut/wFcm7q6uHS04+PFOe+RM/6sqLHrr4xMMKVhzNjhbyFnifeHkrSz9xDOs2v8VXf/0MsXiyaDRREgjKt/MK8M2fPp6frHqFbzz4XJ57KRZP8skfPUlCk8STqRmQF5Z6wwPPsmRFNxefOD1vphBxhAuPO5hbV76IK6kctDmTR3PWd/5UUM6+hLJ4+QaWX3xcft6GKziSKuedXd4aUtFX3/1wJ6OHRX0uv3tWb2JRVq+EvU2TyESbBSXhFfLBB7lixgyPFvg0mmnwdOo3HvF9X8oZ/Ve72qqFwRrVoOUMQzG/rhe5E3GEPfEkrvjVYHvE4YazZzNl3xGsenGrTwFlj8r++Pwb+SGlEYeDxw/n4We3MHF0hy/nYNzI9sAF3qiTUo4j2yNEXSEWkBFdjGFRlytOOZzT50zKKJyp44YjBLu7chVxNrF4km8++GxeiGksoYwd3kYyqfSnD7vqxa0lZfMWtoN8/bmG0kMcydRT8vAMfanw2+ws8WI++EIDh+UXH0fU9bdtjbrCzImjAZi+/yiuP3vOgHot5JZhGczCsfV8MKpB6IZBRE4BbgRc4FZV/VrO6x8BrgO8uf4SVb01LHkKjai8GUL2wm08R4GKwLGHjs+8J5tYPMGqF3o554FnA5VUfyLJxXeuzjzPTqhaOHcS/fFEYEG3j9/eRcQpLwwzl6QmOWTCSCC1iL6y+w2+cv/TZecP5OKKQ5vrr3LaHnG49n+eCXRfFcMbxc6ZMiZwxHz84RO47qzSCq5n6+6y1lR29Sf42A+f4KPvOpjbVr5ILK6+LnRjhkeZOXF0wYHDzr4EN5w9h8uXpsqLJ5LKdWf55anG6L8aC8fW88EYLKEmuImICzwL/BPQAzwBnKuqG7L2+QjQqaoXl3vcwSa45SatXXvmbA4aN4Lzb/1TYETRiHaXRNLf4N47hiY1neTllNUrIJeOqMPCORN9rSpT7hQhkUyWHcMfhCMwoi3Crv4EiQKL4h1Rh0QiWXZkkOcuKvS8FMOjLglNcvGJh/nab2aPkoHAx0Gj6lUv9Ab2gK6U4W0uSVWuXDCDxcs3+AYIHVGHR684KbB5UTUJSrzLPrdhDJZGSXA7GuhW1RfSQv0UOAPYUPRdIRM0oup+bXumkFs2w6MOl7/3cA6ZMIqZE/fxHWPGgftw6jdXAjogowAp5Z3bv7g/oXz1X2fylfuf8WVkV0pSKev9X1o4i8X3bfD1hv7p468QT5DXry7XKDiOkKjAMpzTOZk7n9jILQ+/wE0Pdfs6s0UdJ5M4OCwaKVm5NeJIwdDgSvHyNxYv38CVp83wXY/cvhxhKemwwlcNo1LCNgyTgI1Zz3uAeQH7nSkix5OaXfyHqm7M3UFELgQuBJg6Nb+fb6Vkl5r+zbq/sfi+Dbgiee6j/qTylfufJuKkRrrXnbU3Nn1nXyIvD6FS9vQFv7frpa0DqtxaCa7ARSdM55RZB3DKrAMyhnJl9xv87IkeopHCpcMhZRQcESiwZpFLmyv85ImN9GUtomd3ZstWiJ5Bu3zpmoKVWwdLu+vk1WqKOg6zJo0edC2kgWALx0ajEHa4apD3N1eL3AtMU9XZwO+AHwYdSFVvUdVOVe2cMCG4zHQl3PKH55n3ld/xwe88yud/tY49/f6In2ERoT3iEE+kavrs6k8QiyufvWt1JkZ+2+7+wFj9SnAK3IF7127mygUzUvH+FYZ0lktC4eY/PM/8ax7k0e43MmGf3mJuqVlQf6LymVKbm9uDwsEtEocbiys/WZVKMOzdEePeNcGhxdnH/96Hj+JLp88omvPRHnG44Zw5eesTu/vjGWNQKJQ1LMIIXzWMgRD2jKEHmJL1fDKwOXsHVc0um/ld4JqQZWLR0jUZ900hvdavcMHRU/jBYy/7tseT8N1HXuAHf3yJqDO42QKk6v/PnjSaR1/wVw9tj7jMmjia5Rcfx7I1m/neyhfYWWB2kUuq9pGiqrRFXF+Zi1y8RW0vnHP95m3pWYCfNlcKNhAqhJAqbNfm7m2Fuvg+vxcxoUnQ4qvHS1Z0s++INl+ntlyiDrhuSpGePOMAAPYd0cZlS9cG3qP3HLkfxx46Ls8VJgGfPSyC1isafeHYSm20BmEbhieAw0TkYFJRRx8EzsveQUQOVNVX008XAsEZZVWi+7XteT79IOIJ5cergkthfG/li/QnNK/cQyGibqoqaV8imbeg3J9McvUZMznlxod923f1x1n1Qi9f/92zOFC0BlO+7El+c8nxjB3Rxq2PvMC3//BCaRkdhztWvcJNK57z5Wd4VGoUIDU1vOOjRxONuBlFMqojEthj+tKfrylSukP54j3rii5y9yfhUyccwkHjRtC7I8a4ke2ZdaD3feNhciNxf/fM63xg8zY6Im4mbwSgI+KG4tPPVailenE3otK1UhutQ6iGQVXjInIx8D+kwlVvU9X1IvJloEtVlwGfEZGFQBz4O/CRMGVavfHNsvcNKm8RcSQ1Ak6Uv+jpCHz1/W8DYP2r2/jeIy9k1iyuXDCDzdt2kxsclkgy4GibeBJ+ve5vnDdvKt//40tlvacvkSxoFMrBBYKuyLnfXcUN58zJuKmyR8Qj2lIlQvrjxa9luTLd+GA3t658kXhSufjE6Zw3byrT9x/FZ046nBseeNa3b1u6bEctfPq5CjU78qlZurdZx7nWIvQ8BlW9H7g/Z9sXsx5/Dvhc2HJ4zE0rqEoZFnVIKnzx9NSPuhi5SjKeUC756Z99I15N/7iuumcdbRG34jyAUixZ0c2cKaMDC9kF8Q8HjOL5LTt9SXuVUEi19yc1sAezl0wI+UX/BoPnGktlaz/HFxfMZM6UMXnhxP3JJDMn7sO1Z87m8qVrcMUhnkxk6kNViyCFmmpH6ndZNXr0kUVMtRYtl/k8ff9RfPjYqdz+WOmKqR5tDiw+YxYnHrlf6kegqd4DhchVkkFKP9ttEh9oxlkRIq4QNCIuxOqebQVrKQ2WXAVSzciiYsTiyud/tY6R7S79OUEC53ROZtzI9nQkRKoDXF8Cljz4XCaMtpCbpBI/e6BCdYX++OBnKrX091vEVGvRkkX0vnzG21j6iWMoUag0Q18yNdPwfnzllMSuNztjCZ559S3O6Zzs217sI5991FQ6og7Do9X9WuQqkEIL3OXiCnzk2IPK/vLuiCXI9Qre1dWT6Xcdi+9tJRpLKHv6k1y+NLgiabkFGD2CFGoiqVx1+sxBRR9VKsdgsYip1qLlZgwe0YhLR7Rwq81sOqKpkgi9O2IsWrqG3z+zpQYSDp6v/PqZvHDMYmP0f5s/jc++93BWPPM6n//VugEn7eXijc7BK3i3pqx1g0JZ1Rcefwjf/+NLZS79BxN1HFZvfJNIgVDZWDzJT1a9wqdPPiyzbSB+9kK1ixbOneTLHalEwdbL39/oEVNG9Wgpw5A99V63aVvZWbOqyp9e6OX63/614raU9aYScb//6Ev869snst8+HVWVwas8C16ORGGhXEcYHk31ODjhsAn8ZsNrefvMnFj+2kkh+pNJ/r6zr+h3YMmK53xlOwbqZy+kUAcafVRPf3+jRkwZ1aVlDEN2ZEhfIt+1UIx4Ar5ahXo8jc4dj7/CHY+/kurrIKkIrGKNh8olkVTWb97G6GFtBUfoHhEHbjr/7Tzz6vaCUVl3PfFKWWsnArhOKlckt9RGUE5FLm2uP3R1RJvLrj7/4ryXEFeKaipU8/cbYdMShiFo6h2E66TCRHNJhFhoEFKLkVC6BWetSAIoSJU+d39C+dgPn+Cq02eVzIdoc1Mr4Nf8prAhfri7l0tOns63H3q+6PEUcB2Hm85/BzMn7sPWnX2s3vgmc6eMYWdfouSsI1vZej0fck9Xy4Q4DyutbYRNSxiGoKl3EIOsbjFgzj16Ch8+Zhrvu/HhAfc/DgMlNequhnnoS8Di+zbwvpn7c8+aVwvut6svzlu74yVdYEtWPJ/pJleMNtdh9LBoJjw2k0tw2oy8UXfESRkSL1PbU7bdr23n8p+vCTRClSTEVTOKyPz9Rpi0hGGYPHZYphtZLsOjTkVZxWFwV1cPR+y/T9VzGapBNUUSVX69Pn/NIJuEwsru10seq1AZcUfwuQlze214g4PF923IJJrlNgnKzVC+fGlwJz/v+OW4cKqVNZxrXMwgGGHQEoZh3Mh2Lj7xsLzs12FRl/OPmcJPVm0cUCOcaqFJuGrZuorWPXKp1sg+THbHFVdKS/nTJwYeevmJdx/C9x99yaeAg9xGUcdh1sTgKqq5+RaF6mG1R6QsF06QKzO7OVC5yt1KUhi1oiUMA8B586ayZEW3LwRzd3+CH/7xJRIliriFTW7p54FQb6Nw+H4j6H59Z8k4obBnRcceMp6PHXeIT9n37oixuz940bjYqLuQCzLqCp85yd9oqBhBx4nFlU/++CmSqmUpeCtJYdSSlklwGzeynS8umJGX1NaXKOyWMMrn2TKMwkBocwXXSUUElTLfEYdMT+jcktm5i8SlFo0LlVVvizj8+jPv4tMnH1bWTGHNxjcZ0eYGRlHt6kuwpz/JoruDk+my8YxLNtl9xg2jmrTMjOGe1Zu4atm6QbXKNGpPUuFnHz+GXf1JPn57V9Gku6sXzgpU1j1bd1dURdVLwnPFIZ5IZqrjeu6b6fuPKil3rtvnnM7J3NXVg4OwK6fUazk5CBaiatSSlpgxeBnLZhSaj3hSOf97q1iz8c10RdRgRrS7zJo02ret2Ii9kFLt3RHj0rtWE4trqle2QjKp3HT+23n0ipPK8ulnu322x+Ls6U9yV1cPyy8+jps/dFReA6FyFLyVpDBqSUvMGHq25pe1NupH1E2FhZZbRC8WV5aseI7ghoAp+hPKiKwqgP6ExiTvOXI/Hnj6NSKOkyp3ftoM1m9+C1DfAvD6zW/lDSBS6yJSthIulJm8sy/B8YdP4LqzBpaDYCGqRq1oCcMwos0dUKMZo/q4Apf/85F8PSdCrBQRx+GT7z6Ubz74HCKp6qSuKwhCXyKJqLJgycpMyGnuQu196/6WPpKiwBd+tS6zYB91hRvO9np5F24WVC6l3D6DUfAWomrUgpZwJe3sS9CeW03OqAsJha8/8CyfPP6Qit63sy/Byu4t9KX7TCdJzRJyq6Iuunst6zdvy1uo9ehLJOlPqE/N9yc0U0115sTRmUx0j6grzJzod1MVoxy3Tz16ShtGubTEjGHy2GGIU6BUp1Fzoo4zoJyNVS9uLevYlfSh8HAdoWfrbuZMGcMNZ8/h8qVrcR0hkVSuO6tyX765fYxmpiUMw7iR7Vy5YAaf/2Xh5jpG7ehLJEqGng6U3f1xJo7u4KITpuflrRQjkdSquHqyMbeP0ay0hGEAmDJ2eL1FMNIkkqnWo2GgCKd985F0MT7l1Lftz++ffh005W7qiDokkko8y50UdSVvVmBK3WhlWsYw1D832PAot5T3OZ2TWbZmcyZ65x8PGssj3b2Z14PKgCSSSiJJpnf1g89s4b5Pv4udfQlGtLns7EtkZgZBUUmGYbSQYZg42hKBmokRbS4LZh/IgtkHApLJaO5+bXumdDbAqd9cWbCWEewNE52T3j+b4w+fUBVZw+i9XMt+zoaRS8sYhmf+1vh9mhudiEPNkgT7EqlM5zbX9RWMm77/qEzmce+OGJ8+cTpLVjxHxHXYGdCNrS8RbnZwGIXtrFieUW9aIlwV4OXeHfUWoek5+cj9Qjt21EnVIRrVHqE94qCqxOKayRzOrSd0z+pNzL/mQW55+AVAOGPORF+Cm8fFJ04PbcQdlOFcTt2jWh/TMCqlZWYMG/9uxcYGy4RRA+sFHUmHfRZbWRARfvLRo3mpdxcdUYfP/WId22N7axtl1xMKqjR691ObyF1xaI8I582bOiCZyyGM3sv17OdsGB4tYRh6d8S4q6un3mI0PXc+8QpRh4q7zImkIn9cR+hPJFOZyzk5JafM2p8Lbnu8YE/u7MzhIOXZ5jpccMxUvrfyRaKuQyJdzjpMZRpGYTsrlmc0Ai3hSlq/eVsoJaFbjUQSkhVkIAyPuhm3UF9C2d2fJJ4EVaU94mSKybW5wrI1f8u4T2JxzewTlDkcpDx398e57dGXaIs49CeVKxfMCN0vH0ZhOyuWZxTDKwwZtmsx9BmDiJwC3Ai4wK2q+rWc19uB24GjgF7gA6r6UjVleOrl0hmzRnmU02cZUjOEmz90FAAX3fGUzy00LBrhq+9/G5+9azVAYB2rYdEIN53/DkYPi+ZF5njKc1GmSF5qhhGLJ/FOs3j5Bk6ZeUDoCjWMDGfLmjaCqGVQQqiGQURc4Cbgn4Ae4AkRWaaqG7J2+yiwVVWni8gHgWuAD1RTjj8+31t6J6Msys0Gufy9R3D84RPo3RELdI3sMyxCe8SlLxEPfH9/MpkJUQ0iW3lu293HRXf82ddvoZZ++TCS4SzBzsim1h38wnYlHQ10q+oLqtoH/BQ4I2efM4Afph8vBU6WUu21KuTxl2zGUEvaI8K8Q8YBhV0jMyeODqxnNKLNLdt94hWiCzqW+eWNoUStO/iF7UqaBGzMet4DzCu0j6rGRWQbMA54I3snEbkQuBBg6tTwIk2MvQxvc+lPJFFVhkUjGZdN7sJxLiLiU8qFXCPZ7qD+ZJIrF8xg1sTRAypFnXss88sbQ4laByWEbRiCRv65WqWcfVDVW4BbADo7O62+RRW55KTpfOPBbt9FP6dzEufPm+aLBJo8dhiPdr/BorvXommffi7tEQlUykGukWr60s0vbwxlaj34Cdsw9ABTsp5PBjYX2KdHRCLAaODv1RTipa+dxrT/vK+ahwwFB0im/7sOxBXQVMezvgQcMmE486aNZfbksYzqiPLqtt389W/bGdUR4aiD9mXbnn4WL9/g6zP8syc24opDLJ5AJBVZlG0APnzsVP7jvUfw4XdO47Hne3ljxx6Omz4hr6+x9wXMVsCrXujl+t/+lajrEE8qF584nfPmTa1b4xnzyxtDmVoOfkRD7HmZVvTPAicDm4AngPNUdX3WPhcBb1PVT6YXn9+vqucUO25nZ6d2dXVVLE81jEObQEebSzyRAIGpY0fwzunjOGz/fXi5dyddL2/lsP1GMu/gfel6eSu9O/uIxZPMnTyaQyaMBGD7njh/39nHviPaGNURYZ9hUSaOHuYr8Obd/OzH5XwRcmvsZD/3jtUfT/BS7y7mThlTVmP7Ss5nGEbjIiJPqmpnyf3CNAxpQU4F/ptUuOptqvpfIvJloEtVl4lIB/Aj4O2kZgofVNUXih1zoIbBMAyjlSnXMISex6Cq9wP352z7YtbjPcDZYcthGIZhlEdLZD4bhmEY5WOGwTAMw/BhhsEwDMPwYYbBMAzD8BF6VFIYiMgW4OUBvn08OVnVhg+7PsWx61MYuzbFaYTrc5Cqluxp25SGYTCISFc54Vqtil2f4tj1KYxdm+I00/UxV5JhGIbhwwyDYRiG4aMVDcMt9RagwbHrUxy7PoWxa1Ocprk+LbfGYBiGYRSnFWcMhmEYRhHMMBiGYRg+WsowiMgpIvJXEekWkf+stzyNhIjcJiKvi8i6esvSaIjIFBFZISJPi8h6Ebmk3jI1EiLSISKPi8ia9PW5ut4yNRoi4orIn0Vkeb1lKYeWMQwi4gI3Ae8DZgDnisiM+krVUPwAOKXeQjQoceBSVf0H4BjgIvvu+IgBJ6nqHGAucIqIHFNnmRqNS4Cn6y1EubSMYQCOBrpV9QVV7QN+CpxRZ5kaBlV9mCp3zhsqqOqrqvpU+vF2Uj/wSfWVqnHQFDvST6PpP4tqSSMik4HTgFvrLUu5tJJhmARszHreg/24jQoRkWmkmkqtqq8kjUXaVbIaeB14QFXt+uzlv4FFpDr3NgWtZBgkYJuNaoyyEZGRwN3Av6vqW/WWp5FQ1YSqziXV1/1oEZlVb5kaARFZALyuqk/WW5ZKaCXD0ANMyXo+GdhcJ1mMJkNEoqSMwh2q+ot6y9OoqOqbwEPYepXHfGChiLxEyn19koj8uL4ilaaVDMMTwGEicrCItAEfBJbVWSajCRARAb4HPK2qX6+3PI2GiEwQkTHpx8OA9wDP1FeqxkBVP6eqk1V1Gimd86CqXlBnsUrSMoZBVePAxcD/kFo8vEtV19dXqsZBRO4EHgOOEJEeEflovWVqIOYDHyI12lud/ju13kI1EAcCK0RkLakB2AOq2hRhmUYwVhLDMAzD8NEyMwbDMAyjPMwwGIZhGD7MMBiGYRg+zDAYhmEYPswwGIZhNDiVFLkUkanpoo9/FpG1A4mgM8NgGIbR+PyA8pMGv0AqHP/tpHInvlXpycwwGC2BiEwbTElxEXlJRMYP8L3/Uq1qrCKyI/1/oogsrcYxjcYnqMiliBwqIr8RkSdF5BEROdLbHdgn/Xg0A6jwYIbBMMLnX0iVeq8aqrpZVc+q5jGNpuMW4NOqehRwGXtnBl8CLhCRHuB+4NOVHtgMg9FKRETkh2m/61IRGZ49ExCRThF5KP14nIj8Nu2n/Q5ZRRhF5EoReUZEHhCRO0XksvT2vBGciLwTWAhcl86YPjRIMBH5uIg8kW52c7eIDE9vP1hEHku/tjhr/0HNgIzmJl3Q8Z3Az9NVbb9DKgMd4FzgB6o6GTgV+JGIVKTrzTAYrcQRwC2qOht4C/hUkX2vAlam/bTLgKmQMh7AmaRKb78f6Mx6T94ITlX/mH7/5ao6V1WfL3C+X6jqP6ab3TwNeCVJbgS+rar/CPyt4k9sDFUc4M30d8r7+4f0ax8F7gJQ1ceADqAiN6gZBqOV2Kiqj6Yf/xg4rsi+x6f3QVXvA7amtx8H3KOqu9NNe+6FkiO4cpiVnmX8BTgfmJnePh+4M/34RxUczxjCpMu+vygiZ0Oq0KOIzEm//Apwcnr7P5AyDFsqOX6kirIaRqOTWxhMSbXt9AZIHSX2h+C+HpA1ghugbD8A/kVV14jIR4ATSshhtBDpIpcnAOPTawdXkRpAfFtEvkCqa95PgTXApcB3ReQ/SH13PqIVFsWzGYPRSkwVkWPTj88FVgIvAUelt52Zte/DpH54iMj7gLHp7SuB00WkIz1LOA1KjuC2A6NKyDYKeDXd9+H8rO2Pkgo5JGe70UKo6rmqeqCqRtNlvL+nqi+q6imqOkdVZ6jql9P7blDV+entc1X1t5WezwyD0Uo8DfyvdHnofYFvA1cDN4rII0Aia9+rgeNF5CngvaSm56jqE6TWDNYAvwC6gG3p95wPfFRE1gDr2dtT/KfA5emF7MDFZ+BKUu1CH8Dfy+AS4CIReYJU6GE2NpMwQsHKbhtGhYjISFXdkY4cehi4UFWfqrEMRwFfV9V31/K8RmtgawyGUTm3pBPWOoAf1sEodAI/Af6zluc1WgebMRhGDRGRm0hFGmVzo6p+vx7yGEYQZhgMwzAMH7b4bBiGYfgww2AYhmH4MMNgGIZh+DDDYBiGYfj4f6bCrVfOO+9DAAAAAElFTkSuQmCC\n",
      "text/plain": [
       "<Figure size 432x288 with 1 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    }
   ],
   "source": [
    "tmdb.plot.scatter(x='popularity', y='revenue_adj')\n",
    "tmdb.plot.scatter(x='vote_count',y='revenue_adj')\n",
    "tmdb.plot.scatter(x='budget_adj',y='revenue_adj')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Analysis:**\n",
    ">- In general, the three variabels(`popularity`, `vote_count` and `budget_adj`) are all positively correlated with `revenue_adj`, but the correlation is not very strong, which is the same conclusion from the heatmap;\n",
    ">- There are many outlier data, some movies with extremely high popularity and high vote_count do not have extremely high revenue. These movies maybe controversial, and popularity and vote_count alone are not good indicator for movie success.\n",
    ">- Also, some extremely high budget movies do not have very high revenue, which means they maybe losing money."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2. Explore Answers for research questions "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Question 1. What are the profitibility trend for movie industry? "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<matplotlib.axes._subplots.AxesSubplot at 0x1a15f79908>"
      ]
     },
     "execution_count": 19,
     "metadata": {},
     "output_type": "execute_result"
    },
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAX8AAAEDCAYAAADdpATdAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDMuMC4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvOIA7rQAAFLZJREFUeJzt3X9sXWd9x/HPJ8aN0yYrJfFI2jQNYxXccWlTsAqFbKqllRY2pWyA1nSCwu6IwsAbo1WSxRKslZI23VQEDsPKlop2gwsqoOJBAuPHheKxVnH6M61XCGSoXqrVoWnSNHHr2N/94RvHcezmXt/T+8Pn/ZKse+45T87zjdR+fPKc5znHESEAQLrMqXUBAIDqI/wBIIUIfwBIIcIfAFKI8AeAFCL8ASCF6j78bd9p+xnbe0poe5HtH9p+1PaPbS+tRo0A0GjqPvwlfUnSNSW2/UdJd0fEJZJukXTrK1UUADSyug//iLhP0rMT99l+ve3v2t5t+6e231g89HuSfljcLki6toqlAkDDqPvwn8Y2SR0R8VZJN0n6p+L+RyS9r7j9J5IW2F5Yg/oAoK69qtYFlMv2fEnvkHSP7RO75xY/b5K01faHJd0n6X8lHa92jQBQ7xou/DX2r5XnImLF5AMRsV/Sn0rjvyTeFxGHqlwfANS9hhv2iYjDkvbZ/oAkecylxe1Ftk/8nf5O0p01KhMA6lrdh7/tvKT/kvQG2wO2c5L+XFLO9iOSHtfJG7tXSnrS9s8lvVbSphqUDAB1zzzSGQDSp+6v/AEAyavbG76LFi2K5cuX17oMAGgou3fvPhARrWdqV7fhv3z5cvX19dW6DABoKLZ/XUo7hn0AIIUIfwBIIcIfAFKI8AeAFCL8ASCFCH+gDPl8XtlsVk1NTcpms8rn87UuCZiRup3qCdSbfD6vzs5Obd++XStXrlRvb69yuZwkafXq1TWuDihP3T7eoa2tLZjnj3qSzWbV1dWl9vb28X2FQkEdHR3as+eMbxkFqsL27ohoO2M7wh8oTVNTk4aGhtTc3Dy+b3h4WC0tLRoZGalhZcBJpYY/Y/5AiTKZjHp7e0/Z19vbq0wmU6OKgJkj/IESdXZ2KpfLqVAoaHh4WIVCQblcTp2dnbUuDSgbN3yBEp24qdvR0aH+/n5lMhlt2rSJm71oSIz5A8Aswpg/AGBahD8ApBDhD5SBFb6YLbjhC5SIFb6YTbjhC5SIFb5oBKzwBRLGCl80Amb7AAljhS9mE8IfKBErfDGbVHzD1/aFku6WtFjSqKRtEfG5SW2ulPQtSfuKu74ZEbdU2jdQTazwxWxS8Zi/7SWSlkTEg7YXSNot6b0R8cSENldKuiki/rjU8zLmDwDlq9qYf0Q8HREPFrefl9Qv6YJKzwsAeOUkOuZve7mkyyQ9MMXhK2w/Ynun7TdN8+fX2O6z3Tc4OJhkaQCACRILf9vzJX1D0icj4vCkww9KuigiLpXUJeneqc4REdsioi0i2lpbW5MqDQAwSSLhb7tZY8H/5Yj45uTjEXE4Io4Ut3dIara9KIm+AQDlqzj8bVvSdkn9EXHHNG0WF9vJ9uXFfn9Tad8AgJlJ4tk+75T0QUmP2X64uG+jpGWSFBHdkt4v6WO2j0s6Jum6qNelxQCQAhWHf0T0SvIZ2myVtLXSvgAAyWCFLwCkEOEPAClE+ANAChH+AJBChD8ApBDhDwApRPgDQAoR/kAZ8vm8stmsmpqalM1mlc/na10SMCNJrPAFUiGfz6uzs1Pbt2/XypUr1dvbq1wuJ0m80AUNhxe4AyXKZrPq6upSe3v7+L5CoaCOjg7t2bOnhpUBJ5X6MhfCHyhRU1OThoaG1NzcPL5veHhYLS0tGhkZqWFlwElVe5MXkBaZTEa9vb2n7Ovt7VUmk6lRRcDMEf5AiTo7O5XL5VQoFDQ8PKxCoaBcLqfOzs5alwaUjRu+QIlO3NTt6OhQf3+/MpmMNm3axM1eNCSu/AEghbjyB0rEVE/MJsz2AUrEVE80AqZ6AgljqicaAVM9gYQx1ROzCeEPlIipnphNuOELlIipnphNGPMHgFmkamP+ti+0XbDdb/tx238zRRvb/rztvbYftf2WSvsFAMxcEsM+xyXdGBEP2l4gabft70fEExPavFvSxcWft0n6YvETAFADFV/5R8TTEfFgcft5Sf2SLpjU7FpJd8eY+yW92vaSSvsGAMxMorN9bC+XdJmkByYdukDSUxO+D+j0XxCyvcZ2n+2+wcHBJEsDAEyQWPjbni/pG5I+GRGHJx+e4o+cdqc5IrZFRFtEtLW2tiZVGgBgkkTC33azxoL/yxHxzSmaDEi6cML3pZL2J9E3AKB8Scz2saTtkvoj4o5pmvVI+lBx1s/bJR2KiKcr7RsAMDNJzPZ5p6QPSnrM9sPFfRslLZOkiOiWtEPSeyTtlXRU0kcS6BcAMENJzPbpjQhHxCURsaL4syMiuovBr+Isn49HxOsj4s0RweotNKR8Pq9sNqumpiZls1nl8/lalwTMCI93AErE8/wxm/B4B6BEPM8fjYDn+QMJ43n+aAQ8zx9IGM/zx2xC+AMl4nn+mE244QuUiOf5Yzbhyh8AUogrf6BETPXEbMJsH6BETPVEI2C2D5Cw/v5+DQwMnLLCd2BgQP39/bUuDSgbwz5Aic4//3ytW7dOX/nKV8aHfa6//nqdf/75tS4NKBtX/kAZxh5iO/13oFEQ/kCJ9u/fry1btqijo0MtLS3q6OjQli1btH8/r6ZA42HYByhRJpPR0qVLT7m5WygUWOGLhsSVP1AiVvhiNuHKHygRK3wxmzDPHwBmEeb5AwCmRfgDQAoR/kAZTkzztD0+3RNoRIQ/UKKOjg51d3dr8+bNeuGFF7R582Z1d3fzCwANiRu+QIlaWlq0efNmfepTnxrfd8cdd2jjxo0aGhqqYWXASVW94Wv7TtvP2J7y0Ya2r7R9yPbDxZ9PJ9EvUE0vvvii1q5de8q+tWvX6sUXX6xRRcDMJTXs8yVJ15yhzU8jYkXx55aE+gWqZu7cueru7j5lX3d3t+bOnVujioCZS2SRV0TcZ3t5EucC6tVHP/pRrV+/XtLYFX93d7fWr19/2r8GgEZQzRW+V9h+RNJ+STdFxOOTG9heI2mNJC1btqyKpQFn1tXVJUnauHGjbrzxRs2dO1dr164d3w80ksRu+Bav/L8dEdkpjv2WpNGIOGL7PZI+FxEXv9z5uOELAOWrqxW+EXE4Io4Ut3dIara9qBp9AwBOV5Xwt73Yxbde2L682O9vqtE3AOB0iYz5285LulLSItsDkj4jqVmSIqJb0vslfcz2cUnHJF0X9brAAABSIJEr/4hYHRFLIqI5IpZGxPaI6C4GvyJia0S8KSIujYi3R8TPkugXqLZ8Pn/KC9zz+XytSwJmhOf5AyXK5/Pq7OzU9u3bx1/gnsvlJIln+qPh8HgHoETZbFZdXV1qb28f31coFNTR0XHKqx2BWip1tg/hD5SoqalJQ0NDam5uHt83PDyslpYWjYyM1LAy4KS6muoJzAaZTEY333zzKWP+N998My9wR0Mi/IEStbe369Zbb9WBAwcUETpw4IBuvfXWU4aBgEZB+AMluvfee7VgwQLNmzdPkjRv3jwtWLBA9957b40rA8pH+AMlGhgY0D333KN9+/ZpdHRU+/bt0z333KOBgYFalwaUjfAHyrB169ZTXuO4devWWpcEzAjhD5TonHPOUU9Pj84++2xJ0tlnn62enh6dc845Na4MKB/hD5To2LFjkqSDBw+e8nliP9BICH+gRKOjo7KtxYsXa86cOVq8eLFsa3R0tNalAWUj/IEyrFixQgsXLpQkLVy4UCtWrKhxRcDM8GwfoAwPPfSQzjvvPI2Ojmr//v3jQz9Ao+HKHyjT4cOHT/kEGhHhD5Rp0aJFsq1Fi3gZHRoX4Q+UYdWqVXruuecUEXruuee0atWqWpcEzAjhD5Ro6dKl2rVrl3bu3KmXXnpJO3fu1K5du7R06dJalwaUjfAHSnT77bfryJEjuvrqq3XWWWfp6quv1pEjR3T77bfXujSgbIQ/AKQQ4Q+UaN26dae9tGVkZETr1q2rUUXAzDHPHyjRiad3NjU1SRpb8Xv06FEdPXq0lmUBM8KVP1CmE1f/vLoRjSyR8Ld9p+1nbE/5FmuP+bztvbYftf2WJPoFamHVqlUaHBxkmicaWlLDPl+StFXS3dMcf7eki4s/b5P0xeIn0FDmzJmjnTt3qrW1Vc3NzZozZw4PdkNDSuTKPyLuk/TsyzS5VtLdMeZ+Sa+2vSSJvoFqGh0d1fz58zVnzhzNnz+f4EfDqtaY/wWSnprwfaC4D2g4Bw8e1OjoKA91Q0OrVvh7in1xWiN7je0+232Dg4NVKAsA0qla4T8g6cIJ35dK2j+5UURsi4i2iGhrbW2tUmlAaebMmfp/l+n2A/WsWv/V9kj6UHHWz9slHYqIp6vUN5CIE+P7J+b5T5zvDzSaRGb72M5LulLSItsDkj4jqVmSIqJb0g5J75G0V9JRSR9Jol+gFpjnj9kgkfCPiNVnOB6SPp5EX0CtzZ8/X0eOHBn/BBoRg5VAmU4EPsGPRkb4A0AKEf4AkEKEPwCkEOEPAClE+ANlOrGoi8VdaGT81wuU6cSiLhZ3oZER/gCQQoQ/AKQQ4Q8AKUT4A0AKEf4AkEKEPwCkEOEPAClE+ANAChH+AJBChD8ApBDhDwApRPgDQAoR/gCQQoQ/AKQQ4Q8AKUT4A0AKJRL+tq+x/aTtvbY3THH8w7YHbT9c/PnLJPoFAMzMqyo9ge0mSV+QdJWkAUm7bPdExBOTmn4tIj5RaX8AgMolceV/uaS9EfGriHhJ0lclXZvAeQEAr5Akwv8CSU9N+D5Q3DfZ+2w/avvrti+c6kS219jus903ODiYQGkAgKkkEf6eYl9M+v7vkpZHxCWSfiDprqlOFBHbIqItItpaW1sTKA0AMJUkwn9A0sQr+aWS9k9sEBG/iYgXi1//WdJbE+gXADBDSYT/LkkX236d7bMkXSepZ2ID20smfF0lqT+BfgEAM1TxbJ+IOG77E5K+J6lJ0p0R8bjtWyT1RUSPpL+2vUrScUnPSvpwpf0CAGbOEZOH5+tDW1tb9PX11boMYJw91e2tMfX6/xHSx/buiGg7UztW+AJAChH+AJBChD8ApBDhDwApRPgDQAoR/gCQQoQ/AKQQ4Q8AKVTxCl+g0b3c4q0kz8FCMNQTwh+pV2oos8IXswnDPgCQQoQ/UKLpru656kcjIvyBMkSEIkIXrf/2+DbQiAh/AEghwh8AUojwB4AUIvwBIIUIfwBIIRZ5Yda59Ob/0KFjw694P8s3fOcVPf+585r1yGfe9Yr2gfQi/DHrHDo2rP+57Y9qXUbFXulfLkg3hn0AIIUIfwBIoUTC3/Y1tp+0vdf2himOz7X9teLxB2wvT6JfAMDMVDzmb7tJ0hckXSVpQNIu2z0R8cSEZjlJByPid21fJ2mLpD+rtG9gKgsyG/Tmu067Bmk4CzKS1Pj3LlCfkrjhe7mkvRHxK0my/VVJ10qaGP7XSvr74vbXJW217eDBKHgFPN9/Gzd8gTNIIvwvkPTUhO8Dkt42XZuIOG77kKSFkg5MbGR7jaQ1krRs2bIESkNazYbgPHdec61LwCyWRPhP9YaLyVf0pbRRRGyTtE2S2tra+FcBZqQaV/3LN3xnVvzrAumVxA3fAUkXTvi+VNL+6drYfpWkcyU9m0DfAIAZSCL8d0m62PbrbJ8l6TpJPZPa9Ei6obj9fkk/YrwfAGqn4mGf4hj+JyR9T1KTpDsj4nHbt0jqi4geSdsl/avtvRq74r+u0n4BADOXyOMdImKHpB2T9n16wvaQpA8k0RcAoHKs8AWAFCL8ASCFCH8ASCHCHwBSiOf5I/XsqdYglvDntpTXntnNqCeEP1KPUEYaMewDAClE+ANAChH+AJBChD8ApBDhDwApRPgDQAoR/gCQQoQ/AKQQ4Q8AKUT4A0AKEf4AkEKEP1CGfD6vbDarpqYmZbNZ5fP5WpcEzAgPdgNKlM/n1dnZqe3bt2vlypXq7e1VLpeTJK1evbrG1QHlcb0+0bCtrS36+vpqXQYwLpvNqqurS+3t7eP7CoWCOjo6tGfPnhpWBpxke3dEtJ2xHeEPlKapqUlDQ0Nqbm4e3zc8PKyWlhaNjIzUsDLgpFLDnzF/oESZTEa9vb2n7Ovt7VUmk6lRRcDMVRT+tl9j+/u2f1H8PG+adiO2Hy7+9FTSJ1ArnZ2dyuVyKhQKGh4eVqFQUC6XU2dnZ61LA8pW6Q3fDZJ+GBG32d5Q/L5+inbHImJFhX0BNXXipm5HR4f6+/uVyWS0adMmbvaiIVU05m/7SUlXRsTTtpdI+nFEvGGKdkciYn4552bMHwDKV60x/9dGxNOSVPz87Wnatdjus32/7fdOdzLba4rt+gYHByssDQAwnTMO+9j+gaTFUxwqZ6BzWUTst/07kn5k+7GI+OXkRhGxTdI2aezKv4zzAwDKcMbwj4g/nO6Y7f+zvWTCsM8z05xjf/HzV7Z/LOkySaeFPwCgOiod9umRdENx+wZJ35rcwPZ5tucWtxdJeqekJyrsFwBQgUrD/zZJV9n+haSrit9lu832vxTbZCT12X5EUkHSbRFB+ANADdXtCl/bg5J+Xes6gGksknSg1kUAU7goIlrP1Khuwx+oZ7b7SplOB9QrHu8AAClE+ANAChH+wMxsq3UBQCUY8weAFOLKHwBSiPAHgBQi/IEK2X5j8V0VD9l+ve2fFfcvt319resDpkL4AyWw3fQyh98r6VsRcVlE/DIi3lHcv1wS4Y+6xA1fpJ7t5ZK+K+kBjT108OeSPqSxZ1DdKeldkrZK+m9J3ZLO1tiDCf9C0hXFNiOSfh4R7SfeX2H7fo093mSfpLsi4rNV/GsBL4srf2DMGyRti4hLJB2W9FfF/UMRsTIivirpbknri20ek/SZiNihsV8In42I9knn3CDppxGxguBHvSH8gTFPRcR/Frf/TdLK4vbXJMn2uZJeHRE/Ke6/S9IfVLdEIDmEPzBm8vjnie8vVLsQoBoIf2DMMttXFLdXS+qdeDAiDkk6aPv3i7s+KOknennPS1qQaJVAQgh/YEy/pBtsPyrpNZK+OEWbGyT9Q7HNCkm3nOGcj0o6bvsR23+baLVAhZjtg9Qrzvb5dkRka1wKUDVc+QNACnHlDwApxJU/AKQQ4Q8AKUT4A0AKEf4AkEKEPwCk0P8DMeGPWF3nii4AAAAASUVORK5CYII=\n",
      "text/plain": [
       "<Figure size 432x288 with 1 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    }
   ],
   "source": [
    "# create a column for profit\n",
    "tmdb['profit']=tmdb['revenue_adj']-tmdb['budget_adj']\n",
    "# plot a box chart for the profit column\n",
    "tmdb['profit'].plot.box()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Analysis:**\n",
    ">- Some movies are losing money; others, however have huge profit."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>popularity</th>\n",
       "      <th>runtime</th>\n",
       "      <th>vote_count</th>\n",
       "      <th>vote_average</th>\n",
       "      <th>budget_adj</th>\n",
       "      <th>revenue_adj</th>\n",
       "      <th>profit</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>release_year</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>1960</th>\n",
       "      <td>1.324513</td>\n",
       "      <td>130.0</td>\n",
       "      <td>372.6</td>\n",
       "      <td>7.40</td>\n",
       "      <td>3.068179e+07</td>\n",
       "      <td>1.902299e+08</td>\n",
       "      <td>1.595481e+08</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1961</th>\n",
       "      <td>0.787718</td>\n",
       "      <td>132.5</td>\n",
       "      <td>191.4</td>\n",
       "      <td>6.62</td>\n",
       "      <td>2.818516e+07</td>\n",
       "      <td>2.463622e+08</td>\n",
       "      <td>2.181770e+08</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "              popularity  runtime  vote_count  vote_average    budget_adj  \\\n",
       "release_year                                                                \n",
       "1960            1.324513    130.0       372.6          7.40  3.068179e+07   \n",
       "1961            0.787718    132.5       191.4          6.62  2.818516e+07   \n",
       "\n",
       "               revenue_adj        profit  \n",
       "release_year                              \n",
       "1960          1.902299e+08  1.595481e+08  \n",
       "1961          2.463622e+08  2.181770e+08  "
      ]
     },
     "execution_count": 20,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Create a dataframe that group by year and calculate the mean value\n",
    "year_mean=tmdb.groupby('release_year').mean()\n",
    "year_mean.head(2)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<matplotlib.legend.Legend at 0x1a15cba828>"
      ]
     },
     "execution_count": 21,
     "metadata": {},
     "output_type": "execute_result"
    },
    {
     "data": {
      "image/png": "\n",
      "text/plain": [
       "<Figure size 432x288 with 1 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    }
   ],
   "source": [
    "# plotting line chart for the average profit groupby each year\n",
    "plt.plot(year_mean.index, year_mean['profit'],label='Profit')\n",
    "plt.xlabel('years')\n",
    "plt.ylabel('In terms of 2010 dollars')\n",
    "plt.title('Profit over years')\n",
    "plt.legend()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Analysis:**\n",
    ">- Surprisingly, the average profit per year started to decrease since about 1980; there is less profit to making a movie compared with three decades ago.\n",
    ">- In the earlier years from 1960 to 1980, film industry have higher profit but with very high fluctuation too. Even though the profit is lower since 1989, but the profit trend is more stable. Probably in the earlier years, film industry is relatively new, and high risk is associated with high profit."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Question 2. Are newer movies more popular?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<matplotlib.legend.Legend at 0x1a15fd7d30>"
      ]
     },
     "execution_count": 22,
     "metadata": {},
     "output_type": "execute_result"
    },
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAY8AAAEWCAYAAACe8xtsAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDMuMC4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvOIA7rQAAIABJREFUeJzs3Xd8W/W5+PHPY8l7zzh2EjtOgrM3IexA2KWXUeBCe8sot7SFS+H2Vwr0dg8u5ba3lPZ20AKl7Ja2QFktZY8A2QlZJHGWYzvee8r6/v44R7ZsS5bkJdl+3q+XXpGOjs75SnLOo+96vmKMQSmllApFVLgLoJRSavzR4KGUUipkGjyUUkqFTIOHUkqpkGnwUEopFTINHkoppUKmwUOFjYicKiJ7wl0OpVToNHgov0TkoIh0ikhWv+1bRMSISOFwjm+MedsYUzycY4RKRNaISOlYnnOkiEih/bk7w10WpTR4qEAOAFd5HojIIiA+fMVR4SaWsFw7NHBGDg0eKpBHgKu9Hl8D/MF7BxFJFZE/iEiViBwSkW+ISJSIxIpIvYgs9No3W0TaRCSnfy1ARPJE5M/2cQ6IyJd9FUhEVotIhYg4vLZdIiLb7PuxInKviJTZt3vtbYnAS0CeiDTbtzy7rHeIyH4RqRGRP4pIRqAPRkROEZH37Pd4RESuHezzsJ/7jog86nWMPrUJEXlDRL4vIu+KSJOI/MOr5veW/W+9XfYTfZTJ53u3n9slIhd67esUkWoRWe71uXrez1YRWeO17xsi8kMReRdoBYr6nfc2Eflzv20/F5F7vT6TB0SkXESOisgPPN+fiMwSkdfsz75aRB4TkTSv4xwUkdvt77fFLvft9nGaRGSPiKwN9H2pEWaM0ZvefN6Ag8BZwB5gHuAAjgAFgAEK7f3+ADwLJAOFwMfA9fZzDwI/9DrmTcDL9v01QKl9PwrYCHwLiMG6OJUA5/op237gbK/HfwLusO9/D3gfyAGygfeA7/c/p9drb7X3nwbEAr8Bngjw2cwAmrBqZdFAJrA0iM/jO8CjXscptD9Lp/34Dfu9HYdVw3sDuNvXvn7KNdh7/xbwmNe+nwB22/fzgRrgAvu7ONt+nO1VrsPAAsAJRPc771SgBUizHzuBSmCF/fgZ+3NNtMv2IfAF+7nZ9vli7TK/Bdzb7+9wCzDd/kyKsf4O87w+l1nh/v8y2W5hL4DeIvdGb/D4BvDfwHnAK/aFwdj/aR1ABzDf63VfAN6w758FlHg99y5wtX1/Db3B4wTgcL/z3wk85KdsPwAetO8n2xeuAvvxfuACr33PBQ72P6fX87uAtV6PpwJdAS7SdwJ/9bE90OfxHQIHj294PX8jvcG2z75+yjXYe5+NFfAS7MePAd+y798OPNLvWH8HrvEq1/cC/L28BHzevn8hsNO+P8X+TOK99r0KeN3PcS4GNvf7O/yc1+PZWIHpLPoFMb2N3U3bD1UwHsH6NTiTfk1WQBZWTeGQ17ZDWL9kAV4D4kXkBKACWAr81cc5CrCak+q9tjmAt/2U6XHgPRH5EnApsMkY4ylDno/y5Pl9d9a5/yoibq9t3VgXvaN+XjMd60LdX6DPIxgVXvdbgaQQXuv3vRtj9onILuCTIvI34F+AZfZ+BcDlIvJJr9dGA697PT4S4NwPA18Cfgv8G9bfjefY0UC5iHj2jfIcT0RygPuAU7F+CEQBdf2O3XNu+33cihWIF4jI34GvGGPKApRPjSDt81AB2RflA1hNGn/p93Q11q/0Aq9tM7AvusYYN/BHrF+anwaeN8Y0+TjNEeCAMSbN65ZsjLnAT5l2Yl0Yz7eP+7jX02U+yuO5sPhKI30EOL/fueOMMf4Ch+c1s3xsH/TzwKohJXg9lzvIOfoLJgX2YO8d4Ams7+IirJrBPnv7Eayah/dnkGiMuTuE8z8DLLb7uC7Eqtl4jt0BZHkdO8UYs8B+/r/tYy82xqRgBR7pd+w+5zbGPG6MOYXeJtQfBSibGmEaPFSwrgfONMa0eG80xnRjBYcfikiyiBQAXwEe9drtceBfgc/Q9yLv7UOg0e4IjRcRh4gsFJHjBynT48CXgdOw+jw8ngC+YXfOZ2G19XvKcwzIFJFUr/1/bZe/AHo69S8a5LxgXRjPEpEr7A7cTBFZGsTnsQU4TURm2GW4M8B5vFUBbvp1Vvcz2HsHeBI4B6uG4P1dPIpVIznX/uzjxBrQMC3Ywhlj2oGn7eN+aIw5bG8vB/4B/EREUuwBCrNE5HT7pclAM9ZAgHzgtsHOIyLFInKmPRCgHWjDqimqMaTBQwXFGLPfGLPBz9M3Y/2iLgHewbp4POj12g/s5/Ow2sV9Hb8b+CRWs9YBrF/wvwNSfe1vewKrD+M1Y0y11/YfABuAbcB2YJO9DWPMbvt1JfaoojzgZ8BzwD9EpAmrw/mEQc6LfWG8APh/QC1WUFgS6PMwxrwCPGWXbSPw/GDn6XfOVuCHwLt22Vf72M3ve7ePUQ6sA06yy+HZfgSrNvJ1rCB1BOsiHuo14mFgEb1NVh5XYzXn7cRqknoaq28J4LvAcqABeIGBtdv+YoG7sf5GKrA64L8eYjnVMIkxuhiUUmpkiMgMYDeQa4xpDHd51OjRmodSakTYc1m+AjypgWPi09FWSvkhIp/BmpvQ3yGvzl4FiDUB8xjWIIbzwlwcNQa02UoppVTItNlKKaVUyCZss1VWVpYpLCwMdzGUUmpc2bhxY7UxJjvQfhM2eBQWFrJhg7+RpUoppXwRkUOB99JmK6WUUkOgwUMppVTINHgopZQK2YTt8/Clq6uL0tJS2tvbw12UCS0uLo5p06YRHR0d7qIopUbJpAoepaWlJCcnU1hYiFdqaDWCjDHU1NRQWlrKzJkzw10cpdQomVTNVu3t7WRmZmrgGEUiQmZmptbulJrgJlXwADRwjAH9jJWa+CZd8FBKqfGuvaubP64/wpHa1rCVQYNHhDp48CCPP+5v3aTwu+uuu8JdBKUmHbfb8OyWo6z9yZt87c/b+Ppft4etLBo8IpQGD6WUtw8P1HLJL9/llie3kJYQzaeWT+PtvdVsL20IS3k0eIyh22+/nV/+8pc9j7/zne/wk5/8hNtuu42FCxeyaNEinnrKWtztjjvu4O2332bp0qX89Kc/pbu7m9tuu43jjz+exYsX85vf+MoU3uuee+5h0aJFLFmyhDvuuAOALVu2sHr1ahYvXswll1xCXV0dAGvWrOlJ5VJdXY0nJ9jvf/97Lr30Us477zzmzJnD1772tZ6ytbW1sXTpUj7zmc+M6Gek1GTldhuufvBDLvz523z2gQ+45cnNfOe5Hdz36l6+8MgGrvjNOo41dvCTy5fwt/84he/8y3yS45z8+s39YSnvqA7VFZEHgQuBSmPMQntbBtbyl4XAQeAKY0ydWL2sP8Na2rMVuNYYs8l+zTXAN+zD/sAY8/Bwy/bdv+1gZ9nIrlczPy+Fb3/S/zIPV155Jbfeeis33ngjAH/84x+5/fbbefnll9m6dSvV1dUcf/zxnHbaadx99938+Mc/5vnnrVVK77//flJTU1m/fj0dHR2cfPLJnHPOOT6Hw7700ks888wzfPDBByQkJFBbWwvA1Vdfzc9//nNOP/10vvWtb/Hd736Xe++9d9D3tGXLFjZv3kxsbCzFxcXcfPPN3H333fziF79gy5YtQ/2olFL91LV28tbHVczJSaKp3cWhmlbqWjpp6nCRGOPg/519HP9+ahHxMQ4AkuOi+ezqAn715n4OVLcwMytxTMs72vM8fg/8AviD17Y7gFeNMXeLyB3249uB84E59u0E4FfACXaw+TawEjDARhF5zhhTN8plH3HLli2jsrKSsrIyqqqqSE9PZ8uWLVx11VU4HA6mTJnC6aefzvr160lJSenz2n/84x9s27aNp59+GoCGhgb27t3rM3j885//5LrrriMhIQGAjIwMGhoaqK+v5/TTTwfgmmuu4fLLLw9Y5rVr15Kaai0jPn/+fA4dOsT06dOH9TkopQaqb+sC4KYzZnPxsvye7Z0uNwZDrNMx4DXXnTyT371zgPvf2s9/X7p4zMoKoxw8jDFviUhhv80XAWvs+w8Db2AFj4uAPxhrdar3RSRNRKba+75ijKkFEJFXsFYqe2I4ZRushjCaLrvsMp5++mkqKiq48sor2b8/uCqnMYaf//znnHvuuUHtG8pwWafTidvtBhgwPyM2NrbnvsPhwOVyBX1cpVTwGuzgkRrfNzNDjNN/70J2cixXrJzGH9eXcutZxzElJW5Uy+gtHH0eU4wx5QD2vzn29nzgiNd+pfY2f9sHEJEbRGSDiGyoqqoa8YKPhCuvvJInn3ySp59+mssuu4zTTjuNp556iu7ubqqqqnjrrbdYtWoVycnJNDU19bzu3HPP5Ve/+hVdXdYf2Mcff0xLS4vPc5xzzjk8+OCDtLZaw/hqa2tJTU0lPT2dt99+G4BHHnmkpxZSWFjIxo0bAXpqNoFER0f3lEUpNXw9wSMhtLQ+N5w6C5fbzYPvHBiNYvkVSelJfP1UNoNsH7jRmPuB+wFWrlwZkevrLliwgKamJvLz85k6dSqXXHIJ69atY8mSJYgI99xzD7m5uWRmZuJ0OlmyZAnXXnstt9xyCwcPHmT58uUYY8jOzuaZZ57xeY7zzjuPLVu2sHLlSmJiYrjgggu46667ePjhh/niF79Ia2srRUVFPPTQQwB89atf5YorruCRRx7hzDPPDOp93HDDDSxevJjly5fz2GOPjdjno9Rk1dDqu+YRyIzMBC5cnMej7x/ixjWzQw4+QzXqa5jbzVbPe3WY7wHWGGPK7WapN4wxxSLyG/v+E977eW7GmC/Y2/vs58/KlStN/8Wgdu3axbx580bw3Sl/9LNWKjQPv3eQbz+3gw3fOIuspNjAL/Cys6yRC+57m6+ecxz/ceacYZVDRDYaY1YG2i8czVbPAdfY968BnvXafrVYVgMNdrPW34FzRCRdRNKBc+xtSik1Yfjr8wjG/LwU1hRn89C7B2nr7B7povk0qsFDRJ4A1gHFIlIqItcDdwNni8he4Gz7McCLQAmwD/gtcCOA3VH+fWC9ffuep/N8stu+fTtLly7tczvhhBPCXSyl1BA0tHWRGOMg2jG0y/KNa2ZT09LJnzYeCbzzCBjt0VZX+XlqrY99DXCTn+M8CDw4gkWbEBYtWqRzLZSaIOpbu4ZU6/A4vjCdFQXp/ObNEq5aNWPIQShYk26G+Wj38Sj9jJUaioa2LlITYob8ehHhS6fPIis5lqqmjhEsmW+RNNpq1MXFxVFTU6Nreowiz2JQcXFjN95cqYmgsa2L1PjhXZLXzsth7bycMbm+TargMW3aNEpLS4nUOSAThWcZWqVU8OrbOoedYmQsfxRPquARHR2tS6MqpSJSQ1sXafFDb7Yaa5Ouz0MppSKR1ecxNhP8RoIGD6WUCrP2rm7au9zDGm011jR4KKVUmDXaEwRTNHgopZQKlmd2eZoGD6WUUsGqH0ZqknDR4KGUUmE21Iy64aTBQymlwqyn2UpHWymllAqWNlsppZQKmafmkRynwUMppVSQGtu6SIlz4ogaPzn3NHgopVSY1bd2jqvZ5aDBQymlwq6hbXhreYSDBg+llAqz8ZYUETR4KKVU2GnNQymlVMga2rrGVV4r0OChlFJhZYyxmq20w1wppVSw2rq66eo22myllFIqePXjMK8VaPBQSqmwGo/p2EGDh1JKhVXDOMxrBRo8lFIqrDzNVjraSimlVNAax2E6dtDgoZRSYaXNVkoppUJW39aJI0pIinWGuygh0eChlFJh1GCnYxcZP+nYQYOHUkqFVUObi7SE8ZUUETR4KKVUWNW3do67kVYQxuAhIv8pIjtE5CMReUJE4kRkpoh8ICJ7ReQpEYmx9421H++zny8MV7mVUmokNY7DjLoQpuAhIvnAl4GVxpiFgAO4EvgR8FNjzBygDrjefsn1QJ0xZjbwU3s/pZQa96y1PDR4hMIJxIuIE0gAyoEzgaft5x8GLrbvX2Q/xn5+rYy33iWllPKhXmsewTPGHAV+DBzGChoNwEag3hjjsncrBfLt+/nAEfu1Lnv/zP7HFZEbRGSDiGyoqqoa3TehlFLD5HYbbbYKhYikY9UmZgJ5QCJwvo9djeclgzzXu8GY+40xK40xK7Ozs0equEopNSqaO124zfibXQ7ha7Y6CzhgjKkyxnQBfwFOAtLsZiyAaUCZfb8UmA5gP58K1I5tkZVSamQ1jNO8VhC+4HEYWC0iCXbfxVpgJ/A6cJm9zzXAs/b95+zH2M+/ZowZUPNQSqnxZLymJoHw9Xl8gNXxvQnYbpfjfuB24Csisg+rT+MB+yUPAJn29q8Ad4x5oZVSaoSN17U8wBrxFBbGmG8D3+63uQRY5WPfduDysSiXUkqNlZ6ah/Z5KKWUCtZ4XYIWNHgopVTY9DZbaW4rpZRSQWpo6yLGEUVc9Pi7FI+/Eiul1ATR0GYlRRyPCTM0eCilVJg0tHWNywmCoMFDKaXCpmGcpiYBDR5KKRU29a0aPJRSSoVIax5KKaVCpsFDKaVUSLrdhqZ2lwYPpZRSwWscx0kRQYOHUkqFRc/sch2qq5RSKljjOR07aPBQSqmwqNfgoZRSKlTabKWUUipknuAxHpegBQ0eSikVFg2tnYA2WymllApBQ1sX8dEOYp2OcBdlSDR4KKVUGIzn2eWgwUMppcJiPCdFBA0eSikVFg1tXaSO05FWoMFDKaXCYsI3W4nII8FsU0opFbwJHzyABd4PRMQBrBid4iil1MRijPG5vaGti7SJGDxE5E4RaQIWi0ijfWsCKoFnx6yESik1TlU0tLP2J2/yzOajfbZ3uty0dnZPzJqHMea/jTHJwP8YY1LsW7IxJtMYc+cYllEppcalX7+5n5LqFv7rr9s5XNPas70nKeJE7jA3xtwpIvkicpKInOa5jUXhlFJqvDrW2M7jHx5m7dwcokT4yh+30O22mrDGe0ZdAGegHUTkbuBKYCfQbW82wFujWC6llBrXfv3mfrrdhm9/cgEbDtXylT9u5f63SvjSmlmTI3gAlwDFxpiO0S6MUkpNBJWN7Tz+wWEuWZbPjMwEpmfE889dx/jfV/Zw2nFZNLSN77xWENxoqxJg/L5DpZQaY795qwSX2/AfZ8wGQET44cWLSEuI4StPbaWqyfotPp6DRzA1j1Zgi4i8CvTUPowxXx61Uiml1DhV1dTBYx8c4qKleRRmJfZsT0+M4Z7LFnPdQ+v56St7AUhLiAlXMYctmJrHc8D3gfeAjV63YRGRNBF5WkR2i8guETlRRDJE5BUR2Wv/m27vKyJyn4jsE5FtIrJ8uOdXSqnR8Nu3S+h0ubn5zDkDnjujOId/Wz2DisZ2AFLigvn9HpkCltwY8/AonftnwMvGmMtEJAZIAL4OvGqMuVtE7gDuAG4Hzgfm2LcTgF/Z/yqlVMSobu7gkXWHuGhpPjO9ah3evn7BPN7dV0NNcwdOx/jNEBXMaKsDWKOr+jDGFA31pCKSApwGXGsfqxPoFJGLgDX2bg8Db2AFj4uAPxhrqub7dq1lqjGmfKhlUEqpkfbbt0vocHXzH2fO9rtPQoyT3193PAe95n2MR8HUmVZ63Y8DLgcyhnneIqAKeEhElmA1g90CTPEEBGNMuYjk2PvnA0e8Xl9qb9PgoZSKCLUtnTyy7hCfXJLHrOykQfctyEykINN3zWS8CGaSYI3X7agx5l7gzGGe1wksB35ljFkGtGA1Ufkjvoo2YCeRG0Rkg4hsqKqqGmYRlVIqeA+8U0JbVzc3D1LrmEiCabby7pyOwqqJJA/zvKVAqTHmA/vx01jB45inOUpEpmLl0fLsP93r9dOAsv4HNcbcD9wPsHLlSt/ZyFRQjDG4DTiifMVtpZS39q5uHv/gMOfMn8LsnOFeHseHYHprfuJ1+2+sjLpXDOekxpgK4IiIFNub1mLNYH8OuMbedg29CRifA662R12tBhq0v2N0vbGniqXf/UfPTFillH8vbCunrrWLq08sDHdRxkwwo63OGKVz3ww8Zo+0KgGuwwpmfxSR64HDWP0rAC8CFwD7sOadXDdKZVK2neWNNHW4KK1rJTU+NdzFUSqiPfL+IYqyEzlpVma4izJmgmm2SgW+jTU6CuBN4HvGmIbhnNgYs4W+nfEea33sa4CbhnM+FRrPDFjPv0op37aXNrDlSD3f/uR8RCZPM28wzVYPAk1YTVVXAI3AQ6NZKBV+1c3DDx5PfniYF7Zp66Ka2B55/yDx0Q4uXT4t3EUZU8EM1Z1ljPmU1+PvisiW0SqQigw9waN56MHjV2/uJzcljk8snjpSxVIqojS0dvHsljIuXT5tXOepGopgah5tInKK54GInAy0jV6RVCSobrayflY2Di14uN2G8vr2YQUfpSLdnzYeocPl5rOrC8JdlDEXTM3jS8DDdt8HQB32zHA1cQ235lHd0kFnt5tq7TNRE5TbbXj0/UOsLEhnfl5KuIsz5oIZbbUFWGKnFMEY0zjqpVJh1dXtpr7VGqI71D6Psnor8Vtju4sOVzexTseIlU+pSPDOvmoO1rTyn2cfF+6ihEXAZisRuUtE0owxjcaYRhFJF5EfjEXhVHjU2E1WwJBrDkfrels2vY+n1ETxyPuHyEyM4byFueEuSlgE0+dxvjGm3vPAGFOHNedCTVCeJqv8tPhh1Dx6g0e19nuoCeZofRuv7jrGlaumT9padTDBwyEisZ4HIhIPxA6yvxrnPP0c86Ym09Thoq2zO8ArBjrqFTx0roiaaB7/4BAAV62aEeaShE8wweNR4FURuV5EPge8gpUuXU1QnqaqeVOtTsChXPzL6ttIjrW61LTmEbme31bGL17bG+5ijCsdrm6e/PAIZ86dwrT0hHAXJ2yCyap7D/ADYB6wAPi+vU1NUJ5huj3Bo7k95GMcrW9jYX5qn+OpyPOnDaX83+v76XZrHtFgPb+1nJqWTq4+cfINz/UW1BqIxpiXgZd9PSci64wxJ45oqVRYVTd3EB/toCDT+lU11JrHkkVpfHS0QZutIlhFQzttXd0crGkJuAaFsrJNP/DOAebkJHHqnKxwFyesRmINxLgROIaKINXNHWQlx5CdbHVthXrxb+10UdfaRX5aPNnJsTpRMIKVN1h9UzvLdAR+MN4vqWVneSPXnzJzUuWx8mUkgofWdyeY6uYOspJiyUyMJUpCDx6eOR75afFkJcXqRMEI1dLhorHdBVhZlMezf+48xovby0e9+e2Bdw6QkRjDxcvyR/U840FQzVZqcqlu6mRGZgKOKCEzKfSag2ekVV5aPFnJMeypaBqNYqphqmjs7csazzWP6uYObnx8E50uN0XZidy0ZjYXLc3D6RiJ38a9Dla38OruY9x8xmzioifn8FxvI/HpTu662wTkqXkAZCfFhpzfyjPHIz/drnloh3lEKrdriNPS48d1zeORdYfodLn51oXziXFE8f/+tJUzfvIGT3x4mA5X6MPM/Xno3QNER0Xxb5O8o9wjqOAhIgUicpZ9P15EvNdZ/OyolEyFhavbTW1rJ9lJMQBD6rMoq28jSmBKcixZSbE0tHWN6H9iNTI8/R1r5+ZQ1dRBZVPoo+rCrb2rm0feP8TauTl87pSZvPjlU/nt1SvJSIjhzr9s56z/fZOaEehza2jr4k8bS/nkkjxykrWbF4JLT/J5rDXGf2NvmgY843neGPPR6BRNhUNtayfGQJbdWZ6dHBtyn8fR+jZyU+JwOqJ6Ot01RUnkqWiwgsWauTkA7Coff82Lf9l0lNqWTv791CIAoqKEs+dP4ZmbTuaXn1nOkdo2/rHz2LDP8+SHh2nt7Ob6U2YO+1gTRTA1j5uAk7EWgcIYsxfIGc1CqfCpbrIu8j3NVsmxVDd34A6hI/JoXRt5afF9jqMTBSNPeWM7mYkxLJueBoy/fg+32/C7d0pYmJ/C6qKMPs+JCOcvzCU/LZ7Xd1cO6zxd3W4efu8gJxZlTsrsuf4EEzw6jDE9PxtFxImOsJqwPBd5z0U/JzmWrm5DQ1tX0Mcoa2gjP90TPGL6HFdFjoqGdnJT40hLiCE/LfL6PR59/xDv7av2+/zreyopqWrh86cW+Rw2KyKsKc7m3X3Vw2o2femjCsoa2rXW0U8wweNNEfk6EC8iZwN/Av42usVS4dIbPHr7PAAqg2y66nYbKhraB9Y8mrTZKtKUN7QzNdVqv583NYWdZQ1hLlGvysZ2vvnsR1z3+/VsPlznc5/73yohLzWOCxb5X6nyzLk5tHR2s/6A72ME4pkUODMrkTPnaoOLt2CCxx1AFbAd+ALwojHmv0a1VCpseoJHcu9oKwh+rkd1cwdd3aYnePRMNNSaR8Qpb2hjaqr1Pc3PS6GkuoXWTleYS2X5+44KjIGU+Gj+/eENHKlt7fP89tIGPjhQy3UnzyR6kCG5J87KJMYZxet7htZ0telwHVuP1HPdyYVERenAUm/BBI+bjTG/NcZcboy5zBjzWxG5ZdRLpsKiurmTGGdUT1LD3ot/cCNxSu11PPLTrF+0cdEOkmOdmqIkwrR1dlPf2kWuXfNYkJeCMUTMnJwXtpczOyeJJ29YjcttuPahD2lo7W06/e3bJSTFOvnXVdMHPU5CjJPVRZlDDh4PvXuQ1PhoLlsxbUivn8iCCR7X+Nh27QiXQ0WI6qYOspNie9qQQ01R0jPHI60322iW3emuIodngqCn2Wq+nQQzEvo9qpo6+PBALRcszGVWdhK/+ewKDte28sVHN9LpcnO0vo0Xtpdz5fHTSYmLDni8M4qzKalq4VBNS0jlaGrv4pWdx7h4aR4JMTqfuj+/wUNErhKRvwEzReQ5r9sbQM2YlVCNqarmjp7+DoCkWCdx0VEhB4+8tN6x8FlJMRo8Ioxnjoen5jEtPZ7kOGdEjLj6x84K3AbOt/syVhdlcs9li1lXUsOdf9nO7989AMB1QXZgn1Fs9VW8sacqpHL8fccxOlxuLtJUJD4NFk7fA8qBLOAnXtubgG2jWSgVPtXNneSl9l74RYSc5LiQgkdynJNkr1+EWUmx7K1sHrEyHq1vo761kwV5qSN2zMnGM8fD0+chIsyfmhIRNY+XtlcwMyuRubm1nODAAAAgAElEQVS9c5EvWTaNQzWt3PvPvYjAJxfnkW/3qwVSmJVIUVYir+2u5JqTCoMux7NbjjIjI6FnKLPqy2/NwxhzyBjzhp1ufTeQbN9KjTGR0aumRpx3ahKP7OTYoEdbHa1vG/CfeigTDQdz1wu7+NSv3uPjY5HRPj8eldvBIzel94fC/LwUdpc3hXVtj9qWTtaV1HD+wtwBw29vWTuHS5fnI8Dn7UmBwVpTnMO6kpqgV8WsbGrn3X3VXLQ0b9Jnz/UnmBnmlwMfApcDVwAfiMhlo10wNfbcbkNtSydZyTF9tmcnBX/xP1rfPiB4eFKUdLrcI1LOQ7UttHe5+Y/HNw1piVxlNVulJ0QTH9Ob4G/+1JSetT3C5ZWdFXS7jc/htyLCjy9bwpu3ncGiaaHVOs+Ym02ny826Ev/zRrw9v7Uct4GLluaFdJ7JJJgO828AxxtjrjHGXA2sAr45usVS4VDX2km32/iseQQ71Lasvnd2uYfneDUtI1P7KKtvZ25uMnsrm/ne8ztG5JiTjTVBsO/35Jk9Hc5+jxe3VzAjI4EFfmZyR0UJ0zNCX/p11cwM4qMdvL47uH6PZ7eWsSAvhdk5yYF3nqSCCR5RxhjvcW41Qb5OjTOe7Le+gkd9a+Dkhs0dLhraunwED3uW+QhMFGzr7Ka2pZNPLsnjS6fP4okPj/Dc1rJhH3ey8Z4g6DEnJ5loh4St36OhtYt391Vz/qKBTVbDFet0cPLsLF7bXYkxgzfLHaxuYeuReq11BBBMEHhJRP4uIteKyLXAC8CLo1ssFQ79U5N4BJvc0NdIK+idcDgSI67KGnrP8ZWzj2NFQTpf/8t2DlaHr6llPPKkJvEW44xidk5y2Goer+w6hsttuGCh/xnjw3Hm3ByO1rexL8DgjWe3lFmd8ks0eAwmmOBRATwKLAIWA/cbY24f1VKpsPBc3LP79XnkBDnXw7MI1LT0fh3mIc5SH/Qc9iTEvNR4nI4o7rtqGY4o4eYnNmva9yC1d3VT09LJ1JSBqcXDOeLqpe3l5KfFszjE/oxgrSnOBhh0wqAxhme3HuWEmRk9I9GUb8EEj2SsFCWrgP1YQ3jVBOS5uPureQQacVXmtYKgr9ePRIoS74WmwFrq9p7LFrP9aAM/emnPsI8/GXgW9+pf8wCr3yPUtT3+vqMiqDUzmjtcfOK+t7njz9sGDHRobO/i7b3VPkdZjZS8tHjm5iYP2u/x0dFGSqpauHipzu0IJGDwMMZ81xizACs1ex5WosR/DvfEIuIQkc0i8rz9eKaIfCAie0XkKRGJsbfH2o/32c8XDvfcyrfq5k6iHUJqfN9Zu8HOMi+rb8MZJQMWy4mLdpAU6xyZZivPQlNev5rPXZDLtScV8uC7B/jwQO2wzzHR9Tb9Dfxl7ZlpHuzaHpVN7XzhkY3c+tSWgH0J977yMTvKGnlqwxEu/r93+zQfvbarks5ud8/EwNGypjiH9QdraWz3nSX62S1HiXFEcf4oNZ1NJKF0fFdiNWHVMDLredwC7PJ6/CPgp8aYOUAdcL29/XqgzhgzG/ipvZ8aBdXNHWQmxg745ZeZGGSzVV0bualxOHwkkLNmmQ+/w7y0vo0pKXEDkuF99dxiANYf1OARiGeCoM+ax9TQRlzttoPM23ureXpjqd/9dpU38tB7B/n0CTN4+LpVVDV38C+/eIdnNh8F4MXt5eSmxI36hLwzirNxuQ3v7h04ZLfbbXhuaxlrirNJTQic9mSyC2aex5fslCSvYs02/7wxZvFwTioi04BPAL+zHwtwJtaKhQAPAxfb9y+yH2M/v1Z01s6oqG7uGDDHA6yO1PSE6IDJEcvq233+mgXPRMHhL3PqaygwWGlUspJiQ85fNBn5miDokZoQHdLaHrsrrP0W5qfw/ed3+mzucrsN33jmI1Ljo/naucWcdlw2L375VBbmpXLrU1v42tNbeePjKs5bmDvqmWtXFKSTHOfkwXcPDJhk+kFJDZVNHVykTVZBCabmUQDcaoxZYIz5tjFm5wic917ga4Bn1lgmUO81c70U8HyD+cARAPv5Bnv/AUTkBhHZICIbqqpCy2OjfM8u9whmlriv2eUeWUmxI1LzKPMxCdGjMDOBQzWtPp9TvSoa2kiJc5IY6zs70fy84Nf22F3RxJSUWO67chntLjfffnbgvJs/bTzCxkN13Hn+XNISrB8nualxPP75E/jSmln8cUMpnS73oOtyjBSnI4r/POs4tpU2cM5P3+KzD3zA63sqcbsNz2w5SlKsk7XzdN2OYATT53GHMWbLSJ1QRC4EKo0xG703+zp1EM/13WjM/caYlcaYldnZ2cMs6eRT3dTpN3gEym/V7TZUNLYPGKbrYQWP4fV5uN2G8gbfNQ+AGRo8gmLN8fA/kmj+1ODX9thT0URxbgpF2UncetYcXvqogpe2l/c8X9vSyX+/tJtVhRkD0po7HVHcft5cHrrueL54+ixWFKQP/U2F4HOnzGTdnWu57dxi9lQ0cd1D6zn7p2/y4vYKzl2QS1y0I/BBVFgm+50M/IuIHASexGquuhdIs5e4BZgGeGZ+lQLToWcJ3FRAG7ZHmDGGmpbBax6DjbY61thOt9v0ScXuLSvJmmjY1T30FCVV9kJT+X4CVEFGIhWN7bR36ZDdwVQ0Dpzj4W1+kGt7uLrd7K1s7klg+PlTi1iQl8I3n93Rs/bGPS/vprndxfcvXuh3FNUZxTnccf5cn31loyUjMYabzpjNO7efyc+uXEpirJPmDheXr9R1O4I15sHDGHOnMWaaMaYQuBJ4zRjzGeB1wJMz6xrgWfv+c/SuKXKZvb+uoT7CGttcdHWbPunYvXmarfx99P4mCHp4+lICTTQczFE/Q4E9CrOswNV/1TnVl9U35T94eFKDfBSg0/xgTQudLndP8Ih2RPGjTy2mrrWTH7ywk42Hanly/RGuP2UmxbmRmeYjxhnFRUvzefamk9n4jbNYXeSzRVz5EElpRm4HviIi+7D6NB6wtz8AZNrbv4I152TU/PatEh7/4PBoniIiVfVMEPRT80iKpcPlpqnDd1PG0Z5FoPx0mI/ARMH+czz6m2HnPDqoTVd+dbrcVDd3kJviv9kqPy2erKRYv2uHe+y2aybegWFhfipfOK2IP20s5abHNjM1NY4vr50zMoUfRSJCpp9at/ItrMtjGWPeAN6w75dgTUTsv087VkbfMfH3HRVERQmfPmHGWJ0yIvhLTeLhPdfD1+ptZfXWKBt/tYJAKUo2Ha7D1W1YNTPDbxn9TUL0KMxMBAhqxNWWI/XMzU2edO3bx/qtIOiLiLBsRhqbD9cPeqzd5U04ooTZOUl9tn957Rxe/qiCkuoWfv1vy/12zKvxLZJqHhGhKDuRkqrJN9wzlODhy9H6VtISov1eKHpqHn6Cx51/3s7X/7p90DIerWsjOdbpd+nRtIRokuOcATvNa5o7uPSX7/K7t0sG3W8i8iw/O1ifB8DyGekcqG6hrsV/M+PuiiaKshKJdfYNwHHRDu6/eiV3XbKIcxfkDr/QKiJp8OinKDuJ6uYOvzNQJ6rqntQkvvs8cgKkKCmrbydvkBE8nqDkq+ZR2dTOnmNNlFQ1D9rZfbS+3W+TFVi/mAsyEzgUoM9jV3kTbgPv7AtubYeJpLwhcM0DYNkMa7Le5iP+m672HGv025cxOyeJT58wQxdSmsA0ePRTlGU1fUy22kd1cyeOKCE9wX+HOfivefibvOcRH+MgMcbhMy37e/tqAHAbBs14GugcAAWZiRwO0Gzlmdi26XD9pBuZVWGnJpka4HNcPC0VR5T4bbpq7nBxpLatz1KxanLR4NFPUbbVfltSNXJrbo8H1c0dZCTG+J3hmxofTbRDBmm2avM7hNbD36JS7+yrxmmfd/cgw0OP1rcNOkoIoCAjgdK6NlyDDAn2DEHtdLkDtutHitK61p7+iuEoq28nOdZJUoB+iIQYJ3Nzk9nkp9Pc8xnOzfW9aJOa+DR49DMjIwFHlIx4zePljyp4+L2DI3rMkTTY7HKwmoT8LUfb2N5FU7tr0CYlsCcK9nu9MYZ391Vz1rwpxDqj2O0nLYa/hab6K8xMxOU2PR34vuw51sTiaalECawrqRn0eJGgvKGNC3/+Dp/53Qe4h7m+uK91PPxZPiOdrUcafK5pvsfHSCs1uWjw6CfGGcX09HhKqkeu5mGM4a4Xd3HPy7t9/kccjj0VTTyy7uCwj1PV3Om3v8PDX80h0CgoD1+zzEuqWyhvaOe047I5bkqy35pHeYChwB4zMj3DdX0H/2634eNjTawoSGdhfirvR3jw6HYb/vOpLTS0dbGvspl/7KwY1vHKA0wQ9LZsRhrNHS72Vg78TnZXNJIU6xywdouaPDR4+FCUnTSiNY9tpQ0crm2lpbM74KzdUD3wTgnffHZHUOspDKa6qaNnRJQ//vJb7T1mBdqAwSM5ZkDweNfutD5ldhZzc5N7+iP6CzSPxKPADh7+Os0P17bS3uVmXm4KJxZlsiXC+z1+/eZ+3i+p5e5LF1GQmcAv39gfMPX5YCoa2gJ2lnssn2GlC/HVtLe7ooni3GTtEJ/ENHj4UJSVyIHqlmE3EXg8v81a1hJgY4CJV6HyrLuw8dDQj2uMsTPqBgoecQMy47Z2urjn77spzEzomZns9/VJcdT1S1Hyzt5qpmfEMyMzgblTU6hu7vQZoALNLveYkhxHrDPKb6f5Hjs4Fecms3pWJp3d7mF9dqNp0+E6/veVj/nkkjyuWDmdL54+i22lDUMeJdbV7aayqYPcIFfIK8hMICMxhk39Ph9jjJ3TSpusJjMNHj4UZSfR4XL3XLCGw+02vLCtnDOKc8hOjh3wH3E4XN1u9thppTcM47jNHS46XO6gmq1qWjr7dEb/+O8fc6S2jR99avGA8f799U9R4up2s66khlNmZwEwz74Y+aqdldW34YiSniHD/kRFCTMyEvzOMt9d0YQIHDclmeMLM3BESUQ2XTW2d3HLk5vJTYnjB3ZeqEuX5zMlJZZfvr5/SMe00stAXpA1DxFh2fQ0Nh/pW/OoaGynoa1LR1pNcho8fCjKtofrVg+/6WrzkTrKGtq5cPFUVsxIH9FfuSXVVm4hkeEtguRJlT5YhzlYwcMYK1MqwMZDtTz03gGuPrGAE4LICdR/rsf2ow00tbs42Q4enl+yvpquyurbyU2Jw+kI/CdrDdf1HTz2VDRRkJFAfIy1uuGi/FTW7Y+84PGtZz6irL6d+65a2rOyY6zTwedPLWJdSY3fUVCDKbeH6Qbb5wFWv8e+yuaeRIfQOyJOR1pNbho8fOgJHiMwXPdvW8uJcUZx9vwprChI53Bta0jrQw/Gs9rb2rk5fHS0YcC60MEKNLvcw9MnUtnUQXtXN7c9vY281Hi+dt7coM6T1W+Wuae/46RZVvDITIolOznW5xKog60V0p81UbDFZ99A/+aW1UWZbC2tDyr9+Fj5y6ZSntlSxi1r57CioG+6lqtWzSAtIXpItY/eCYLBd3J7+j22lPbWPnpGWk3RmsdkpsHDh+ykWJJjncPuNO92G17cXs6a47JJjotmub1ewaZDIzO3YGd5IzHOKK5YOZ2ubsPW0qEd1zN8NjOIZiuwLv4/e3UvJVUt3P2pRQHnDPS83lPzsM/3zr5qFuSlkJHYe965ucnsOTaw5nG0LvAcD4+CzATau9wDZsO3d3VzsKaFYq9fzCfOyqSr20RMv0dpXSvffOYjVhVmcNMZswc8nxjr5NqTCvnnrmMhD74YbPlZfxZPTyNK6NPcuru8kbzUOF2qdZLT4OGDiFg5roY5XHf9wVoqmzq4cEkeYC3VGeOIGlKTgy+7yhspnpLMCTOtJqMNQ2y68tQ8Ao228vQ3vL67kvvfKuFfV07n1DnBL7rl6fOoau6gtdPFpkP1Pf0dHvOmpvDxseY+/SqehaYCzSPxKOhJkNi36WrvsWbchj5t9SsL0nFGyaBNV3uPNQ25VheqF7eX09LZzf9cvtjv+hbXnlRIQoyDX72xL6Rjlze0kxDjICUu+ESFSbFOjpvSd7Lgbu0sV2jw8Gskhus+v62MuOgo1s61lrWMdTpYNC11RH7lGmPYWdbIvKnJpCZEUzwlmfUHh3bcquZOROhTA/DFU/P4w7pDZCXF8PVPzAvpPAkxzp4UJesP1tHZ7e7p7/CYm5tMp8vdZ55GZZO10FSgkVYeBRm+53rs9hpp5ZEY62TxtFS/kwU/OtrAeT97mxsf2zisIbLBWre/hqLsxJ4A6EtaQgyfOWEGz20t89u344tngmCow2uXzUhny5F63G5DV7eb/VXNfWpvanLS4OFHUVYi5Q3tQ24Ld3W7eWl7BWvnTumTaXZFQTrbSxvocA3vl2xVUwc1LZ3Mn2r9J15ZmM6mQ3VDmoRY3dxBekJMwM7ouGgHyfav1rsuWdTTkRuKrGRrouC7+6qJcURxfGHfNn3Phd273+NoXXDDdD3y0+NxRMmAC+ueiiZinVE9qds9TpyVybbSBlr6rVXS7Tb811+3I8Dre6p4bmsZo8nV7Wb9wTpODGLwwb+fWoQzKopfvxV830d5Q9ugySv9WT4jjaZ2F/urmimpaqGr2zBvqtY8JjsNHn54clwdGOKIq/dLaqlp6eTCxVP7bF8+I53ObjcfHR18lbZAdthpPObZweP4wgyaOlxDmoRY3dQRcJiux9LpaVx5/HTWzpsS8nmgd5b5O3urWVGQTnxM3+G9s3OScERJn/cR7ARBj2hHFPlp8QNqHnuONTFnStKA5qDVRZl0u82AEWuPfXCIraUN/M/li1k6PY3v/m1nz0iz0bD9aAPNHS5OnBU4eExJieNTK6bx9IZSbn1yM994Zjs/enk3v3xjH4+8f4j39lUPyO9VHkJqEm/LvCYL+qq9qclJV2nxo3fEVQsL8lJDfv3z28pIjHFwht1k5bG8wEp1velQHSvsDvSh8Iy0mpfXW/MAq59lfoDJev0Fymvl7Q+fG7BeV0iykmLYdLieqqYObju3eMDzsU4Hs7IT+wzXDbTQlC8FmQkc7jfLfHdFE6f56KNZUZBOtENYV1LDmmLr+zrW2M49L+/h1DlZXLw0n/lTU/nEfW/z/ed38tN/XRp0OULhaToLdinUm8+czcHqFjYdrqfJzi/m8qp5ZiXFcN7CXC5cnMfyGelUNnUEPbvcW1FWIqnx0Ww6XEdGYgzOKKEoKynwC9WEpsHDj5lZiYgMLTV7V7ebl3dUcNb8KQNWqstJjmNGRgIbD9Xx+WGUb1d5I9Mz4nsWRspPi2dqahzrD9ZyzUmFIR2rurmTpdPTgtp3uOkovFOc9O/v8Jibm9KnX6isvo3U+OigR3WBFTz+trW853FtizVz3dfEtoQYJ0umpfF+SW/N43t/20lnt5vvX2RN0CvOTebGNbO477V9XLQ0ryfIjKR1+2s4bkpS0IE8Ly2eJ25Y3fPYGEOHy01jWxcbD9Xx/PZynt5YyqPvHyYzMYZutxlSzSMqSlg63VpZMD89ntk5ScQ4tdFistO/AD/ioh3kpQ4tQeI7+6qpb+3iwsV5Pp9fUZDOxsN1w+qA3VneyDyvTksRYWVhBusP1oZ0XFe3m4rG9iH9Ih0Kz4UxOc6aoOdLcW4yR+vbehbkOhrEOh79FWQk0tDWRX2r1cwUqLnlxFmZfHS0gab2Ll7fXckL28u5+YzZFGb19o/cdOZsZuck8V9//YhmP2u5D1Wny82GIPs7/BER4qId5KTEcf6iqfzfp5ez6Ztn8/OrlnF8YQaZiTFB/0job/mMdD6ubGLrkXptslKABo9BDXVJ2ue3lpMc5+S043z/sl5ekE5VUweldUNLf9La6eJAdcuA5qnjC9M51hjacT2z1OeOUQeoJ3icNCvT71BUT2fsx3a/R1kQa4X015Mg0e40711/wk/wsPs93t5bzTef/YhZ2YnccHpRn31inQ5+9KlFlDW08eO/7wmpPIFsK62nras7qP6OUCTEOPnkkjx+/dkVbPzm2UNqggVrprkxUNPSqcFDARo8BjUrO4mSquaQfsl3uLr5x84Kzpmf6zfX0wq7A3KoQ3b3VDRhTG9nucdKezbyhkPBz/fY1a/jfbR5gkf/+R3ePGkvdtkX/FBml3t4hrp6Os33VDSRnhDdM9y4v+UF6cQ4ovj6X7dTWtfGXZcs8vn9rSjI4LOrC3h43cERnVi4bn8NIvTM2Yk0S2ek9ST3nKfDdBUaPAZVlJ1IS2e333W7ffnhC7toandx6fJ8v/sU5yaTGOMY8sVnp33Bn9/vgl+cm0xyrDOk+R67ypuIdoxdB+iKgnTOKM7m3IW5fveZmhpHSpyT3eWNPQtNhdpsNcOe6+EZrhsohXhctIOlM9Kob+3i8hXTBs3V9bXz5pKbEscdf94WVDr390tq+NKjG2lq7/K7z7qSGubmppAeYK5NuKTERTPbHoGoNQ8FGjwG5bmg7g8yx9Uf1h3kD+sOccNpRX47gwEcUcKyYSRJ3FXeSHLcwIV4HFHC8oL0kGaa7ypvZHZO8ph1gGYnx/LQdavISfbfDCUizJ2awu6KpqAXmuovPsbBlJRYDtW24rYXgAqUyO+8BblMTY3jzgsGn/yYFOvk7k8tZl9VM//1148GrZmW1bfxpUc38tJHFTz+wWGf+7R3dbPhUB0njXCT1Ug7oSiDrKTYMesfU5FNg8cgvIfrBvL23iq++7edrJ2bw+1BJApcXpDO7orGIXW8WjPLU3z+ij6+MJ2PjzX3dBQHsqu8MSInfM3NTWZPRVPPBMFgU5N4K8hI5FBNC6V1bbR2dgf8xfy5U2by7u1nBpxpD3D6cdl8+cw5/HlTKY9/6DsodLrc3PjYJjpdbhbkpfDAOwd8Tg7dfLieTpd7WJ3lY+H28+by1xtP0gWgFKDBY1C5KXHERzsCBo99lc3c+Ngm5uQk8bOrlvntCPa2oiAdt4GtR0JLZuh2G3ZXNA1osvJYac/YDqZWU9PcQWVTh99jhdPc3BSaO1x8aNeiQu3zADu7bk1rSBPbooL47jxuWTuHNcXZfOe5HWz2ka/srhd3seVIPf9z+RJuP28ulU0dPLt54Cz1dSU1RAmsKsoY8FwkSY6LZrrdHKiUBo9BREUJhVmDJ0isa+nk+ofXE+uM4nfXrAx6LsIyuwMy1KarQ7WttHZ2+73gL5mWRrRDgur38KQAGavO8lB4Rn+9uquSaIcETNroS0FmApVNHWyxA/RxI5xCPCpKuPdflzIlJY4bH9vUZyng57aW8fv3DvK5k2dywaKpnDoniwV5Kfz6rf0DVqh8f38NC/NTe+bsKDUeaPAIYLDhup0uN198dCPl9e385rMrmZYe/K+ylDgrmWGowcMzs9zfLPL4GAcL81OD6vcY65FWofCsFbGvspnc1LiQagQenhFXr+w8xvSM+JAmGQYrLSGGX//bCmpaOrn5ic24ut3sq2zijj9vY2VBOndeYDVhighfOH0WJVUtvLLrWM/r2zq72XxkePM7lAoHDR4BzMpKpLSu1Wdb9fee38EHB2q557LFQ0o1srwgnU2H60JaK31XeSPOKGF2jv/RUasKM9hW2hBwJNCu8kampMQG1cY/1hJjnT0jpobSZAW9cz32VjZTPGX0AuTC/FR+cPFC3ttfww9e2MUXH91EQoyDX3x6OdFeySYvWJjL9Ix4fv3m/p5O9o2H6ujqNqyO8M5ypfrT4BFAUXYSbjNwbYgXt5fz6PuH+cJpRVy8zP+w3MGsmJFOU7uLfSGsWLizvJFZ2UkD0p54W1mYQWe3m22lDQGPFYm1Dg/PhL5QR1p5FGT0zg4f7UEBV6yczlWrZvD79w5SUtXMfVcuG5AKxOmI4oZTi9h8uJ4PD1g1w3Ul1TiiZEB2YaUinQaPAHwtSXu0vo07/ryNJdPT+KqP5H7BWjbDShWx5XDwnea7yhsDJj701II+POB/gaNOl7UuQ0QHD7tsQ615pCZEk2avdjcWcxO+8y/z+cSiqXz3ooWc5Geo9mUrppORGMNv3ioBrMmBi6eljkqTmlKjSYNHADPt3Eb77X4PV7ebW5/cjNvAfVcu7dMsEaqCzERiHFFBzyOpbemkvKE94K/ojMQYFuan8MaeKr/77KtsttdliNzgMW+YNQ/oXRjKX1qSkRTrdPB/n1nOZ1cX+N0nPsbBtScV8truSjYdrmNbaYP2d6hxKSzBQ0Smi8jrIrJLRHaIyC329gwReUVE9tr/ptvbRUTuE5F9IrJNRJaPVVmT46LJSY7tWdfj56/tY/3BOn5w8cJBV3sLhiNKmJmVGHTw2NUzszxwfqIzi3PYdLiOOj/rT/R0lkfwbOHjZ2awMD+FVTOH3qTjCdD9F4AKp6tPLCAhxsGtT27B5TYjns9KqbEQrpqHC/h/xph5wGrgJhGZD9wBvGqMmQO8aj8GOB+YY99uAH41loW1Rlw18+GBWn7+2l4uXZ4/5H6O/mblJPbUagLpHR0V+IJ/5rwpuA28+bHv2seu8kZinFE9NatIlJUUy/M3n8qs7KGnTvn8qUXcdemigKskjqW0hBiuPH4Gh2tbiXZIT04ypcaTsPyPMsaUG2M22febgF1APnAR8LC928PAxfb9i4A/GMv7QJqITGWMFGUnsbeymVuf3MyMjAS+d9HCETv2rOwkDte20ulyB9x3Z5k1OioziDkPi/NTyUqK4dXdlT6f31XRSPGU5Ii6qI6GRdNSuWzFtHAXY4B/P3UmTnudjP6rKSo1HoS9l05ECoFlwAfAFGNMOVgBRkQ8K+7kA0e8XlZqbyv32oaI3IBVM2HGjBkjVsairESa2l20d3Xz5y+dNKKdm7Oyk+h2Gw7XtjA7Z/Aaxc7yxqBng0dFCWuKc/jHjgpc3e4+QcIYw67yJs6aN/ILGqng5KXF8+PLlwyrP0epcArrz04RSQL+DNxqjBlsUW9fM8QGTI4wxtxvjFlpjFmZnT1wudGh8iTUu8dwV6oAAAxjSURBVO3cYhZPG9piOv54mmT2VQ7edNXh6mZfZWijo9bOzaGx3TVgImJlUwe1LZ0R3Vk+GVy8LH9Y/TlKhVPYgoeIRGMFjseMMX+xNx/zNEfZ/3raXEqB6V4vnwYMTBI0Sk6enclfbzyJz59aFHjnEHmGAgfqNN97rBmX24S0Pvkpc7KIdgiv7enbdBXJM8uVUuNDuEZbCfAAsMsY879eTz0HXGPfvwZ41mv71faoq9VAg6d5a4zKy7IZ6aOSTTQx1snU1LiAwcOTliSUleCS46JZNTOD13b1Dx52Titd1EcpNUThqnmcDHwWOFNEtti3C4C7gbNFZC9wtv0Y4EWgBNgH/Ba4MQxlHjWzspMCjrjaUdZAUqyzZ95CsM4ozmFvZTNHantnyO8qbyQ/LZ7UBE3Ep5QamrB0mBtj3sF3PwbAWh/7G+CmUS1UGM3KTuQvm45ijPFbu9lRZq27EWqCwLXzpvCDF3bx2u5KrjmpEIjcNTyUUuPHxB6nOU7MykmiqcNFlZ/lbt1uY6UlGUIfxcysRGZmJfKaPWS3vaubkuoW7e9QSg2LBo8I0DPiyk+/x8GaFlo6u0Pq7/B25twc1pXU0NrpYu+xZrrdkZ2WRCkV+TR4RABP8PDX77EjwBoegaydm0Ony827+2p6RlqNRa4npdTEFfZJggqmpMSSGONgf6XvmsfO8kaiHTLklfBWFmaQFOvktd3HiHU6iI92DDsvl1JqctPgEQFEhKLsJL/DdXeUNTInJ5kY59AqijHOKE47LovXdldSkJlIcW5yUOusK6WUP9psFSFm+Vnu1hjDzrKGITdZeZxRnMOxxg42HKzV/g6l1LBp8IgQs7KTOFrfRmunq8/2yqYOqps7WTDM4LGmOAcRcBuYr8N0lVLDpMEjQsyy1yTvX/vYUWYtJTvUkVYe2cmxLLHzcmnNQyk1XBo8IkTviKu+/R47jga/hkcg5y/MJT7a0bO8q1JKDZV2mEeIgswEomTgcN2d5Y0UZiaQHDf8VCLXnzKTi5bm63rZSqlh05pHhIiLdjA9I2FgzaOscdhNVh5ORxS5qXEjciyl1OSmwSOCzMpO6jPXo7G9i8O1rcMeaaWUUiNNg0cEmZWdyIHqFrrd1jpXO4c5s1wppUaLBo8IMis7iQ6Xm7L6NqA3Lclwh+kqpdRI0+ARQTzDdT0JEneUNZCdHEtOsvZTKKUiiwaPCNIzXNfu99hZ1qi1DqVURNLgEUEyEmNIT4hmf1ULHa5u9lU2D2kND6WUGm0aPCLMLDtB4scVzbjcZsSG6Sql1EjS4BFhZmUnUVLV7JWWRGseSqnIo8EjwszKSaS6uZN1JTUkxTqZkZEQ7iIppdQAGjwiTFGW1Wn+ys5jzJ+aQpSuu6GUikAaPCKMZ7hua2e3Tg5USkUsDR4RZnp6PNEOq7ahwUMpFak0eEQYpyOKQnt9ce0sV0pFKg0eEWhWdhLRDmFOjq74p5SKTLqwQwT63CkzOWl2JjFOje1KqcikwSMCrZqZwaqZGeEuhlJK+aU/bZVSSoVMg4dSSqmQafBQSikVMg0eSimlQjZugoeInCcie0Rkn4jcEe7yKKXUZDYugoeIOID/A84H5gNXicj88JZKKaUmr3ERPIBVwD5jTIkxphN4ErgozGVSSqlJa7wEj3zgiNfjUntbHyJyg4hsEJENVVVVY1Y4pZSabMbLJEFfecnNgA3G3A/cDyAiVSJyaIjnywKqh/jaSDeR3xtM7Pen7238Gk/vryCYncZL8CgFpns9ngaUDfYCY0z2UE8mIhuMMSuH+vpINpHfG0zs96fvbfyaiO9vvDRbrQfmiMhMEYkBrgSeC3OZlFJq0hoXNQ9jjEtE/gP4O+AAHjTG7AhzsZRSatIaF8EDwBjzIvDiGJ3u/jE6TzhM5PcGE/v96Xsbvybc+xNjBvQ7K6WUUoMaL30eSimlIogGD6WUUiGbFMFDRB4UkUoR+chr2xIRWSci20XkbyKS4vXcYvu5Hfbzcfb2FfbjfSJyn4j4mn8y5kJ5fyLyGRHZ4nVzi8hS+7mIe38hvrdoEXnY3r5LRO70ek3E5UYL8b3FiMhD9vatIrLG6zWR+L1NF5HX7e9hh4jcYm/PEJFXRGSv/W+6vV3ssu8TkW0istzrWNfY++8VkWvC9Z68DeH9zbW/1w4R+Wq/Y0Xc32ZQjDET/gacBiwHPvLath443b7/OeD79n0nsA1YYj/OBBz2/Q+BE7EmLb4EnB/u9xbq++v3ukVAidfjiHt/IX53nwaetO8nAAeBQqwRevuBIiAG2ArMH2fv7SbgIft+DrARiIrg720qsNy+nwx8/P/bu7sQK+owjuPfXynkS6V7oZlEGmTmRWBtpiAkWhYGmYRoL1gZVFBYUoZUNymSiUQXXRSYoCRBpaWUaLZkQrpiWlqpZUaIsLS0aquJ6MrTxf/Z9qA7u2dc3TOnfT5wOLP/mTPMb2c4/zNvz5Dq0i0B5nv7fOBNH57iyy5gLLDd22uA3/19oA8PrMJ8g4DbgUXASyXzKeS2Wc6rR+x5mNkW4Mg5zTcBW3x4E/CgD08G9pjZbv9sk5mdlTQEuMrMtlla6yuBBy790ncuZ75SDwEfAhQ1X85sBvST1AvoA5wGmilobbSc2UYBdf65RuAYUFvg9dZgZrt8+Diwj1RSaCqwwidbQduyTgVWWlIPDPBs9wCbzOyImR0l/U/u7cYo7cqbz8wazWwHcOacWRVy2yxHj+g8MvwE3O/D02m7g30EYJI2Stol6WVvH0q6071Vu/W1CiQrX6kZeOdBdeXLyvYJ8A/QABwClprZEcqsjVYQWdl2A1Ml9ZI0HLjNxxV+vUkaBowGtgODzawB0hcw6Rc5ZK+jwq+7MvNlKXy+LD2585gNPCtpJ2m387S39wLGA4/4+zRJkyizvlaBZOUDQNIdwEkzaz3eXk35srKNAc4C1wLDgRcl3cD/I9ty0hfLd8DbwFaghYJnk9QfWA28YGbNHU3aTpt10F4IOfJlzqKdtsLk60jV3CR4sZnZftIhKiSNAO7zUYeBb8zsLx+3nnRc+gNSTa1WndbXqqQO8rWaSdteB6TcVZGvg2wPAxvM7AzQKOlboJb0yy5XbbRKycpmZi3A3NbpJG0FDgBHKeh6k9Sb9MW6yszWePOfkoaYWYMflmr09qz6dYeBCee0b76Uy12unPmy5K7bVxQ9ds9D0iB/vwx4DXjXR20EbpHU14+d3wns9V3Q45LG+tUss4C1FVj0snSQr7VtOun4KvDfLnZV5Osg2yFgol+504904nU/VVQbLSubb4/9fPhuoMXMCrtd+rK8D+wzs7dKRq0DWq+Yeoy2ZV0HzPJ1Nxb427NtBCZLGuhXLk32toq6gHxZqmbbPE+lz9h3x4v0C7uBdLLqMPAk8DzpColfgcX43fY+/aPAz6Tjz0tK2mu97SDwTulnqizfBKC+nfkULl+ebEB/4GNfd3uBeSXzmeLTHwRerXSuC8g2DPiFdGL2K+D6gq+38aTDL3uAH/w1hXT1Yh1pr6kOqPHpRXpa6EHgR6C2ZF6zgd/89USls11gvmt8HTeTLnY4TLrQoZDbZjmvKE8SQgghtx572CqEEMKFi84jhBBCbtF5hBBCyC06jxBCCLlF5xFCCCG36DxCCCHkFp1HCAUj6fJKL0MInYnOI4QukLSw9VkO/vciSXMkzZO0w59N8XrJ+M8k7fRnQDxV0n5C0gJJ24FxkhZL2uufX9rNsULoVNwkGEIXeEXVNWZ2q5cUOQC8AkwCnibdOb2OVKlgi6QaMzsiqQ9tz+5okmTADDP7SFINsA0YaWYmaYCZHatAvBAy9djCiCFcDGb2h6QmSaOBwcD3pIf+TPZhSGVTbiQ9p2OOpGnefp23N5GqAa/29mbgFLBM0hfA592RJYQ8ovMIoeuWAY+T6hctJ+11vGFm75VOpPTo2LuAcWZ2UtJm4AoffcrMzkKqoCtpjM9nJvAcMPHSxwihfNF5hNB1nwILgN6ksvAtwEJJq8zshKShpOKHVwNHveMYSar6ex5/RkRfM1svqZ5UEDCEQonOI4QuMrPTkr4Gjvnew5eSbga2pcrdnCBVat4APCNpD6lCbn3GLK8E1kq6gnTOZG7GdCFUTJwwD6GL/ET5LmC6mR2o9PKE0B3iUt0QukDSKNJhpbroOEJPEnseIYQQcos9jxBCCLlF5xFCCCG36DxCCCHkFp1HCCGE3KLzCCGEkNu//NFL9R5kfokAAAAASUVORK5CYII=\n",
      "text/plain": [
       "<Figure size 432x288 with 1 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    }
   ],
   "source": [
    "# Since popularuty variable has some outlier data, we use vote_count to count for popularity.\n",
    "plt.plot(year_mean.index, year_mean['vote_count'], label='vote_count')\n",
    "plt.xlabel('years')\n",
    "plt.ylabel('vote_count')\n",
    "plt.title('Movie vote_count over years')\n",
    "plt.legend()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Analysis:**\n",
    ">- Yes. There is clear trend that the newer movies received more vote count and are more popular. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Question 3. What are the top 5 most common movie generes that associated with high revenue?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**1. Find out the movies with very high revenues**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "min    2.370705e+00\n",
       "50%    6.173068e+07\n",
       "80%    2.033811e+08\n",
       "90%    3.538761e+08\n",
       "max    2.827124e+09\n",
       "Name: revenue_adj, dtype: float64"
      ]
     },
     "execution_count": 23,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# check the distribution for 'revenue_adj'\n",
    "tmdb.revenue_adj.describe([.8,.9]).iloc[3:]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "low          1550\n",
       "very_low     1271\n",
       "very_high     517\n",
       "high          516\n",
       "Name: revenue_level, dtype: int64"
      ]
     },
     "execution_count": 24,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Create an ordinal data column to categorize movies with different levels of revenues\n",
    "bin_edge=[0, 2.872138e+07, 1.496016e+08, 2.880722e+08, 3e+09]\n",
    "bin_names=['very_low','low','high','very_high']\n",
    "tmdb['revenue_level']=pd.cut(tmdb.revenue_adj,bin_edge,labels=bin_names)\n",
    "tmdb['revenue_level'].value_counts()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    ">- Half of the movies have very_low or even negative revenue\n",
    ">- There are **517 movies with very high_revenue**, let's focus on these movies."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>popularity</th>\n",
       "      <th>original_title</th>\n",
       "      <th>cast</th>\n",
       "      <th>director</th>\n",
       "      <th>runtime</th>\n",
       "      <th>genres</th>\n",
       "      <th>vote_count</th>\n",
       "      <th>vote_average</th>\n",
       "      <th>release_year</th>\n",
       "      <th>budget_adj</th>\n",
       "      <th>revenue_adj</th>\n",
       "      <th>profit</th>\n",
       "      <th>revenue_level</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>32.985763</td>\n",
       "      <td>Jurassic World</td>\n",
       "      <td>Chris Pratt|Bryce Dallas Howard|Irrfan Khan|Vi...</td>\n",
       "      <td>Colin Trevorrow</td>\n",
       "      <td>124</td>\n",
       "      <td>Action|Adventure|Science Fiction|Thriller</td>\n",
       "      <td>5562</td>\n",
       "      <td>6.5</td>\n",
       "      <td>2015</td>\n",
       "      <td>1.379999e+08</td>\n",
       "      <td>1.392446e+09</td>\n",
       "      <td>1.254446e+09</td>\n",
       "      <td>very_high</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>28.419936</td>\n",
       "      <td>Mad Max: Fury Road</td>\n",
       "      <td>Tom Hardy|Charlize Theron|Hugh Keays-Byrne|Nic...</td>\n",
       "      <td>George Miller</td>\n",
       "      <td>120</td>\n",
       "      <td>Action|Adventure|Science Fiction|Thriller</td>\n",
       "      <td>6185</td>\n",
       "      <td>7.1</td>\n",
       "      <td>2015</td>\n",
       "      <td>1.379999e+08</td>\n",
       "      <td>3.481613e+08</td>\n",
       "      <td>2.101614e+08</td>\n",
       "      <td>very_high</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>11.173104</td>\n",
       "      <td>Star Wars: The Force Awakens</td>\n",
       "      <td>Harrison Ford|Mark Hamill|Carrie Fisher|Adam D...</td>\n",
       "      <td>J.J. Abrams</td>\n",
       "      <td>136</td>\n",
       "      <td>Action|Adventure|Science Fiction|Fantasy</td>\n",
       "      <td>5292</td>\n",
       "      <td>7.5</td>\n",
       "      <td>2015</td>\n",
       "      <td>1.839999e+08</td>\n",
       "      <td>1.902723e+09</td>\n",
       "      <td>1.718723e+09</td>\n",
       "      <td>very_high</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>9.335014</td>\n",
       "      <td>Furious 7</td>\n",
       "      <td>Vin Diesel|Paul Walker|Jason Statham|Michelle ...</td>\n",
       "      <td>James Wan</td>\n",
       "      <td>137</td>\n",
       "      <td>Action|Crime|Thriller</td>\n",
       "      <td>2947</td>\n",
       "      <td>7.3</td>\n",
       "      <td>2015</td>\n",
       "      <td>1.747999e+08</td>\n",
       "      <td>1.385749e+09</td>\n",
       "      <td>1.210949e+09</td>\n",
       "      <td>very_high</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>9.110700</td>\n",
       "      <td>The Revenant</td>\n",
       "      <td>Leonardo DiCaprio|Tom Hardy|Will Poulter|Domhn...</td>\n",
       "      <td>Alejandro González Iñárritu</td>\n",
       "      <td>156</td>\n",
       "      <td>Western|Drama|Adventure|Thriller</td>\n",
       "      <td>3929</td>\n",
       "      <td>7.2</td>\n",
       "      <td>2015</td>\n",
       "      <td>1.241999e+08</td>\n",
       "      <td>4.903142e+08</td>\n",
       "      <td>3.661143e+08</td>\n",
       "      <td>very_high</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   popularity                original_title  \\\n",
       "0   32.985763                Jurassic World   \n",
       "1   28.419936            Mad Max: Fury Road   \n",
       "3   11.173104  Star Wars: The Force Awakens   \n",
       "4    9.335014                     Furious 7   \n",
       "5    9.110700                  The Revenant   \n",
       "\n",
       "                                                cast  \\\n",
       "0  Chris Pratt|Bryce Dallas Howard|Irrfan Khan|Vi...   \n",
       "1  Tom Hardy|Charlize Theron|Hugh Keays-Byrne|Nic...   \n",
       "3  Harrison Ford|Mark Hamill|Carrie Fisher|Adam D...   \n",
       "4  Vin Diesel|Paul Walker|Jason Statham|Michelle ...   \n",
       "5  Leonardo DiCaprio|Tom Hardy|Will Poulter|Domhn...   \n",
       "\n",
       "                         director  runtime  \\\n",
       "0                 Colin Trevorrow      124   \n",
       "1                   George Miller      120   \n",
       "3                     J.J. Abrams      136   \n",
       "4                       James Wan      137   \n",
       "5  Alejandro González Iñárritu      156   \n",
       "\n",
       "                                      genres  vote_count  vote_average  \\\n",
       "0  Action|Adventure|Science Fiction|Thriller        5562           6.5   \n",
       "1  Action|Adventure|Science Fiction|Thriller        6185           7.1   \n",
       "3   Action|Adventure|Science Fiction|Fantasy        5292           7.5   \n",
       "4                      Action|Crime|Thriller        2947           7.3   \n",
       "5           Western|Drama|Adventure|Thriller        3929           7.2   \n",
       "\n",
       "   release_year    budget_adj   revenue_adj        profit revenue_level  \n",
       "0          2015  1.379999e+08  1.392446e+09  1.254446e+09     very_high  \n",
       "1          2015  1.379999e+08  3.481613e+08  2.101614e+08     very_high  \n",
       "3          2015  1.839999e+08  1.902723e+09  1.718723e+09     very_high  \n",
       "4          2015  1.747999e+08  1.385749e+09  1.210949e+09     very_high  \n",
       "5          2015  1.241999e+08  4.903142e+08  3.661143e+08     very_high  "
      ]
     },
     "execution_count": 25,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Create a dataframe that only contains movies with very high revenues\n",
    "very_high_revenue = tmdb[tmdb['revenue_level']=='very_high']\n",
    "# View top 5 very_high revenue movies\n",
    "very_high_revenue.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**2. Find out among the very_high profit movies, what kinds of generes are the most common ones.** "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>0</th>\n",
       "      <th>1</th>\n",
       "      <th>2</th>\n",
       "      <th>3</th>\n",
       "      <th>4</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Action</td>\n",
       "      <td>Adventure</td>\n",
       "      <td>Science Fiction</td>\n",
       "      <td>Thriller</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Action</td>\n",
       "      <td>Adventure</td>\n",
       "      <td>Science Fiction</td>\n",
       "      <td>Thriller</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Action</td>\n",
       "      <td>Adventure</td>\n",
       "      <td>Science Fiction</td>\n",
       "      <td>Fantasy</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Action</td>\n",
       "      <td>Crime</td>\n",
       "      <td>Thriller</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>Western</td>\n",
       "      <td>Drama</td>\n",
       "      <td>Adventure</td>\n",
       "      <td>Thriller</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "         0          1                2         3     4\n",
       "0   Action  Adventure  Science Fiction  Thriller  None\n",
       "1   Action  Adventure  Science Fiction  Thriller  None\n",
       "3   Action  Adventure  Science Fiction   Fantasy  None\n",
       "4   Action      Crime         Thriller      None  None\n",
       "5  Western      Drama        Adventure  Thriller  None"
      ]
     },
     "execution_count": 26,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# separate the generes with '|' and make it a new table\n",
    "generes = very_high_revenue['genres'].str.split('|', expand=True)\n",
    "# view the first 5 rows of the new table\n",
    "generes.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>0</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Action</th>\n",
       "      <td>121</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Adventure</th>\n",
       "      <td>116</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Animation</th>\n",
       "      <td>44</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Comedy</th>\n",
       "      <td>63</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Crime</th>\n",
       "      <td>12</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Drama</th>\n",
       "      <td>61</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Family</th>\n",
       "      <td>12</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Fantasy</th>\n",
       "      <td>26</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>History</th>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Horror</th>\n",
       "      <td>9</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Music</th>\n",
       "      <td>6</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Mystery</th>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Romance</th>\n",
       "      <td>6</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Science Fiction</th>\n",
       "      <td>21</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Thriller</th>\n",
       "      <td>9</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>War</th>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Western</th>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                   0\n",
       "Action           121\n",
       "Adventure        116\n",
       "Animation         44\n",
       "Comedy            63\n",
       "Crime             12\n",
       "Drama             61\n",
       "Family            12\n",
       "Fantasy           26\n",
       "History            3\n",
       "Horror             9\n",
       "Music              6\n",
       "Mystery            3\n",
       "Romance            6\n",
       "Science Fiction   21\n",
       "Thriller           9\n",
       "War                3\n",
       "Western            2"
      ]
     },
     "execution_count": 27,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Count the frequency for each genere in column 0 in the generes table, sorted as index.\n",
    "col_0 = generes.loc[:,0].value_counts().sort_index()\n",
    "# Convert the pandas Series to a data frame \n",
    "df_0 = pd.DataFrame(data=col_0, index=col_0.index)\n",
    "# display the new dataframe\n",
    "df_0"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {},
   "outputs": [],
   "source": [
    "# do the same for other columns:\n",
    "col_1=generes.loc[:,1].value_counts().sort_index()\n",
    "df_1=pd.DataFrame(data=col_1, index=col_1.index)\n",
    "\n",
    "col_2=generes.loc[:,2].value_counts().sort_index()\n",
    "df_2=pd.DataFrame(data=col_2, index=col_2.index)\n",
    "\n",
    "col_3=generes.loc[:,3].value_counts().sort_index()\n",
    "df_3=pd.DataFrame(data=col_3, index=col_3.index)\n",
    "\n",
    "col_4=generes.loc[:,4].value_counts().sort_index()\n",
    "df_4=pd.DataFrame(data=col_4, index=col_4.index)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>0</th>\n",
       "      <th>1</th>\n",
       "      <th>2</th>\n",
       "      <th>3</th>\n",
       "      <th>4</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Action</th>\n",
       "      <td>121</td>\n",
       "      <td>79</td>\n",
       "      <td>33</td>\n",
       "      <td>3.0</td>\n",
       "      <td>4.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Adventure</th>\n",
       "      <td>116</td>\n",
       "      <td>74</td>\n",
       "      <td>36</td>\n",
       "      <td>14.0</td>\n",
       "      <td>1.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Animation</th>\n",
       "      <td>44</td>\n",
       "      <td>23</td>\n",
       "      <td>9</td>\n",
       "      <td>3.0</td>\n",
       "      <td>2.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Comedy</th>\n",
       "      <td>63</td>\n",
       "      <td>44</td>\n",
       "      <td>37</td>\n",
       "      <td>13.0</td>\n",
       "      <td>5.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Crime</th>\n",
       "      <td>12</td>\n",
       "      <td>16</td>\n",
       "      <td>19</td>\n",
       "      <td>11.0</td>\n",
       "      <td>2.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Drama</th>\n",
       "      <td>61</td>\n",
       "      <td>52</td>\n",
       "      <td>32</td>\n",
       "      <td>5.0</td>\n",
       "      <td>1.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Family</th>\n",
       "      <td>12</td>\n",
       "      <td>36</td>\n",
       "      <td>40</td>\n",
       "      <td>28.0</td>\n",
       "      <td>9.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Fantasy</th>\n",
       "      <td>26</td>\n",
       "      <td>42</td>\n",
       "      <td>23</td>\n",
       "      <td>15.0</td>\n",
       "      <td>10.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>History</th>\n",
       "      <td>3</td>\n",
       "      <td>5</td>\n",
       "      <td>4</td>\n",
       "      <td>1.0</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Horror</th>\n",
       "      <td>9</td>\n",
       "      <td>5</td>\n",
       "      <td>3</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Music</th>\n",
       "      <td>6</td>\n",
       "      <td>6</td>\n",
       "      <td>3</td>\n",
       "      <td>2.0</td>\n",
       "      <td>4.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Mystery</th>\n",
       "      <td>3</td>\n",
       "      <td>13</td>\n",
       "      <td>7</td>\n",
       "      <td>10.0</td>\n",
       "      <td>2.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Romance</th>\n",
       "      <td>6</td>\n",
       "      <td>29</td>\n",
       "      <td>20</td>\n",
       "      <td>10.0</td>\n",
       "      <td>5.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Science Fiction</th>\n",
       "      <td>21</td>\n",
       "      <td>18</td>\n",
       "      <td>37</td>\n",
       "      <td>32.0</td>\n",
       "      <td>6.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Thriller</th>\n",
       "      <td>9</td>\n",
       "      <td>38</td>\n",
       "      <td>66</td>\n",
       "      <td>28.0</td>\n",
       "      <td>7.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>War</th>\n",
       "      <td>3</td>\n",
       "      <td>1</td>\n",
       "      <td>11</td>\n",
       "      <td>4.0</td>\n",
       "      <td>1.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Western</th>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>3</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                   0   1   2     3     4\n",
       "Action           121  79  33   3.0   4.0\n",
       "Adventure        116  74  36  14.0   1.0\n",
       "Animation         44  23   9   3.0   2.0\n",
       "Comedy            63  44  37  13.0   5.0\n",
       "Crime             12  16  19  11.0   2.0\n",
       "Drama             61  52  32   5.0   1.0\n",
       "Family            12  36  40  28.0   9.0\n",
       "Fantasy           26  42  23  15.0  10.0\n",
       "History            3   5   4   1.0   NaN\n",
       "Horror             9   5   3   1.0   1.0\n",
       "Music              6   6   3   2.0   4.0\n",
       "Mystery            3  13   7  10.0   2.0\n",
       "Romance            6  29  20  10.0   5.0\n",
       "Science Fiction   21  18  37  32.0   6.0\n",
       "Thriller           9  38  66  28.0   7.0\n",
       "War                3   1  11   4.0   1.0\n",
       "Western            2   2   3   NaN   NaN"
      ]
     },
     "execution_count": 29,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# join the other 4 dataframe together\n",
    "generes_join=df_0.join(df_1).join(df_2).join(df_3).join(df_4)\n",
    "generes_join"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>0</th>\n",
       "      <th>1</th>\n",
       "      <th>2</th>\n",
       "      <th>3</th>\n",
       "      <th>4</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Action</th>\n",
       "      <td>121</td>\n",
       "      <td>79</td>\n",
       "      <td>33</td>\n",
       "      <td>3.0</td>\n",
       "      <td>4.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Adventure</th>\n",
       "      <td>116</td>\n",
       "      <td>74</td>\n",
       "      <td>36</td>\n",
       "      <td>14.0</td>\n",
       "      <td>1.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Animation</th>\n",
       "      <td>44</td>\n",
       "      <td>23</td>\n",
       "      <td>9</td>\n",
       "      <td>3.0</td>\n",
       "      <td>2.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Comedy</th>\n",
       "      <td>63</td>\n",
       "      <td>44</td>\n",
       "      <td>37</td>\n",
       "      <td>13.0</td>\n",
       "      <td>5.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Crime</th>\n",
       "      <td>12</td>\n",
       "      <td>16</td>\n",
       "      <td>19</td>\n",
       "      <td>11.0</td>\n",
       "      <td>2.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Drama</th>\n",
       "      <td>61</td>\n",
       "      <td>52</td>\n",
       "      <td>32</td>\n",
       "      <td>5.0</td>\n",
       "      <td>1.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Family</th>\n",
       "      <td>12</td>\n",
       "      <td>36</td>\n",
       "      <td>40</td>\n",
       "      <td>28.0</td>\n",
       "      <td>9.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Fantasy</th>\n",
       "      <td>26</td>\n",
       "      <td>42</td>\n",
       "      <td>23</td>\n",
       "      <td>15.0</td>\n",
       "      <td>10.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>History</th>\n",
       "      <td>3</td>\n",
       "      <td>5</td>\n",
       "      <td>4</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Horror</th>\n",
       "      <td>9</td>\n",
       "      <td>5</td>\n",
       "      <td>3</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Music</th>\n",
       "      <td>6</td>\n",
       "      <td>6</td>\n",
       "      <td>3</td>\n",
       "      <td>2.0</td>\n",
       "      <td>4.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Mystery</th>\n",
       "      <td>3</td>\n",
       "      <td>13</td>\n",
       "      <td>7</td>\n",
       "      <td>10.0</td>\n",
       "      <td>2.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Romance</th>\n",
       "      <td>6</td>\n",
       "      <td>29</td>\n",
       "      <td>20</td>\n",
       "      <td>10.0</td>\n",
       "      <td>5.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Science Fiction</th>\n",
       "      <td>21</td>\n",
       "      <td>18</td>\n",
       "      <td>37</td>\n",
       "      <td>32.0</td>\n",
       "      <td>6.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Thriller</th>\n",
       "      <td>9</td>\n",
       "      <td>38</td>\n",
       "      <td>66</td>\n",
       "      <td>28.0</td>\n",
       "      <td>7.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>War</th>\n",
       "      <td>3</td>\n",
       "      <td>1</td>\n",
       "      <td>11</td>\n",
       "      <td>4.0</td>\n",
       "      <td>1.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Western</th>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>3</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                   0   1   2     3     4\n",
       "Action           121  79  33   3.0   4.0\n",
       "Adventure        116  74  36  14.0   1.0\n",
       "Animation         44  23   9   3.0   2.0\n",
       "Comedy            63  44  37  13.0   5.0\n",
       "Crime             12  16  19  11.0   2.0\n",
       "Drama             61  52  32   5.0   1.0\n",
       "Family            12  36  40  28.0   9.0\n",
       "Fantasy           26  42  23  15.0  10.0\n",
       "History            3   5   4   1.0   0.0\n",
       "Horror             9   5   3   1.0   1.0\n",
       "Music              6   6   3   2.0   4.0\n",
       "Mystery            3  13   7  10.0   2.0\n",
       "Romance            6  29  20  10.0   5.0\n",
       "Science Fiction   21  18  37  32.0   6.0\n",
       "Thriller           9  38  66  28.0   7.0\n",
       "War                3   1  11   4.0   1.0\n",
       "Western            2   2   3   0.0   0.0"
      ]
     },
     "execution_count": 30,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# fill null value with 0\n",
    "generes_join.fillna(0)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>0</th>\n",
       "      <th>1</th>\n",
       "      <th>2</th>\n",
       "      <th>3</th>\n",
       "      <th>4</th>\n",
       "      <th>sum_generes</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Adventure</th>\n",
       "      <td>116</td>\n",
       "      <td>74</td>\n",
       "      <td>36</td>\n",
       "      <td>14.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>241.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Action</th>\n",
       "      <td>121</td>\n",
       "      <td>79</td>\n",
       "      <td>33</td>\n",
       "      <td>3.0</td>\n",
       "      <td>4.0</td>\n",
       "      <td>240.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Comedy</th>\n",
       "      <td>63</td>\n",
       "      <td>44</td>\n",
       "      <td>37</td>\n",
       "      <td>13.0</td>\n",
       "      <td>5.0</td>\n",
       "      <td>162.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Drama</th>\n",
       "      <td>61</td>\n",
       "      <td>52</td>\n",
       "      <td>32</td>\n",
       "      <td>5.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>151.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Thriller</th>\n",
       "      <td>9</td>\n",
       "      <td>38</td>\n",
       "      <td>66</td>\n",
       "      <td>28.0</td>\n",
       "      <td>7.0</td>\n",
       "      <td>148.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "             0   1   2     3    4  sum_generes\n",
       "Adventure  116  74  36  14.0  1.0        241.0\n",
       "Action     121  79  33   3.0  4.0        240.0\n",
       "Comedy      63  44  37  13.0  5.0        162.0\n",
       "Drama       61  52  32   5.0  1.0        151.0\n",
       "Thriller     9  38  66  28.0  7.0        148.0"
      ]
     },
     "execution_count": 31,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# calculate the sum of each genere's frequency and create a new column for the sum\n",
    "generes_join['sum_generes'] = generes_join.sum(axis=1)\n",
    "# sort value based on frequency sum and show top 5\n",
    "generes_join.sort_values('sum_generes', ascending=False).head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Analysis**\n",
    ">- We can see among these very_high_revenue movies, the generes with **'Action', 'Adventure', 'Comedy','Drama', 'Thriller'** are top five generes."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Question 4. Is it possible to make high profit movies with low budget?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "30016111.9054567"
      ]
     },
     "execution_count": 32,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# find out the median value of budget for the whole data\n",
    "tmdb.budget_adj.median()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 33,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Number of low budget movies 1927\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "very_low     1020\n",
       "low           746\n",
       "high          100\n",
       "very_high      61\n",
       "Name: revenue_level, dtype: int64"
      ]
     },
     "execution_count": 33,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# create a table for low_budget movies\n",
    "low_budget=tmdb[tmdb['budget_adj']<30016111.9054567]\n",
    "# check how many rows are in this table\n",
    "print(f'Number of low budget movies {low_budget.shape[0]}')\n",
    "# view the distribution of 'revenue_level' in low_budget movie table\n",
    "low_budget['revenue_level'].value_counts()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 34,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0.0316554229372081"
      ]
     },
     "execution_count": 34,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Percentage of very_high revenue movies with low_budget\n",
    "61/1927"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Analysis:**\n",
    ">- There are only 3% of the movies that are made with low budget but yeilded very high revenue; we can conclude that most low budget movies does not yeild high revenue, and from the heatmap earlier, we can see the budget and revenue are positively correlated, higher budget movies generally yield higher revenue."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Question 5. What are the top 10 rated movies? and how is their profitibility?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Since some movies have more vote_count, we can not directly compare a movie rated 10 with only 3 counts to the movie rated 7 with 100 counts. We will use IMDB'a definition to calculated **weighted average for movie rating score.**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 35,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "The minimum votes required to be listed in the chart is: 1371.7000000000003\n"
     ]
    }
   ],
   "source": [
    "# m is the minimum votes required to be listed in the chart;\n",
    "m= tmdb['vote_count'].quantile(0.9)\n",
    "print(f'The minimum votes required to be listed in the chart is: {m}')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 36,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "The mean vote across the whole report is 527.7202906071614\n"
     ]
    }
   ],
   "source": [
    "# C is the mean vote across the whole report\n",
    "C=tmdb['vote_count'].mean()\n",
    "print(f'The mean vote across the whole report is {C}')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 37,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Column and row number for the table of top 10% highest rated movies: (386, 13)\n"
     ]
    }
   ],
   "source": [
    "# Create a table for top 10% highest rated movies\n",
    "top_10_percent_movies = tmdb.copy().loc[tmdb['vote_count'] >= m]\n",
    "print(f'Column and row number for the table of top 10% highest rated movies: {top_10_percent_movies.shape}')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 38,
   "metadata": {},
   "outputs": [],
   "source": [
    "def weighted_rating(x, m=m, C=C):\n",
    "    v = x['vote_count']\n",
    "    R = x['vote_average']\n",
    "    # Calculation based on the IMDB formula\n",
    "    return (v/(v+m) * R) + (m/(m+v) * C)\n",
    "\n",
    "# Define a new feature 'score' and calculate its value with `weighted_rating()`\n",
    "top_10_percent_movies['score'] = top_10_percent_movies.apply(weighted_rating, axis=1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 39,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>popularity</th>\n",
       "      <th>original_title</th>\n",
       "      <th>cast</th>\n",
       "      <th>director</th>\n",
       "      <th>runtime</th>\n",
       "      <th>genres</th>\n",
       "      <th>vote_count</th>\n",
       "      <th>vote_average</th>\n",
       "      <th>release_year</th>\n",
       "      <th>budget_adj</th>\n",
       "      <th>revenue_adj</th>\n",
       "      <th>profit</th>\n",
       "      <th>revenue_level</th>\n",
       "      <th>score</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>2912</th>\n",
       "      <td>1.499784</td>\n",
       "      <td>Cloverfield</td>\n",
       "      <td>Lizzy Caplan|Jessica Lucas|Odette Annable|Mich...</td>\n",
       "      <td>Matt Reeves</td>\n",
       "      <td>85</td>\n",
       "      <td>Action|Thriller|Science Fiction</td>\n",
       "      <td>1373</td>\n",
       "      <td>6.4</td>\n",
       "      <td>2008</td>\n",
       "      <td>2.531967e+07</td>\n",
       "      <td>1.729475e+08</td>\n",
       "      <td>1.476279e+08</td>\n",
       "      <td>high</td>\n",
       "      <td>266.936686</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2643</th>\n",
       "      <td>2.449323</td>\n",
       "      <td>The Mummy Returns</td>\n",
       "      <td>Brendan Fraser|Rachel Weisz|John Hannah|Arnold...</td>\n",
       "      <td>Stephen Sommers</td>\n",
       "      <td>130</td>\n",
       "      <td>Action|Adventure|Drama|Fantasy|Horror</td>\n",
       "      <td>1372</td>\n",
       "      <td>5.8</td>\n",
       "      <td>2001</td>\n",
       "      <td>1.206858e+08</td>\n",
       "      <td>5.332507e+08</td>\n",
       "      <td>4.125649e+08</td>\n",
       "      <td>very_high</td>\n",
       "      <td>266.731612</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6980</th>\n",
       "      <td>2.175284</td>\n",
       "      <td>Ocean's Twelve</td>\n",
       "      <td>George Clooney|Brad Pitt|Catherine Zeta-Jones|...</td>\n",
       "      <td>Steven Soderbergh</td>\n",
       "      <td>125</td>\n",
       "      <td>Thriller|Crime</td>\n",
       "      <td>1376</td>\n",
       "      <td>6.4</td>\n",
       "      <td>2004</td>\n",
       "      <td>1.269890e+08</td>\n",
       "      <td>4.187685e+08</td>\n",
       "      <td>2.917795e+08</td>\n",
       "      <td>very_high</td>\n",
       "      <td>266.652226</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7884</th>\n",
       "      <td>2.484654</td>\n",
       "      <td>Ghostbusters</td>\n",
       "      <td>Bill Murray|Dan Aykroyd|Sigourney Weaver|Harol...</td>\n",
       "      <td>Ivan Reitman</td>\n",
       "      <td>107</td>\n",
       "      <td>Fantasy|Action|Comedy|Science Fiction|Family</td>\n",
       "      <td>1383</td>\n",
       "      <td>7.2</td>\n",
       "      <td>1984</td>\n",
       "      <td>6.297126e+07</td>\n",
       "      <td>6.196634e+08</td>\n",
       "      <td>5.566921e+08</td>\n",
       "      <td>very_high</td>\n",
       "      <td>266.392537</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>21</th>\n",
       "      <td>5.337064</td>\n",
       "      <td>Southpaw</td>\n",
       "      <td>Jake Gyllenhaal|Rachel McAdams|Forest Whitaker...</td>\n",
       "      <td>Antoine Fuqua</td>\n",
       "      <td>123</td>\n",
       "      <td>Action|Drama</td>\n",
       "      <td>1386</td>\n",
       "      <td>7.3</td>\n",
       "      <td>2015</td>\n",
       "      <td>2.759999e+07</td>\n",
       "      <td>8.437300e+07</td>\n",
       "      <td>5.677302e+07</td>\n",
       "      <td>low</td>\n",
       "      <td>266.160831</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>658</th>\n",
       "      <td>3.813740</td>\n",
       "      <td>Exodus: Gods and Kings</td>\n",
       "      <td>Christian Bale|Joel Edgerton|John Turturro|Aar...</td>\n",
       "      <td>Ridley Scott</td>\n",
       "      <td>153</td>\n",
       "      <td>Adventure|Drama|Action</td>\n",
       "      <td>1377</td>\n",
       "      <td>5.6</td>\n",
       "      <td>2014</td>\n",
       "      <td>1.289527e+08</td>\n",
       "      <td>2.468817e+08</td>\n",
       "      <td>1.179290e+08</td>\n",
       "      <td>high</td>\n",
       "      <td>266.156773</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3416</th>\n",
       "      <td>1.499109</td>\n",
       "      <td>Rango</td>\n",
       "      <td>Johnny Depp|Isla Fisher|Ned Beatty|Bill Nighy|...</td>\n",
       "      <td>Gore Verbinski</td>\n",
       "      <td>107</td>\n",
       "      <td>Animation|Comedy|Family|Western|Adventure</td>\n",
       "      <td>1385</td>\n",
       "      <td>6.5</td>\n",
       "      <td>2011</td>\n",
       "      <td>1.308687e+08</td>\n",
       "      <td>2.382049e+08</td>\n",
       "      <td>1.073362e+08</td>\n",
       "      <td>high</td>\n",
       "      <td>265.852803</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6631</th>\n",
       "      <td>0.890909</td>\n",
       "      <td>The Pursuit of Happyness</td>\n",
       "      <td>Will Smith|Jaden Smith|Thandie Newton|Brian Ho...</td>\n",
       "      <td>Gabriele Muccino</td>\n",
       "      <td>117</td>\n",
       "      <td>Drama</td>\n",
       "      <td>1392</td>\n",
       "      <td>7.5</td>\n",
       "      <td>2006</td>\n",
       "      <td>5.949180e+07</td>\n",
       "      <td>3.321560e+08</td>\n",
       "      <td>2.726642e+08</td>\n",
       "      <td>very_high</td>\n",
       "      <td>265.699578</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6578</th>\n",
       "      <td>1.603140</td>\n",
       "      <td>Blood Diamond</td>\n",
       "      <td>Leonardo DiCaprio|Djimon Hounsou|Jennifer Conn...</td>\n",
       "      <td>Edward Zwick</td>\n",
       "      <td>143</td>\n",
       "      <td>Drama|Thriller|Action</td>\n",
       "      <td>1394</td>\n",
       "      <td>7.2</td>\n",
       "      <td>2006</td>\n",
       "      <td>1.081669e+08</td>\n",
       "      <td>1.848334e+08</td>\n",
       "      <td>7.666646e+07</td>\n",
       "      <td>high</td>\n",
       "      <td>265.361653</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10094</th>\n",
       "      <td>0.142486</td>\n",
       "      <td>Home Alone</td>\n",
       "      <td>Macaulay Culkin|Joe Pesci|Daniel Stern|John He...</td>\n",
       "      <td>Chris Columbus</td>\n",
       "      <td>103</td>\n",
       "      <td>Comedy|Family</td>\n",
       "      <td>1393</td>\n",
       "      <td>7.0</td>\n",
       "      <td>1990</td>\n",
       "      <td>3.004017e+07</td>\n",
       "      <td>7.955384e+08</td>\n",
       "      <td>7.654982e+08</td>\n",
       "      <td>very_high</td>\n",
       "      <td>265.354260</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       popularity            original_title  \\\n",
       "2912     1.499784               Cloverfield   \n",
       "2643     2.449323         The Mummy Returns   \n",
       "6980     2.175284            Ocean's Twelve   \n",
       "7884     2.484654              Ghostbusters   \n",
       "21       5.337064                  Southpaw   \n",
       "658      3.813740    Exodus: Gods and Kings   \n",
       "3416     1.499109                     Rango   \n",
       "6631     0.890909  The Pursuit of Happyness   \n",
       "6578     1.603140             Blood Diamond   \n",
       "10094    0.142486                Home Alone   \n",
       "\n",
       "                                                    cast           director  \\\n",
       "2912   Lizzy Caplan|Jessica Lucas|Odette Annable|Mich...        Matt Reeves   \n",
       "2643   Brendan Fraser|Rachel Weisz|John Hannah|Arnold...    Stephen Sommers   \n",
       "6980   George Clooney|Brad Pitt|Catherine Zeta-Jones|...  Steven Soderbergh   \n",
       "7884   Bill Murray|Dan Aykroyd|Sigourney Weaver|Harol...       Ivan Reitman   \n",
       "21     Jake Gyllenhaal|Rachel McAdams|Forest Whitaker...      Antoine Fuqua   \n",
       "658    Christian Bale|Joel Edgerton|John Turturro|Aar...       Ridley Scott   \n",
       "3416   Johnny Depp|Isla Fisher|Ned Beatty|Bill Nighy|...     Gore Verbinski   \n",
       "6631   Will Smith|Jaden Smith|Thandie Newton|Brian Ho...   Gabriele Muccino   \n",
       "6578   Leonardo DiCaprio|Djimon Hounsou|Jennifer Conn...       Edward Zwick   \n",
       "10094  Macaulay Culkin|Joe Pesci|Daniel Stern|John He...     Chris Columbus   \n",
       "\n",
       "       runtime                                        genres  vote_count  \\\n",
       "2912        85               Action|Thriller|Science Fiction        1373   \n",
       "2643       130         Action|Adventure|Drama|Fantasy|Horror        1372   \n",
       "6980       125                                Thriller|Crime        1376   \n",
       "7884       107  Fantasy|Action|Comedy|Science Fiction|Family        1383   \n",
       "21         123                                  Action|Drama        1386   \n",
       "658        153                        Adventure|Drama|Action        1377   \n",
       "3416       107     Animation|Comedy|Family|Western|Adventure        1385   \n",
       "6631       117                                         Drama        1392   \n",
       "6578       143                         Drama|Thriller|Action        1394   \n",
       "10094      103                                 Comedy|Family        1393   \n",
       "\n",
       "       vote_average  release_year    budget_adj   revenue_adj        profit  \\\n",
       "2912            6.4          2008  2.531967e+07  1.729475e+08  1.476279e+08   \n",
       "2643            5.8          2001  1.206858e+08  5.332507e+08  4.125649e+08   \n",
       "6980            6.4          2004  1.269890e+08  4.187685e+08  2.917795e+08   \n",
       "7884            7.2          1984  6.297126e+07  6.196634e+08  5.566921e+08   \n",
       "21              7.3          2015  2.759999e+07  8.437300e+07  5.677302e+07   \n",
       "658             5.6          2014  1.289527e+08  2.468817e+08  1.179290e+08   \n",
       "3416            6.5          2011  1.308687e+08  2.382049e+08  1.073362e+08   \n",
       "6631            7.5          2006  5.949180e+07  3.321560e+08  2.726642e+08   \n",
       "6578            7.2          2006  1.081669e+08  1.848334e+08  7.666646e+07   \n",
       "10094           7.0          1990  3.004017e+07  7.955384e+08  7.654982e+08   \n",
       "\n",
       "      revenue_level       score  \n",
       "2912           high  266.936686  \n",
       "2643      very_high  266.731612  \n",
       "6980      very_high  266.652226  \n",
       "7884      very_high  266.392537  \n",
       "21              low  266.160831  \n",
       "658            high  266.156773  \n",
       "3416           high  265.852803  \n",
       "6631      very_high  265.699578  \n",
       "6578           high  265.361653  \n",
       "10094     very_high  265.354260  "
      ]
     },
     "execution_count": 39,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# show the top 10 rated movies\n",
    "top_10_percent_movies.sort_values('score',ascending=False).head(10)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    ">- Above are the movies with highest rating score, as we can see, most have high or very_high revenue too, but it is different from the top_10_profit movies."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 40,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "very_high    242\n",
       "high          75\n",
       "low           64\n",
       "very_low       5\n",
       "Name: revenue_level, dtype: int64"
      ]
     },
     "execution_count": 40,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "top_10_percent_movies['revenue_level'].value_counts()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 41,
   "metadata": {},
   "outputs": [
    {
     "ename": "NameError",
     "evalue": "name 'q_movies' is not defined",
     "output_type": "error",
     "traceback": [
      "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
      "\u001b[0;31mNameError\u001b[0m                                 Traceback (most recent call last)",
      "\u001b[0;32m<ipython-input-41-ad86873dbdbb>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m\u001b[0m\n\u001b[1;32m      1\u001b[0m \u001b[0;31m# Count for percentage of the one with low or very_low revenue\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 2\u001b[0;31m \u001b[0;34m(\u001b[0m\u001b[0;36m64\u001b[0m\u001b[0;34m+\u001b[0m\u001b[0;36m5\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m/\u001b[0m\u001b[0mq_movies\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mshape\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;36m0\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
      "\u001b[0;31mNameError\u001b[0m: name 'q_movies' is not defined"
     ]
    }
   ],
   "source": [
    "# Count for percentage of the one with low or very_low revenue\n",
    "(64+5)/q_movies.shape[0]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Analysis**\n",
    ">- About 17% of the highly rated movies have low revenue, most have very_high revenue.\n",
    ">- This can confirm that a good rating score can yield high revenue."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a id='conclusions'></a>\n",
    "## Conclusions\n",
    "\n",
    "In this project we did comprehensive analysis on the movie database with a focus on movie revenues and other properties like generes and rating score. We did initial data exploration and answered all the questions \n",
    "\n",
    "**Summarize some of the featured findings:**\n",
    ">- Although much more movies are produced over time, movies industry is getting more stable and annual average profit is lower compared with movies made 3 decades ago.\n",
    ">- Popular movies have more vote_count and also have higher revenues, and newer movies are more popular.\n",
    ">- 'Action', 'Adventure', 'Comedy','Drama', 'Thriller' are the most common generes for movies with very high revenues\n",
    ">- In general, higher budget can yield higher revenues; movies made with low budget can have moderately high revenue, but successful rate is lower.\n",
    ">- After calculatig weighted average rating score, we found the top 10 percent highest rated movies. Most of these movies have high revenues, about 17% have low revenues. \n",
    ">- Higher popularity, higher budget, higher rating score and certain types of genes ('Action', 'Adventure', 'Comedy','Drama', 'Thriller') are all associated with movies that have high revenues.\n",
    "\n",
    "**Limitation of the project**\n",
    ">- As we see in data cleaning process, about half of the data is deleted because it contains 0 value for runtime, budget or revenue. If we can have a more complete data, the analysis can be more accurate.\n",
    ">- Since the project is main focusing on movie revenue analysis, and we find popularity, budget, rating score and genes can have big impact on revenue, but these information is not enough to make revenue prediction as there might be other factors that can affect movie revenue but not included in this dataset."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.0"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}