{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Licensed under the Apache License, Version 2.0 (the \"License\"); you may\n",
    "# not use this file except in compliance with the License. You may obtain\n",
    "# a copy of the License at\n",
    "#\n",
    "#      http://www.apache.org/licenses/LICENSE-2.0\n",
    "#\n",
    "# Unless required by applicable law or agreed to in writing, software\n",
    "# distributed under the License is distributed on an \"AS IS\" BASIS, WITHOUT\n",
    "# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the\n",
    "# License for the specific language governing permissions and limitations\n",
    "# under the License."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Watson Studioで文字データを可視化しよう\n",
    "\n",
    "twitterから気になる情報を取得し、 Watson Natural Languege UnderstandingでKeyword抽出、WorldCloudをPixieDust を使用して表示してみましょう!"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a id=\"part1\"></a>\n",
    "# Part 1 - 分析データ作成\n",
    "<a id=\"setup\"></a>\n",
    "## 1. Setup\n",
    "### 1.1 最新の Watson Developer Cloud, requests-oauthlib パッケージの導入\n",
    "Natural　Languge　Understandingに使用するWatson Developer CloudとTwitterのOAuth認証に使用する requests-oauthlibパッケージを導入します。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!pip install --upgrade ibm-watson\n",
    "\n",
    "!pip install requests requests_oauthlib"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a id=\"pixie\"></a>\n",
    "### 1.2 PixieDust Libraryの導入\n",
    "このノートブックでは、PixieDustライブラリを使用してデータセットを分析および視覚化します。\n",
    "\n",
    "PixieDustの詳細は[Introductory Notebook](https://dataplatform.cloud.ibm.com/exchange/public/entry/view/5b000ed5abda694232eb5be84c3dd7c1) または [PixieDust Github](https://ibm-cds-labs.github.io/pixiedust/)　を参照してください。\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "次のセルを実行して、最新バージョンのPixieDustを実行していることを確認します。 ローカルのjupyter notyebookを使用し、PixieDustをローカルにインストール済みで、それを使用したい場合は、このセルを実行しないでください。\n",
    "\n",
    "尚、正式リリース前の`https://github.com/pixiedust/pixiedust.git@va-working-branch#egg=pixiedust`はフォント指定が可能なモジュールで、正式リリースまで一時的に利用しています。日本語表示を可能にするために使用しています。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# To confirm you have the latest version of PixieDust on your system, run this cell\n",
    "#!pip install -U --no-deps pixiedust\n",
    "!pip install --upgrade --no-deps git+https://github.com/pixiedust/pixiedust.git@va-working-branch#egg=pixiedust"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "PixieDustをインポートし、カーネルをRestartが必要な場合はRestartさせます。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pixiedust"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    " #### 1.4　Option\n",
    " <span style=\"color: red\">Pixiedust runtime updated. Please restart kernel</span>  と表示された場合は、上のメニューの`Kernel`-> `Restart`からカーネルをRestartさせてください。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a id=\"wordcloud\"></a>\n",
    "### 1.3 wordcloud Libraryの導入\n",
    "\n",
    "日本語が表示できるように、日本語フォントも導入します。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!pip install --user wordcloud"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "#日本語フォントの導入\n",
    "jp_font_path ='/home/dsxuser/work/ipaexg00301/ipaexg.ttf'\n",
    "\n",
    "import os\n",
    "if not os.path.exists(jp_font_path):\n",
    "    !wget https://oscdl.ipa.go.jp/IPAexfont/ipaexg00301.zip\n",
    "    !unzip ipaexg00301.zip\n",
    "else:\n",
    "    print('IPA font haｓ been already installed')\n",
    " "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a id=\"pixie_wordcloud\"></a>\n",
    "### 1.4 PixieDust にWordCloudの設定\n",
    "PixieDustにWordCloud形式でデータが表示できる チャートを追加しましょう。\n",
    "PixieDustは自分で設定したフォーマットでデータ表示するチャートを設定することができます。\n",
    "\n",
    "今回は１列目にwordcloudに表示する文字、２列目にその表示Volume数を入れたpandasのDataFrameを渡すと、wordcloudを表示するチャートを設定します。\n",
    "例えば下記のようなデータです:\n",
    "\n",
    "```\n",
    "import pandas as pd\n",
    "df = pd.DataFrame([[\"四月\", 26],[\"May\", 10],[\"June\", 5]],  columns=['key', 'value'])\n",
    "```\n",
    "| 　 |  Key  | Value |\n",
    "| ---- | ---- | ---- |\n",
    "|  0  |  四月  | 26 |\n",
    "|  1  |  May  | 10 |\n",
    "|  2  |  June  | 5 |"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from pixiedust.display.display import *\n",
    "import io\n",
    "import base64\n",
    "from wordcloud import WordCloud\n",
    "\n",
    "class SimpleWordCloudDisplay(Display):\n",
    "    def doRender(self, handlerId):\n",
    "        # convert from dataframe to dict\n",
    "        dfdict = {}\n",
    "       # df = self.entity.toPandas()\n",
    "        df = self.entity\n",
    "        for x in range(len(df)):\n",
    "            currentid = df.iloc[x,0] or 'NoKey'\n",
    "            currentvalue = df.iloc[x,1]\n",
    "            dfdict.setdefault(currentid, 0)\n",
    "            dfdict[currentid] = dfdict[currentid] + currentvalue\n",
    "            \n",
    "        ##remove 'Others'  stopwordｓオプションが効かないのでマニュアル削除\n",
    "        #if ('Others' in dfdict.keys())==True:\n",
    "        #    dfdict.pop('Others' )\n",
    "\n",
    "        # create word cloud from dict\n",
    "        wc = WordCloud(background_color=\"white\",  width=800, height=400, max_font_size=140, font_path=jp_font_path).fit_words(dfdict)\n",
    "        #wc = WordCloud(background_color=\"white\", max_font_size=140, font_path=jp_font_path).fit_words(dfdict)\n",
    "\n",
    "\n",
    "        # encode word cloud image to base64 string\n",
    "        img = wc.to_image()\n",
    "        buffer =io.BytesIO()\n",
    "        img.save(buffer,format=\"JPEG\")                  #Enregistre l'image dans le buffer\n",
    "        myimage = buffer.getvalue()  \n",
    "        img_str = base64.b64encode(myimage)\n",
    "      \n",
    "\n",
    "        self._addHTMLTemplateString(\n",
    "\"\"\"\n",
    "<center><img src=\"data:image/png;base64,{0}\"></center>\n",
    "\"\"\".format(img_str.decode(\"ascii\"))\n",
    "            \n",
    "        )\n",
    "        \n",
    "@PixiedustDisplay()\n",
    "class SimpleWordCloudMeta(DisplayHandlerMeta):\n",
    "    @addId\n",
    "    def getMenuInfo(self,entity,dataHandler):\n",
    "        if entity.__class__.__name__ == \"DataFrame\":\n",
    "            return [\n",
    "                {\n",
    "                    \"categoryId\": \"Chart\",\n",
    "                    \"title\": \"Simple Word Cloud\",\n",
    "                    \"icon\": \"fa-cloud\",\n",
    "                    \"id\": \"mySimpleWordCloud\"\n",
    "                }\n",
    "            ]\n",
    "        else:\n",
    "            return []\n",
    "\n",
    "    def newDisplayHandler(self,options,entity):\n",
    "        return SimpleWordCloudDisplay(options,entity)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 1.5 WordCloudのTest\n",
    "表示できるか確認してみましょう！"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "pixiedust": {
     "displayParams": {
      "handlerId": "mySimpleWordCloud",
      "keyFields": "key",
      "rendererId": "matplotlib",
      "valueFields": "value"
     }
    }
   },
   "outputs": [],
   "source": [
    "#Test Code\n",
    "import pandas as pd\n",
    "\n",
    "df = pd.DataFrame([[\"四月\", 26],[\"May\", 10],[\"June\", 5]],  columns=['key', 'value'])\n",
    "display(df, font_path='/home/dsxuser/work/ipaexg00301/ipaexg.ttf')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a id=\"setupenv\"></a>\n",
    "## 2. 環境準備\n",
    "\n",
    "以下の3つができるように環境の設定を行います。\n",
    "\n",
    "1:  Twitter APIを使用してTweet Dataを取得します。\n",
    "Twitter API KEYの取得が必要です。ここではその取得方法は説明しませんので、お持ちでない方は「Twitter API KEY 取得」等で検索し、取得お願いします。以下の4つの値が必要です。\n",
    "\n",
    "* AccessToken\n",
    "* AccessTokenSecret\n",
    "* ConsumerKey\n",
    "* ConsumerSecret\n",
    "\n",
    "2:  IBM Cloud Natural Language Understanding サービスでTweetを分析します。\n",
    "Natural Language Understanding サービスの作成しAPIKEY,URLの値が必要です。\n",
    "Natural Language Understanding サービスが未作成の方は、[こちら](https://cloud.ibm.com/catalog/services/natural-language-understanding)より作成してください。\n",
    "\n",
    "3: 設定ファイルや作成したファイルをIBM Cloud Object Strageを使用してて読み込み、保存します。\n",
    "IBM Cloud Object Strageを使用する設定を以下で行います。\n",
    "\n",
    "### 2.1 apikeys.iniの作成\n",
    "[apikeys.in](https://raw.githubusercontent.com/kyokonishito/pixiedust-twitter-analysis/master/apikeys.ini)をダウンロードします。\n",
    "リンクを右クリックし、「リンク先を別名で保存」をしてください。\n",
    "\n",
    "ダウンロードしたapikeys.iniを以下のように準備した１、２の値を記入して保存してください。\n",
    "[　]でかこまれた部分を自分のKEYに変更します。[　]は不要ですので残さないようにしてください。\n",
    "```\n",
    "[TWITTER]\n",
    "TWITTER_AccessToken=[AccessTokenの値]\n",
    "TWITTER_AccessTokenSecret=[AccessTokenSecretの値]\n",
    "TWITTER_ConsumerKey=[ConsumerKeyの値]\n",
    "TWITTER_ConsumerSecret=[ConsumerSecreの値]\n",
    "[IBM]\n",
    "NLU_APIKEY=[Natural Language UnderstandingサービスのAPIの値]\n",
    "NLU_URL=[Natural Language UnderstandingサービスのURLの値]\n",
    "```\n",
    "\n",
    "### 2.2 apikeys.iniのアップロード\n",
    "Watson Studioのjupyter note book上、右上の`0101`というアイコンをクリックし、Fileアップロードの画面を出します。`Drop your file here or browse your files to add a new file` と書いてある場所に、2.1で作成したファイルをドロップし、Watson Studio上のProjectにアップロードします。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a id='load'></a> \n",
    "### 2.3 apikeys.iniの読み込み\n",
    "\n",
    "　1. **下のセルを選択して、空の行にカーソルを置いてください。** \n",
    "\n",
    "2. アップロードしたTweeterAPIKey.iniファイルの下にある(見えない場合は右上の10/01アイコンをクリック) `Insert to code`の下にある`Insert StreamingBody object`をクリックしてください。\n",
    "\n",
    "3. ファイルを読み込むストリーム`streaming_body_2`をセットするコードが挿入されます。\n",
    "\n",
    "4.  4箇所ある`streaming_body_2`は　全て`streaming_body_1`に変更します。(後のコードで使用するため)\n",
    "\n",
    "5.  編集が終わったらセルを実行します。\n",
    "\n",
    "<p><span style=\"color: teal\">\n",
    "# Your data file was loaded into a botocore.response.StreamingBody object.<br/>\n",
    "# Please read the documentation of ibm_boto3 and pandas to learn more about your possibilities to load the data.<br/>\n",
    "# ibm_boto3 documentation: https://ibm.github.io/ibm-cos-sdk-python/<br/>\n",
    "# pandas documentation: http://pandas.pydata.org/<br/></span>\n",
    "<strong>streaming_body_2</strong> = client_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx.get_object(Bucket='wordcloud-donotdelete-pr-wyztdevipqhfkt', Key='apikeys.ini')['Body']</p>\n",
    "<p><span style=\"color: teal\">\n",
    " # add missing __iter__ method, so pandas accepts body as file-like object</span>\n",
    "if not hasattr(<strong>streaming_body_2</strong>, \"__iter__\"): <strong>streaming_body_2</strong>.__iter__ = types.MethodType( __iter__, <strong>streaming_body_2</strong> ) \n",
    "</p>\n",
    "\n",
    "\n",
    "## <span style=\"color: red\">この下に入力 </span>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# この行の下にカーソルを置いて、Insert StreamingBody objectをクリック\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 2.4 ConfigParserへの設定情報の読み込み"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import configparser\n",
    "\n",
    "inifile = configparser.ConfigParser(interpolation=configparser.ExtendedInterpolation())\n",
    "inifile.read_string(streaming_body_1.read().decode('utf-8'))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 2.5 IBM Objerct StrageのCredentialセット\n",
    "結果のファイル保存に使用するため、IBM Objerct StrageのCredentialをセットします。\n",
    "\n",
    "　1. **下のセルを選択して、空の行にカーソルを置いてください。** \n",
    "\n",
    "2. アップロードしたTweeterAPIKey.iniファイルの下にある(見えない場合は右上の10/01アイコンをクリック) `Insert to code`の下にある`Insert Credentials`をクリックしてください。\n",
    "\n",
    "3. ファイルを読み込むストリーム`credentials_2`をセットするコードが挿入されます。\n",
    "\n",
    "4.  `credentials_2`は　全てcredentials_1`に変更します。(後のコードで使用するため)\n",
    "5.  編集が終わったらセルを実行します。\n",
    "\n",
    "<p><span style=\"color: teal\">\n",
    "# @hidden_cell<br/>\n",
    "# The following code contains the credentials for a file in your IBM Cloud Object Storage.<br/>\n",
    "# You might want to remove those credentials before you share your notebook.<br/>\n",
    "</span>\n",
    "<strong> credentials_2</strong>  = {\n",
    "    'IAM_SERVICE_ID': 'iam-ServiceId-xxxxxxxx-xxxx-xxxx-xxxx-1234567890xx',<br/>\n",
    "    'IBM_API_KEY_ID': 'xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx',<br/>\n",
    "    'ENDPOINT': 'https://s3-api.us-geo.objectstorage.service.networklayer.com',<br/>\n",
    "    'IBM_AUTH_ENDPOINT': 'https://iam.bluemix.net/oidc/token',<br/>\n",
    "    'BUCKET': 'wordcloud-donotdelete-pr-wyztdevipqhfkt',<br/>\n",
    "    'FILE': 'apikeys.ini'<br/>\n",
    "}</p>\n",
    "\n",
    "## <span style=\"color: red\">この下に入力 </span>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# この行の下にカーソルを置いて、Insert　Credentialsをクリック\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a id=\"gettweets\"></a>\n",
    "## 3. Tweet情報の取得と保存"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 3.1 パラメータの設定\n",
    "検索したい文字列、取得したいTweet数、検索の繰り返し回数、結果をサマリーするKeyword出現数をセット。\n",
    "1回の検索で最大100 件しか取得できないので、それ以上取得したい場合は、繰り返し回数をセットする。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "#以下のパラメータをセットする\n",
    "#検索したい文字列\n",
    "#基本は最新1週間分\n",
    "search_key = \"IBM\"\n",
    "\n",
    "#1回の検索で取得したいTweet数, 最大は１００\n",
    "search_count=100\n",
    "\n",
    "#検索の繰り返し回数\n",
    "repeat_count=5\n",
    "\n",
    "#以下の数以下のKeywordはOthersでまとめる。Othersはwordcloudには表示しない\n",
    "summary_num = 5\n",
    "\n",
    "#search_until=\"2019-04-07\" #期間を指定したい場合はコメントをはずす"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 3.2 Tweet情報の取得\n",
    "Twitter Search APIを使って、Tweet情報を取得します。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import json\n",
    "import urllib\n",
    "import calendar\n",
    "import time\n",
    "from requests_oauthlib import OAuth1Session\n",
    "\n",
    "CK = inifile['TWITTER']['TWITTER_ConsumerKey']\n",
    "CS =  inifile['TWITTER']['TWITTER_ConsumerSecret']\n",
    "AT =  inifile['TWITTER']['TWITTER_AccessToken']\n",
    "ATS =  inifile['TWITTER']['TWITTER_AccessTokenSecret']\n",
    "twitter = OAuth1Session(CK, CS, AT, ATS)  \n",
    "search_endpoint = \"https://api.twitter.com/1.1/search/tweets.json\"\n",
    "\n",
    "\n",
    "\n",
    "def search_tw(tw, params, search_endpoint):\n",
    "\n",
    "    url = \"https://api.twitter.com/1.1/search/tweets.json\"  # 取得エンドポイント\n",
    "    res = tw.get(search_endpoint, params = params)\n",
    "\n",
    "    if res.status_code == 200:  # 正常通信出来た場合\n",
    "        timelines = json.loads(res.text)  # レスポンスからタイムラインリストを取得\n",
    "        df_tweets = pd.DataFrame.from_dict(timelines.get('statuses'), dtype='object')\n",
    "        next_param = timelines.get('search_metadata').get('next_results')\n",
    "        if next_param is not None:\n",
    "            next_param = urllib.parse.parse_qs(next_param.replace('?', ''))\n",
    "        return df_tweets, next_param\n",
    "\n",
    "    else:  # 正常通信出来なかった場合\n",
    "        print(\"Failed: %d\" % res.status_code)\n",
    "        raise ValueError(\"Failed: %d\" % res.status_code)\n",
    "        \n",
    "def YmdHMS(created_at):\n",
    "    time_utc = time.strptime(created_at, '%a %b %d %H:%M:%S +0000 %Y')\n",
    "    unix_time = calendar.timegm(time_utc)\n",
    "    time_local = time.localtime(unix_time)\n",
    "    return str(time.strftime(\"%Y-%m-%d %H:%M:%S\", time_local))\n",
    "               \n",
    "param_dict = {} \n",
    "param_dict['q'] = search_key\n",
    "#param_dict['until'] = search_until    #期間を指定したい場合はコメントをはずす\n",
    "param_dict['result_type'] = 'recent'\n",
    "param_dict['count'] = search_count\n",
    "param_dict['lang'] = 'ja'\n",
    "\n",
    "next_params = param_dict\n",
    "df_tweets= pd.DataFrame()\n",
    "last_id = ''\n",
    "for i in range(int(repeat_count)):\n",
    "    print('repaet#:{}'.format(i))\n",
    "    df, next_params = search_tw(twitter, next_params, search_endpoint)\n",
    "    df_tweets = pd.concat([df_tweets, df], ignore_index=True,)\n",
    "    if len(df) == 0:\n",
    "        break\n",
    "    if next_params is None:\n",
    "        before_last_id = last_id\n",
    "        last_id = df.tail(1)['id'].astype(str).values[0]\n",
    "        if before_last_id == last_id:\n",
    "            break\n",
    "        param_dict['max_id'] = last_id\n",
    "        next_params = param_dict\n",
    "if len(df_tweets) > 0:\n",
    "    df_tweets['create_at_jst'] = df_tweets.apply(lambda x: YmdHMS(x['created_at']) if x.dtype == \"object\" else x, axis=1)\n",
    "    df_tweets = df_tweets.sort_values(by=[\"create_at_jst\"], ascending=True)\n",
    "    col_list = list(df_tweets.columns.values)\n",
    "    del col_list[-1]\n",
    "    col_list.insert(0, 'create_at_jst')\n",
    "    df_tweets = df_tweets.reindex(columns=col_list)\n",
    "else:\n",
    "    print('No Data')\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 3.3 Tweet情報の取得\n",
    "結果を一旦ファイルに保存します。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "#IBM Object Storageへの検索保存\n",
    "tweet_result_filename = 'tweets_results.csv'\n",
    "\n",
    "cos = ibm_boto3.client(service_name='s3',\n",
    "    ibm_api_key_id=credentials_1['IBM_API_KEY_ID'],\n",
    "    ibm_service_instance_id=credentials_1['IAM_SERVICE_ID'],\n",
    "    ibm_auth_endpoint=credentials_1['IBM_AUTH_ENDPOINT'],\n",
    "    config=Config(signature_version='oauth'),\n",
    "    endpoint_url=credentials_1['ENDPOINT'])\n",
    "\n",
    "\n",
    "# Write a CSV file from the enriched pandas DataFrame.\n",
    "df_tweets.to_csv(tweet_result_filename, index=False)\n",
    "\n",
    "# Use the above put_file method with credentials to put the file in Object Storage.\n",
    "cos.upload_file(tweet_result_filename, Bucket=credentials_1['BUCKET'],Key=tweet_result_filename)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a id=\"donulu\"></a>\n",
    "## 4. TweetからNatural Language Understandingを使用してKeyword抽出\n",
    "\n",
    "### 4.1 前処理後NLU呼び出しと合計処理"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "from ibm_watson import NaturalLanguageUnderstandingV1\n",
    "from ibm_watson.natural_language_understanding_v1 import Features, KeywordsOptions\n",
    "from ibm_watson import ApiException\n",
    "\n",
    "\n",
    "df_tweets['refineText']=df_tweets['text']\n",
    "\n",
    "# 改行削除\n",
    "df_tweets['refineText'] = df_tweets.apply(lambda x: x['refineText'].replace('\\n', ' '), axis=1)\n",
    "df_tweets['refineText'] = df_tweets.apply(lambda x: x['refineText'].strip(), axis=1)\n",
    "\n",
    "# RTマーク削除\n",
    "df_tweets['refineText'] = df_tweets['refineText'].replace(r'^RT.*?:', '', regex=True)\n",
    "\n",
    "# hasshutag記号(#)削除\n",
    "df_tweets['refineText'] = df_tweets.apply(lambda x: x['refineText'].replace('#', ' '), axis=1)\n",
    "\n",
    "# linkの削除\n",
    "df_tweets['refineText'] = df_tweets['refineText'].replace(r'(http:\\/\\/[\\x21-\\x7e]+)', '', regex=True)\n",
    "df_tweets['refineText'] = df_tweets['refineText'].replace(r'(https:\\/\\/[\\x21-\\x7e]+)', '', regex=True)\n",
    "\n",
    "# カッコ削除\n",
    "df_tweets['refineText'] = df_tweets.apply(lambda x: x['refineText'].replace('(', ' '), axis=1)\n",
    "df_tweets['refineText'] = df_tweets.apply(lambda x: x['refineText'].replace(')', ' '), axis=1)\n",
    "df_tweets['refineText'] = df_tweets.apply(lambda x: x['refineText'].replace('【', ' '), axis=1)\n",
    "df_tweets['refineText'] = df_tweets.apply(lambda x: x['refineText'].replace('】', ' '), axis=1)\n",
    "df_tweets['refineText'] = df_tweets.apply(lambda x: x['refineText'].replace('[', ' '), axis=1)\n",
    "df_tweets['refineText'] = df_tweets.apply(lambda x: x['refineText'].replace(']', ' '), axis=1)\n",
    "df_tweets['refineText'] = df_tweets.apply(lambda x: x['refineText'].replace('「', ' '), axis=1)\n",
    "df_tweets['refineText'] = df_tweets.apply(lambda x: x['refineText'].replace('」', ' '), axis=1)\n",
    "df_tweets['refineText'] = df_tweets.apply(lambda x: x['refineText'].replace('『', ' '), axis=1)\n",
    "df_tweets['refineText'] = df_tweets.apply(lambda x: x['refineText'].replace('』', ' '), axis=1)\n",
    "\n",
    "\n",
    "nlu = NaturalLanguageUnderstandingV1(version='2018-11-16', url=inifile['IBM']['NLU_URL'], iam_apikey=inifile['IBM']['NLU_APIKEY'])\n",
    "\n",
    "features = Features(keywords=KeywordsOptions())\n",
    "keywords = []\n",
    "df_pixiedust = pd.DataFrame(columns=['Key', 'Value'])\n",
    "\n",
    "\n",
    "for text, i in zip(df_tweets['refineText'], df_tweets.index):\n",
    "    if not text:\n",
    "        keywords.append(' ')\n",
    "        continue\n",
    "\n",
    "    try:\n",
    "        enriched_json = nlu.analyze(text=text, features=features, language='ja') #NLU　呼び出し\n",
    "    except ApiException as ex:\n",
    "        print('{}:text: {} -- Method failed with status code {}: {}'.format(i, text, str(ex.code), ex.message))\n",
    "        keywords.append(' ')\n",
    "        continue\n",
    "\n",
    "\n",
    "    # Iterate and get KEYWORDS with a confidence of over 70%\n",
    "    if 'keywords' in enriched_json.result:\n",
    "        tmpkw = []\n",
    "        for kw in enriched_json.result[\"keywords\"]:\n",
    "            if (float(kw[\"relevance\"]) >= 0.7):\n",
    "                tmpkw.append(kw[\"text\"])\n",
    "                df_pixiedust = df_pixiedust.append({'Key': kw[\"text\"], 'Value': 1}, ignore_index=True)\n",
    "        # Convert multiple keywords in a list to a string and append the string\n",
    "        keywords.append(', '.join(tmpkw))\n",
    "    else:\n",
    "        keywords.append(\"\")\n",
    "    print('{}:keyword: {}'.format(i, tmpkw))\n",
    "\n",
    "df_tweets['Keywords'] = keywords\n",
    "\n",
    "df_pixiedust = df_pixiedust.groupby('Key').count()\n",
    "df_pixiedust = df_pixiedust.sort_values(by=[\"Value\"], ascending=False)\n",
    "df_pixiedust = df_pixiedust.reset_index()\n",
    "\n",
    "# Write a CSV file from the enriched pandas DataFrame.\n",
    "df_tweets.to_csv(tweet_result_filename, index=False)\n",
    "\n",
    "# Use the above put_file method with credentials to put the file in Object Storage.\n",
    "cos.upload_file(tweet_result_filename, Bucket=credentials_1['BUCKET'],Key=tweet_result_filename)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 4.2　抽出データ確認"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "df_pixiedust"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 4.2 <<オプション>> 抽出データのうち、小さなものはまとめる\n",
    "Keywordの出現回数が少ないものが多数を占めると、グラフが見にくくなる場合は、`3.1 パラメータの設定の設定`で設定したsummary_num以下の場合はOthersにまとめる。\n",
    "    以下を実行し表示は`4.3 <<オプション>>WordCloudで表示してみましょう! :Summary`の方を実施\n",
    "    "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df_pixiedust_sum = df_pixiedust[df_pixiedust['Value'] > summary_num -1]\n",
    "# サマリーしたデータをOthersとして表示させたい場合は以下のコメントを外す\n",
    "#df_pixiedust_others = df_pixiedust[df_pixiedust['Value']< summary_num]\n",
    "#df_pixiedust_others  = df_pixiedust_others['Value'].sum()\n",
    "#df_pixiedust_sum = df_pixiedust_sum.append({'Key': 'Others', 'Value': df_pixiedust_others}, ignore_index=True)\n",
    "df_pixiedust_sum = df_pixiedust_sum.sort_values(by=[\"Value\"], ascending=False)\n",
    "df_pixiedust_sum"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 4.3 WordCloudで表示してみましょう!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "display(df_pixiedust, font_path = '/home/dsxuser/work/ipaexg00301/ipaexg.ttf')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 4.3 <<オプション>>WordCloudで表示してみましょう! :Summary"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "display(df_pixiedust_sum, font_path = '/home/dsxuser/work/ipaexg00301/ipaexg.ttf') #font_path指定はtemporary fix\n"
   ]
  }
 ],
 "metadata": {
  "celltoolbar": "Edit Metadata",
  "kernelspec": {
   "display_name": "Python 3.5",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.5.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}