{ "cells": [ { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Licensed under the Apache License, Version 2.0 (the \"License\"); you may\n", "# not use this file except in compliance with the License. You may obtain\n", "# a copy of the License at\n", "#\n", "# http://www.apache.org/licenses/LICENSE-2.0\n", "#\n", "# Unless required by applicable law or agreed to in writing, software\n", "# distributed under the License is distributed on an \"AS IS\" BASIS, WITHOUT\n", "# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the\n", "# License for the specific language governing permissions and limitations\n", "# under the License." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Watson Studioで文字データを可視化しよう\n", "\n", "twitterから気になる情報を取得し、 Watson Natural Languege UnderstandingでKeyword抽出、WorldCloudをPixieDust を使用して表示してみましょう!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "# Part 1 - 分析データ作成\n", "\n", "## 1. Setup\n", "### 1.1 最新の Watson Developer Cloud, requests-oauthlib パッケージの導入\n", "Natural Languge Understandingに使用するWatson Developer CloudとTwitterのOAuth認証に使用する requests-oauthlibパッケージを導入します。" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!pip install --upgrade ibm-watson\n", "\n", "!pip install requests requests_oauthlib" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "### 1.2 PixieDust Libraryの導入\n", "このノートブックでは、PixieDustライブラリを使用してデータセットを分析および視覚化します。\n", "\n", "PixieDustの詳細は[Introductory Notebook](https://dataplatform.cloud.ibm.com/exchange/public/entry/view/5b000ed5abda694232eb5be84c3dd7c1) または [PixieDust Github](https://ibm-cds-labs.github.io/pixiedust/) を参照してください。\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "次のセルを実行して、最新バージョンのPixieDustを実行していることを確認します。 ローカルのjupyter notyebookを使用し、PixieDustをローカルにインストール済みで、それを使用したい場合は、このセルを実行しないでください。\n", "\n", "尚、正式リリース前の`https://github.com/pixiedust/pixiedust.git@va-working-branch#egg=pixiedust`はフォント指定が可能なモジュールで、正式リリースまで一時的に利用しています。日本語表示を可能にするために使用しています。" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# To confirm you have the latest version of PixieDust on your system, run this cell\n", "#!pip install -U --no-deps pixiedust\n", "!pip install --upgrade --no-deps git+https://github.com/pixiedust/pixiedust.git@va-working-branch#egg=pixiedust" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "PixieDustをインポートし、カーネルをRestartが必要な場合はRestartさせます。" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import pixiedust" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ " #### 1.4 Option\n", " Pixiedust runtime updated. Please restart kernel と表示された場合は、上のメニューの`Kernel`-> `Restart`からカーネルをRestartさせてください。" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "### 1.3 wordcloud Libraryの導入\n", "\n", "日本語が表示できるように、日本語フォントも導入します。" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!pip install --user wordcloud" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#日本語フォントの導入\n", "jp_font_path ='/home/dsxuser/work/ipaexg00301/ipaexg.ttf'\n", "\n", "import os\n", "if not os.path.exists(jp_font_path):\n", " !wget https://oscdl.ipa.go.jp/IPAexfont/ipaexg00301.zip\n", " !unzip ipaexg00301.zip\n", "else:\n", " print('IPA font has been already installed')\n", " " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "### 1.4 PixieDust にWordCloudの設定\n", "PixieDustにWordCloud形式でデータが表示できる チャートを追加しましょう。\n", "PixieDustは自分で設定したフォーマットでデータ表示するチャートを設定することができます。\n", "\n", "今回は1列目にwordcloudに表示する文字、2列目にその表示Volume数を入れたpandasのDataFrameを渡すと、wordcloudを表示するチャートを設定します。\n", "例えば下記のようなデータです:\n", "\n", "```\n", "import pandas as pd\n", "df = pd.DataFrame([[\"四月\", 26],[\"May\", 10],[\"June\", 5]], columns=['key', 'value'])\n", "```\n", "| | Key | Value |\n", "| ---- | ---- | ---- |\n", "| 0 | 四月 | 26 |\n", "| 1 | May | 10 |\n", "| 2 | June | 5 |" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from pixiedust.display.display import *\n", "import io\n", "import base64\n", "from wordcloud import WordCloud\n", "\n", "class SimpleWordCloudDisplay(Display):\n", " def doRender(self, handlerId):\n", " # convert from dataframe to dict\n", " dfdict = {}\n", " # df = self.entity.toPandas()\n", " df = self.entity\n", " for x in range(len(df)):\n", " currentid = df.iloc[x,0] or 'NoKey'\n", " currentvalue = df.iloc[x,1]\n", " dfdict.setdefault(currentid, 0)\n", " dfdict[currentid] = dfdict[currentid] + currentvalue\n", " \n", " ##remove 'Others' stopwordsオプションが効かないのでマニュアル削除\n", " #if ('Others' in dfdict.keys())==True:\n", " # dfdict.pop('Others' )\n", "\n", " # create word cloud from dict\n", " wc = WordCloud(background_color=\"white\", width=800, height=400, max_font_size=140, font_path=jp_font_path).fit_words(dfdict)\n", " #wc = WordCloud(background_color=\"white\", max_font_size=140, font_path=jp_font_path).fit_words(dfdict)\n", "\n", "\n", " # encode word cloud image to base64 string\n", " img = wc.to_image()\n", " buffer =io.BytesIO()\n", " img.save(buffer,format=\"JPEG\") #Enregistre l'image dans le buffer\n", " myimage = buffer.getvalue() \n", " img_str = base64.b64encode(myimage)\n", " \n", "\n", " self._addHTMLTemplateString(\n", "\"\"\"\n", "
\n",
"# Your data file was loaded into a botocore.response.StreamingBody object.
\n",
"# Please read the documentation of ibm_boto3 and pandas to learn more about your possibilities to load the data.
\n",
"# ibm_boto3 documentation: https://ibm.github.io/ibm-cos-sdk-python/
\n",
"# pandas documentation: http://pandas.pydata.org/
\n",
"streaming_body_2 = client_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx.get_object(Bucket='wordcloud-donotdelete-pr-wyztdevipqhfkt', Key='apikeys.ini')['Body']
\n", " # add missing __iter__ method, so pandas accepts body as file-like object\n", "if not hasattr(streaming_body_2, \"__iter__\"): streaming_body_2.__iter__ = types.MethodType( __iter__, streaming_body_2 ) \n", "
\n", "\n", "\n", "## この下に入力 " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# この行の下にカーソルを置いて、Insert StreamingBody objectをクリック\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 2.4 ConfigParserへの設定情報の読み込み" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import configparser\n", "\n", "inifile = configparser.ConfigParser(interpolation=configparser.ExtendedInterpolation())\n", "inifile.read_string(streaming_body_1.read().decode('utf-8'))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 2.5 IBM Objerct StrageのCredentialセット\n", "結果のファイル保存に使用するため、IBM Objerct StrageのCredentialをセットします。\n", "\n", " 1. **下のセルを選択して、空の行にカーソルを置いてください。** \n", "\n", "2. アップロードしたTweeterAPIKey.iniファイルの下にある(見えない場合は右上の10/01アイコンをクリック) `Insert to code`の下にある`Insert Credentials`をクリックしてください。\n", "\n", "3. ファイルを読み込むストリーム`credentials_2`をセットするコードが挿入されます。\n", "\n", "4. `credentials_2`は 全てcredentials_1`に変更します。(後のコードで使用するため)\n", "5. 編集が終わったらセルを実行します。\n", "\n", "\n",
"# @hidden_cell
\n",
"# The following code contains the credentials for a file in your IBM Cloud Object Storage.
\n",
"# You might want to remove those credentials before you share your notebook.
\n",
"\n",
" credentials_2 = {\n",
" 'IAM_SERVICE_ID': 'iam-ServiceId-xxxxxxxx-xxxx-xxxx-xxxx-1234567890xx',
\n",
" 'IBM_API_KEY_ID': 'xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx',
\n",
" 'ENDPOINT': 'https://s3-api.us-geo.objectstorage.service.networklayer.com',
\n",
" 'IBM_AUTH_ENDPOINT': 'https://iam.bluemix.net/oidc/token',
\n",
" 'BUCKET': 'wordcloud-donotdelete-pr-wyztdevipqhfkt',
\n",
" 'FILE': 'apikeys.ini'
\n",
"}