{ "nbformat": 4, "nbformat_minor": 0, "metadata": { "accelerator": "GPU", "colab": { "name": "final-sentiment-analysis_v2", "provenance": [], "collapsed_sections": [], "toc_visible": true, "include_colab_link": true }, "kernelspec": { "display_name": "Python 3", "name": "python3" } }, "cells": [ { "cell_type": "markdown", "metadata": { "id": "view-in-github", "colab_type": "text" }, "source": [ "\"Open" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "vpaLrN0mteAS" }, "source": [ "## বাংলা সেন্টিমেন্ট অ্যানালাইসিস, টেন্সর-ফ্লো হাব দিয়ে\n", "\n", "এখনো ড্রাফট পর্যায়ে আছে " ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "GhN2WtIrBQ4y" }, "source": [ "আমরা প্রায় বইয়ের শেষ পর্যায়ে চলে এসেছি। এতক্ষণ আমরা যা শিখেছি - সবগুলোর কিছু কিছু অংশ নিয়ে এই শেষ টিউটোরিয়াল। আগের সেন্টিমেন্ট অ্যানালাইসিস এ আমরা যে ধরনের সাধারণ লাইব্রেরি ব্যবহার করেছি সেগুলোকে কিভাবে প্রি-ট্রেইনড মডেল, ট্রান্সফার লার্নিং এই জিনিসগুলোকে ব্যবহার করে আরো উন্নত করা যায় সেটা নিয়ে আলাপ করছি এখানে। মাঝখান দিয়ে আমরা টেন্সর-ফ্লো সার্ভিং এবং এপিআই নিয়ে কাজ করেছি। এর পাশে আমরা টেন্সর-ফ্লো হাব যেখানে মডেলগুলোকে ট্রান্সফার লার্নিং দিয়ে “সেভড-মডেল” করে রাখা যায় সেই ফরম্যাটগুলো নিয়ে আলাপ করেছি। এদিকে টেন্সর-ফ্লো হাব যেখানে আমাদের প্রি-ট্রেইনড মডেলগুলোকে মডিউল হিসেবে রেখেছি যাকে বিভিন্ন এপিআই দিয়ে কানেক্ট করা যায়। আজকের গল্পের বেশিরভাগ হচ্ছে এই টেন্সর-ফ্লো হাব নিয়ে। " ] }, { "cell_type": "markdown", "metadata": { "id": "4pJhKIERYWTl", "colab_type": "text" }, "source": [ "## আমাদের আজকের কাজ\n", "\n", "১. ওয়ার্ড২ভেক প্রি-ট্রেইনড এমবেডিং ডাউনলোড করে নেব। \n", "\n", "২. ট্রেনিং এর জন্য দুটো লেবেলড ডেটাসেট ডাউনলোড করে নেই। একটা পজিটিভ আরেকটা নেগেটিভ সেন্টিমেন্টের ফাইল। দেখে নেই ভিতরে কি আছে?\n", "\n", "৩. টেন্সর ফ্লো হাব থেকে আগে থেকে তৈরি করা ওয়ার্ড এমবেডিং কনভার্টার/এক্সপোর্টার স্ক্রিপ্ট নামাবো যেটা ১. এ ডাউনলোড করা ওয়ার্ড এমবেডিংগুলোকে টেন্সর-ফ্লো হাব টেক্সট এমবেডিং মডিউলে এক্সপোর্ট করে দেবে। \n", "\n", "৪. ওয়ার্ড২ভেক প্রি-ট্রেইনড এমবেডিং ফাইল এবং ওয়ার্ড এমবেডিং কনভার্টার/এক্সপোর্টার স্ক্রিপ্ট দুটো একই ডাইরেক্টরিতে থাকবে। আমাদের ওয়ার্ড এমবেডিং এর .txt অথবা .vec (বিশেষ করে ফাস্টটেক্সট) ফাইল থেকে এক্সপোর্টার স্ক্রিপ্ট এমবেডিংগুলোর ভেক্টর পড়ে সেটাকে এক্সপোর্ট করবো “সেভড-মডেলে”। \n", "\n", "৫. টেন্সর-ফ্লো হাব এই “সেভড-মডেল”কে লোড করবে মডিউল হিসেবে যা আমাদেরকে মডেলকে সেন্টিমেন্ট অ্যানালাইসিস করবে। \n", "\n", "৬. একটা সিকোয়েন্সিয়াল মডেল তৈরি করবো, সেখানে ডেটাসেট বড় হওয়ায় একটা জেনারেটর ফাংশন ব্যবহার করে সেটার দৈবচয়নের ভিত্তিতে শাফলিং এবং দরকারি ব্যাচিং করবো। এখানে ব্যবহার করবো টেন্সর ফ্লো ডেটাসেটের tf.data.Dataset.from_generator মেথড। এটা ট্রেনিং এর জন্য দরকার।\n", "\n", "৭. ট্রেনিং করে মডেল ‘সেভ’ করবো। এখানে যেকোন লেয়ারের মতো model.add দিয়ে আমাদের টেক্সট এমবেডিং মডিউলকে যোগ করা যায় এই ছোট সিকোয়েন্সিয়াল মডেলে। \n", "\n", "৮. সবশেষে টেস্টিং। কয়েকটা বাক্যকে প্রেডিক্ট মেথডে পাঠালে সেটার দুটো ক্লাস (পজিটিভ/নেগেটিভ) আমাদেরকে জানিয়ে দেবে।" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "Q4DN769E2O_R" }, "source": [ "## দরকারি লাইব্রেরিগুলো লোড অথবা ইনস্টল করে নেই \n" ] }, { "cell_type": "code", "metadata": { "colab_type": "code", "id": "zA07b51AGF5l", "outputId": "341d2107-69a9-4314-fda8-2ae168d7eefa", "colab": { "base_uri": "https://localhost:8080/", "height": 70 } }, "source": [ "!pip install -q tensorflow-gpu==2.0.0-beta1\n", "# !pip install -q tensorflow-gpu==1.15" ], "execution_count": 0, "outputs": [ { "output_type": "stream", "text": [ "\u001b[K |████████████████████████████████| 348.9MB 44kB/s \n", "\u001b[K |████████████████████████████████| 3.1MB 51.2MB/s \n", "\u001b[K |████████████████████████████████| 501kB 43.4MB/s \n", "\u001b[?25h" ], "name": "stdout" } ] }, { "cell_type": "code", "metadata": { "colab_type": "code", "id": "zSeyZMq-BYsu", "outputId": "4267a2aa-a82c-40e6-cda9-6503aaea0a93", "colab": { "base_uri": "https://localhost:8080/", "height": 534 } }, "source": [ "import tensorflow as tf\n", "import tensorflow_hub as hub\n", "import numpy as np\n", "import os\n", "from sklearn.metrics import classification_report\n", "from gensim.models import Word2Vec\n", "\n", "# দেখি কি কি আসলে আছে? \n", "print(\"Version: \", tf.__version__)\n", "print(\"Eager mode: \", tf.executing_eagerly())\n", "print(\"Hub version: \", hub.__version__)\n", "print(\"GPU is\", \"available\" if tf.test.is_gpu_available() else \"NOT AVAILABLE\")" ], "execution_count": 0, "outputs": [ { "output_type": "stream", "text": [ "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/dtypes.py:516: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", " _np_qint8 = np.dtype([(\"qint8\", np.int8, 1)])\n", "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/dtypes.py:517: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", " _np_quint8 = np.dtype([(\"quint8\", np.uint8, 1)])\n", "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/dtypes.py:518: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", " _np_qint16 = np.dtype([(\"qint16\", np.int16, 1)])\n", "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/dtypes.py:519: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", " _np_quint16 = np.dtype([(\"quint16\", np.uint16, 1)])\n", "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/dtypes.py:520: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", " _np_qint32 = np.dtype([(\"qint32\", np.int32, 1)])\n", "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/dtypes.py:525: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", " np_resource = np.dtype([(\"resource\", np.ubyte, 1)])\n", "/usr/local/lib/python3.6/dist-packages/tensorboard/compat/tensorflow_stub/dtypes.py:541: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", " _np_qint8 = np.dtype([(\"qint8\", np.int8, 1)])\n", "/usr/local/lib/python3.6/dist-packages/tensorboard/compat/tensorflow_stub/dtypes.py:542: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", " _np_quint8 = np.dtype([(\"quint8\", np.uint8, 1)])\n", "/usr/local/lib/python3.6/dist-packages/tensorboard/compat/tensorflow_stub/dtypes.py:543: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", " _np_qint16 = np.dtype([(\"qint16\", np.int16, 1)])\n", "/usr/local/lib/python3.6/dist-packages/tensorboard/compat/tensorflow_stub/dtypes.py:544: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", " _np_quint16 = np.dtype([(\"quint16\", np.uint16, 1)])\n", "/usr/local/lib/python3.6/dist-packages/tensorboard/compat/tensorflow_stub/dtypes.py:545: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", " _np_qint32 = np.dtype([(\"qint32\", np.int32, 1)])\n", "/usr/local/lib/python3.6/dist-packages/tensorboard/compat/tensorflow_stub/dtypes.py:550: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", " np_resource = np.dtype([(\"resource\", np.ubyte, 1)])\n" ], "name": "stderr" }, { "output_type": "stream", "text": [ "Version: 2.0.0-beta1\n", "Eager mode: True\n", "Hub version: 0.7.0\n", "GPU is NOT AVAILABLE\n" ], "name": "stdout" } ] }, { "cell_type": "code", "metadata": { "id": "_SF78GWg74cW", "colab_type": "code", "outputId": "037b9f29-13e1-4fbd-93b5-1827a6bd6302", "colab": { "base_uri": "https://localhost:8080/", "height": 34 } }, "source": [ "tf.__version__" ], "execution_count": 0, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "'2.0.0-beta1'" ] }, "metadata": { "tags": [] }, "execution_count": 3 } ] }, { "cell_type": "code", "metadata": { "id": "_qRk_Ff7EGGc", "colab_type": "code", "colab": {} }, "source": [ "# বাড়তি ওয়ার্নিং ফেলে দিচ্ছি, আপনাদের কাজের সময় লাগবে না \n", "import warnings\n", "warnings.filterwarnings(\"ignore\")" ], "execution_count": 0, "outputs": [] }, { "cell_type": "markdown", "metadata": { "id": "KaQVUacCFMhE", "colab_type": "text" }, "source": [ "এই সেন্টিমেন্ট অ্যানালাইসিসে কেরাস লেয়ারে “সেভড-মডেল” ব্যবহার করব যাতে প্রি-ট্রেইনড মডেলগুলোকে ব্যবহার করা যায়। যেগুলোতে আগে থেকে এমবেডিংগুলো বানানো আছে। আমাদের টেন্সর-ফ্লো হাব (tensorflow_hub) লাইব্রেরিটা হাব-কেরাস (hub.KerasLayer) ক্লাস দিচ্ছে যা একটা “ইউআরএল” অথবা ফাইল সিস্টেম থেকে “সেভড-মডেলে”র কম্পিউটেশন এবং প্রি-ট্রেইনড ওয়েট বের করে নিয়ে আসবে। এখানে আমরা টেন্সর-ফ্লো ২ এর “সেভড-মডেল”কে পুনরায় ব্যবহার করবো লো-লেভেল hub.load() এপিআই এবং hub.KerasLayer এর wrapper দিয়ে।\n", "\n", "আমাদের বাংলা ভাষার জন্য কিছু প্রি-ট্রেইনড ওয়ার্ড এমবেডিং ব্যবহার করবো যাতে ব্যাপারটা সহজ হয়। এর আগেও আমরা প্রি-ট্রেইনড ওয়ার্ড এমবেডিং ব্যবহার করার জন্য ওয়ার্ড২ভেক এবং ফাস্টটেক্সট ব্যবহার করেছি। আপনারা তো জানেন, ফেইসবুক ফাস্টটেক্সটে বাংলাসহ ১৫৭টা ভাষার প্রি-ট্রেইনড ওয়ার্ড ভেক্টর ছেড়েছে বেশ কিছুদিন হলো। তবে, ওয়ার্ড২ভেক অতোটা খারাপ নয়। এই মডেলেও আপনারা ওয়ার্ড২ভেকের জায়গায় ফাস্টটেক্সট ব্যবহার করে দেখতে পারেন।\n", "\n", "আমরা ইচ্ছা করলে 'বার্ট' (BERT) অথবা 'ফ্লেয়ার' ব্যবহার করতে পারতাম, তবে এই ফ্রেমওয়ার্কগুলো বুঝলে বাকিগুলো কাজ করা সহজ হবে।" ] }, { "cell_type": "code", "metadata": { "id": "En9nhQ0elBEQ", "colab_type": "code", "outputId": "f1c2ee64-ea6a-48f5-ef9f-7b737a95da94", "colab": { "base_uri": "https://localhost:8080/", "height": 34 } }, "source": [ "hub_url = \"https://tfhub.dev/google/tf2-preview/nnlm-en-dim128/1\"\n", "embed = hub.KerasLayer(hub_url)\n", "embeddings = embed([\"A long sentence.\", \"single-word\", \"http://example.com\"])\n", "print(embeddings.shape, embeddings.dtype)" ], "execution_count": 0, "outputs": [ { "output_type": "stream", "text": [ "(3, 128) \n" ], "name": "stdout" } ] }, { "cell_type": "markdown", "metadata": { "id": "ldyAlgRHlI2R", "colab_type": "text" }, "source": [ "এখানে সাধারণ কেরাস লেয়ার দিয়ে একটা টেক্সট ক্লাসিফায়ার বানানো সমস্যা নয়।" ] }, { "cell_type": "code", "metadata": { "id": "UIVnQxk_lDEP", "colab_type": "code", "colab": {} }, "source": [ "model = tf.keras.Sequential([\n", " embed,\n", " tf.keras.layers.Dense(16, activation=\"relu\"),\n", " tf.keras.layers.Dense(1, activation=\"sigmoid\"),\n", "])" ], "execution_count": 0, "outputs": [] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "9FB7gLU4F54l" }, "source": [ "# ওয়ার্ড২ভেক ওয়ার্ড এমবেডিং ডাউনলোড \n", "\n", "চেষ্টা করছি এতো বড় ফাইল কোথাও হোস্ট করতে। " ] }, { "cell_type": "code", "metadata": { "id": "hqwIWFs_BAvb", "colab_type": "code", "outputId": "5268cbf4-8518-4526-c719-5a1f5d873818", "colab": { "base_uri": "https://localhost:8080/", "height": 194 } }, "source": [ "!wget http://119.81.77.70:8090/bn-wiki-word2vec-300.txt" ], "execution_count": 46, "outputs": [ { "output_type": "stream", "text": [ "--2019-11-22 17:44:01-- http://119.81.77.70:8090/bn-wiki-word2vec-300.txt\n", "Connecting to 119.81.77.70:8090... connected.\n", "HTTP request sent, awaiting response... 200 OK\n", "Length: 2496996336 (2.3G) [text/plain]\n", "Saving to: ‘bn-wiki-word2vec-300.txt’\n", "\n", "bn-wiki-word2vec-30 100%[===================>] 2.33G 18.3MB/s in 2m 13s \n", "\n", "2019-11-22 17:46:14 (17.9 MB/s) - ‘bn-wiki-word2vec-300.txt’ saved [2496996336/2496996336]\n", "\n" ], "name": "stdout" } ] }, { "cell_type": "code", "metadata": { "id": "wWPjDiES0i-S", "colab_type": "code", "colab": { "base_uri": "https://localhost:8080/", "height": 34 }, "outputId": "26a20419-9ae0-4fd7-ef0a-7b9b153eed49" }, "source": [ "!tar cvzf - bn-wiki-word2vec-300.txt | split -b 800m - \"downloads-part\"" ], "execution_count": 1, "outputs": [ { "output_type": "stream", "text": [ "bn-wiki-word2vec-300.txt\n" ], "name": "stdout" } ] }, { "cell_type": "code", "metadata": { "id": "I_0i81K91wFH", "colab_type": "code", "colab": {} }, "source": [ "!cat downloads-parta* >backup.tar.gz" ], "execution_count": 0, "outputs": [] }, { "cell_type": "code", "metadata": { "id": "HJf2_-uN4IIR", "colab_type": "code", "colab": { "base_uri": "https://localhost:8080/", "height": 318 }, "outputId": "94c9236f-0169-4a0e-9399-fcecd1045094" }, "source": [ "!ls -al\n" ], "execution_count": 49, "outputs": [ { "output_type": "stream", "text": [ "total 9806840\n", "drwxr-xr-x 1 root root 4096 Nov 22 17:48 .\n", "drwxr-xr-x 1 root root 4096 Nov 22 15:53 ..\n", "-rw-r--r-- 1 root root 979579412 Nov 22 16:09 backup.tar.gz\n", "-rw-r--r-- 1 root root 2496996336 Nov 14 09:58 bn-wiki-word2vec-300.txt\n", "-rw-r--r-- 1 root root 2496996336 Nov 14 09:58 bn-wiki-word2vec-300.txt-cat\n", "-rw-r--r-- 1 root root 2496996336 Nov 14 09:58 bn-wiki-word2vec-300-txt.original\n", "-rw-r--r-- 1 root root 492830720 Nov 22 16:38 bn-wiki-word2vec-300.txt.tgz.aa\n", "-rw-r--r-- 1 root root 486748692 Nov 22 16:40 bn-wiki-word2vec-300.txt.tgz.ab\n", "drwxr-xr-x 1 root root 4096 Nov 20 16:17 .config\n", "drwxr-xr-x 4 root root 4096 Nov 22 17:06 datasets\n", "-rw-r--r-- 1 root root 140718612 Nov 22 16:03 downloads-partab\n", "drwxr-xr-x 8 root root 4096 Nov 22 17:19 .git\n", "-rw-r----- 1 root root 82 Nov 22 17:09 .gitattributes\n", "drwxr-xr-x 2 root root 4096 Nov 22 16:30 .ipynb_checkpoints\n", "drwxr-xr-x 1 root root 4096 Nov 15 16:31 sample_data\n", "-rw-r--r-- 1 root root 451264512 Nov 22 15:59 word2vec.backup.tar.gz.aa\n" ], "name": "stdout" } ] }, { "cell_type": "code", "metadata": { "id": "4dbCZN5T64Ox", "colab_type": "code", "colab": {} }, "source": [ "!cat backup.tar.gz.* | tar tar xzf -" ], "execution_count": 0, "outputs": [] }, { "cell_type": "code", "metadata": { "id": "okscMsIc7ZNR", "colab_type": "code", "colab": { "base_uri": "https://localhost:8080/", "height": 34 }, "outputId": "3d72f05e-df85-41f7-f476-862119cc52f6" }, "source": [ "!tar cvzf - bn-wiki-word2vec-300.txt | split -b 470m - bn-wiki-word2vec-300.txt.tgz." ], "execution_count": 9, "outputs": [ { "output_type": "stream", "text": [ "bn-wiki-word2vec-300.txt\n" ], "name": "stdout" } ] }, { "cell_type": "code", "metadata": { "id": "qfuPtnBDMLSy", "colab_type": "code", "colab": {} }, "source": [ "!cat bn-wiki-word2vec-300.txt.tgz.* | tar xzf -" ], "execution_count": 0, "outputs": [] }, { "cell_type": "code", "metadata": { "id": "lgYBx3SdD5E1", "colab_type": "code", "colab": { "base_uri": "https://localhost:8080/", "height": 301 }, "outputId": "46cae57c-54a5-46d9-d69a-a1953dd64f8e" }, "source": [ "!apt install git-lfs" ], "execution_count": 13, "outputs": [ { "output_type": "stream", "text": [ "Reading package lists... Done\n", "Building dependency tree \n", "Reading state information... Done\n", "The following NEW packages will be installed:\n", " git-lfs\n", "0 upgraded, 1 newly installed, 0 to remove and 29 not upgraded.\n", "Need to get 2,129 kB of archives.\n", "After this operation, 7,662 kB of additional disk space will be used.\n", "Get:1 http://archive.ubuntu.com/ubuntu bionic/universe amd64 git-lfs amd64 2.3.4-1 [2,129 kB]\n", "Fetched 2,129 kB in 1s (1,640 kB/s)\n", "Selecting previously unselected package git-lfs.\n", "(Reading database ... 134923 files and directories currently installed.)\n", "Preparing to unpack .../git-lfs_2.3.4-1_amd64.deb ...\n", "Unpacking git-lfs (2.3.4-1) ...\n", "Setting up git-lfs (2.3.4-1) ...\n", "Processing triggers for man-db (2.8.3-2ubuntu0.1) ...\n" ], "name": "stdout" } ] }, { "cell_type": "code", "metadata": { "id": "Nh8KXLgqEMFX", "colab_type": "code", "colab": { "base_uri": "https://localhost:8080/", "height": 52 }, "outputId": "65f6ce1d-08a8-4190-bc11-ed70715219eb" }, "source": [ "!git lfs install" ], "execution_count": 51, "outputs": [ { "output_type": "stream", "text": [ "Updated git hooks.\n", "Git LFS initialized.\n" ], "name": "stdout" } ] }, { "cell_type": "code", "metadata": { "id": "E-1qN6I0Ekpi", "colab_type": "code", "colab": { "base_uri": "https://localhost:8080/", "height": 34 }, "outputId": "8ea32be4-e81a-4aa9-fdd2-b2735d235d34" }, "source": [ "!git init" ], "execution_count": 52, "outputs": [ { "output_type": "stream", "text": [ "Reinitialized existing Git repository in /content/.git/\n" ], "name": "stdout" } ] }, { "cell_type": "code", "metadata": { "id": "NWT1vkwsFFC4", "colab_type": "code", "colab": { "base_uri": "https://localhost:8080/", "height": 34 }, "outputId": "08187786-2492-4462-eab8-4ddbfcdabfcc" }, "source": [ "!git clone https://r_hassan@bitbucket.org/r_hassan/datasets.git" ], "execution_count": 54, "outputs": [ { "output_type": "stream", "text": [ "fatal: destination path 'datasets' already exists and is not an empty directory.\n" ], "name": "stdout" } ] }, { "cell_type": "code", "metadata": { "id": "ANwjuEhZFcdq", "colab_type": "code", "colab": { "base_uri": "https://localhost:8080/", "height": 301 }, "outputId": "364407bb-ac29-4ea4-913c-d15107088026" }, "source": [ "!git pull" ], "execution_count": 55, "outputs": [ { "output_type": "stream", "text": [ "remote: Counting objects: 7, done.\u001b[K\n", "remote: Compressing objects: 33% (1/3) \u001b[K\rremote: Compressing objects: 66% (2/3) \u001b[K\rremote: Compressing objects: 100% (3/3) \u001b[K\rremote: Compressing objects: 100% (3/3), done.\u001b[K\n", "remote: Total 7 (delta 0), reused 0 (delta 0)\u001b[K\n", "Unpacking objects: 14% (1/7) \rUnpacking objects: 28% (2/7) \rUnpacking objects: 42% (3/7) \rUnpacking objects: 57% (4/7) \rUnpacking objects: 71% (5/7) \rUnpacking objects: 85% (6/7) \rUnpacking objects: 100% (7/7) \rUnpacking objects: 100% (7/7), done.\n", "From https://bitbucket.org/r_hassan/datasets\n", " * [new branch] master -> origin/master\n", "There is no tracking information for the current branch.\n", "Please specify which branch you want to merge with.\n", "See git-pull(1) for details.\n", "\n", " git pull \n", "\n", "If you wish to set tracking information for this branch you can do so with:\n", "\n", " git branch --set-upstream-to=/ master\n", "\n" ], "name": "stdout" } ] }, { "cell_type": "code", "metadata": { "id": "rCqSg45GFozY", "colab_type": "code", "colab": { "base_uri": "https://localhost:8080/", "height": 34 }, "outputId": "c47c990a-deeb-4bd6-c313-e506e0cb28ce" }, "source": [ "!git lfs track \"*.aa\"" ], "execution_count": 77, "outputs": [ { "output_type": "stream", "text": [ "\"*.aa\" already supported\n" ], "name": "stdout" } ] }, { "cell_type": "code", "metadata": { "id": "lyvy0qGHFvpk", "colab_type": "code", "colab": { "base_uri": "https://localhost:8080/", "height": 34 }, "outputId": "1b4ea154-7eaf-4f91-9f91-7891b9ea7577" }, "source": [ "!git lfs track \"*.ab\"" ], "execution_count": 78, "outputs": [ { "output_type": "stream", "text": [ "\"*.ab\" already supported\n" ], "name": "stdout" } ] }, { "cell_type": "code", "metadata": { "id": "_M2oUpHRU-m_", "colab_type": "code", "colab": {} }, "source": [ "!git add bn-wiki-word2vec-300.txt.tgz.aa" ], "execution_count": 0, "outputs": [] }, { "cell_type": "code", "metadata": { "id": "1jln1SsQVPSb", "colab_type": "code", "colab": {} }, "source": [ "!git add bn-wiki-word2vec-300.txt.tgz.ab" ], "execution_count": 0, "outputs": [] }, { "cell_type": "code", "metadata": { "id": "Rgk1RQOaF9Ac", "colab_type": "code", "colab": {} }, "source": [ "!git add .gitattributes" ], "execution_count": 0, "outputs": [] }, { "cell_type": "code", "metadata": { "id": "GIPtdoA-GDRr", "colab_type": "code", "colab": { "base_uri": "https://localhost:8080/", "height": 87 }, "outputId": "bd4368e1-80a0-4939-ad1d-422d31f349ca" }, "source": [ "!git commit -m \"Track BN-wiki with Git LFS\"" ], "execution_count": 82, "outputs": [ { "output_type": "stream", "text": [ "[master 9615e4c] Track BN-wiki with Git LFS\n", " 2 files changed, 6 insertions(+)\n", " create mode 100644 bn-wiki-word2vec-300.txt.tgz.aa\n", " create mode 100644 bn-wiki-word2vec-300.txt.tgz.ab\n" ], "name": "stdout" } ] }, { "cell_type": "code", "metadata": { "id": "dnx3sp86GQ7v", "colab_type": "code", "colab": {} }, "source": [ "!git config --global user.email \"wideangle@gmail.com\"" ], "execution_count": 0, "outputs": [] }, { "cell_type": "code", "metadata": { "id": "zvlpzvffGbUh", "colab_type": "code", "colab": {} }, "source": [ "!git config --global user.name \"R Hassan\"" ], "execution_count": 0, "outputs": [] }, { "cell_type": "code", "metadata": { "id": "Dn7Uky3XGniC", "colab_type": "code", "colab": { "base_uri": "https://localhost:8080/", "height": 70 }, "outputId": "25326ad6-3d10-4106-b2bb-dc2f4da22a9f" }, "source": [ "!git lfs track" ], "execution_count": 60, "outputs": [ { "output_type": "stream", "text": [ "Listing tracked patterns\n", " *.aa (.gitattributes)\n", " *.ab (.gitattributes)\n" ], "name": "stdout" } ] }, { "cell_type": "code", "metadata": { "id": "lP2GB2CDG615", "colab_type": "code", "colab": { "base_uri": "https://localhost:8080/", "height": 105 }, "outputId": "0fc217d6-2cf7-45fc-894b-08536a8faf47" }, "source": [ "!git push" ], "execution_count": 67, "outputs": [ { "output_type": "stream", "text": [ "fatal: The current branch master has no upstream branch.\n", "To push the current branch and set the remote as upstream, use\n", "\n", " git push --set-upstream origin master\n", "\n" ], "name": "stdout" } ] }, { "cell_type": "code", "metadata": { "id": "4YnFchc5RPmj", "colab_type": "code", "colab": { "base_uri": "https://localhost:8080/", "height": 34 }, "outputId": "6a4a3abe-c569-4289-ee7d-1736a9e26dd8" }, "source": [ "!git push --set-upstream origin master" ], "execution_count": 68, "outputs": [ { "output_type": "stream", "text": [ "fatal: could not read Password for 'https://r_hassan@bitbucket.org': No such device or address\n" ], "name": "stdout" } ] }, { "cell_type": "code", "metadata": { "id": "MZo2aPJ6RsWW", "colab_type": "code", "colab": {} }, "source": [ "!git config credential.helper store" ], "execution_count": 0, "outputs": [] }, { "cell_type": "code", "metadata": { "id": "iGfFiYlrHAwE", "colab_type": "code", "colab": { "base_uri": "https://localhost:8080/", "height": 34 }, "outputId": "a309d689-a509-4caa-c242-3f08ca958b96" }, "source": [ "!git remote add origin https://r_hassan@bitbucket.org/r_hassan/datasets.git" ], "execution_count": 61, "outputs": [ { "output_type": "stream", "text": [ "fatal: remote origin already exists.\n" ], "name": "stdout" } ] }, { "cell_type": "code", "metadata": { "id": "VJxV8AA4H-jv", "colab_type": "code", "colab": { "base_uri": "https://localhost:8080/", "height": 34 }, "outputId": "c28ce1e6-096e-4456-cb7b-30286b533904" }, "source": [ "!git push -f origin master" ], "execution_count": 83, "outputs": [ { "output_type": "stream", "text": [ "fatal: could not read Password for 'https://r_hassan@bitbucket.org': No such device or address\n" ], "name": "stdout" } ] }, { "cell_type": "code", "metadata": { "id": "YDRvEdYiITAB", "colab_type": "code", "colab": { "base_uri": "https://localhost:8080/", "height": 52 }, "outputId": "c1383f86-2f58-4706-907e-641e1dab8a3b" }, "source": [ "!git remote" ], "execution_count": 35, "outputs": [ { "output_type": "stream", "text": [ "origin\n", "rhassa\n" ], "name": "stdout" } ] }, { "cell_type": "code", "metadata": { "id": "1ejuj0B5JGn2", "colab_type": "code", "colab": { "base_uri": "https://localhost:8080/", "height": 34 }, "outputId": "1ac66a72-8577-460a-f71f-705ca128b9eb" }, "source": [ "!git push -u origin --all" ], "execution_count": 41, "outputs": [ { "output_type": "stream", "text": [ "fatal: could not read Password for 'https://r_hassan@bitbucket.org': No such device or address\n" ], "name": "stdout" } ] }, { "cell_type": "code", "metadata": { "id": "JcR75qVrVzfL", "colab_type": "code", "colab": { "base_uri": "https://localhost:8080/", "height": 34 }, "outputId": "7e45bc8c-0630-4054-e1d4-3befb1c2e394" }, "source": [ "!git push origin master" ], "execution_count": 85, "outputs": [ { "output_type": "stream", "text": [ "fatal: could not read Password for 'https://r_hassan@bitbucket.org': No such device or address\n" ], "name": "stdout" } ] }, { "cell_type": "code", "metadata": { "id": "QAzqijJnKi3n", "colab_type": "code", "colab": {} }, "source": [ "!git config --global user.name 'r_hassan'" ], "execution_count": 0, "outputs": [] }, { "cell_type": "code", "metadata": { "id": "ILCGF_G9Kvmm", "colab_type": "code", "colab": {} }, "source": [ "!git config --global user.password 'fakepass'" ], "execution_count": 0, "outputs": [] }, { "cell_type": "code", "metadata": { "id": "IiDG7FmvUKPD", "colab_type": "code", "colab": {} }, "source": [ "!git config --global core.askpass" ], "execution_count": 0, "outputs": [] }, { "cell_type": "code", "metadata": { "id": "ZQd4OyHJUoCz", "colab_type": "code", "colab": { "base_uri": "https://localhost:8080/", "height": 283 }, "outputId": "2ae6c340-cdad-46dd-d093-13520a2ebc24" }, "source": [ "!git status" ], "execution_count": 84, "outputs": [ { "output_type": "stream", "text": [ "On branch master\n", "Untracked files:\n", " (use \"git add ...\" to include in what will be committed)\n", "\n", "\t\u001b[31m.config/\u001b[m\n", "\t\u001b[31mbackup.tar.gz\u001b[m\n", "\t\u001b[31mbn-wiki-word2vec-300-txt.original\u001b[m\n", "\t\u001b[31mbn-wiki-word2vec-300.txt\u001b[m\n", "\t\u001b[31mbn-wiki-word2vec-300.txt-cat\u001b[m\n", "\t\u001b[31mdatasets/\u001b[m\n", "\t\u001b[31mdownloads-partab\u001b[m\n", "\t\u001b[31msample_data/\u001b[m\n", "\t\u001b[31mword2vec.backup.tar.gz.aa\u001b[m\n", "\n", "nothing added to commit but untracked files present (use \"git add\" to track)\n" ], "name": "stdout" } ] }, { "cell_type": "code", "metadata": { "id": "pCdVTe2Yu8GI", "colab_type": "code", "outputId": "3d247295-20e4-43d6-fcfc-9fe69ea3f5be", "colab": { "base_uri": "https://localhost:8080/", "height": 161 } }, "source": [ "# প্রথম কয়েক লাইন দেখলেই বুঝবেন কি বলতে চাচ্ছি - ওয়ার্ড এমবেডিং ভেক্টর\n", "# শুরুতে বাংলা শব্দটা, এরপরে ভেক্টর\n", "\n", "!head -7 bn-wiki-word2vec-300.txt" ], "execution_count": 64, "outputs": [ { "output_type": "stream", "text": [ "669605 300\n", "এবং -0.93040687 0.60418844 0.6206399 0.15345214 -0.5920706 1.4105053 -0.19125229 1.9424365 -0.28456318 0.86637765 -0.34657523 0.008341969 0.91945136 0.33013958 -1.6456839 -1.6953105 1.9161752 1.1476667 0.17091753 0.3958588 1.0207202 -0.8163486 0.32261878 -0.30720857 -0.6554219 -1.7145324 -1.6113459 0.29473424 -0.8452265 0.18330733 1.047255 0.22511762 0.822286 0.16025306 0.66336554 1.0438149 0.6023638 -0.64874256 1.5032426 1.5895689 0.75842565 -1.2870961 0.079544045 0.3080709 0.32782224 -0.7009649 0.15249959 -1.027652 0.8451291 -0.32714248 0.42230263 -1.4003234 -0.59839815 -0.67217594 1.072765 0.2526819 0.16195725 -1.2569925 -0.5837513 -1.1979657 -0.6138971 0.79471904 -0.9409709 1.2761021 0.89106756 0.53292865 2.2675922 -0.13259064 0.15469204 1.3745763 -0.5177524 0.41830626 0.5299528 -0.40102947 -0.42628673 -1.0313057 0.55274475 -0.88331276 0.21075027 1.387416 0.5721329 0.35013482 -0.21881458 -2.7000587 -1.14341 1.7165354 1.2415577 -0.13076034 0.9847175 -1.4681516 0.35087734 0.7275639 -0.9640771 0.3047465 1.379611 -0.7907444 0.60839903 1.2384896 -0.28551388 -1.2486242 0.43692696 1.3337344 1.4157426 -0.5497216 -1.0586624 1.1485555 -0.2848548 -0.052209064 -0.27139845 -0.9404369 -0.5050181 -0.41314003 -0.28034905 -1.5697598 -0.59607816 -0.7769144 -1.5711054 -0.590155 0.04364686 0.10375001 0.97881234 0.44856653 1.7305022 1.0183886 0.78831923 1.2242231 0.4937382 -1.0819402 -1.3215535 -1.01117 0.206751 0.6223832 1.0083629 -0.59937185 1.3992922 -1.7688543 -0.62926817 0.9333828 -0.20508951 -0.18137959 1.6263956 -0.26525566 -0.70443696 -1.6312447 -1.3012207 1.5594385 0.18196549 0.10253664 -0.27073783 -0.57916814 0.08798229 0.7529285 0.27043918 0.13748497 -0.7652544 1.4045902 -0.2284309 -0.017713083 -0.16723397 1.4385971 1.075745 1.4567398 -0.3620912 -0.049863935 0.21818072 0.494385 0.26240674 0.47549942 -0.40527856 -2.5260077 -0.93502015 -0.6997547 0.66674054 -0.32764944 -0.51164323 0.15243222 -0.2710508 1.4720447 -0.9499978 1.5680584 -0.49819544 0.9979104 1.3278234 0.28267214 -1.9825058 -0.5250971 -1.3529805 1.0538665 -2.9742591 1.3853692 -1.0246351 -0.45788342 -0.58545464 -0.59052104 -0.07944148 0.3599861 -0.09542372 0.48706493 -0.30248284 0.91800874 0.22113384 -1.6033583 0.611133 -1.6167171 0.8257279 1.511753 0.07948396 -0.42566258 1.413907 1.2193689 -2.1537566 0.9580351 0.88082844 0.85505676 -0.097767904 1.9956189 -1.4581429 -0.40138552 -1.5197046 0.895531 0.43802926 1.9927098 0.18570288 -0.1933191 0.37472582 1.6219916 -1.3226445 1.504367 0.7569192 0.3736352 1.9224497 -0.81562334 1.1996975 0.81546986 1.5816469 0.37639666 -1.4780647 0.60364085 -0.2537704 0.10160284 -1.1119928 0.9420247 0.33656976 -0.6338852 0.2734657 -0.3280644 1.0076349 0.31781343 -1.4982619 0.24992752 1.289109 -1.1146048 -0.38495338 -1.235198 0.69176793 -0.73397833 0.8294925 -0.3333072 -0.75034577 -0.30954805 0.51477766 -0.3312381 1.8786335 -0.95206314 1.3874136 0.13236618 -2.1544163 -1.9921167 -0.68547404 -1.6311249 -2.8701363 1.0035746 0.84304595 -0.046721794 -0.5423261 -1.3923548 1.0062038 1.3781087 0.6179311 0.1195227 1.5426967 1.2713842 -0.87397003 1.1154585 -1.1934513 0.54639715 0.25530308 -0.8559359 -1.4125086 1.0636417 0.4095158 -0.21225604 0.14483044 0.043718226 0.3257524 0.74142253 0.6205646 1.7895927 0.6101109 0.2586147\n", "ও -0.58524036 0.45447958 -0.271059 0.903281 -0.93572026 0.781395 0.13197729 2.583347 0.017617458 0.9133457 -0.20172201 0.5319752 0.045481898 0.48530668 -2.8445883 -2.1672294 1.7199044 1.5424261 -0.23204179 -0.47215548 0.23906806 -2.0918553 0.3764463 0.11041565 0.9503702 -1.0620172 -1.6744974 0.51625335 -0.26014885 -0.9991527 0.34031266 -0.43819448 -0.21267189 -0.7827624 0.5974515 -0.671627 1.1221253 -0.16315356 0.66311425 0.79359466 0.36301354 -0.1322366 1.1541231 0.0052844356 -0.16176642 0.24536197 0.26942456 -0.8911422 1.5041916 -0.099623024 0.13907145 0.6387255 -0.97795665 -0.20738949 1.0964372 1.3952187 1.1587632 1.845541 -0.61203593 0.031217841 -0.15978587 0.82115567 -0.5647423 0.99255794 -0.3691354 -0.43174398 1.0422002 0.2552028 0.91485304 1.1568944 -0.49558777 0.84087986 0.17164658 1.0703882 -0.8077708 -2.308471 0.29645053 -1.1976753 -0.1050389 0.48258373 0.7099251 -0.3150525 -0.52110106 -1.5863286 -0.5118045 2.1650927 1.2914363 -0.94399244 0.78685063 -0.57375664 -0.015934566 0.53221756 -2.271201 -0.20666255 0.6977974 -1.0001872 0.19057953 1.5204624 1.2299569 -1.5117029 0.7973736 1.0480093 1.1071234 0.10898301 -0.3522087 2.949563 -0.66522825 0.5497931 -0.06854835 -0.45992488 -0.38099083 -1.4241983 1.7252287 -0.7620656 0.90273964 0.01332941 -1.9926864 -1.3374759 -0.030660542 -0.12929039 1.8411474 1.0526581 2.9473522 1.3345705 0.5947804 1.2005383 0.07082191 -0.23855963 -0.92356974 -0.7214136 0.09630342 1.3055235 1.1290928 0.16470808 1.0403022 -0.33776775 -0.9134136 0.44647256 0.6388682 -0.5466339 0.8665062 -1.8086394 -1.3353461 -0.5004335 -1.0874145 1.2467664 -0.94416803 0.14157577 0.5646803 -1.2637936 -2.4817054 -0.43978366 -0.9711801 -0.11627038 0.2651847 0.22482847 -0.646065 0.6627505 0.18591863 1.3862606 0.26508737 1.4358656 -0.1356123 0.30023885 -1.0071484 0.14609484 -0.40067545 0.11614575 -1.1211672 -2.9948976 -1.1640134 -0.8893208 0.69647986 0.5226507 0.43557674 0.4416255 -0.1976004 1.3055359 -0.35186562 0.96851355 0.6488393 1.613607 0.6422151 0.039758135 -1.3041908 -1.0441707 -0.33136213 0.42566493 -1.8446702 0.8682943 -0.4355695 -0.05710234 -1.3763376 0.44872513 -0.18696697 0.7705693 0.20046796 -0.49683523 -0.9298872 0.19058146 -1.049943 -0.9615622 0.43946669 -1.7700518 0.18221591 1.9623158 1.4556637 -1.4133968 2.1814926 -0.36896977 -0.26596007 0.7020955 0.7601751 0.9486567 0.12267504 1.7857215 1.3161366 -0.007625853 -0.19064592 2.14113 -0.06724504 1.9534035 0.3407787 0.12543564 0.71115464 0.050714646 -0.59355587 -0.03792645 0.23806724 -0.26249117 1.2605692 -1.0072167 0.9429875 0.76088065 1.6205138 -1.8962756 -2.7197406 0.089888565 0.5498877 -1.2580627 -1.8090764 -0.91368014 0.72881436 -0.22046834 -0.69258153 -1.5590016 0.33359516 0.3590895 -0.19181904 0.7086726 0.8025603 -1.6143092 -1.9493469 0.31151736 0.8470133 -1.691301 1.0816429 0.71139693 -0.2311525 -1.1807052 0.87001324 -0.3568379 1.6725506 -0.50945103 -0.27646524 0.37701982 -2.2516382 -2.0910823 0.4179324 -1.0593878 -1.7708141 0.87476987 -0.030148244 1.4041723 -1.1686617 -0.19899243 0.8683499 1.4706208 0.95210135 0.8617375 1.2374552 0.26896238 -0.35295638 1.6243384 0.05577474 1.4881486 -0.75438017 1.0217514 -1.022703 0.840268 0.9699244 -0.30531976 -0.9636556 0.18336996 0.85473806 1.6347116 -0.83497536 0.63057977 1.2308702 -0.32897514\n", "হয় -1.8247691 -2.0915492 1.5521196 0.9817031 -0.71198153 -1.1406553 0.20706734 2.002229 0.4222637 1.0221027 1.9313294 0.20992161 -0.22667134 0.5778148 -1.9811527 -1.6068003 0.51935846 4.190498 1.6591264 3.696415 -0.8972124 -0.9695677 -1.0911344 -2.5159888 0.64899313 -3.9777596 1.2032437 1.5543182 -0.5659898 -1.1380624 0.7247645 -0.81457055 2.3773313 -0.10186057 0.2803823 -1.8328085 1.8070668 0.49927655 3.7146556 2.3095276 1.4903845 0.24114813 -1.6105928 2.377896 -1.8299401 -1.6903756 -0.045043427 -3.1909945 -0.9577706 0.89430904 2.3162541 1.4072512 0.15841399 -0.17395592 1.276816 0.14795898 0.7110485 0.8493568 -0.89166087 1.3985461 -0.2607159 -1.2707427 -0.5525633 1.288889 -1.2450587 2.4014404 -1.647329 0.3628347 -0.36183968 1.8255006 -0.49935928 -3.7446706 3.0025551 1.1856798 -1.0313323 -1.7755738 -3.3956897 -0.0071685165 0.46419826 0.79227895 1.6122632 -1.1335179 0.69419193 -2.092566 -1.2973521 3.7132552 -0.12574294 1.7159435 0.5323431 -1.177301 -0.5910223 1.8957894 1.3004191 0.043679 -2.1573641 -1.2192295 -0.5794048 0.5401791 0.09259804 -0.884269 -0.8885932 -1.6349372 2.9712481 -0.5136787 0.47402793 3.2185366 -1.0303059 1.8862349 -1.2980344 1.3100715 2.224686 2.2730892 0.2538476 -0.10844603 -0.13062906 2.7270248 0.6029394 0.61437994 2.1098824 0.831568 1.8434451 -0.96293265 -1.2397857 1.6735845 -1.2131585 0.26517543 0.81909645 0.6247328 -1.4168934 1.3242841 -0.7107275 1.4562027 0.64761376 2.464252 -1.7311655 -1.0090224 0.26995316 0.8978482 1.0040481 -0.9774874 -0.14152434 0.90454286 1.0976329 -0.26051864 -0.67981464 -0.14273185 -0.59774244 -1.2385356 -0.5086274 1.9503305 -0.4984986 3.2457983 -2.5104663 0.71335363 1.2941341 0.24610522 2.3594325 -2.3504922 1.334544 1.430389 2.6565323 2.0589359 0.16488439 1.20627 0.86995834 -0.9385664 -0.70905566 -1.2333251 0.31907684 -2.1176155 -0.5103191 0.038472503 0.6015968 -1.3714979 0.21941918 -2.0026355 0.5943965 -2.8778977 0.27973834 1.6905153 0.015640106 -1.3693813 0.27766258 1.8742485 -5.211662 0.48750126 -0.27569988 1.3009589 -3.8204072 1.46969 1.0037109 -2.82306 -0.746158 1.1147379 -3.249578 2.260624 1.354588 -1.1860611 -0.014181979 -0.19925839 -0.3954283 -0.6466107 -1.9974515 0.4805865 -2.211764 0.52255344 -0.18123792 -1.6121762 3.2712665 1.7167118 1.1680586 1.1729398 -0.28861085 1.5550202 1.0895033 2.2843733 0.7231674 0.7465411 -1.0611109 -0.49648127 0.4654247 2.1616533 -1.2523402 1.0771954 0.74913055 1.0157934 2.9522753 -0.76349473 -0.84537077 0.82048655 1.1640248 0.2368476 0.34759012 2.1007953 0.7149294 -0.21828659 -0.11407231 -0.1643775 1.2118605 3.0619822 -1.5080625 -0.93636864 0.0018499214 1.2086605 0.26163357 0.56321836 -1.9514295 2.2925124 -0.45035234 -3.6912563 -2.088268 0.5641842 -2.1283545 -2.0851738 0.36083636 0.95293146 -0.7159297 0.26238683 -0.0973097 0.28054848 -0.34115073 -0.52036345 1.0155922 -1.0751973 0.42513663 -0.052052822 -1.1211832 -0.41907045 0.707179 -0.69412476 -2.3132749 -1.1080768 -0.09723622 -2.4235508 -3.4284568 0.8030779 1.1016521 -2.4200463 -1.4302769 0.3497601 1.2805206 0.71379256 0.8369469 1.64142 0.6716727 -1.4271523 0.48001695 -3.4348445 -0.2225095 1.2764535 -0.8893114 -0.31440404 -1.0537965 -0.5511297 -0.7446707 -2.1130662 1.9561703 -0.7298195 -0.45455173 0.49335495\n", "করে 1.4638041 0.46013883 0.4770293 2.1610985 -0.09711245 -0.5382342 -1.3283437 0.70485425 -0.5950621 1.2856623 -0.8577408 -0.70547193 -1.6236017 0.6296531 -1.9744449 -2.508508 1.4924573 1.1503576 1.5973461 0.5769804 -0.6069721 0.13309194 -2.1690218 -0.3986674 -0.1678412 -2.250409 -0.9179335 -1.0952939 -0.88789827 0.09512321 1.7706733 0.11340081 2.1630971 -1.0658263 -0.24395598 -1.1115603 1.5150295 1.1675327 2.7483194 3.5537095 -0.59933996 0.81488013 -0.6739365 1.773015 1.0559928 0.6247622 -0.3478868 -2.965228 -0.21492709 0.89353067 2.9318707 1.9227598 -0.11769063 -2.4075592 0.6904535 2.3752434 -0.037284724 -0.63309926 -3.128802 2.9791856 1.5833476 0.6449195 -0.32738364 2.6401837 0.44933286 1.1283821 2.565787 1.5990252 2.102574 1.1839719 -0.08801095 -1.3336484 0.4691175 2.7238932 -0.07542253 -1.1330373 -1.0942403 -1.2601542 1.1726574 0.8933551 1.5275543 0.8165056 -0.5239747 -2.4554758 0.36391234 0.6503322 2.8399084 1.4098556 -0.51982903 -2.1042693 -0.05306528 -0.56123155 1.1982063 1.2563752 0.7620389 -1.2276828 -0.84249467 1.4393364 0.77440304 -2.104103 -2.059731 -0.21191682 -0.19533132 0.9014223 -0.13766639 1.7979478 -1.5070478 -0.44969973 0.33912385 0.5111161 0.42914236 2.8019922 -0.593548 1.031023 -0.35696626 1.9381701 0.61078405 -0.1258579 1.0306083 0.8604857 -0.13915882 -1.4226767 3.048097 1.00559 -0.22615053 1.128138 1.1582717 -2.166985 -0.65041953 -0.08019211 1.2970815 1.0033172 0.635273 1.8792266 -1.8587356 -0.4351632 -1.0432365 0.3433936 1.877641 -0.6134908 1.7805911 0.44998482 0.6516786 0.08570023 -0.25780004 1.9028347 1.5127552 0.44377133 -1.0347323 0.073662594 0.6528598 1.7802001 -0.08616957 0.10228003 0.90827394 1.8746697 3.0865862 -1.5397671 -2.0126138 1.3180072 2.6859086 2.988848 -0.47898462 1.8480692 -0.8682913 1.2637501 0.38463044 -2.0548644 1.1856201 -2.8460965 -2.0303233 -0.34533274 2.5564435 0.85207915 -0.07946124 -2.363868 -1.1239123 -1.2606779 -0.930255 1.6201055 -0.9506335 -1.3052654 1.1424946 2.498399 -2.8332803 2.473882 -1.9490753 1.0908501 -2.2823808 1.3368375 1.2825803 -1.5106692 1.9418434 0.6338872 -4.218184 0.1354068 0.54564655 -0.613364 0.8976207 -0.33982697 0.43815103 -1.8296937 0.3118065 -1.1050173 -1.0681049 3.0734577 -0.06152384 0.04139191 2.8266747 1.7856406 -0.37226483 -1.1455252 0.07255044 3.1156726 -0.9318064 2.2944505 -0.4834573 -0.22268529 -1.4305617 -0.71662635 -1.4057091 3.9090807 0.3661221 -2.7341263 0.43841648 -0.7899564 0.7970126 1.2823138 -0.35770118 0.21682079 2.389844 -3.4076505 2.0682027 -0.01840818 -1.5271113 0.3105775 -1.7938234 1.9284834 -1.203957 0.54004896 -0.6391245 -0.95948666 0.7867698 1.0515132 -1.1418717 0.61196977 0.50458574 -0.9365569 -0.6977027 -1.6735121 -1.8234655 -0.65726954 0.51447946 -0.74598676 0.1281155 -1.5880537 1.9741507 0.3338909 0.6053169 -1.1810766 0.7080426 -0.96108526 1.6863154 -0.9004534 1.7303089 -0.5598361 -1.5497233 0.27824214 0.29501814 -1.0096308 -2.839824 -0.3321661 1.1058911 -1.7745312 -2.4715776 -0.32964313 1.6028548 2.7005281 -0.5799925 0.34175193 2.3922627 1.4408594 -0.21198568 2.0125268 0.73433226 0.11654882 -1.5469825 0.052762344 -0.81855565 0.42038527 1.0864041 -1.083027 -0.4222766 0.56002355 -1.7107468 -1.171553 2.6199453 -0.6048778 1.6012655 -0.3749908\n", "তিনি -1.1738678 0.37016252 -0.19781187 -0.16410504 -0.753571 -0.4210705 0.01682054 -0.26354325 1.439832 1.9638042 -0.18513128 0.21441272 -0.369283 0.7091637 -1.6222771 -1.0937014 -0.4071048 -1.758798 1.9133385 -1.0898916 0.5836275 1.6742417 1.6410733 -0.34128645 1.5230937 0.21261688 -1.420016 -2.4655473 -1.3401339 0.018569082 -0.034572214 -2.9821966 -0.91251194 0.7565992 0.45008317 0.99885845 -0.6392834 -0.44748396 1.8189455 1.6601168 0.24081528 -0.15245743 1.1164213 -0.60606116 1.1505754 -0.5297351 -1.8496084 -0.7881723 0.8304548 -1.2648691 -0.39852887 1.6382383 0.0046687606 -2.6636164 1.0559351 -0.14164782 0.47628072 0.52358997 -0.18506156 0.5733556 -0.7096374 -0.1830227 -1.5379386 0.60600984 -1.4188213 2.2663279 0.99508315 -0.9390181 -0.06525691 0.7604507 0.40174294 -0.936488 0.581012 0.8795788 1.8943496 -1.1697103 0.046171933 -0.9759555 0.64223045 -0.30895722 1.4993834 1.8423985 -0.5992019 1.4912438 -0.6711824 0.67837816 2.2673793 0.6526759 -0.2134826 1.2043437 -1.3382409 2.451539 -0.5747329 -0.4633366 1.9642861 -2.5575483 2.2893333 -1.731404 -2.2986598 -1.814027 -0.20896345 0.7783642 1.1070861 -0.6239703 1.4139019 1.3592714 0.6100163 2.6752806 0.5042902 -0.0152576035 0.2540001 -1.4778836 -0.08349057 -1.4187082 1.3294297 0.06202054 0.5349524 0.6188579 0.011822534 1.363343 1.0386257 0.7101702 -0.48664105 -1.2881335 1.5449836 2.4187858 1.3230757 -1.7545801 -1.2919799 -1.8938507 0.88096726 -0.3899972 -0.051461186 0.95661414 -0.9052122 -2.28024 -0.8510748 -0.36411715 0.55613613 0.7228644 2.2993028 1.2776306 0.57384104 0.0894299 -1.6698265 -0.76141024 -0.2615726 0.22468947 -0.90536505 0.8760995 -1.0942718 -1.7030396 1.5688227 -0.015965315 3.0786073 2.171664 -1.9684086 -0.71591437 0.84308565 -0.023123678 1.0272353 1.6805394 0.9659495 -0.693886 -0.19121507 0.69738144 -0.81845325 1.4621712 -0.64663017 -0.48601997 -1.8073735 1.9365724 1.0728244 -1.675593 -2.48825 -2.2549763 0.99883205 0.011717423 -0.968143 2.3676248 1.1581804 -1.1019763 -0.4052338 0.259193 -1.6883879 0.5301176 1.6926105 -0.27319026 -0.227297 -0.4368029 -0.7653405 -1.510421 -0.08304167 0.7327884 -0.72956216 0.6924532 0.12786134 -0.44742504 0.68064266 0.46439362 0.13899907 -0.03652357 -0.78007084 -1.0838907 0.9138987 1.2062742 -0.2172192 0.4082591 1.4207231 2.244966 -1.5546291 0.605055 -1.2843417 -1.4083458 -0.21145965 -0.23477666 -2.2664113 -0.28401485 0.66756046 0.76596564 -0.7703791 1.983706 1.670782 0.95740604 0.98075515 -0.97029024 0.8108358 0.14063218 1.0512149 0.65278953 -0.098545246 -1.505401 1.5409887 0.7402301 2.480546 0.08127695 -0.9955454 1.9439696 -2.7986643 -0.8857946 -1.4971809 -2.7917898 -0.047041085 -0.7761789 0.13264154 1.9617302 -1.6107641 -1.6877475 1.0520048 0.12970483 -0.24276163 -0.41984153 -0.15497293 0.9001811 -0.3241018 -1.3436453 0.5630198 -0.3871916 0.71962655 0.23657644 -0.84953403 -0.42115983 1.8571 0.9960627 -0.78075564 -1.2766498 0.030295808 -1.1820985 -0.26983517 -1.1656342 2.0044823 0.17986952 -0.1600661 -0.20971248 0.38795295 1.4175266 0.21813197 -0.25860766 -1.3707263 2.8051105 1.9417973 -1.3433391 -0.5040571 0.2990665 -1.5229963 -0.37078992 0.70896035 -0.019158987 0.23528913 1.5369213 1.0056368 -1.689032 -0.5990417 1.3212885 -0.85864425 -0.46737942 -0.8508794 -0.39553878 -0.35939643 2.2599432\n", "করেন 0.19318587 -0.9337364 0.5890889 1.1284809 -0.85182375 -1.8623679 -1.0837123 0.8337486 -1.4050269 2.0234854 0.36548066 0.61430967 -2.1424985 0.93722534 -1.5154264 -0.38403997 0.61476415 0.43370208 4.0862966 0.13589169 -1.4761454 0.43438548 -1.9887968 -1.0147327 1.4183593 -3.1219664 -0.8659663 -1.6563139 -1.2397468 -1.0541365 0.16262296 0.38919505 3.6184697 0.5404631 0.45725846 -0.3386037 0.83069044 1.3954304 2.4237938 2.018734 -0.30144882 1.15298 0.26642394 0.88249487 0.2900979 0.3875404 -1.026831 -3.002415 0.11618454 -0.6558714 3.1870656 3.3008626 -0.15362315 -2.0912647 2.6626966 2.197552 0.48283696 -0.026623448 -4.7704725 2.4603713 -0.71151155 -0.12804835 1.4783677 3.0028138 -1.366305 2.4201086 2.1641617 -0.50529355 1.9294802 2.5331008 -0.33527365 -0.54821765 -0.66211706 2.3656108 1.3793545 -1.7108806 -1.4458313 -0.48866358 0.75904554 -1.0844674 2.1751697 -0.2692254 -1.6566877 -1.1986219 1.2887886 -0.21022478 2.4642634 1.1712825 -1.7327492 -0.3192753 -0.80545455 -0.08996122 0.98601866 1.9941684 0.4085885 0.17396548 1.135127 0.38655835 0.7577267 -2.6354032 -1.3291075 0.8205332 -0.8832689 1.5700159 -0.14703543 2.9848032 -1.8170913 0.25200543 1.4294518 -0.7820231 0.10771451 1.7591317 -1.1159538 0.89680964 1.0042003 0.785278 1.7250232 0.19296032 1.5285679 2.7183332 0.78775007 -0.92460287 1.057212 0.3105081 0.2049064 2.5118272 1.5450809 -2.0605302 -3.5001185 -0.68298286 1.9003531 1.4539047 -0.8866524 1.5103813 -3.7033958 0.41399 -0.9688876 2.19605 0.39015073 0.92533594 3.3174067 1.1528869 2.2846165 0.89071685 -0.7841825 1.0828366 1.044191 0.32648712 -1.9048945 0.37079397 -0.01011078 -0.7614668 -0.5495317 1.1998264 1.8840824 1.5695889 3.3184748 0.4852694 1.0154525 -0.021934023 4.4868765 1.6185771 -0.3153896 -0.36514628 -1.9076306 1.3816717 -2.1277153 -1.9272645 -1.3931488 -0.57421815 -1.1602892 -0.20142998 1.9718063 1.2130727 -2.745902 -3.8931425 -0.1855184 -2.4099195 -0.8878476 2.6704414 0.5670274 0.38503766 0.35302347 4.551226 -2.0581865 0.9459339 0.337567 0.21248995 -2.9210923 -0.36402667 0.63711524 -0.952968 1.1995943 1.3776119 -3.9117785 0.34826854 2.3526456 0.92054784 2.5165944 -0.98528296 -1.0495394 0.52338076 -0.78099805 -0.33064157 0.33150956 1.875219 1.2452867 -1.6603845 2.65914 3.361523 -0.9170144 -1.004725 -0.31350675 2.7215428 -1.635537 1.1541646 1.7256192 0.12370112 0.48722214 -1.6641299 -2.1762908 4.199462 -0.2883563 -0.16597326 -0.553747 -2.022318 1.4750097 1.1440145 0.46725643 1.014653 3.361021 -3.4376462 1.2877408 -0.5148107 -1.4415078 -0.7464355 -1.5502417 1.7877142 -0.5613597 2.2749197 -0.9590551 -0.6896642 -0.6645099 0.541241 0.5105197 0.9254783 -0.7677737 -0.45314562 0.72046924 -1.971568 -2.613329 -0.49380887 -0.83681905 0.29184273 0.7707044 -0.73559433 2.3954475 -0.07767087 -0.30188093 -2.1767392 2.1759086 -2.0669603 -0.019899957 -0.19439699 0.54339635 0.1168371 -0.6867408 0.21940033 0.90612113 -1.8674718 -0.22612645 -1.3948021 -0.25322914 -2.3842711 -2.5536478 -0.26761347 2.2381825 1.4699225 -0.17046195 0.07034144 1.3627189 0.8321233 1.3198925 2.1762233 1.9735737 0.19121808 -0.40528092 1.0960783 -3.6836426 0.32600078 -2.6550748 -1.9942055 -2.7730727 0.3876345 -0.06702569 -1.1657287 2.536466 -2.1727517 2.0637708 0.57837164\n" ], "name": "stdout" } ] }, { "cell_type": "markdown", "metadata": { "id": "AT0UqnHW-MlC", "colab_type": "text" }, "source": [ "## সেন্টিমেন্ট অ্যানালাইসিস এর ট্রেইিং ফাইল\n", "\n", "দুটো ফাইল, একটা নেগেটিভ আরেকটা পজিটিভ। bangla-sentiment.neg ফাইলে সব নেগেটিভ, সেদিক থেকে bangla-sentiment.pos ফাইলে সব পজিটিভ বাক্য। লেবেলিং করেছেন আমাদের মতো মানুষ। তবে ভালো ডেটাসেট খুঁজছি আমি। এমুহুর্তে এই ডেটাসেট দিয়ে সাহায্য করেছেন সোসিয়ান, বিশেষ করে তারেক আল মুনতাসির। সোশ্যাল মিডিয়াতে মানুষ কি লেখেন সেটার ওপর আমাদের কন্ট্রোল নেই। তাই এই ডেটাসেটে কি লেখা আছে সেটা নিয়ে আমরা মাথা ঘামাবো না। রিসার্চের জন্য ব্যবহার হিসেবে ধরে নিচ্ছি আমরা।" ] }, { "cell_type": "code", "metadata": { "colab_type": "code", "id": "bYv6LqlEChO1", "outputId": "2717b091-0500-41db-d7cd-125aa8a9feed", "colab": { "base_uri": "https://localhost:8080/", "height": 585 } }, "source": [ "!wget https://github.com/raqueeb/datasets/raw/master/bangla-sentiment.pos\n", "!wget https://github.com/raqueeb/datasets/raw/master/bangla-sentiment.neg\n" ], "execution_count": 0, "outputs": [ { "output_type": "stream", "text": [ "--2019-11-22 06:47:27-- https://github.com/raqueeb/datasets/raw/master/bangla-sentiment.pos\n", "Resolving github.com (github.com)... 52.74.223.119\n", "Connecting to github.com (github.com)|52.74.223.119|:443... connected.\n", "HTTP request sent, awaiting response... 302 Found\n", "Location: https://raw.githubusercontent.com/raqueeb/datasets/master/bangla-sentiment.pos [following]\n", "--2019-11-22 06:47:27-- https://raw.githubusercontent.com/raqueeb/datasets/master/bangla-sentiment.pos\n", "Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 151.101.0.133, 151.101.64.133, 151.101.128.133, ...\n", "Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|151.101.0.133|:443... connected.\n", "HTTP request sent, awaiting response... 200 OK\n", "Length: 220062 (215K) [text/plain]\n", "Saving to: ‘bangla-sentiment.pos’\n", "\n", "\rbangla-sentiment.po 0%[ ] 0 --.-KB/s \rbangla-sentiment.po 100%[===================>] 214.90K --.-KB/s in 0.003s \n", "\n", "2019-11-22 06:47:28 (79.9 MB/s) - ‘bangla-sentiment.pos’ saved [220062/220062]\n", "\n", "--2019-11-22 06:47:31-- https://github.com/raqueeb/datasets/raw/master/bangla-sentiment.neg\n", "Resolving github.com (github.com)... 52.74.223.119\n", "Connecting to github.com (github.com)|52.74.223.119|:443... connected.\n", "HTTP request sent, awaiting response... 302 Found\n", "Location: https://raw.githubusercontent.com/raqueeb/datasets/master/bangla-sentiment.neg [following]\n", "--2019-11-22 06:47:31-- https://raw.githubusercontent.com/raqueeb/datasets/master/bangla-sentiment.neg\n", "Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 151.101.0.133, 151.101.64.133, 151.101.128.133, ...\n", "Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|151.101.0.133|:443... connected.\n", "HTTP request sent, awaiting response... 200 OK\n", "Length: 363162 (355K) [text/plain]\n", "Saving to: ‘bangla-sentiment.neg’\n", "\n", "bangla-sentiment.ne 100%[===================>] 354.65K --.-KB/s in 0.006s \n", "\n", "2019-11-22 06:47:33 (53.6 MB/s) - ‘bangla-sentiment.neg’ saved [363162/363162]\n", "\n" ], "name": "stdout" } ] }, { "cell_type": "code", "metadata": { "id": "FE_7dSquwfvm", "colab_type": "code", "outputId": "cdeb4a01-0f4d-4a56-b822-e87945c937b3", "colab": { "base_uri": "https://localhost:8080/", "height": 105 } }, "source": [ "# দেখি এই ফাইলটাতে কি আচ্ছে প্রথম ৫ লাইনে?\n", "\n", "!head -5 bangla-sentiment.pos" ], "execution_count": 0, "outputs": [ { "output_type": "stream", "text": [ "বাংলাদেশের সবাই শান্তিতে আছে থাকবে\n", "ভারতে সব বাংলাদেশী বৈধ ভাবে থাকে\n", "ওদের দেশে সোনার অভাব নাই\n", "গ্রামীণফোন এর মত সু্বিধা পাই নি সবসময় অাছি গ্রামীণফোন এর সাথে ভালবাসি গ্রামীণফোন কে\n", "গ্রামীণফোন থেকে বিভিন্ন সময় বিভিন্ন অফার দেয়া হয়ে থাকে\n" ], "name": "stdout" } ] }, { "cell_type": "code", "metadata": { "id": "B1tqdoSXw0_0", "colab_type": "code", "outputId": "5ddfb1b7-2e47-4554-b8a7-46d9a7292e0e", "colab": { "base_uri": "https://localhost:8080/", "height": 105 } }, "source": [ "# বাকি ফাইলটাতে?\n", "\n", "!head -5 bangla-sentiment.neg" ], "execution_count": 0, "outputs": [ { "output_type": "stream", "text": [ "আর দোষিরা কোনদিন বিচার পাবে না\n", "সীমের এমন জটিল সমস্যায় রীতিমত হয়রানির শিকার\n", "নেটওয়ার্ক ভাল না আপনাদের\n", "আমার তো এখন নেটওয়ার্ক খুবই কম....\n", "কোন বিদ্বেষের কারনেই কাউকে খুন করার লাইসেন্স দেওয়া হয়না\n" ], "name": "stdout" } ] }, { "cell_type": "code", "metadata": { "id": "0SA-_IImNerZ", "colab_type": "code", "outputId": "5f19c81f-2db7-43ac-cfaf-de2a37874099", "colab": { "base_uri": "https://localhost:8080/", "height": 158 } }, "source": [ "# কি কি ডাউনলোড করলাম?\n", "\n", "!ls -al" ], "execution_count": 0, "outputs": [ { "output_type": "stream", "text": [ "total 2439068\n", "drwxr-xr-x 1 root root 4096 Nov 22 06:47 .\n", "drwxr-xr-x 1 root root 4096 Nov 22 06:45 ..\n", "-rw-r--r-- 1 root root 363162 Nov 22 06:47 bangla-sentiment.neg\n", "-rw-r--r-- 1 root root 220062 Nov 22 06:47 bangla-sentiment.pos\n", "-rw-r--r-- 1 root root 2496996336 Nov 14 09:58 bn-wiki-word2vec-300.txt\n", "drwxr-xr-x 1 root root 4096 Nov 20 16:17 .config\n", "drwxr-xr-x 1 root root 4096 Nov 15 16:31 sample_data\n" ], "name": "stdout" } ] }, { "cell_type": "markdown", "metadata": { "id": "QIviAyFUJkRi", "colab_type": "text" }, "source": [ "## দুটো ফাইলের মোট বাক্য কতো আছে?\n", "\n", "দুভাবে দেখতে পারি। সাধারণ পাইথন ফাইল অপারেশন।" ] }, { "cell_type": "code", "metadata": { "id": "m3su_fMhBIKR", "colab_type": "code", "colab": {} }, "source": [ "preprocessed_text_file_path = 'bangla-sentiment.pos'" ], "execution_count": 0, "outputs": [] }, { "cell_type": "code", "metadata": { "id": "8SEVl_jQBBEd", "colab_type": "code", "colab": {} }, "source": [ "lines_from_file = []\n", "with open(preprocessed_text_file_path, encoding='utf8') as text_file:\n", " for line in text_file:\n", " lines_from_file.append(line)" ], "execution_count": 0, "outputs": [] }, { "cell_type": "code", "metadata": { "id": "QUQpFBqFCALn", "colab_type": "code", "outputId": "91c8675b-2dd5-4e94-c5b2-1b11de19c96a", "colab": { "base_uri": "https://localhost:8080/", "height": 34 } }, "source": [ "# পজিটিভ ফাইলের লাইন সংখ্যা\n", "\n", "len(lines_from_file)" ], "execution_count": 0, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "2039" ] }, "metadata": { "tags": [] }, "execution_count": 15 } ] }, { "cell_type": "code", "metadata": { "id": "8bWPdVWfBUzD", "colab_type": "code", "colab": {} }, "source": [ "preprocessed_text_file_path = 'bangla-sentiment.neg'" ], "execution_count": 0, "outputs": [] }, { "cell_type": "code", "metadata": { "id": "yBKXVx4gBU9B", "colab_type": "code", "colab": {} }, "source": [ "lines_from_file = []\n", "with open(preprocessed_text_file_path, encoding='utf8') as text_file:\n", " for line in text_file:\n", " lines_from_file.append(line)" ], "execution_count": 0, "outputs": [] }, { "cell_type": "code", "metadata": { "id": "bLm0slDbCB_h", "colab_type": "code", "outputId": "53b15247-f710-418a-d81b-197c5c5ac730", "colab": { "base_uri": "https://localhost:8080/", "height": 34 } }, "source": [ "# নেগেটিভ ফাইলের লাইন সংখ্যা\n", "\n", "len(lines_from_file)" ], "execution_count": 0, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "2520" ] }, "metadata": { "tags": [] }, "execution_count": 18 } ] }, { "cell_type": "code", "metadata": { "id": "vSpOZEzgwf5E", "colab_type": "code", "colab": {} }, "source": [ "# সব এক জায়গায় নিয়ে আসি\n", "\n", "all_sentences = []\n", "with open('bangla-sentiment.pos', encoding='utf8') as f:\n", " all_sentences.extend([(line.strip(), 'positive') for line in f])\n", " \n", "with open('bangla-sentiment.neg', encoding='utf8') as f:\n", " all_sentences.extend([(line.strip(), 'negative') for line in f])" ], "execution_count": 0, "outputs": [] }, { "cell_type": "code", "metadata": { "id": "ifzqGZ88LD_K", "colab_type": "code", "outputId": "3e1fd54d-9205-4778-d30f-5cfb9d01f412", "colab": { "base_uri": "https://localhost:8080/", "height": 123 } }, "source": [ "# all_sentences এর প্রথম পাঁচ লাইন\n", "\n", "all_sentences[:5]" ], "execution_count": 0, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "[('বাংলাদেশের সবাই শান্তিতে আছে থাকবে', 'positive'),\n", " ('ভারতে সব বাংলাদেশী বৈধ ভাবে থাকে', 'positive'),\n", " ('ওদের দেশে সোনার অভাব নাই', 'positive'),\n", " ('গ্রামীণফোন এর মত সু্বিধা পাই নি সবসময় অাছি গ্রামীণফোন এর সাথে ভালবাসি গ্রামীণফোন কে',\n", " 'positive'),\n", " ('গ্রামীণফোন থেকে বিভিন্ন সময় বিভিন্ন অফার দেয়া হয়ে থাকে', 'positive')]" ] }, "metadata": { "tags": [] }, "execution_count": 20 } ] }, { "cell_type": "code", "metadata": { "colab_type": "code", "id": "4BNXFrkotAYu", "outputId": "3803c04b-1741-4e23-e8e8-49516963d176", "colab": { "base_uri": "https://localhost:8080/", "height": 52 } }, "source": [ "# কতগুলো পজিটিভ আর কতো নম্বর লাইন থেকে নেগেটিভ শুরু হয়েছে?\n", "\n", "pos_count = 0\n", "neg_count = 0\n", "for sentence, label in all_sentences:\n", " if label =='positive':\n", " pos_count +=1\n", " else:\n", " neg_count +=1\n", "print(pos_count)\n", "print(neg_count)" ], "execution_count": 0, "outputs": [ { "output_type": "stream", "text": [ "2039\n", "2520\n" ], "name": "stdout" } ] }, { "cell_type": "code", "metadata": { "id": "tpJczOxSLjgg", "colab_type": "code", "outputId": "7d989108-73db-4837-f575-b4f0c130968d", "colab": { "base_uri": "https://localhost:8080/", "height": 105 } }, "source": [ "# নেগেটিভ লাইন শুরুর প্রথম পাঁচ লাইনে কি আছে?\n", "\n", "all_sentences[2040:2045]" ], "execution_count": 0, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "[('সীমের এমন জটিল সমস্যায় রীতিমত হয়রানির শিকার', 'negative'),\n", " ('নেটওয়ার্ক ভাল না আপনাদের', 'negative'),\n", " ('আমার তো এখন নেটওয়ার্ক খুবই কম....', 'negative'),\n", " ('কোন বিদ্বেষের কারনেই কাউকে খুন করার লাইসেন্স দেওয়া হয়না', 'negative'),\n", " ('জিপি নেট চলছে না কেন', 'negative')]" ] }, "metadata": { "tags": [] }, "execution_count": 22 } ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "js75OARBF_B8" }, "source": [ "## প্রিট্রেইনড এমবেডিং এক্সপোর্টার স্ক্রিপ্ট ডাউনলোড" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "-uAicYA6vLsf" }, "source": [ "টেন্সর-ফ্লো এর একটা বড় সুবিধা হচ্ছে মডেলের মধ্যে লার্নিং ট্রান্সফার করা যায়। এই ক্লাস মডিউল (নিচে দেখুন) মডেলের গ্রাফের হার্ডডিস্কের সেভ করা একটা অংশ যাকে এক্সপোর্ট করা যায় আরেক জায়গায়।\n", "\n", "শুরুতেই বলেছিলাম আমরা টেন্সর-ফ্লো হাবের একটা প্রিট্রেইনড এমবেডিং এক্সপোর্টার স্ক্রিপ্ট ব্যবহার করবো - ওয়ার্ড এমবেডিং থেকে টেক্সট এমবেডিং মডিউল বের করতে সেটাকে পাঠিয়ে দেবো ক্লাসিফায়ার ট্রেইন করতে। ফাস্টটেক্সট দিয়ে আরেকটা উদাহরন (সেন্টিমেন্ট অ্যানালাইসিস নয়, টেক্সট ক্লাসিফিকেশন) দেয়া আছে নিচের লিঙ্কে। আমরা ফাস্টটেক্সট ব্যবহার করতে চাইলে শুধুমাত্র ফাস্টটেক্সট ভেক্টর ফাইলটা ব্যবহার করলেই হবে।\n", "\n", "আমাদের এক্সপোর্টার স্ক্রিপ্ট আছে https://github.com/tensorflow/hub/tree/master/examples/text_embeddings_v2, ডাউনলোড করে রাখি একই ডিরেক্টরিতে। \n", "\n", "একটা সেভড মডেলে কি থাকে? টেন্সর-ফ্লো এর দরকারী ডেটা সঙ্গে মডেলের ওয়েট এবং গ্রাফ যাতে মডেলটা আবার তৈরি করা যেতে পারে। এই সেভড মডেল থেকে নিয়ে আসবো ওয়ার্ড এমবেডিংগুলো। টেন্সর-ফ্লো হাবের কাজ হচ্ছে এই সেভড মডেলকে লোড করে মডিউল হিসেবে যাকে দরকার হবে [hub.KerasLayer](https://www.tensorflow.org/hub/api_docs/python/hub/KerasLayer) এ। সেকয়েন্সিয়াল লেয়ারে এই কেরাস লেয়ার ভালোই কাজ করছে।\n" ] }, { "cell_type": "code", "metadata": { "colab_type": "code", "id": "5DY5Ze6pO1G5", "outputId": "81f7e7b0-5a65-43cc-fe09-19210463d9ed", "colab": { "base_uri": "https://localhost:8080/", "height": 212 } }, "source": [ "!wget https://raw.githubusercontent.com/tensorflow/hub/master/examples/text_embeddings_v2/export_v2.py\n", "# !wget https://raw.githubusercontent.com/tensorflow/hub/master/examples/text_embeddings/export.py\n" ], "execution_count": 0, "outputs": [ { "output_type": "stream", "text": [ "--2019-11-22 06:47:54-- https://raw.githubusercontent.com/tensorflow/hub/master/examples/text_embeddings_v2/export_v2.py\n", "Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 151.101.0.133, 151.101.64.133, 151.101.128.133, ...\n", "Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|151.101.0.133|:443... connected.\n", "HTTP request sent, awaiting response... 200 OK\n", "Length: 7603 (7.4K) [text/plain]\n", "Saving to: ‘export_v2.py’\n", "\n", "\rexport_v2.py 0%[ ] 0 --.-KB/s \rexport_v2.py 100%[===================>] 7.42K --.-KB/s in 0s \n", "\n", "2019-11-22 06:47:54 (107 MB/s) - ‘export_v2.py’ saved [7603/7603]\n", "\n" ], "name": "stdout" } ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "PAzdNZaHmdl1" }, "source": [ "এক্সপোর্টার দিয়ে এমবেডিং ফাইল নেবার সময় ওয়ার্ড২ভেক বা ফাস্টটেক্সট এর হেডারটা অনেক বড় হয় বলে সেটাকে ফেলে দিতে পারি। বিশেষ করে লোকাল মেশিনে বা গুগল কোলাবে এটা একটা বাড়তি সমস্যা।" ] }, { "cell_type": "code", "metadata": { "colab_type": "code", "id": "Tkv5acr_Q9UU", "outputId": "84a24153-1e51-45c3-e20a-b6b8b134f646", "colab": { "base_uri": "https://localhost:8080/", "height": 1000 } }, "source": [ "!python export_v2.py --embedding_file=/content/bn-wiki-word2vec-300.txt --export_path=text_embedding --num_lines_to_ignore=1 \n", "# !python export.py --embedding_file=/content/bn-wiki-word2vec-300.txt --export_path=text_embedding --num_lines_to_ignore=1 --preprocess_text=True" ], "execution_count": 0, "outputs": [ { "output_type": "stream", "text": [ "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/dtypes.py:516: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", " _np_qint8 = np.dtype([(\"qint8\", np.int8, 1)])\n", "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/dtypes.py:517: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", " _np_quint8 = np.dtype([(\"quint8\", np.uint8, 1)])\n", "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/dtypes.py:518: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", " _np_qint16 = np.dtype([(\"qint16\", np.int16, 1)])\n", "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/dtypes.py:519: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", " _np_quint16 = np.dtype([(\"quint16\", np.uint16, 1)])\n", "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/dtypes.py:520: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", " _np_qint32 = np.dtype([(\"qint32\", np.int32, 1)])\n", "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/dtypes.py:525: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", " np_resource = np.dtype([(\"resource\", np.ubyte, 1)])\n", "/usr/local/lib/python3.6/dist-packages/tensorboard/compat/tensorflow_stub/dtypes.py:541: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", " _np_qint8 = np.dtype([(\"qint8\", np.int8, 1)])\n", "/usr/local/lib/python3.6/dist-packages/tensorboard/compat/tensorflow_stub/dtypes.py:542: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", " _np_quint8 = np.dtype([(\"quint8\", np.uint8, 1)])\n", "/usr/local/lib/python3.6/dist-packages/tensorboard/compat/tensorflow_stub/dtypes.py:543: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", " _np_qint16 = np.dtype([(\"qint16\", np.int16, 1)])\n", "/usr/local/lib/python3.6/dist-packages/tensorboard/compat/tensorflow_stub/dtypes.py:544: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", " _np_quint16 = np.dtype([(\"quint16\", np.uint16, 1)])\n", "/usr/local/lib/python3.6/dist-packages/tensorboard/compat/tensorflow_stub/dtypes.py:545: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", " _np_qint32 = np.dtype([(\"qint32\", np.int32, 1)])\n", "/usr/local/lib/python3.6/dist-packages/tensorboard/compat/tensorflow_stub/dtypes.py:550: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", " np_resource = np.dtype([(\"resource\", np.ubyte, 1)])\n", "tcmalloc: large alloc 1607057408 bytes == 0x89642000 @ 0x7fbe30e291e7 0x7fbe2d3c8f71 0x7fbe2d42c55d 0x7fbe2d42fe28 0x7fbe2d4303e5 0x7fbe2d4c6fc2 0x50abc5 0x50c549 0x509ce8 0x50aa1d 0x50c549 0x5081d5 0x509647 0x5951c1 0x54a11f 0x551761 0x5aa69c 0x50ab53 0x50c549 0x509ce8 0x50aa1d 0x50c549 0x509ce8 0x50aa1d 0x50c549 0x509ce8 0x50aa1d 0x50c549 0x5081d5 0x50a020 0x50aa1d\n", "2019-11-22 06:52:20.037253: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcuda.so.1\n", "2019-11-22 06:52:20.043509: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1006] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero\n", "2019-11-22 06:52:20.044094: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 0 with properties: \n", "name: Tesla T4 major: 7 minor: 5 memoryClockRate(GHz): 1.59\n", "pciBusID: 0000:00:04.0\n", "2019-11-22 06:52:20.044322: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcudart.so.10.0'; dlerror: libcudart.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64-nvidia\n", "2019-11-22 06:52:20.044424: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcublas.so.10.0'; dlerror: libcublas.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64-nvidia\n", "2019-11-22 06:52:20.044512: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcufft.so.10.0'; dlerror: libcufft.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64-nvidia\n", "2019-11-22 06:52:20.044594: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcurand.so.10.0'; dlerror: libcurand.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64-nvidia\n", "2019-11-22 06:52:20.044697: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcusolver.so.10.0'; dlerror: libcusolver.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64-nvidia\n", "2019-11-22 06:52:20.044784: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcusparse.so.10.0'; dlerror: libcusparse.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64-nvidia\n", "2019-11-22 06:52:20.058836: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7\n", "2019-11-22 06:52:20.058873: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1663] Cannot dlopen some GPU libraries. Skipping registering GPU devices...\n", "2019-11-22 06:52:20.059405: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA\n", "2019-11-22 06:52:20.187284: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1006] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero\n", "2019-11-22 06:52:20.188009: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x1f79f80 executing computations on platform CUDA. Devices:\n", "2019-11-22 06:52:20.188051: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (0): Tesla T4, Compute Capability 7.5\n", "2019-11-22 06:52:20.190321: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2200000000 Hz\n", "2019-11-22 06:52:20.190826: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x1f7af40 executing computations on platform Host. Devices:\n", "2019-11-22 06:52:20.190869: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (0): , \n", "2019-11-22 06:52:20.190971: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1181] Device interconnect StreamExecutor with strength 1 edge matrix:\n", "2019-11-22 06:52:20.190990: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1187] \n", "WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/lookup_ops.py:1159: add_dispatch_support..wrapper (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.\n", "Instructions for updating:\n", "Use tf.where in 2.0, which has the same broadcast rule as np.where\n", "W1122 06:52:20.825026 140454845077376 deprecation.py:323] From /usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/lookup_ops.py:1159: add_dispatch_support..wrapper (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.\n", "Instructions for updating:\n", "Use tf.where in 2.0, which has the same broadcast rule as np.where\n", "INFO:tensorflow:Assets written to: text_embedding/assets\n", "I1122 06:52:27.513698 140454845077376 builder_impl.py:770] Assets written to: text_embedding/assets\n" ], "name": "stdout" } ] }, { "cell_type": "code", "metadata": { "id": "pDGGOY0EQ3XQ", "colab_type": "code", "outputId": "b3fcfe50-522b-42a2-a01b-b36dd83f10f5", "colab": { "base_uri": "https://localhost:8080/", "height": 658 } }, "source": [ "# সেভড মডেলের একটা কমান্ড লাইন ইন্টারফেস আছে দেখার জন্য, এখনো কিছু আসেনি এখানে\n", "\n", "!saved_model_cli show --dir text_embedding --all" ], "execution_count": 0, "outputs": [ { "output_type": "stream", "text": [ "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/dtypes.py:516: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", " _np_qint8 = np.dtype([(\"qint8\", np.int8, 1)])\n", "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/dtypes.py:517: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", " _np_quint8 = np.dtype([(\"quint8\", np.uint8, 1)])\n", "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/dtypes.py:518: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", " _np_qint16 = np.dtype([(\"qint16\", np.int16, 1)])\n", "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/dtypes.py:519: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", " _np_quint16 = np.dtype([(\"quint16\", np.uint16, 1)])\n", "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/dtypes.py:520: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", " _np_qint32 = np.dtype([(\"qint32\", np.int32, 1)])\n", "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/dtypes.py:525: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", " np_resource = np.dtype([(\"resource\", np.ubyte, 1)])\n", "/usr/local/lib/python3.6/dist-packages/tensorboard/compat/tensorflow_stub/dtypes.py:541: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", " _np_qint8 = np.dtype([(\"qint8\", np.int8, 1)])\n", "/usr/local/lib/python3.6/dist-packages/tensorboard/compat/tensorflow_stub/dtypes.py:542: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", " _np_quint8 = np.dtype([(\"quint8\", np.uint8, 1)])\n", "/usr/local/lib/python3.6/dist-packages/tensorboard/compat/tensorflow_stub/dtypes.py:543: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", " _np_qint16 = np.dtype([(\"qint16\", np.int16, 1)])\n", "/usr/local/lib/python3.6/dist-packages/tensorboard/compat/tensorflow_stub/dtypes.py:544: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", " _np_quint16 = np.dtype([(\"quint16\", np.uint16, 1)])\n", "/usr/local/lib/python3.6/dist-packages/tensorboard/compat/tensorflow_stub/dtypes.py:545: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", " _np_qint32 = np.dtype([(\"qint32\", np.int32, 1)])\n", "/usr/local/lib/python3.6/dist-packages/tensorboard/compat/tensorflow_stub/dtypes.py:550: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", " np_resource = np.dtype([(\"resource\", np.ubyte, 1)])\n", "\n", "MetaGraphDef with tag-set: 'serve' contains the following SignatureDefs:\n", "\n", "signature_def['__saved_model_init_op']:\n", " The given SavedModel SignatureDef contains the following input(s):\n", " The given SavedModel SignatureDef contains the following output(s):\n", " outputs['__saved_model_init_op'] tensor_info:\n", " dtype: DT_INVALID\n", " shape: unknown_rank\n", " name: NoOp\n", " Method name is: \n" ], "name": "stdout" } ] }, { "cell_type": "markdown", "metadata": { "id": "iiRXv5VexLWN", "colab_type": "text" }, "source": [ "hub.KerasLayer ব্যবহার হচ্ছে, তবে আমাদের এমবেডিং মডিউলে trainable=False সেট করা হয়েছে যাতে এমবেডিং ওয়েটগুলো আপডেট না হয় ট্রেনিং এর সময়। তবে আমরা দুটোই টেস্ট করবো। " ] }, { "cell_type": "code", "metadata": { "colab_type": "code", "id": "k9WEpmedF_3_", "colab": {} }, "source": [ "# এই মডিউলটা ফ্রিজ করা আছে, মনে আছে ট্রান্সফার লার্নিং এর কথা?\n", "# পাশাপাশি hub.KerasLayer এর আর্গুমেন্টগুলো দেখুন \n", "# __init__(\n", "# spec,\n", "# trainable=False,\n", "# name='module',\n", "# tags=None\n", "#)\n", "\n", "embedding_path = \"text_embedding\"\n", "embedding_layer = hub.KerasLayer(embedding_path, trainable=True)\n", "# embedding_layer = hub.KerasLayer(embedding_path, trainable=False)" ], "execution_count": 0, "outputs": [] }, { "cell_type": "code", "metadata": { "id": "p6ZuhfQWPFzv", "colab_type": "code", "outputId": "85a12b65-cbe1-46de-dd27-a52574de618f", "colab": { "base_uri": "https://localhost:8080/", "height": 34 } }, "source": [ "print(embedding_layer)" ], "execution_count": 0, "outputs": [ { "output_type": "stream", "text": [ "\n" ], "name": "stdout" } ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "fQHbmS_D4YIo" }, "source": [ "## বাংলা শব্দকে embedding_layerয়ে পাঠিয়ে দিয়ে দেখি\n", "\n", "বাংলা শব্দগুলোকে কিভাবে পাঠাবো এই নতুন মডিউলে? নিচের উদাহরন দেখুন। বাক্যের মধ্যে শব্দগুলোকে ভাগ করছে স্পেস দেখে। একটা বাক্যের ব্যাচ করে এক ডাইমেনশনের টেন্সর দেখাচ্ছে আমাদের shape এট্রিবিউটে। এখানে বাক্য আর শব্দ এমবেডিং নিয়ে একটা চিন্তা আছে তবে সেটা আসবে নিচের উদাহরন থেকে।\n", "\n", "```\n", "tf.nn.embedding_lookup_sparse(\n", " params,\n", " sp_ids,\n", " sp_weights,\n", " combiner=None,\n", " max_norm=None,\n", " name=None\n", ")\n", "```\n", "এর মানে হচ্ছে embedding_layer ইনপুট হিসেবে বাংলা শব্দ নিয়ে এমবেডিং বের করে দিচ্ছে ঠিকমতো। " ] }, { "cell_type": "code", "metadata": { "colab_type": "code", "id": "Z1MBnaBUihWn", "outputId": "fd9a382a-a181-453a-bba8-e416a3fe40bc", "colab": { "base_uri": "https://localhost:8080/", "height": 34 } }, "source": [ "embedding_layer(['ভালো আছি'], ['আমরা']).shape" ], "execution_count": 0, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "TensorShape([1, 300])" ] }, "metadata": { "tags": [] }, "execution_count": 28 } ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "4KY8LiFOHmcd" }, "source": [ "# টেন্সর-ফ্লো এর জন্য তৈরি করি ডেটাসেট \n" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "pNguCDNe6bvz" }, "source": [ "আপনার মনে আছে, আমাদের all_sentences এর মধ্যে প্রথম দিকে পজিটিভ আর শেষের দিকে নেগেটিভ সেন্টিমেন্টের বাক্য ছিলো। এখন এই ডেটা দিয়ে ট্রেনিং করালে ভারসাম্য থাকবে না। তাই শুরুতে দৈবচয়নের মাধ্যমে শাফল করে নেই। আমরা কাজ করবো একটা জেনারেটর নিয়ে। " ] }, { "cell_type": "markdown", "metadata": { "id": "gdiu6WojE_QI", "colab_type": "text" }, "source": [ "জেনারেটর দিয়ে ডেটাসেট তৈরিতে শুরুতে আমরা একটা জেনারেটর ফাংশন বানিয়ে দিলে সেটা প্রতিটা বাক্য এবং তার করেসপন্ডিং লেবেল থেকে একটা পুরো এক্সাম্পল (ডেটা + লেবেল) তৈরি করে দেবে। এরপর সেটাকে tf.data.Dataset.from_generator পাঠালে সেটার কি ধরনের আউটপুট চাই সেটা বললে হবে। জেনারেটরের একটা উদাহরণ দেখি। generator বানিয়ে সেটা পাঠাচ্ছি from_generator এর মধ্যে। \n", "\n", "\n", "```\n", "@staticmethod\n", "from_generator(\n", " generator,\n", " output_types,\n", " output_shapes=None,\n", " args=None\n", ")\n", "```\n", "এর ব্যবহার?\n", "\n", "```\n", "import itertools\n", "tf.compat.v1.enable_eager_execution()\n", "\n", "def gen():\n", " for i in itertools.count(1):\n", " yield (i, [1] * i)\n", "\n", "ds = tf.data.Dataset.from_generator(\n", " gen, (tf.int64, tf.int64), (tf.TensorShape([]), tf.TensorShape([None])))\n", "\n", "for value in ds.take(2):\n", " print value\n", "# (1, array([1]))\n", "# (2, array([1, 1]))\n", "```\n", "\n", "\n", "\n" ] }, { "cell_type": "code", "metadata": { "colab_type": "code", "id": "eZRGTzEhUi7Q", "colab": {} }, "source": [ "import random\n", "\n", "def generator():\n", " random.shuffle(all_sentences) \n", " for sentence, label in all_sentences:\n", " if label =='positive':\n", " label = tf.keras.utils.to_categorical(1, num_classes=2)\n", " else:\n", " label = tf.keras.utils.to_categorical(0, num_classes=2)\n", " sentence_tensor = tf.constant(sentence, dtype=tf.dtypes.string)\n", " yield sentence_tensor, label" ], "execution_count": 0, "outputs": [] }, { "cell_type": "markdown", "metadata": { "id": "A7zmgGSiIte0", "colab_type": "text" }, "source": [ "প্রতিটা এক্সাম্পল এখানে একটা বাক্যের টুপল যা dtype=tf.dtypes.string এবং লেবেলটা হচ্ছে ওয়ান হট এনকোডেড। ডেটাসেট তৈরিতে একটা ট্রেনিং এবং ভ্যালিডেশন সেট লাগবে। কিভাবে করা যায়?\n", "```\n", "train_data = data.take(train_size)\n", "validation_data = data.skip(train_size)\n", "```\n", "\n" ] }, { "cell_type": "code", "metadata": { "colab_type": "code", "id": "2g4nRflB7fbF", "colab": {} }, "source": [ "def make_dataset(train_size):\n", " data = tf.data.Dataset.from_generator(generator=generator, \n", " output_types=(tf.string, tf.float32))\n", " train_size = 4000\n", " train_data = data.take(train_size)\n", " validation_data = data.skip(train_size)\n", " return train_data, validation_data" ], "execution_count": 0, "outputs": [] }, { "cell_type": "code", "metadata": { "colab_type": "code", "id": "8PuuN6el8tv9", "outputId": "eed3c3ce-fef0-4ac8-ebef-fb222019f432", "colab": { "base_uri": "https://localhost:8080/", "height": 498 } }, "source": [ "# ৮০-২০% ভাগ করে ডেটাসেট তৈরি\n", "\n", "train_data, validation_data = make_dataset(0.80)" ], "execution_count": 0, "outputs": [ { "output_type": "stream", "text": [ "WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow/python/data/ops/dataset_ops.py:505: py_func (from tensorflow.python.ops.script_ops) is deprecated and will be removed in a future version.\n", "Instructions for updating:\n", "tf.py_func is deprecated in TF V2. Instead, there are two\n", " options available in V2.\n", " - tf.py_function takes a python function which manipulates tf eager\n", " tensors instead of numpy arrays. It's easy to convert a tf eager tensor to\n", " an ndarray (just call tensor.numpy()) but having access to eager tensors\n", " means `tf.py_function`s can use accelerators such as GPUs as well as\n", " being differentiable using a gradient tape.\n", " - tf.numpy_function maintains the semantics of the deprecated tf.py_func\n", " (it is not differentiable, and manipulates numpy arrays). It drops the\n", " stateful argument making all functions stateful.\n", " \n" ], "name": "stdout" }, { "output_type": "stream", "text": [ "WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow/python/data/ops/dataset_ops.py:505: py_func (from tensorflow.python.ops.script_ops) is deprecated and will be removed in a future version.\n", "Instructions for updating:\n", "tf.py_func is deprecated in TF V2. Instead, there are two\n", " options available in V2.\n", " - tf.py_function takes a python function which manipulates tf eager\n", " tensors instead of numpy arrays. It's easy to convert a tf eager tensor to\n", " an ndarray (just call tensor.numpy()) but having access to eager tensors\n", " means `tf.py_function`s can use accelerators such as GPUs as well as\n", " being differentiable using a gradient tape.\n", " - tf.numpy_function maintains the semantics of the deprecated tf.py_func\n", " (it is not differentiable, and manipulates numpy arrays). It drops the\n", " stateful argument making all functions stateful.\n", " \n" ], "name": "stderr" } ] }, { "cell_type": "code", "metadata": { "id": "G0CyNOl1yajF", "colab_type": "code", "outputId": "e344b2c8-d104-439d-9e1b-be011b21a194", "colab": { "base_uri": "https://localhost:8080/", "height": 143 } }, "source": [ "# একটা ব্যাচ দেখি, যেখানে ২টা এলিমেন্ট থাকবে train_data থেকে \n", "# এরকম বেশ কয়েকটা উদাহরন দেখি নিচে\n", "\n", "next(iter(train_data.batch(2)))" ], "execution_count": 0, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "(, )" ] }, "metadata": { "tags": [] }, "execution_count": 32 } ] }, { "cell_type": "code", "metadata": { "id": "5cRCljsMCDmP", "colab_type": "code", "colab": {} }, "source": [ "sentences_in_a_single_batch, labels_in_a_single_batch = next(iter(train_data.batch(2)))" ], "execution_count": 0, "outputs": [] }, { "cell_type": "code", "metadata": { "id": "s9yZYRItCNWR", "colab_type": "code", "outputId": "80346bff-7dea-477d-e2fb-eda4604d6361", "colab": { "base_uri": "https://localhost:8080/", "height": 107 } }, "source": [ "sentences_in_a_single_batch" ], "execution_count": 0, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "" ] }, "metadata": { "tags": [] }, "execution_count": 34 } ] }, { "cell_type": "code", "metadata": { "id": "gU2HDStGCbfi", "colab_type": "code", "outputId": "f9596900-0e5e-4ebd-8e53-9737bfecb84c", "colab": { "base_uri": "https://localhost:8080/", "height": 34 } }, "source": [ "sentences_in_a_single_batch.shape" ], "execution_count": 0, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "TensorShape([2])" ] }, "metadata": { "tags": [] }, "execution_count": 35 } ] }, { "cell_type": "code", "metadata": { "id": "igZFiVMqChDo", "colab_type": "code", "outputId": "06cc49e9-539a-4de6-d2c1-3f701742e855", "colab": { "base_uri": "https://localhost:8080/", "height": 34 } }, "source": [ "labels_in_a_single_batch.shape" ], "execution_count": 0, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "TensorShape([2, 2])" ] }, "metadata": { "tags": [] }, "execution_count": 36 } ] }, { "cell_type": "code", "metadata": { "id": "zFcaGNrIzABb", "colab_type": "code", "colab": {} }, "source": [ "sentence, label = next(iter(train_data.take(1)))" ], "execution_count": 0, "outputs": [] }, { "cell_type": "code", "metadata": { "id": "eu12m7AS2YVC", "colab_type": "code", "outputId": "bf98e9ce-f6a3-4f72-c104-96b0d89c9019", "colab": { "base_uri": "https://localhost:8080/", "height": 34 } }, "source": [ "# numpy()কে ডিকোড করতে হবে ইউনিকোডে, তা না হলে স্ট্রিংকে বাইট হিসেবে পাঠাবে\n", "\n", "sentence.numpy().decode('utf8')" ], "execution_count": 0, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "'তুমি বাংলার গৌরব তুমি বাংলার প্রতিটা মানুষের গর্ভ তোমায় হাজার সালাম'" ] }, "metadata": { "tags": [] }, "execution_count": 38 } ] }, { "cell_type": "code", "metadata": { "id": "q7KcfmLC2ceA", "colab_type": "code", "outputId": "76749c43-8030-46b0-bce3-029c553b65d5", "colab": { "base_uri": "https://localhost:8080/", "height": 34 } }, "source": [ "# to_categorical() এর কনভার্সনের পর লেবেল\n", "label.numpy() " ], "execution_count": 0, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "array([0., 1.], dtype=float32)" ] }, "metadata": { "tags": [] }, "execution_count": 39 } ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "MrdZI6FqPJNP" }, "source": [ "## মডেল ট্রেনিং " ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "jgr7YScGVS58" }, "source": [ "এই মডেল আগেও তৈরি করেছি আমরা। এখানে \n", "এমবেডিং লেয়ারকে ঢুকিয়ে দিয়েছি শুরুতেই। \n", "```\n", "model.add(embedding_layer)\n", "```\n", "tf.data থেকে স্যাম্পলকে ব্যাচ করে পাঠানো হবে মডেলে।\n", "LSTM নিয়ে কাজ করবো সামনে।" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "WhCqbDK2uUV5" }, "source": [ "## ডেন্স লেয়ার দিয়ে মডেল\n", "\n", "LSTM দিয়ে কাজ করানোর চেষ্টা চলছে।" ] }, { "cell_type": "code", "metadata": { "colab_type": "code", "id": "nHUw807XPPM9", "colab": {} }, "source": [ "def create_model():\n", " model = tf.keras.Sequential()\n", " model.add(embedding_layer)\n", " # model.add(tf.keras.layers.Flatten())\n", " # model.add(tf.keras.layers.SpatialDropout1D(0.2))\n", " # model.add(tf.keras.layers.LSTM(100, dropout=0.2, recurrent_dropout=0.2))\n", " # model.add(Dense(13, activation='softmax'))\n", " model.add(tf.keras.layers.Dense(256, activation=\"relu\"))\n", " model.add(tf.keras.layers.Dense(128, activation=\"relu\"))\n", " model.add(tf.keras.layers.Dense(2, activation=\"softmax\"))\n", " model.compile(optimizer=\"adam\",loss=\"categorical_crossentropy\",metrics=['acc'])\n", " return model" ], "execution_count": 0, "outputs": [] }, { "cell_type": "code", "metadata": { "id": "CUyeg6WhhliR", "colab_type": "code", "colab": {} }, "source": [ "from tensorflow.keras.callbacks import TensorBoard\n", "log_dir=\"logs/fit/\"\n", "tensorboard_callback = tf.keras.callbacks.TensorBoard(log_dir=log_dir, histogram_freq=1)\n" ], "execution_count": 0, "outputs": [] }, { "cell_type": "code", "metadata": { "colab_type": "code", "id": "5J4EXJUmPVNG", "colab": {} }, "source": [ "model = create_model()" ], "execution_count": 0, "outputs": [] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "ZZ7XJLg2u2No" }, "source": [ "## ট্রেনিং ১০ ইপক দিয়ে" ] }, { "cell_type": "code", "metadata": { "colab_type": "code", "id": "OoBkN2tAaXWD", "outputId": "1797d491-f4bb-46e3-a226-fd7fb5fccb49", "colab": { "base_uri": "https://localhost:8080/", "height": 498 } }, "source": [ "batch_size = 256\n", "history = model.fit(train_data.batch(batch_size), \n", " validation_data=validation_data.batch(batch_size), \n", " epochs=10,callbacks=[tensorboard_callback])" ], "execution_count": 0, "outputs": [ { "output_type": "stream", "text": [ "Epoch 1/10\n", "WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/math_grad.py:1250: add_dispatch_support..wrapper (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.\n", "Instructions for updating:\n", "Use tf.where in 2.0, which has the same broadcast rule as np.where\n" ], "name": "stdout" }, { "output_type": "stream", "text": [ "WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/math_grad.py:1250: add_dispatch_support..wrapper (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.\n", "Instructions for updating:\n", "Use tf.where in 2.0, which has the same broadcast rule as np.where\n" ], "name": "stderr" }, { "output_type": "stream", "text": [ "16/16 [==============================] - 96s 6s/step - loss: 0.5697 - acc: 0.6261 - val_loss: 0.0000e+00 - val_acc: 0.0000e+00\n", "Epoch 2/10\n", "16/16 [==============================] - 89s 6s/step - loss: 0.3639 - acc: 0.8343 - val_loss: 0.2472 - val_acc: 0.8962\n", "Epoch 3/10\n", "16/16 [==============================] - 90s 6s/step - loss: 0.2856 - acc: 0.8787 - val_loss: 0.2356 - val_acc: 0.8980\n", "Epoch 4/10\n", "16/16 [==============================] - 90s 6s/step - loss: 0.2314 - acc: 0.9159 - val_loss: 0.1696 - val_acc: 0.9374\n", "Epoch 5/10\n", "16/16 [==============================] - 91s 6s/step - loss: 0.1828 - acc: 0.9320 - val_loss: 0.1381 - val_acc: 0.9571\n", "Epoch 6/10\n", "16/16 [==============================] - 90s 6s/step - loss: 0.1387 - acc: 0.9537 - val_loss: 0.1194 - val_acc: 0.9642\n", "Epoch 7/10\n", "16/16 [==============================] - 91s 6s/step - loss: 0.1040 - acc: 0.9679 - val_loss: 0.0681 - val_acc: 0.9803\n", "Epoch 8/10\n", "16/16 [==============================] - 91s 6s/step - loss: 0.0769 - acc: 0.9737 - val_loss: 0.0534 - val_acc: 0.9911\n", "Epoch 9/10\n", "16/16 [==============================] - 90s 6s/step - loss: 0.0503 - acc: 0.9885 - val_loss: 0.0484 - val_acc: 0.9893\n", "Epoch 10/10\n", "16/16 [==============================] - 89s 6s/step - loss: 0.0345 - acc: 0.9930 - val_loss: 0.0205 - val_acc: 0.9928\n" ], "name": "stdout" } ] }, { "cell_type": "code", "metadata": { "id": "lrR7SmiTk9t3", "colab_type": "code", "outputId": "26a59cf5-4d7b-4a5c-8d3a-07d28194bc2f", "colab": { "base_uri": "https://localhost:8080/", "height": 301 } }, "source": [ "model.summary()" ], "execution_count": 0, "outputs": [ { "output_type": "stream", "text": [ "Model: \"sequential_1\"\n", "_________________________________________________________________\n", "Layer (type) Output Shape Param # \n", "=================================================================\n", "keras_layer_1 (KerasLayer) multiple 200881800 \n", "_________________________________________________________________\n", "dense_2 (Dense) multiple 77056 \n", "_________________________________________________________________\n", "dense_3 (Dense) multiple 32896 \n", "_________________________________________________________________\n", "dense_4 (Dense) multiple 258 \n", "=================================================================\n", "Total params: 200,992,010\n", "Trainable params: 200,992,010\n", "Non-trainable params: 0\n", "_________________________________________________________________\n" ], "name": "stdout" } ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "9DeGZFXsJt5g" }, "source": [ "## মডেলকে সেভ করে রাখি ভবিষ্যত কাজে" ] }, { "cell_type": "code", "metadata": { "colab_type": "code", "id": "rIO_CseWJtJP", "outputId": "40eb3c3e-3108-48e2-8e90-369bd7d244f5", "colab": { "base_uri": "https://localhost:8080/", "height": 52 } }, "source": [ "tf.saved_model.save(model, export_dir=\"my_model\")" ], "execution_count": 0, "outputs": [ { "output_type": "stream", "text": [ "INFO:tensorflow:Assets written to: my_model/assets\n" ], "name": "stdout" }, { "output_type": "stream", "text": [ "INFO:tensorflow:Assets written to: my_model/assets\n" ], "name": "stderr" } ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "D54IXLqcG8Cq" }, "source": [ "## প্রেডিকশন\n", "\n", "দেখুন আমাদের প্রেডিক্ট মেথড কি বের করে নিয়ে আসে? চেষ্টা করুন নতুন নতুন শব্দ দিয়ে। নিচের sents এর মধ্যে আপনার পছন্দের বাক্যটা লিখে চেষ্টা করুন। ১ হচ্ছে পজিটিভ ০ হচ্ছে নেগেটিভ।" ] }, { "cell_type": "code", "metadata": { "id": "ISbX3GzPoth8", "colab_type": "code", "outputId": "3992be0a-2bc6-4021-aafe-973fa453bd6c", "colab": { "base_uri": "https://localhost:8080/", "height": 212 } }, "source": [ "sents = ['আমারা খুবি খুশি অফারটির জন্য', 'বই পড়তে অনেক পছন্দ করি', 'আজকের ঘটনা আমাকে মনে কষ্ট দিয়েছে', 'কাজটা খুব খারাপ হয়েছে', \n", " 'আমি দেশকে খুব ভালবাসি', 'এই বইটা বেশ ভালো লাগছে', 'একটা দুর্ঘটনা ঘটে গেল',\n", " 'আজকে একটা অসাধারণ অভিজ্ঞতা হলো', 'আমাদের কাজ করতে বেশ কষ্ট হয়', 'বিদ্যুতের ঘাটতি হলে কারখানার কাজ কমে যায়',\n", " 'ঢাকা-সিলেটসহ আশপাশের সড়কের যানবাহন চলাচল বন্ধ হয়ে যায়',]\n", "pred_dataset = tf.data.Dataset.from_tensor_slices(sents)\n", "prediction = model.predict(np.array(sents))\n", "\n", "for sentence, pred_sentiment in zip(sents, prediction.argmax(axis=1)):\n", " print(\"Sentence:{} - predicted: {}\".format(sentence, pred_sentiment))" ], "execution_count": 0, "outputs": [ { "output_type": "stream", "text": [ "Sentence:আমারা খুবি খুশি অফারটির জন্য - predicted: 1\n", "Sentence:বই পড়তে অনেক পছন্দ করি - predicted: 1\n", "Sentence:আজকের ঘটনা আমাকে মনে কষ্ট দিয়েছে - predicted: 0\n", "Sentence:কাজটা খুব খারাপ হয়েছে - predicted: 0\n", "Sentence:আমি দেশকে খুব ভালবাসি - predicted: 1\n", "Sentence:এই বইটা বেশ ভালো লাগছে - predicted: 1\n", "Sentence:একটা দুর্ঘটনা ঘটে গেল - predicted: 0\n", "Sentence:আজকে একটা অসাধারণ অভিজ্ঞতা হলো - predicted: 1\n", "Sentence:আমাদের কাজ করতে বেশ কষ্ট হয় - predicted: 0\n", "Sentence:বিদ্যুতের ঘাটতি হলে কারখানার কাজ কমে যায় - predicted: 0\n", "Sentence:ঢাকা-সিলেটসহ আশপাশের সড়কের যানবাহন চলাচল বন্ধ হয়ে যায় - predicted: 0\n" ], "name": "stdout" } ] }, { "cell_type": "markdown", "metadata": { "id": "b07jbrPw-1-_", "colab_type": "text" }, "source": [ "## চালু করি আমাদের টেন্সরবোর্ডকে" ] }, { "cell_type": "code", "metadata": { "id": "CvY_M886_DOK", "colab_type": "code", "colab": {} }, "source": [ "%reload_ext tensorboard\n", "%tensorboard --logdir logs/fit" ], "execution_count": 0, "outputs": [] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "p5e9m3bV6oXK" }, "source": [ "এই নোটবুক তৈরিতে অনেকগুলো নোটবুক থেকে ধারণা নেয়া হয়েছে এখানে। তবে, নিচের তিনটা নোটবুক দেখতে পারেন। এই নোটবুকগুলো টেক্সট ক্লাসিফিকেশন নিয়ে কাজ করলেও এর পেছনের আন্ডারলাইনড কাজ প্রায় কাছাকাছি। আপনারা শেখার জন্য নোটবুকগুলোকে বুকমার্ক করে রাখতে পারেন আপনার পছন্দমতো কাজ করতে। \n", "\n", "১. https://github.com/tensorflow/hub/blob/master/examples/colab/bangla_article_classifier.ipynb\n", "\n", "২. https://github.com/rezacsedu/BengFastText/blob/master/SentimentAnalysis_Multichannel_CNN_LSTM/Multichannel_CNN_Bengali_Sentiment.ipynb\n", "\n", "৩. https://github.com/tanvirfahim15/BARD-Bangla-Article-Classifier/\n", "\n" ] } ] }