{ "cells": [ { "cell_type": "markdown", "metadata": { "id": "dLgke1AI8ZQo" }, "source": [ "
به نام خدا
\n", "\"class.vision\"\n", "

تخمین قیمت ارزهای دیجیتال

" ] }, { "cell_type": "markdown", "metadata": { "id": "bcgQMhIg8ZQp" }, "source": [ "\n", "##
مجموعه داده
\n", "\n", "
مجموعه داده را می‌توانید از مسیر زیر دانلود کنید
\n", "\n", "https://github.com/Alireza-Akhavan/datasets_and_models/raw/main/crypto_data.zip" ] }, { "cell_type": "code", "source": [ "!mkdir crypto_data\n", "!wget https://raw.githubusercontent.com/Alireza-Akhavan/datasets_and_models/main/crypto_data.zip\n", "!unzip crypto_data.zip -d crypto_data" ], "metadata": { "id": "qc0xZ1LC8lx2", "outputId": "361cb9ba-6791-4b6e-f04c-03c495da9b2d", "colab": { "base_uri": "https://localhost:8080/" } }, "execution_count": 1, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "--2025-07-17 15:13:32-- https://raw.githubusercontent.com/Alireza-Akhavan/datasets_and_models/main/crypto_data.zip\n", "Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...\n", "Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.\n", "HTTP request sent, awaiting response... 200 OK\n", "Length: 5819754 (5.5M) [application/zip]\n", "Saving to: ‘crypto_data.zip’\n", "\n", "crypto_data.zip 100%[===================>] 5.55M --.-KB/s in 0.06s \n", "\n", "2025-07-17 15:13:32 (97.7 MB/s) - ‘crypto_data.zip’ saved [5819754/5819754]\n", "\n", "Archive: crypto_data.zip\n", " inflating: crypto_data/BCH-USD.csv \n", " inflating: crypto_data/BTC-USD.csv \n", " inflating: crypto_data/ETH-USD.csv \n", " inflating: crypto_data/LTC-USD.csv \n" ] } ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "id": "x4CU4eIi8ZQp", "outputId": "8dd92fc8-c132-41f5-a62d-35142758c257", "colab": { "base_uri": "https://localhost:8080/", "height": 206 } }, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ " time low high open close volume\n", "0 1528968660 96.580002 96.589996 96.589996 96.580002 9.647200\n", "1 1528968720 96.449997 96.669998 96.589996 96.660004 314.387024\n", "2 1528968780 96.470001 96.570000 96.570000 96.570000 77.129799\n", "3 1528968840 96.449997 96.570000 96.570000 96.500000 7.216067\n", "4 1528968900 96.279999 96.540001 96.500000 96.389999 524.539978" ], "text/html": [ "\n", "
\n", "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
timelowhighopenclosevolume
0152896866096.58000296.58999696.58999696.5800029.647200
1152896872096.44999796.66999896.58999696.660004314.387024
2152896878096.47000196.57000096.57000096.57000077.129799
3152896884096.44999796.57000096.57000096.5000007.216067
4152896890096.27999996.54000196.50000096.389999524.539978
\n", "
\n", "
\n", "\n", "
\n", " \n", "\n", " \n", "\n", " \n", "
\n", "\n", "\n", "
\n", " \n", "\n", "\n", "\n", " \n", "
\n", "
\n", "
\n" ], "application/vnd.google.colaboratory.intrinsic+json": { "type": "dataframe", "variable_name": "df" } }, "metadata": {}, "execution_count": 2 } ], "source": [ "import pandas as pd\n", "\n", "df = pd.read_csv(\"crypto_data/LTC-USD.csv\", names=['time', 'low', 'high', 'open', 'close', 'volume'])\n", "\n", "df.head()" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "id": "LZ9fSp908ZQq", "outputId": "a666ce43-369e-4ace-95c1-bec0a20e8386", "colab": { "base_uri": "https://localhost:8080/", "height": 361 } }, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "BTC-USD\n", "LTC-USD\n", "BCH-USD\n", "ETH-USD\n" ] }, { "output_type": "stream", "name": "stderr", "text": [ "/tmp/ipython-input-3-119421751.py:20: FutureWarning: DataFrame.fillna with 'method' is deprecated and will raise in a future version. Use obj.ffill() or obj.bfill() instead.\n", " main_df.fillna(method=\"ffill\", inplace=True) # if there are gaps in data, use previously known values\n" ] }, { "output_type": "execute_result", "data": { "text/plain": [ " BTC-USD_close BTC-USD_volume LTC-USD_close LTC-USD_volume \\\n", "time \n", "1528968720 6487.379883 7.706374 96.660004 314.387024 \n", "1528968780 6479.410156 3.088252 96.570000 77.129799 \n", "1528968840 6479.410156 1.404100 96.500000 7.216067 \n", "1528968900 6479.979980 0.753000 96.389999 524.539978 \n", "1528968960 6480.000000 1.490900 96.519997 16.991997 \n", "\n", " BCH-USD_close BCH-USD_volume ETH-USD_close ETH-USD_volume \n", "time \n", "1528968720 870.859985 26.856577 486.01001 26.019083 \n", "1528968780 870.099976 1.124300 486.00000 8.449400 \n", "1528968840 870.789978 1.749862 485.75000 26.994646 \n", "1528968900 870.000000 1.680500 486.00000 77.355759 \n", "1528968960 869.989990 1.669014 486.00000 7.503300 " ], "text/html": [ "\n", "
\n", "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
BTC-USD_closeBTC-USD_volumeLTC-USD_closeLTC-USD_volumeBCH-USD_closeBCH-USD_volumeETH-USD_closeETH-USD_volume
time
15289687206487.3798837.70637496.660004314.387024870.85998526.856577486.0100126.019083
15289687806479.4101563.08825296.57000077.129799870.0999761.124300486.000008.449400
15289688406479.4101561.40410096.5000007.216067870.7899781.749862485.7500026.994646
15289689006479.9799800.75300096.389999524.539978870.0000001.680500486.0000077.355759
15289689606480.0000001.49090096.51999716.991997869.9899901.669014486.000007.503300
\n", "
\n", "
\n", "\n", "
\n", " \n", "\n", " \n", "\n", " \n", "
\n", "\n", "\n", "
\n", " \n", "\n", "\n", "\n", " \n", "
\n", "
\n", "
\n" ], "application/vnd.google.colaboratory.intrinsic+json": { "type": "dataframe", "variable_name": "main_df", "summary": "{\n \"name\": \"main_df\",\n \"rows\": 97723,\n \"fields\": [\n {\n \"column\": \"time\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 1812105,\n \"min\": 1528968720,\n \"max\": 1535215200,\n \"num_unique_values\": 97723,\n \"samples\": [\n 1529419080,\n 1535151120,\n 1534230060\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"BTC-USD_close\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 641.3567724151566,\n \"min\": 5778.109863,\n \"max\": 8482.799805,\n \"num_unique_values\": 40971,\n \"samples\": [\n 8137.649902,\n 6232.939941,\n 6502.839844\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"BTC-USD_volume\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 16.6404266337796,\n \"min\": 0.001915,\n \"max\": 801.442993,\n \"num_unique_values\": 96210,\n \"samples\": [\n 4.310447,\n 1.282316,\n 1.032321\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"LTC-USD_close\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 12.675627898158666,\n \"min\": 49.560001,\n \"max\": 103.040001,\n \"num_unique_values\": 3970,\n \"samples\": [\n 79.07,\n 79.449997,\n 92.190002\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"LTC-USD_volume\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 278.2014639364785,\n \"min\": 6e-05,\n \"max\": 10263.191406,\n \"num_unique_values\": 95665,\n \"samples\": [\n 302.852661,\n 9.371292,\n 4.923814\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"BCH-USD_close\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 111.40178065421344,\n \"min\": 473.209991,\n \"max\": 927.0,\n \"num_unique_values\": 24657,\n \"samples\": [\n 496.01001,\n 849.47998,\n 714.77002\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"BCH-USD_volume\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 31.89530583329891,\n \"min\": 2e-06,\n \"max\": 1520.833862,\n \"num_unique_values\": 77904,\n \"samples\": [\n 0.2304,\n 3.869892,\n 0.233977\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"ETH-USD_close\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 75.78239513466816,\n \"min\": 251.0,\n \"max\": 547.0,\n \"num_unique_values\": 17630,\n \"samples\": [\n 282.029999,\n 436.470001,\n 445.649994\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"ETH-USD_volume\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 152.6161975813692,\n \"min\": 3e-05,\n \"max\": 9310.024414,\n \"num_unique_values\": 96872,\n \"samples\": [\n 11.978044,\n 76.828804,\n 9.442243\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n }\n ]\n}" } }, "metadata": {}, "execution_count": 3 } ], "source": [ "main_df = pd.DataFrame() # begin empty\n", "\n", "ratios = [\"BTC-USD\", \"LTC-USD\", \"BCH-USD\", \"ETH-USD\"] # the 4 ratios we want to consider\n", "for ratio in ratios: # begin iteration\n", " print(ratio)\n", " dataset = f'crypto_data/{ratio}.csv' # get the full path to the file.\n", " df = pd.read_csv(dataset, names=['time', 'low', 'high', 'open', 'close', 'volume']) # read in specific file\n", "\n", " # rename volume and close to include the ticker so we can still which close/volume is which:\n", " df.rename(columns={\"close\": f\"{ratio}_close\", \"volume\": f\"{ratio}_volume\"}, inplace=True)\n", "\n", " df.set_index(\"time\", inplace=True) # set time as index so we can join them on this shared time\n", " df = df[[f\"{ratio}_close\", f\"{ratio}_volume\"]] # ignore the other columns besides price and volume\n", "\n", " if len(main_df)==0: # if the dataframe is empty\n", " main_df = df # then it's just the current df\n", " else: # otherwise, join this data to the main one\n", " main_df = main_df.join(df)\n", "\n", "main_df.fillna(method=\"ffill\", inplace=True) # if there are gaps in data, use previously known values\n", "main_df.dropna(inplace=True)\n", "main_df.head() # how did we do??" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "id": "MysngZ8N8ZQq" }, "outputs": [], "source": [ "SEQ_LEN = 60 # how long of a preceeding sequence to collect for RNN\n", "FUTURE_PERIOD_PREDICT = 3 # how far into the future are we trying to predict?\n", "RATIO_TO_PREDICT = \"LTC-USD\"" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "id": "E7m-5xf28ZQq" }, "outputs": [], "source": [ "main_df['future'] = main_df[f'{RATIO_TO_PREDICT}_close'].shift(-FUTURE_PERIOD_PREDICT)" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "id": "6F3cpbsx8ZQq", "outputId": "36425aef-be54-47ca-d05e-26342f9db654", "colab": { "base_uri": "https://localhost:8080/", "height": 255 } }, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ " BTC-USD_close BTC-USD_volume LTC-USD_close LTC-USD_volume \\\n", "time \n", "1528968720 6487.379883 7.706374 96.660004 314.387024 \n", "1528968780 6479.410156 3.088252 96.570000 77.129799 \n", "1528968840 6479.410156 1.404100 96.500000 7.216067 \n", "1528968900 6479.979980 0.753000 96.389999 524.539978 \n", "1528968960 6480.000000 1.490900 96.519997 16.991997 \n", "\n", " BCH-USD_close BCH-USD_volume ETH-USD_close ETH-USD_volume \\\n", "time \n", "1528968720 870.859985 26.856577 486.01001 26.019083 \n", "1528968780 870.099976 1.124300 486.00000 8.449400 \n", "1528968840 870.789978 1.749862 485.75000 26.994646 \n", "1528968900 870.000000 1.680500 486.00000 77.355759 \n", "1528968960 869.989990 1.669014 486.00000 7.503300 \n", "\n", " future \n", "time \n", "1528968720 96.389999 \n", "1528968780 96.519997 \n", "1528968840 96.440002 \n", "1528968900 96.470001 \n", "1528968960 96.400002 " ], "text/html": [ "\n", "
\n", "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
BTC-USD_closeBTC-USD_volumeLTC-USD_closeLTC-USD_volumeBCH-USD_closeBCH-USD_volumeETH-USD_closeETH-USD_volumefuture
time
15289687206487.3798837.70637496.660004314.387024870.85998526.856577486.0100126.01908396.389999
15289687806479.4101563.08825296.57000077.129799870.0999761.124300486.000008.44940096.519997
15289688406479.4101561.40410096.5000007.216067870.7899781.749862485.7500026.99464696.440002
15289689006479.9799800.75300096.389999524.539978870.0000001.680500486.0000077.35575996.470001
15289689606480.0000001.49090096.51999716.991997869.9899901.669014486.000007.50330096.400002
\n", "
\n", "
\n", "\n", "
\n", " \n", "\n", " \n", "\n", " \n", "
\n", "\n", "\n", "
\n", " \n", "\n", "\n", "\n", " \n", "
\n", "
\n", "
\n" ], "application/vnd.google.colaboratory.intrinsic+json": { "type": "dataframe", "variable_name": "main_df", "summary": "{\n \"name\": \"main_df\",\n \"rows\": 97723,\n \"fields\": [\n {\n \"column\": \"time\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 1812105,\n \"min\": 1528968720,\n \"max\": 1535215200,\n \"num_unique_values\": 97723,\n \"samples\": [\n 1529419080,\n 1535151120,\n 1534230060\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"BTC-USD_close\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 641.3567724151566,\n \"min\": 5778.109863,\n \"max\": 8482.799805,\n \"num_unique_values\": 40971,\n \"samples\": [\n 8137.649902,\n 6232.939941,\n 6502.839844\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"BTC-USD_volume\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 16.6404266337796,\n \"min\": 0.001915,\n \"max\": 801.442993,\n \"num_unique_values\": 96210,\n \"samples\": [\n 4.310447,\n 1.282316,\n 1.032321\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"LTC-USD_close\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 12.675627898158666,\n \"min\": 49.560001,\n \"max\": 103.040001,\n \"num_unique_values\": 3970,\n \"samples\": [\n 79.07,\n 79.449997,\n 92.190002\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"LTC-USD_volume\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 278.2014639364785,\n \"min\": 6e-05,\n \"max\": 10263.191406,\n \"num_unique_values\": 95665,\n \"samples\": [\n 302.852661,\n 9.371292,\n 4.923814\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"BCH-USD_close\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 111.40178065421344,\n \"min\": 473.209991,\n \"max\": 927.0,\n \"num_unique_values\": 24657,\n \"samples\": [\n 496.01001,\n 849.47998,\n 714.77002\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"BCH-USD_volume\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 31.89530583329891,\n \"min\": 2e-06,\n \"max\": 1520.833862,\n \"num_unique_values\": 77904,\n \"samples\": [\n 0.2304,\n 3.869892,\n 0.233977\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"ETH-USD_close\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 75.78239513466816,\n \"min\": 251.0,\n \"max\": 547.0,\n \"num_unique_values\": 17630,\n \"samples\": [\n 282.029999,\n 436.470001,\n 445.649994\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"ETH-USD_volume\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 152.6161975813692,\n \"min\": 3e-05,\n \"max\": 9310.024414,\n \"num_unique_values\": 96872,\n \"samples\": [\n 11.978044,\n 76.828804,\n 9.442243\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"future\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 12.675390733502079,\n \"min\": 49.560001,\n \"max\": 103.040001,\n \"num_unique_values\": 3970,\n \"samples\": [\n 79.07,\n 79.449997,\n 92.190002\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n }\n ]\n}" } }, "metadata": {}, "execution_count": 6 } ], "source": [ "main_df.head()" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "id": "1QIwQwYG8ZQq" }, "outputs": [], "source": [ "def classify(current, future):\n", " if float(future) > float(current):\n", " return 1\n", " else:\n", " return 0" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "id": "dJ9sj3E68ZQq" }, "outputs": [], "source": [ "main_df['target'] = list(map(classify, main_df[f'{RATIO_TO_PREDICT}_close'], main_df['future']))" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "id": "GHdcn5Cs8ZQq", "outputId": "e51a4d5c-822e-465f-90a5-f1a20acaccd4", "colab": { "base_uri": "https://localhost:8080/", "height": 255 } }, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ " BTC-USD_close BTC-USD_volume LTC-USD_close LTC-USD_volume \\\n", "time \n", "1528968720 6487.379883 7.706374 96.660004 314.387024 \n", "1528968780 6479.410156 3.088252 96.570000 77.129799 \n", "1528968840 6479.410156 1.404100 96.500000 7.216067 \n", "1528968900 6479.979980 0.753000 96.389999 524.539978 \n", "1528968960 6480.000000 1.490900 96.519997 16.991997 \n", "\n", " BCH-USD_close BCH-USD_volume ETH-USD_close ETH-USD_volume \\\n", "time \n", "1528968720 870.859985 26.856577 486.01001 26.019083 \n", "1528968780 870.099976 1.124300 486.00000 8.449400 \n", "1528968840 870.789978 1.749862 485.75000 26.994646 \n", "1528968900 870.000000 1.680500 486.00000 77.355759 \n", "1528968960 869.989990 1.669014 486.00000 7.503300 \n", "\n", " future target \n", "time \n", "1528968720 96.389999 0 \n", "1528968780 96.519997 0 \n", "1528968840 96.440002 0 \n", "1528968900 96.470001 1 \n", "1528968960 96.400002 0 " ], "text/html": [ "\n", "
\n", "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
BTC-USD_closeBTC-USD_volumeLTC-USD_closeLTC-USD_volumeBCH-USD_closeBCH-USD_volumeETH-USD_closeETH-USD_volumefuturetarget
time
15289687206487.3798837.70637496.660004314.387024870.85998526.856577486.0100126.01908396.3899990
15289687806479.4101563.08825296.57000077.129799870.0999761.124300486.000008.44940096.5199970
15289688406479.4101561.40410096.5000007.216067870.7899781.749862485.7500026.99464696.4400020
15289689006479.9799800.75300096.389999524.539978870.0000001.680500486.0000077.35575996.4700011
15289689606480.0000001.49090096.51999716.991997869.9899901.669014486.000007.50330096.4000020
\n", "
\n", "
\n", "\n", "
\n", " \n", "\n", " \n", "\n", " \n", "
\n", "\n", "\n", "
\n", " \n", "\n", "\n", "\n", " \n", "
\n", "
\n", "
\n" ], "application/vnd.google.colaboratory.intrinsic+json": { "type": "dataframe", "variable_name": "main_df", "summary": "{\n \"name\": \"main_df\",\n \"rows\": 97723,\n \"fields\": [\n {\n \"column\": \"time\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 1812105,\n \"min\": 1528968720,\n \"max\": 1535215200,\n \"num_unique_values\": 97723,\n \"samples\": [\n 1529419080,\n 1535151120,\n 1534230060\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"BTC-USD_close\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 641.3567724151566,\n \"min\": 5778.109863,\n \"max\": 8482.799805,\n \"num_unique_values\": 40971,\n \"samples\": [\n 8137.649902,\n 6232.939941,\n 6502.839844\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"BTC-USD_volume\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 16.6404266337796,\n \"min\": 0.001915,\n \"max\": 801.442993,\n \"num_unique_values\": 96210,\n \"samples\": [\n 4.310447,\n 1.282316,\n 1.032321\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"LTC-USD_close\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 12.675627898158666,\n \"min\": 49.560001,\n \"max\": 103.040001,\n \"num_unique_values\": 3970,\n \"samples\": [\n 79.07,\n 79.449997,\n 92.190002\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"LTC-USD_volume\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 278.2014639364785,\n \"min\": 6e-05,\n \"max\": 10263.191406,\n \"num_unique_values\": 95665,\n \"samples\": [\n 302.852661,\n 9.371292,\n 4.923814\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"BCH-USD_close\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 111.40178065421344,\n \"min\": 473.209991,\n \"max\": 927.0,\n \"num_unique_values\": 24657,\n \"samples\": [\n 496.01001,\n 849.47998,\n 714.77002\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"BCH-USD_volume\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 31.89530583329891,\n \"min\": 2e-06,\n \"max\": 1520.833862,\n \"num_unique_values\": 77904,\n \"samples\": [\n 0.2304,\n 3.869892,\n 0.233977\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"ETH-USD_close\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 75.78239513466816,\n \"min\": 251.0,\n \"max\": 547.0,\n \"num_unique_values\": 17630,\n \"samples\": [\n 282.029999,\n 436.470001,\n 445.649994\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"ETH-USD_volume\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 152.6161975813692,\n \"min\": 3e-05,\n \"max\": 9310.024414,\n \"num_unique_values\": 96872,\n \"samples\": [\n 11.978044,\n 76.828804,\n 9.442243\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"future\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 12.675390733502079,\n \"min\": 49.560001,\n \"max\": 103.040001,\n \"num_unique_values\": 3970,\n \"samples\": [\n 79.07,\n 79.449997,\n 92.190002\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"target\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 0,\n \"min\": 0,\n \"max\": 1,\n \"num_unique_values\": 2,\n \"samples\": [\n 1,\n 0\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n }\n ]\n}" } }, "metadata": {}, "execution_count": 9 } ], "source": [ "main_df.head()" ] }, { "cell_type": "markdown", "metadata": { "id": "Mb653Wnt8ZQq" }, "source": [ "##
جدا کردن دیتای آموزش و ارزیابی
\n" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "id": "YWa8rej-8ZQq" }, "outputs": [], "source": [ "times = sorted(main_df.index.values) # get the times\n", "last_5pct = sorted(main_df.index.values)[-int(0.05*len(times))] # get the last 5% of the times\n", "\n", "validation_main_df = main_df[(main_df.index >= last_5pct)] # make the validation data where the index is in the last 5%\n", "main_df = main_df[(main_df.index < last_5pct)] # now the main_df is all the data up to the last 5%" ] }, { "cell_type": "markdown", "metadata": { "id": "l9wnJN8z8ZQr" }, "source": [ "Next, we need to **balance** and **normalize** this data.\n", "\n", "By **balance**, we want to make sure the classes have equal amounts when training, so our model doesn't just always predict one class.\n", "\n", "One way to counteract this is to use class weights, which allows you to weight loss higher for lesser-frequent classifications. That said, I've never personally seen this really be comparable to a real balanced dataset.\n", "\n", "We also need to take our data and make sequences from it.\n", "\n", "So...we've got some work to do! We'll start by making a function that will process the dataframes, so we can just do something like:\n", "\n", " train_x, train_y = preprocess_df(main_df)\n", " validation_x, validation_y = preprocess_df(validation_main_df)" ] }, { "cell_type": "markdown", "metadata": { "id": "SiAQnFUV8ZQr" }, "source": [ "Let's start by **removing the future column** (the actual target is called literally target and only needed the future column temporarily to create it).\n", "\n", "Then, we need to **scale our data**:" ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "id": "0pQ0kxPM8ZQr", "outputId": "44d186b3-4d57-40c5-afb3-920c32dd845a", "colab": { "base_uri": "https://localhost:8080/", "height": 472 } }, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ " BTC-USD_close BTC-USD_volume LTC-USD_close LTC-USD_volume \\\n", "time \n", "1528969200 -0.242319 0.002191 0.003795 -0.008003 \n", "1528969260 -0.000806 -0.076684 0.003795 -0.006258 \n", "1528969320 0.122635 -0.051822 0.003795 -0.007615 \n", "1528969380 0.001056 -0.084845 0.003795 -0.005315 \n", "1528969440 0.115439 -0.069695 0.416534 -0.007590 \n", "... ... ... ... ... \n", "1534921800 -0.000806 -0.051173 0.142175 -0.002851 \n", "1534921860 -0.000806 -0.068348 -0.134560 -0.007805 \n", "1534921920 -0.000806 0.067019 0.142175 -0.007603 \n", "1534921980 -0.000806 -0.090402 0.003795 -0.006195 \n", "1534922040 -0.220776 0.670964 0.003795 -0.008134 \n", "\n", " BCH-USD_close BCH-USD_volume ETH-USD_close ETH-USD_volume \\\n", "time \n", "1528969200 0.273203 -0.005525 0.004318 -0.048874 \n", "1528969260 0.281555 -0.005861 0.021991 -0.035993 \n", "1528969320 0.483944 -0.005540 0.004318 -0.053161 \n", "1528969380 0.559611 -0.005867 0.004318 -0.036009 \n", "1528969440 0.002924 -0.005758 0.004318 -0.015918 \n", "... ... ... ... ... \n", "1534921800 0.016266 -0.005801 0.275133 -0.024191 \n", "1534921860 0.002924 -0.005885 0.786397 -0.050777 \n", "1534921920 0.002924 0.041620 0.755693 -0.051294 \n", "1534921980 0.002924 -0.005798 0.785094 -0.026056 \n", "1534922040 -0.010418 -0.005884 -0.445685 -0.052437 \n", "\n", " target \n", "time \n", "1528969200 0 \n", "1528969260 1 \n", "1528969320 1 \n", "1528969380 0 \n", "1528969440 0 \n", "... ... \n", "1534921800 0 \n", "1534921860 1 \n", "1534921920 0 \n", "1534921980 0 \n", "1534922040 0 \n", "\n", "[92829 rows x 9 columns]" ], "text/html": [ "\n", "
\n", "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
BTC-USD_closeBTC-USD_volumeLTC-USD_closeLTC-USD_volumeBCH-USD_closeBCH-USD_volumeETH-USD_closeETH-USD_volumetarget
time
1528969200-0.2423190.0021910.003795-0.0080030.273203-0.0055250.004318-0.0488740
1528969260-0.000806-0.0766840.003795-0.0062580.281555-0.0058610.021991-0.0359931
15289693200.122635-0.0518220.003795-0.0076150.483944-0.0055400.004318-0.0531611
15289693800.001056-0.0848450.003795-0.0053150.559611-0.0058670.004318-0.0360090
15289694400.115439-0.0696950.416534-0.0075900.002924-0.0057580.004318-0.0159180
..............................
1534921800-0.000806-0.0511730.142175-0.0028510.016266-0.0058010.275133-0.0241910
1534921860-0.000806-0.068348-0.134560-0.0078050.002924-0.0058850.786397-0.0507771
1534921920-0.0008060.0670190.142175-0.0076030.0029240.0416200.755693-0.0512940
1534921980-0.000806-0.0904020.003795-0.0061950.002924-0.0057980.785094-0.0260560
1534922040-0.2207760.6709640.003795-0.008134-0.010418-0.005884-0.445685-0.0524370
\n", "

92829 rows × 9 columns

\n", "
\n", "
\n", "\n", "
\n", " \n", "\n", " \n", "\n", " \n", "
\n", "\n", "\n", "
\n", " \n", "\n", "\n", "\n", " \n", "
\n", "
\n", "
\n" ], "application/vnd.google.colaboratory.intrinsic+json": { "type": "dataframe", "variable_name": "df", "summary": "{\n \"name\": \"df\",\n \"rows\": 92829,\n \"fields\": [\n {\n \"column\": \"time\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 1712270,\n \"min\": 1528969200,\n \"max\": 1534922040,\n \"num_unique_values\": 92829,\n \"samples\": [\n 1530378840,\n 1533438600,\n 1529859120\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"BTC-USD_close\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 1.0000290206837568,\n \"min\": -81.74742137212566,\n \"max\": 30.833765086360454,\n \"num_unique_values\": 62394,\n \"samples\": [\n 0.001083358172406219,\n 0.32649950205090406,\n -0.027413790772085188\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"BTC-USD_volume\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 1.0000375521647347,\n \"min\": -0.09254911124520915,\n \"max\": 231.8030755034392,\n \"num_unique_values\": 92829,\n \"samples\": [\n 0.07254300245049788,\n -0.08844888316947014,\n -0.0860664578428688\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"LTC-USD_close\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 1.0000172433819452,\n \"min\": -128.09219631258736,\n \"max\": 15.708381633194122,\n \"num_unique_values\": 37741,\n \"samples\": [\n 0.16887432104182262,\n 1.4743344072269449,\n -0.5910706568907677\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"LTC-USD_volume\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 1.0000269301420412,\n \"min\": -0.008143839415498483,\n \"max\": 304.36041505943,\n \"num_unique_values\": 91713,\n \"samples\": [\n -0.006920036036649803,\n -0.006699914879485025,\n -0.007518286895726806\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"BCH-USD_close\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 1.000019302113942,\n \"min\": -98.49733511955756,\n \"max\": 21.64482878612181,\n \"num_unique_values\": 58953,\n \"samples\": [\n 0.7581976053685282,\n 7.108844077472595,\n 0.019755840185531934\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"BCH-USD_volume\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 1.000016158537327,\n \"min\": -0.005887759727394058,\n \"max\": 302.978764063213,\n \"num_unique_values\": 83053,\n \"samples\": [\n -0.005869617863052636,\n -0.005678845282680639,\n -0.0035226276471193247\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"ETH-USD_close\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 1.0000107724534322,\n \"min\": -96.45607613252089,\n \"max\": 18.894490412812004,\n \"num_unique_values\": 62857,\n \"samples\": [\n 0.3174567653088463,\n 0.6670546843048006,\n -0.9565119431587871\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"ETH-USD_volume\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 1.0000053862913514,\n \"min\": -0.054497721680660645,\n \"max\": 210.60128943887352,\n \"num_unique_values\": 92556,\n \"samples\": [\n 0.1071780730647814,\n -0.05067479336528091,\n -0.049156553341491875\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"target\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 0,\n \"min\": 0,\n \"max\": 1,\n \"num_unique_values\": 2,\n \"samples\": [\n 1,\n 0\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n }\n ]\n}" } }, "metadata": {}, "execution_count": 13 } ], "source": [ "from sklearn import preprocessing # pip install sklearn ... if you don't have it!\n", "\n", "def preprocess_df(df):\n", " df = df.drop(\"future\", axis=1) # don't need this anymore.\n", "\n", " for col in df.columns: # go through all of the columns\n", " if col != \"target\": # normalize all ... except for the target itself!\n", " df[col] = df[col].pct_change() # pct change \"normalizes\" the different currencies (each crypto coin has vastly diff values, we're really more interested in the other coin's movements)\n", " df.dropna(inplace=True) # remove the nas created by pct_change\n", " df[col] = preprocessing.scale(df[col].values) # scale between 0 and 1.\n", "\n", " df.dropna(inplace=True) # cleanup again... jic. Those nasty NaNs love to creep in.\n", " return df\n", "\n", "df = preprocess_df(main_df)\n", "df" ] }, { "cell_type": "markdown", "metadata": { "id": "VYh6PrWV8ZQr" }, "source": [ "Alright, we've normalized and scaled the data!\n", "\n", "Next up, **we need to create our actual sequences**.\n", "\n", "To do this:" ] }, { "cell_type": "code", "source": [ "a = deque(maxlen=3)\n", "a" ], "metadata": { "id": "q4PTlMU9BbNX", "outputId": "37877293-9e9e-4053-b7bb-94e9925cb406", "colab": { "base_uri": "https://localhost:8080/" } }, "execution_count": 15, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "deque([])" ] }, "metadata": {}, "execution_count": 15 } ] }, { "cell_type": "code", "source": [ "a.append(2)\n", "a.append(7)\n", "a" ], "metadata": { "id": "KwbFJU1KBeei", "outputId": "e7743919-2e27-4451-f975-079113264a49", "colab": { "base_uri": "https://localhost:8080/" } }, "execution_count": 17, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "deque([5, 2, 7])" ] }, "metadata": {}, "execution_count": 17 } ] }, { "cell_type": "code", "source": [], "metadata": { "id": "MEcUwvBfBl5U" }, "execution_count": 18, "outputs": [] }, { "cell_type": "code", "source": [ "a.append(200)\n", "a" ], "metadata": { "id": "XciLf_K2Byho", "outputId": "98fe673a-2b82-41ed-8ea2-585465dd2a50", "colab": { "base_uri": "https://localhost:8080/" } }, "execution_count": 20, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "deque([7, 100, 200])" ] }, "metadata": {}, "execution_count": 20 } ] }, { "cell_type": "code", "execution_count": 21, "metadata": { "id": "h0Jcsbb98ZQr" }, "outputs": [], "source": [ "import numpy as np\n", "from collections import deque\n", "import random\n", "\n", "\n", "sequential_data = [] # this is a list that will CONTAIN the sequences\n", "prev_days = deque(maxlen=SEQ_LEN) # These will be our actual sequences. They are made with deque, which keeps the maximum length by popping out older values as new ones come in\n", "\n", "for i in df.values: # iterate over the values\n", " prev_days.append([n for n in i[:-1]]) # store all but the target\n", " if len(prev_days) == SEQ_LEN: # make sure we have 60 sequences!\n", " sequential_data.append([np.array(prev_days), i[-1]]) # append those bad boys!\n", "\n", "random.shuffle(sequential_data) # shuffle for good measure." ] }, { "cell_type": "markdown", "metadata": { "id": "uh2AZ9nH8ZQr" }, "source": [ "##
منبع:
\n", "\n", "https://becominghuman.ai/recurrent-neural-networks-rnn-deep-learning-w-python-tensorflow-keras-p-7-c21bc374d4dc" ] }, { "cell_type": "markdown", "metadata": { "id": "UC4ggXkO8ZQr" }, "source": [ "
\n", "
دوره پیشرفته یادگیری عمیق
علیرضا اخوان پور
آبان و آذر 1399
\n", "
\n", "Class.Vision - AkhavanPour.ir - GitHub\n", "\n", "
" ] } ], "metadata": { "kernelspec": { "display_name": "tf2-GPU", "language": "python", "name": "tf2" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.9" }, "colab": { "provenance": [], "gpuType": "T4" }, "accelerator": "GPU" }, "nbformat": 4, "nbformat_minor": 0 }