{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Get BTC data (FGI, GST, BHP)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"For my project to predict BTC price, i had to get data from different sources with different methods (using package, webscraping and API).
\n",
"So here are the lines of code i'm using to **get a CSV file** with :\n",
"- Crypto Fear & Greed Index (FGI)\n",
"- Bitcoin Google Searches Trend (GST)\n",
"- Bitcoin Historical Price (BHP)"
]
},
{
"cell_type": "markdown",
"metadata": {
"heading_collapsed": true
},
"source": [
"## Librairies Importation"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"hidden": true
},
"outputs": [],
"source": [
"# Main\n",
"import pandas as pd\n",
"import numpy as np\n",
"import math\n",
"import matplotlib.pyplot as plt\n",
"import seaborn as sns\n",
"%matplotlib inline\n",
"\n",
"# Web\n",
"import requests\n",
"from bs4 import BeautifulSoup\n",
"#from IPython.display import display_html\n",
"\n",
"# More\n",
"from pandas.plotting import register_matplotlib_converters\n",
"register_matplotlib_converters()\n",
"pd.set_option('display.max_rows', 400)\n",
"\n",
"# Date format (ex: 190913)\n",
"from datetime import date\n",
"#today = date.today().strftime(\"%Y%m%d\")[2:]\n",
"\n",
"# PyTrends\n",
"from pytrends.request import TrendReq\n",
"pytrend = TrendReq()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Dataframe Preparation"
]
},
{
"cell_type": "markdown",
"metadata": {
"heading_collapsed": true
},
"source": [
"### Dataframe 1: Crypto fear & greed index (fgi)"
]
},
{
"cell_type": "markdown",
"metadata": {
"hidden": true
},
"source": [
"Since February 2018, [Alternative.me](https://alternative.me/crypto/fear-and-greed-index/) provides a fear and greed index based on:
\n",
"- Volatility (25 %)\n",
"- Market Momentum/Volume (25%)\n",
"- Social Media (15%)\n",
"- Surveys (15%)\n",
"- BTC Dominance (10%)\n",
"- Trends (10%)\n",
"
\n",
"\n",
">Note: The API returns a CSV code, but i didn't find how to easily read/convert it.
\n",
">PS: There are some duplicates and missing values in these data. See below in 1.2.4
"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"hidden": true
},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Score | \n",
" Category | \n",
"
\n",
" \n",
" Date | \n",
" | \n",
" | \n",
"
\n",
" \n",
" \n",
" \n",
" 2018-02-01 | \n",
" 30 | \n",
" Fear | \n",
"
\n",
" \n",
" 2018-02-02 | \n",
" 15 | \n",
" Extreme Fear | \n",
"
\n",
" \n",
" 2018-02-03 | \n",
" 40 | \n",
" Fear | \n",
"
\n",
" \n",
" 2018-02-04 | \n",
" 24 | \n",
" Extreme Fear | \n",
"
\n",
" \n",
" 2018-02-05 | \n",
" 11 | \n",
" Extreme Fear | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Score Category\n",
"Date \n",
"2018-02-01 30 Fear\n",
"2018-02-02 15 Extreme Fear\n",
"2018-02-03 40 Fear\n",
"2018-02-04 24 Extreme Fear\n",
"2018-02-05 11 Extreme Fear"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"url = 'https://api.alternative.me/fng/?limit=0&format=csv&date_format=cn'\n",
"r = requests.get(url)\n",
"fgi = BeautifulSoup(r.text, 'lxml')\n",
"\n",
"fgi = str(fgi)\n",
"fgi = fgi[fgi.find('date\\n')+5:fgi.find('\\n\\t]')] \n",
"fgi = fgi.replace('\\n',',').split(',')\n",
"\n",
"fgi = pd.DataFrame({'Date':fgi[::3],'Score':fgi[1:][::3],'Category':fgi[2:][::3]})\n",
"fgi['Date'] = pd.to_datetime(fgi['Date'])\n",
"fgi = fgi.set_index('Date').sort_index()\n",
"\n",
"fgi.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Dataframe 2: Bitcoin Google searches trend (gst)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In a recent notebook, i discover **Pytrends** (https://github.com/GeneralMills/pytrends) which is a great tool for web scraping Google Trends.
\n",
"(Thanks to its team to share it)\n",
"\n",
"> You can get data from 2004. But note that if you choose a **too large period (~8 months)**, Google Trends returns weekly score, not daily.
\n",
"\n",
"The period from today (2019-09-13) to first Fear&Greed Index data (2018-02-01) exceeds this limit to get daily data. So the function ```interest_over_time()``` returns weekly data. But here is the magic and what we interest in. Pytrends gives a function `build_payload()` in order to set timeframe, localisation, etc.. (more on the [documentation](https://github.com/GeneralMills/pytrends)).
\n",
"
(More: PyTrends has a function ```get_historical_interest()``` to get hours data, and many others possibilities.)
\n",
"**What we need to do**, it to set period time of 8 months in order to have our daily trend score."
]
},
{
"cell_type": "code",
"execution_count": 32,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" bitcoin trading | \n",
" isPartial | \n",
"
\n",
" \n",
" date | \n",
" | \n",
" | \n",
"
\n",
" \n",
" \n",
" \n",
" 2018-02-01 | \n",
" 82 | \n",
" False | \n",
"
\n",
" \n",
" 2018-02-02 | \n",
" 98 | \n",
" False | \n",
"
\n",
" \n",
" 2018-02-03 | \n",
" 64 | \n",
" False | \n",
"
\n",
" \n",
" 2018-02-04 | \n",
" 67 | \n",
" False | \n",
"
\n",
" \n",
" 2018-02-05 | \n",
" 83 | \n",
" False | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" bitcoin trading isPartial\n",
"date \n",
"2018-02-01 82 False\n",
"2018-02-02 98 False\n",
"2018-02-03 64 False\n",
"2018-02-04 67 False\n",
"2018-02-05 83 False"
]
},
"execution_count": 32,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Keywords list (max 5 words/expressions)\n",
"kw_list = ['bitcoin trading']\n",
"\n",
"# 2018-02-01 to 2018-05-11\n",
"pytrend.build_payload(kw_list, cat=0, timeframe='2018-02-01 2018-05-11', geo='', gprop='')\n",
"gst_1 = pytrend.interest_over_time()\n",
"\n",
"# 2018-05-12 to 2019-01-22\n",
"pytrend.build_payload(kw_list, cat=0, timeframe='2018-05-12 2019-01-22', geo='', gprop='')\n",
"gst_2 = pytrend.interest_over_time()\n",
"\n",
"# 2019-01-23 to 2019-09-23\n",
"pytrend.build_payload(kw_list, cat=0, timeframe='2019-01-23 2019-09-23', geo='', gprop='')\n",
"gst_3 = pytrend.interest_over_time()\n",
"\n",
"# Concat\n",
"frames = [gst_1, gst_2, gst_3]\n",
"gst = pd.concat(frames)\n",
"\n",
"gst.head()"
]
},
{
"cell_type": "markdown",
"metadata": {
"heading_collapsed": true
},
"source": [
"### Dataframe 3: Bitcoin historical price (bhp)"
]
},
{
"cell_type": "markdown",
"metadata": {
"hidden": true
},
"source": [
"No surprise here, i already webscrap a few times the BTC price on [coinmarketcap.com](https://coinmarketcap.com/).
"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {
"hidden": true
},
"outputs": [],
"source": [
"def get_btc_price(start_date='20170101', end_date='20190913'):\n",
" ''' \n",
" Webscraping BTC price on coinmarketcap.com\n",
" ''' \n",
" url = f'https://coinmarketcap.com/currencies/bitcoin/historical-data/?start={start_date}&end={end_date}'\n",
" btc = BeautifulSoup(requests.get(url).text, 'lxml')\n",
" btc = pd.read_html(str(btc.find_all('table', class_='table')[0]))[0]\n",
" btc['Date'] = pd.to_datetime(btc['Date'])\n",
" \n",
" return btc"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {
"hidden": true
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Open* | \n",
" High | \n",
" Low | \n",
" Close** | \n",
" Volume | \n",
" Market Cap | \n",
"
\n",
" \n",
" Date | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
"
\n",
" \n",
" \n",
" \n",
" 2018-02-01 | \n",
" 10237.30 | \n",
" 10288.80 | \n",
" 8812.28 | \n",
" 9170.54 | \n",
" 9959400448 | \n",
" 154428564694 | \n",
"
\n",
" \n",
" 2018-02-02 | \n",
" 9142.28 | \n",
" 9142.28 | \n",
" 7796.49 | \n",
" 8830.75 | \n",
" 12726899712 | \n",
" 148725283812 | \n",
"
\n",
" \n",
" 2018-02-03 | \n",
" 8852.12 | \n",
" 9430.75 | \n",
" 8251.63 | \n",
" 9174.91 | \n",
" 7263790080 | \n",
" 154540000411 | \n",
"
\n",
" \n",
" 2018-02-04 | \n",
" 9175.70 | \n",
" 9334.87 | \n",
" 8031.22 | \n",
" 8277.01 | \n",
" 7073549824 | \n",
" 139433682759 | \n",
"
\n",
" \n",
" 2018-02-05 | \n",
" 8270.54 | \n",
" 8364.84 | \n",
" 6756.68 | \n",
" 6955.27 | \n",
" 9285289984 | \n",
" 117184385122 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Open* High Low Close** Volume Market Cap\n",
"Date \n",
"2018-02-01 10237.30 10288.80 8812.28 9170.54 9959400448 154428564694\n",
"2018-02-02 9142.28 9142.28 7796.49 8830.75 12726899712 148725283812\n",
"2018-02-03 8852.12 9430.75 8251.63 9174.91 7263790080 154540000411\n",
"2018-02-04 9175.70 9334.87 8031.22 8277.01 7073549824 139433682759\n",
"2018-02-05 8270.54 8364.84 6756.68 6955.27 9285289984 117184385122"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"bhp = get_btc_price(start_date='20180201', end_date='20190923')\n",
"bhp = bhp.set_index('Date').sort_index()\n",
"\n",
"bhp.head()"
]
},
{
"cell_type": "markdown",
"metadata": {
"heading_collapsed": true
},
"source": [
"### Dataframe 1 (fgi) + Dataframe 2 (gst) + Dataframe 3 (bhp)"
]
},
{
"cell_type": "markdown",
"metadata": {
"hidden": true
},
"source": [
"Let's join our 3 dataframes. For each, we need to check the period, and the shape.
\n",
"A necessary step, because we will find duplicates and missing values in one dataframe."
]
},
{
"cell_type": "code",
"execution_count": 33,
"metadata": {
"hidden": true,
"scrolled": false
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Score | \n",
" Category | \n",
" bitcoin trading | \n",
" isPartial | \n",
" Open* | \n",
" High | \n",
" Low | \n",
" Close** | \n",
" Volume | \n",
" Market Cap | \n",
"
\n",
" \n",
" Date | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
"
\n",
" \n",
" \n",
" \n",
" 2018-02-01 | \n",
" 30 | \n",
" Fear | \n",
" 82 | \n",
" False | \n",
" 10237.30 | \n",
" 10288.80 | \n",
" 8812.28 | \n",
" 9170.54 | \n",
" 9959400448 | \n",
" 154428564694 | \n",
"
\n",
" \n",
" 2018-02-02 | \n",
" 15 | \n",
" Extreme Fear | \n",
" 98 | \n",
" False | \n",
" 9142.28 | \n",
" 9142.28 | \n",
" 7796.49 | \n",
" 8830.75 | \n",
" 12726899712 | \n",
" 148725283812 | \n",
"
\n",
" \n",
" 2018-02-03 | \n",
" 40 | \n",
" Fear | \n",
" 64 | \n",
" False | \n",
" 8852.12 | \n",
" 9430.75 | \n",
" 8251.63 | \n",
" 9174.91 | \n",
" 7263790080 | \n",
" 154540000411 | \n",
"
\n",
" \n",
" 2018-02-04 | \n",
" 24 | \n",
" Extreme Fear | \n",
" 67 | \n",
" False | \n",
" 9175.70 | \n",
" 9334.87 | \n",
" 8031.22 | \n",
" 8277.01 | \n",
" 7073549824 | \n",
" 139433682759 | \n",
"
\n",
" \n",
" 2018-02-05 | \n",
" 11 | \n",
" Extreme Fear | \n",
" 83 | \n",
" False | \n",
" 8270.54 | \n",
" 8364.84 | \n",
" 6756.68 | \n",
" 6955.27 | \n",
" 9285289984 | \n",
" 117184385122 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Score Category bitcoin trading isPartial Open* \\\n",
"Date \n",
"2018-02-01 30 Fear 82 False 10237.30 \n",
"2018-02-02 15 Extreme Fear 98 False 9142.28 \n",
"2018-02-03 40 Fear 64 False 8852.12 \n",
"2018-02-04 24 Extreme Fear 67 False 9175.70 \n",
"2018-02-05 11 Extreme Fear 83 False 8270.54 \n",
"\n",
" High Low Close** Volume Market Cap \n",
"Date \n",
"2018-02-01 10288.80 8812.28 9170.54 9959400448 154428564694 \n",
"2018-02-02 9142.28 7796.49 8830.75 12726899712 148725283812 \n",
"2018-02-03 9430.75 8251.63 9174.91 7263790080 154540000411 \n",
"2018-02-04 9334.87 8031.22 8277.01 7073549824 139433682759 \n",
"2018-02-05 8364.84 6756.68 6955.27 9285289984 117184385122 "
]
},
"execution_count": 33,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_fgi = fgi.copy()\n",
"df_gst = gst.copy()\n",
"df_bhp = bhp.copy()\n",
"\n",
"#print(df_fgi.index[-1]) >>> 2019-09-23 (+3 days)\n",
"#print(df_gst.index[-1]) >>> 2019-09-20\n",
"#print(df_bhp.index[-1]) >>> 2019-09-22 (+2 day)\n",
"\n",
"df_fgi = df_fgi[:-3]\n",
"df_gst = df_gst\n",
"df_bhp = df_bhp[:-2]\n",
"\n",
"#print(df_fgi.shape) >>> (596, 2) (Something weird with this dataframe)\n",
"#print(df_gst.shape) >>> (597, 2)\n",
"#print(df_bhp.shape) >>> (597, 6)\n",
"\n",
"#df_fgi.reset_index().shape >>> (596, 3)\n",
"#df_fgi.reset_index().drop_duplicates(subset='Date', keep='last').shape >>> (594, 3)\n",
"\n",
"# Found two duplicate values: 2019-02-17, 2019-01-30\n",
"df_fgi = df_fgi.reset_index().drop_duplicates(subset='Date', keep='last')\n",
"\n",
"# Found three missing values: 2018-04-14, 2018-04-15, 2018-04-16\n",
"# I had to create realistic values\n",
"# And reported the bug to API team, maybe it will be fix\n",
"val = pd.DataFrame({'Date':['2018-04-14', '2018-04-15', '2018-04-16'],'Score':['25', '26', '24'],'Category':['Extreme Fear', 'Fear', 'Extreme Fear']})\n",
"val['Date'] = pd.to_datetime(val['Date'])\n",
"df_fgi = df_fgi.append(val).set_index('Date').sort_index()\n",
"\n",
"# Okay\n",
"df = df_fgi.join(df_gst).join(df_bhp)\n",
"\n",
"df.head()"
]
},
{
"cell_type": "markdown",
"metadata": {
"heading_collapsed": true
},
"source": [
"## Export dataframe as CSV file"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"hidden": true
},
"outputs": [],
"source": [
"# Let's save this code (date format 190913)\n",
"today = date.today().strftime(\"%Y%m%d\")[2:]\n",
"\n",
"df.to_csv(f'{today}_bitcoin_predictions_data.csv')"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"hidden": true
},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"hidden": true
},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.3"
},
"toc": {
"base_numbering": 1,
"nav_menu": {},
"number_sections": true,
"sideBar": true,
"skip_h1_title": false,
"title_cell": "Table of Contents",
"title_sidebar": "Contents",
"toc_cell": false,
"toc_position": {},
"toc_section_display": true,
"toc_window_display": false
}
},
"nbformat": 4,
"nbformat_minor": 2
}