{ "metadata": { "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.5-final" }, "orig_nbformat": 2, "kernelspec": { "name": "Python 3.8.5 64-bit ('bigquery': conda)", "display_name": "Python 3.8.5 64-bit ('bigquery': conda)", "metadata": { "interpreter": { "hash": "8e6f8fd53d913fe50345f9e659ed342277121f637d2311273da0eef260503de3" } } } }, "nbformat": 4, "nbformat_minor": 2, "cells": [ { "source": [ "# Correlation and covariance from scratch" ], "cell_type": "markdown", "metadata": {} }, { "source": [ "In this post we examine covariance and a correlation a bit closer.\n", "\n", "We will use them to examine the relationship between Ethereum transaction value and gas price.\n", "\n", "Again, most of the time, we break down the steps into standard Python data types and operations (i.e. we use numpy mostly for verification of our results)." ], "cell_type": "markdown", "metadata": {} }, { "source": [ "## Libraries and data load" ], "cell_type": "markdown", "metadata": {} }, { "source": [ "We pull the data from Google's public datasets with BigQuery, use pandas and numpy to manipulate it, and altair to plot their relationship." ], "cell_type": "markdown", "metadata": {} }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import os\n", "os.environ[\"GOOGLE_APPLICATION_CREDENTIALS\"]=os.path.expanduser(\"~/.credentials/Notebook bigquery-c422e406404b.json\")\n", "\n", "from google.cloud import bigquery\n", "client = bigquery.Client()\n", "\n", "import altair as alt\n", "alt.data_transformers.disable_max_rows()\n", "\n", "import numpy as np\n", "import pandas as pd" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "query =\"\"\"\n", "SELECT\n", " EXTRACT(DATE FROM block_timestamp) AS date,\n", " AVG(value) AS value,\n", " AVG(gas_price) AS gas_price, \n", "FROM `bigquery-public-data.ethereum_blockchain.transactions`\n", "WHERE\n", " EXTRACT(YEAR FROM block_timestamp) = 2019\n", "GROUP BY date\n", "ORDER BY date\n", "\"\"\"" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": " date value gas_price\n0 2019-01-01 3.719103e+18 1.431514e+10\n1 2019-01-02 4.649915e+18 1.349952e+10\n2 2019-01-03 4.188781e+18 1.269504e+10\n3 2019-01-04 6.958368e+18 1.418197e+10\n4 2019-01-05 8.167590e+18 2.410475e+10", "text/html": "
\n | date | \nvalue | \ngas_price | \n
---|---|---|---|
0 | \n2019-01-01 | \n3.719103e+18 | \n1.431514e+10 | \n
1 | \n2019-01-02 | \n4.649915e+18 | \n1.349952e+10 | \n
2 | \n2019-01-03 | \n4.188781e+18 | \n1.269504e+10 | \n
3 | \n2019-01-04 | \n6.958368e+18 | \n1.418197e+10 | \n
4 | \n2019-01-05 | \n8.167590e+18 | \n2.410475e+10 | \n