{ "cells": [ { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "import pandas as pd\n", "from datetime import datetime\n", "from datetime import date\n", "import dateutil.parser\n", "import time\n", "from pandas import DataFrame\n", "import datetime\n", "\n", "import os\n", "\n", "from scipy.special import gamma\n", "\n", "import matplotlib.pyplot as plt\n", "import Visualization" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Evaluación estadístico de salto\n", "---\n", "\n", "En este notebook evaluaremos el estadístico de salto definido en Huang & Tauchen [HT] (2005) como:\n", "\n", "$$z_{TP, lm, t} = \\frac{\\log(RV_t) - \\log(BV_t)}{\\sqrt{(v_{bb}-v_{qq})\\frac{1}{M}\\max\\left\\{1, \\frac{TP_t}{BV_t^2}\\right\\}}}$$\n", "\n", "# Parámetros del estadístico\n", "---\n", "\n", "Siguiendo el paper de HT, asumimos que el log-precio de un activo evoluciona en el tiempo como:\n", "\n", "$$dp(t) = \\mu(t)dt + \\sigma(t)dw(t) + dL_J(t)$$\n", "\n", "por lo tanto, se define el retorno geométrico intradía como:\n", "\n", "$$r_{t, j} = p_{t-1+\\frac{j}{M}} - p_{t-1+\\frac{j-1}{M}}$$\n", "\n", "De esta manera, podemos evaluar los parámetros necesarios para evaluar el estadístico: la varianza realizada ($RV_t$), *bi-power variation* ($BV_t$) y *tri-power quarticity* ($TP_t$)\n", "\n", "## Varianza realizada\n", "---\n", "\n", "La varianza realizada se define como:\n", "\n", "$$RV_t = \\sum_{j=1}^M r_{t, j}^2$$\n", "\n", "## Bi-power variation\n", "---\n", "\n", "La variación *bi-power* se define como:\n", "\n", "$$BV_t = \\mu_1^{-2}\\left(\\frac{M}{M-1}\\right)\\sum_{j=2}^M|r_{t, j}||r_{t, j-1}|$$\n", "\n", "$$BV_t = \\frac{\\pi}{2}\\left(\\frac{M}{M-1}\\right)\\sum_{j=2}^M|r_{t, j}||r_{t, j-1}|$$\n", "\n", "## Tri-power quarticity\n", "---\n", "\n", "La cuarticidad *tri-power* se define como:\n", "\n", "$$TP_t = M \\ \\mu_{4/3} \\left(\\frac{M}{M-2}\\right)\\sum_{j=3}^M|r_{t, j-2}|^{4/3}|\\ r_{t, j-1}|^{4/3}\\ |r_{t, j}|^{4/3}$$\n", "\n", "donde $$\\mu_k \\equiv \\frac{2^{k/2}}{\\Gamma(1/2)}\\Gamma\\left[ \\frac{(k+1)}{2} \\right]$$" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Cálculo de los parámetros\n", "---\n", "\n", "Debido a que tendremos un valor del estadístico por cada día, antes de comenzar el cálculo de los parámetros debemos primero separar nuestros datos para cada día." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "# Funcion que retorna una lista con entradas que corresponden a los datos de un dia para una accion\n", "\n", "def sep_date(stockdata):\n", " \n", " # Hacemos un array con las fechas\n", " days = stockdata[\"dia\"].drop_duplicates(keep=\"first\").values\n", " \n", " # Creamos la lista donde guardaremos los dataframes diarios\n", " daily_dfs = []\n", " \n", " # Llenamos la lista\n", " for i in days:\n", " daily_dfs.append(stockdata.loc[stockdata[\"dia\"] == i])\n", " \n", " # Retornamos la lista\n", " return daily_dfs" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Cálculo de los retornos\n", "---\n", "El primer paso es calcular los retornos según los supuestos del estadístico:\n", "\n", "$$r_{t, j} = p_{t-1+\\frac{j}{M}} - p_{t-1+\\frac{j-1}{M}}$$" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "# Funcion que resamplea los precios segun el intervalo de tiempo que nos interese\n", "\n", "def set_freq(stockdata, freq=\"15T\"):\n", " \n", " # Definimos date_time como indice\n", " stockdata = stockdata.set_index(\"date_time\")\n", " \n", " # Resampleamos los precios segun la frecuencia\n", " price_resamp = stockdata.precio.resample(freq).mean()\n", " price_resamp = price_resamp.between_time(\"9:30\", \"15:45\")\n", " price_resamp = pd.DataFrame(price_resamp)\n", " \n", " return price_resamp" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "# Funcion que retorna un array de los retornos a una frecuencia dada\n", "\n", "def get_returns(stockdailydata, lag=1):\n", " \n", " # Resampleamos los precios segun la frecuencia escogida\n", " price_15min = set_freq(stockdailydata)\n", " \n", " # Sacamos el logaritmo de los precios\n", " log_p = np.log(price_15min[\"precio\"].copy())\n", " \n", " # Retornamos la razon de los log-precios\n", " return log_p.diff(periods=lag).values" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Cálculo: varianza realizada\n", "\n", "La varianza realizada será: $$RV_t = \\sum_{j=1}^M r_{t, j}^2$$\n", "donde $$r_{t+j\\delta, \\delta} = p_{t+j\\delta}-p_{t+(j-1)\\delta}$$" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "# Realized volatility\n", "\n", "def RV(stockdailydata, lag=1):\n", " \n", " # Obtenemos los retornos de los datos diarios\n", " ret = get_returns(stockdailydata)\n", " \n", " # Retornamos la suma de los retornos cuadrados\n", " return np.nansum(np.square(ret))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Cálculo: bi-power variation\n", "\n", "La variación *bi-power* será:\n", "\n", "$$BV_t = \\frac{\\pi}{2}\\left(\\frac{M}{M-1}\\right)\\sum_{j=2}^M|r_{t, j}||r_{t, j-1}|$$" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "# Bi-power variation\n", "\n", "def BV(stockdailydata):\n", " \n", " # Calculamos los retornos\n", " ret = get_returns(stockdailydata)\n", " \n", " # Definimos las constantes\n", " M = len(ret) #len(stockdailydata)\n", " coef = (np.pi/2.)*float(M/(M-1.))\n", " \n", " # Inicializamos la lista que guarda los terminos de la sumatoria\n", " sums = []\n", " \n", " # Calculamos los terminos\n", " for i in range(2, M):\n", " sums.append(np.abs(ret[i] * ret[i-1]))\n", " \n", " # Retornamos la suma de los terminos que hay en la lista\n", " return coef*np.nansum(np.array(sums))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Cálculo: tri-power quarticity\n", "\n", "Finalmente, el tri-power quarticity es:\n", "\n", "$$TP_t = M \\ \\mu_{4/3} \\left(\\frac{M}{M-2}\\right)\\sum_{j=3}^M|r_{t, j-2}|^{4/3}|\\ r_{t, j-1}|^{4/3}\\ |r_{t, j}|^{4/3}$$\n", "\n", "donde $$\\mu_k \\equiv \\frac{2^{k/2}}{\\Gamma(1/2)}\\Gamma\\left[ \\frac{(k+1)}{2} \\right]$$" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "# Tri-power quarticity\n", "\n", "def TP(stockdailydata):\n", " \n", " # Calculamos los retornos absolutos\n", " stockdailyret = np.abs(get_returns(stockdailydata))\n", " \n", " # Definimos las constantes\n", " M = len(stockdailyret) #len(stockdailydata)\n", " mu43 = np.float_power(2., 2./3.)*gamma(7./6.)/gamma(0.5)\n", " coef = mu43*M*(M/(M-2))\n", " p = 4./3.\n", " \n", " # Inicializamos la lista donde guardaremos los terminos de la sumatoria\n", " sums = []\n", " \n", " # Calculamos los terminos de la sumatoria\n", " for i in range(3, M):\n", " temp1 = stockdailyret[i]*stockdailyret[i-1]*stockdailyret[i-2]\n", " temp2 = np.float_power(temp1, p)\n", " \n", " # Guardamos los terminos de la sumatoria en la lista sums\n", " sums.append(temp2)\n", " \n", " # Retornamos la suma-producto\n", " return coef*np.nansum(np.array(sums))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Cálculo del estadístico de salto\n", "---\n", "Ya habiendo calculado los parámetros del estadístico podemos calcular su valor:\n", "\n", "$$z_{TP, lm, t} = \\frac{\\log(RV_t) - \\log(BV_t)}{\\sqrt{(v_{bb}-v_{qq})\\frac{1}{M}\\max\\left\\{1, \\frac{TP_t}{BV_t^2}\\right\\}}}$$\n", "\n", "con $$v_{qq}=2$$ $$v_{bb}=\\left(\\frac{\\pi}{2}\\right)^2+\\pi-3$$" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [], "source": [ "# Jump statistic\n", "\n", "def JS(stockdata):\n", " \n", " # Guardamos las fechas para indexar los datos al final\n", " days = stockdata[\"dia\"].drop_duplicates(keep=\"first\").values\n", " \n", " # Inicializamos la lista donde se guardan los resultados\n", " js_result = []\n", " \n", " # Definimos las constantes\n", " vbb = (np.pi/2)**2 + np.pi - 3.\n", " vqq = 2.\n", " \n", " # Separamos nuestros datos por dias\n", " stockdailydata = sep_date(stockdata)\n", " \n", " # Hacemos el calculo de los valores diarios\n", " for df in stockdailydata:\n", " \n", " # Definimos las constantes DIARIAS\n", " #M = len(df)\n", " M = len(get_returns(df))\n", " \n", " # Calculamos los parametros\n", " rv = RV(df)\n", " bv = BV(df)\n", " tp = TP(df)\n", " \n", " # Calculamos el numerador de la operacion\n", " num = np.log(rv) - np.log(bv)\n", " \n", " # Calculamos el denominador de la operacion\n", " temp = tp/(bv*bv)\n", " den = np.sqrt( (vbb-vqq) * 1./M * max(1., temp) )\n", " \n", " # Guardamos el resultado en proporcion de superacion del valor critico 1.96\n", " js_result.append((num/den)/1.96)\n", " \n", " # Verificamos si se supera el valor critico\n", " crit_val = (np.array(js_result) >= 1.).astype(int)\n", " \n", " # Retornamos el resultado\n", " return DataFrame({\"JS_statistic\": js_result, \"OverCritValue\": crit_val}, index=days)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Aplicación - ECOPETL\n", "---\n", "\n", "Ahora podemos aplicar el procedimiento desarrollado a los datos de la acción de ECOPETL. Primero, importamos los datos. Luego, aplicamos las funciones previamente presentadas. Finalmente, se presentan las gráficas de todos los parámetros aquí calculados." ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [], "source": [ "# Definimos la direccion del directorio\n", "pathTesis = os.getcwd()\n", "filepath = pathTesis + \"/depth_data.csv\"" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "CPU times: user 565 ms, sys: 27.4 ms, total: 592 ms\n", "Wall time: 591 ms\n" ] } ], "source": [ "# Importamos los datos\n", "data = %time pd.read_csv(filepath, parse_dates=[\"date_time\"])" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "CPU times: user 2.39 s, sys: 10.4 ms, total: 2.4 s\n", "Wall time: 2.4 s\n" ] } ], "source": [ "# Calculamos el estadistico\n", "js_test = %time JS(data)" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Verificamos si hay algun NaN en los resultados\n", "js_test.isnull().values.sum()" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " | JS_statistic | \n", "OverCritValue | \n", "
---|---|---|
2017-03-03 | \n", "0.334168 | \n", "0 | \n", "
2017-03-06 | \n", "-0.382157 | \n", "0 | \n", "
2017-03-07 | \n", "-0.493153 | \n", "0 | \n", "
2017-03-08 | \n", "-0.494512 | \n", "0 | \n", "
2017-03-09 | \n", "-0.281593 | \n", "0 | \n", "
2017-03-10 | \n", "-0.404463 | \n", "0 | \n", "
2017-03-13 | \n", "1.105760 | \n", "1 | \n", "
2017-03-14 | \n", "1.911395 | \n", "1 | \n", "
2017-03-15 | \n", "-0.008483 | \n", "0 | \n", "
2017-03-16 | \n", "-0.254607 | \n", "0 | \n", "