{ "cells": [ { "cell_type": "markdown", "id": "59bd68c1", "metadata": {}, "source": [ "# 客製化因子\n", "\n", "上一個章節([lecture/Factors.ipynb](https://github.com/tejtw/TQuant-Lab/blob/main/lecture/Factors.ipynb))我們介紹了何謂因子以及如何使用因子,TQuant Lab 已經內建許多不同因子。然而在因子研究不斷勃發之下,許多新型態價量因子持續問世,或許您也有自己的專屬策略因子,因此本章將示範如何客製化因子並運用於 TQuant Lab 中。\n", "\n", "概念上而言,客製化因子與內建因子十分相同。兩者皆以 _inputs_, _window_length_, _mask_ 為輸入參數,並且輸出 _factor_ 物件的類別。\n", "\n", "假使欲計算每檔股票每天的滾動標準差 ([standard deviation](https://zh.wikipedia.org/zh-tw/%E6%A8%99%E6%BA%96%E5%B7%AE)),我們可以使用 `zipline.pipeline.CustomFactor` 子類與 `compute` 方法函式建構。\n", "\n", "### _class_ zipline.pipeline.CustomFactor \n", "\n", "#### Parameters:\n", "* inputs: _iterable_, optional \n", " \n", " 輸入資料。\n", " \n", "* outputs: _iterable[str]_, optional\n", " \n", " 輸出的因子。\n", "\n", "* window_length: _int_, optional\n", " \n", " 輸入資料的時間窗格。\n", " \n", "* mask: _zipline.pipeline.Filter_, optional\n", " \n", " 決定哪些資產需要計算因子。\n", "\n", "#### def compute(self, today, assets, out, *inputs)\n", "\n", "- today: 為pandas.Timestamp型態,記錄 Pipeline 啟動當天的日期。\n", "- assets: 是長度為 N 的numpy array,紀錄 sids(資產)。\n", "- *inputs: 為 MxN 的 numpy.arrays,M 為 window_length 且 N 為資產數量,可以設立多個inputs。\n", "- out: 是長度為 N 的numpy arrays。out 將會產出當天的 CustomFactor 計算結果。" ] }, { "cell_type": "markdown", "id": "a5d69b40", "metadata": {}, "source": [ "## 導入價量資料與必要模組" ] }, { "cell_type": "code", "execution_count": 1, "id": "9b1e34cb", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Merging daily equity files:\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "[2023-10-25 08:14:55.609378] INFO: zipline.data.bundles.core: Ingesting tquant.\n" ] } ], "source": [ "import os\n", "import pandas as pd\n", "import numpy as np \n", "import tejapi\n", "import warnings\n", "warnings.filterwarnings('ignore')\n", "\n", "os.environ['TEJAPI_BASE'] = 'https://api.tej.com.tw'\n", "os.environ['TEJAPI_KEY'] = 'YOUR KEY'\n", "\n", "os.environ['mdate'] = '20080401 20230702'\n", "os.environ['ticker'] = '2330 2409'\n", "\n", "from zipline.pipeline import Pipeline, CustomFactor\n", "from zipline.TQresearch.tej_pipeline import run_pipeline\n", "from zipline.pipeline.data import TWEquityPricing\n", "from zipline.pipeline.filters import StaticAssets,StaticSids\n", "from zipline.api import sid, symbol\n", "\n", "# ingest stock data\n", "!zipline ingest -b tquant" ] }, { "cell_type": "markdown", "id": "49e24285", "metadata": {}, "source": [ "## 建立計算標準差的因子\n", "\n", "於此例我們使用 `np.nanstd` 計算輸入值的標準差,輸入值與時間區間會依照 `make_pipeline()` 中的 `StdDev` 所給的 __inputs__ 與 __window_length__ 所決定。以此例中,若我們想要計算台積電 (2330) 與友達 (2409) 的 7 日收盤價標準差,可以設定為:\n", "\n", "1. inputs = [TWEquityPricing.close], TWEquityPricing 內建 bundle 內所有股票的價量資料。\n", "2. window_length = 7\n", "\n", "接著使用 `run_pipeline` 呼叫 `Pipeline` ,於回測期間內,逐日計算因子,最終產出 dataframe。可以發現該dataframe有MultiIndex,分別是時間與標的,並且每個指標於每天都會生成 7 日收盤價標準差。\n", "\n", "### zipline.TQresearch.tej_pipeline.run_pipeline\n", "\n", "執行 Pipeline 並生成資料表。\n", "\n", "#### Parameters:\n", "* pipeline: _zipline.pipeline.Pipeline_\n", " 欲運行的 pipeline 函式。\n", "* start_date: _pd.Timestamp_\n", " pipeline 起始執行的日期。需注意該日期必須於 bundle 時間區間內。\n", "* end_date: _pd.Timestamp_\n", " pipeline 執行結束的日期。需注意該日期必須於 bundle 時間區間內。\n", " \n", "#### Returns\n", " _pd.DataFrame_, 輸出 Pipeline 執行結果。" ] }, { "cell_type": "code", "execution_count": 2, "id": "6bf90d88", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
std_dev
2013-01-03 00:00:00+00:00Equity(0 [2330])1.375737
Equity(1 [2409])0.350946
2013-01-04 00:00:00+00:00Equity(0 [2330])2.024644
Equity(1 [2409])0.410947
2013-01-07 00:00:00+00:00Equity(0 [2330])2.287053
.........
2022-12-29 00:00:00+00:00Equity(1 [2409])0.314772
2022-12-30 00:00:00+00:00Equity(0 [2330])6.326975
Equity(1 [2409])0.277562
2023-01-03 00:00:00+00:00Equity(0 [2330])6.689163
Equity(1 [2409])0.184888
\n", "

4904 rows × 1 columns

\n", "
" ], "text/plain": [ " std_dev\n", "2013-01-03 00:00:00+00:00 Equity(0 [2330]) 1.375737\n", " Equity(1 [2409]) 0.350946\n", "2013-01-04 00:00:00+00:00 Equity(0 [2330]) 2.024644\n", " Equity(1 [2409]) 0.410947\n", "2013-01-07 00:00:00+00:00 Equity(0 [2330]) 2.287053\n", "... ...\n", "2022-12-29 00:00:00+00:00 Equity(1 [2409]) 0.314772\n", "2022-12-30 00:00:00+00:00 Equity(0 [2330]) 6.326975\n", " Equity(1 [2409]) 0.277562\n", "2023-01-03 00:00:00+00:00 Equity(0 [2330]) 6.689163\n", " Equity(1 [2409]) 0.184888\n", "\n", "[4904 rows x 1 columns]" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "class StdDev(CustomFactor):\n", " def compute(self, today, assets, out, values):\n", " out[:] = np.nanstd(values, axis=0)\n", " \n", "def make_pipeline():\n", " std_dev = StdDev(inputs=[TWEquityPricing.close], window_length=7)\n", " return Pipeline(\n", " columns={\n", " 'std_dev':std_dev\n", " }\n", " )\n", "result = run_pipeline(make_pipeline(), pd.Timestamp('2013-01-03', tz='UTC'), pd.Timestamp('2023-01-03', tz='UTC'))\n", "result" ] }, { "cell_type": "markdown", "id": "3793747c", "metadata": {}, "source": [ "## 預設輸入參數\n", "\n", "當建立客製化因子時,也可以預先設定輸入之參數,於此例中我們欲建立一個計算開收盤價差 10 日平均的因子,在 `TenDayMeanDifference` 中我們預先宣告 `inputs` 與 `window_length` 為 `[TWEquityPricing.close, TWEquityPricing.open]` 與 `window_length = 10`。" ] }, { "cell_type": "code", "execution_count": 3, "id": "b99ec74f", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
close_open_diff
2013-01-03 00:00:00+00:00Equity(0 [2330])0.100
Equity(1 [2409])-0.040
2013-01-04 00:00:00+00:00Equity(0 [2330])0.090
Equity(1 [2409])-0.065
2013-01-07 00:00:00+00:00Equity(0 [2330])0.200
.........
2022-12-29 00:00:00+00:00Equity(1 [2409])-0.060
2022-12-30 00:00:00+00:00Equity(0 [2330])-0.150
Equity(1 [2409])-0.050
2023-01-03 00:00:00+00:00Equity(0 [2330])-1.250
Equity(1 [2409])-0.055
\n", "

4904 rows × 1 columns

\n", "
" ], "text/plain": [ " close_open_diff\n", "2013-01-03 00:00:00+00:00 Equity(0 [2330]) 0.100\n", " Equity(1 [2409]) -0.040\n", "2013-01-04 00:00:00+00:00 Equity(0 [2330]) 0.090\n", " Equity(1 [2409]) -0.065\n", "2013-01-07 00:00:00+00:00 Equity(0 [2330]) 0.200\n", "... ...\n", "2022-12-29 00:00:00+00:00 Equity(1 [2409]) -0.060\n", "2022-12-30 00:00:00+00:00 Equity(0 [2330]) -0.150\n", " Equity(1 [2409]) -0.050\n", "2023-01-03 00:00:00+00:00 Equity(0 [2330]) -1.250\n", " Equity(1 [2409]) -0.055\n", "\n", "[4904 rows x 1 columns]" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "class TenDayMeanDifference(CustomFactor):\n", " inputs = [TWEquityPricing.close, TWEquityPricing.open]\n", " window_length = 10\n", " def compute(self, today, assets, out, c_price, o_price):\n", " out[:] = np.nanmean(c_price - o_price, axis=0)\n", " \n", "def make_pipeline():\n", " close_open_diff = TenDayMeanDifference()\n", " return Pipeline(\n", " columns={\n", " 'close_open_diff':close_open_diff\n", " }\n", " )\n", "\n", "result = run_pipeline(make_pipeline(), pd.Timestamp('2013-01-03', tz='UTC'), pd.Timestamp('2023-01-03', tz='UTC'))\n", "result" ] }, { "cell_type": "markdown", "id": "87c04e8c", "metadata": {}, "source": [ "若在 `make_pipeline` 中賦予 `TenDayMeanDifference` 新的參數則會覆蓋掉預設的參數(`TWEquityPricing.high`、`TWEquityPricing.low`),可以發現下方表格的結果與上方表格不同。" ] }, { "cell_type": "code", "execution_count": 4, "id": "6da187cb", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
close_open_diff
2013-01-03 00:00:00+00:00Equity(0 [2330])1.540
Equity(1 [2409])0.520
2013-01-04 00:00:00+00:00Equity(0 [2330])1.630
Equity(1 [2409])0.515
2013-01-07 00:00:00+00:00Equity(0 [2330])1.600
.........
2022-12-29 00:00:00+00:00Equity(1 [2409])0.375
2022-12-30 00:00:00+00:00Equity(0 [2330])5.850
Equity(1 [2409])0.370
2023-01-03 00:00:00+00:00Equity(0 [2330])6.100
Equity(1 [2409])0.360
\n", "

4904 rows × 1 columns

\n", "
" ], "text/plain": [ " close_open_diff\n", "2013-01-03 00:00:00+00:00 Equity(0 [2330]) 1.540\n", " Equity(1 [2409]) 0.520\n", "2013-01-04 00:00:00+00:00 Equity(0 [2330]) 1.630\n", " Equity(1 [2409]) 0.515\n", "2013-01-07 00:00:00+00:00 Equity(0 [2330]) 1.600\n", "... ...\n", "2022-12-29 00:00:00+00:00 Equity(1 [2409]) 0.375\n", "2022-12-30 00:00:00+00:00 Equity(0 [2330]) 5.850\n", " Equity(1 [2409]) 0.370\n", "2023-01-03 00:00:00+00:00 Equity(0 [2330]) 6.100\n", " Equity(1 [2409]) 0.360\n", "\n", "[4904 rows x 1 columns]" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "def make_pipeline():\n", " close_open_diff = TenDayMeanDifference(inputs=[TWEquityPricing.high, TWEquityPricing.low])\n", " return Pipeline(\n", " columns={\n", " 'close_open_diff':close_open_diff\n", " }\n", " )\n", "\n", "result = run_pipeline(make_pipeline(), pd.Timestamp('2013-01-03', tz='UTC'), pd.Timestamp('2023-01-03', tz='UTC'))\n", "result" ] }, { "cell_type": "markdown", "id": "8e964b36", "metadata": {}, "source": [ "## window length 時間區間\n", "\n", "`Pipeline` 會在每個交易日計算出因子的真實數值。\n", "\n", "請注意因子的時間區間必定是從前一個交易日開始計算,以計算前 10 日最低收盤價格為因子,我們可以建立 `TenDaysLowest`。所得出資料表包含每日各股票往前十個日的最低收盤價,以 2023-03-19 為例,在計算因子時就會從 2023-03-18 開始向前推十日。" ] }, { "cell_type": "code", "execution_count": 5, "id": "d9b4bdad", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
TenDaysLowest
2013-03-18 00:00:00+00:00Equity(0 [2330])102.00
Equity(1 [2409])12.70
2013-03-19 00:00:00+00:00Equity(0 [2330])100.50
Equity(1 [2409])12.65
2013-03-20 00:00:00+00:00Equity(0 [2330])100.00
.........
2022-12-29 00:00:00+00:00Equity(1 [2409])14.65
2022-12-30 00:00:00+00:00Equity(0 [2330])446.00
Equity(1 [2409])14.65
2023-01-03 00:00:00+00:00Equity(0 [2330])446.00
Equity(1 [2409])14.65
\n", "

4814 rows × 1 columns

\n", "
" ], "text/plain": [ " TenDaysLowest\n", "2013-03-18 00:00:00+00:00 Equity(0 [2330]) 102.00\n", " Equity(1 [2409]) 12.70\n", "2013-03-19 00:00:00+00:00 Equity(0 [2330]) 100.50\n", " Equity(1 [2409]) 12.65\n", "2013-03-20 00:00:00+00:00 Equity(0 [2330]) 100.00\n", "... ...\n", "2022-12-29 00:00:00+00:00 Equity(1 [2409]) 14.65\n", "2022-12-30 00:00:00+00:00 Equity(0 [2330]) 446.00\n", " Equity(1 [2409]) 14.65\n", "2023-01-03 00:00:00+00:00 Equity(0 [2330]) 446.00\n", " Equity(1 [2409]) 14.65\n", "\n", "[4814 rows x 1 columns]" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "class TenDaysLowest(CustomFactor):\n", " inputs=[TWEquityPricing.close]\n", " window_length=10\n", " def compute(self, today, assets, out, close):\n", " out[:] = np.nanmin(close, axis=0)\n", " \n", "def make_pipeline():\n", " tendl = TenDaysLowest()\n", " return Pipeline(\n", " columns={\n", " 'TenDaysLowest':tendl\n", " }\n", " )\n", "results = run_pipeline(make_pipeline(), pd.Timestamp('2013-03-18', tz='UTC'), pd.Timestamp('2023-01-03', tz='UTC')) \n", "results" ] }, { "cell_type": "markdown", "id": "5a248da7", "metadata": {}, "source": [ "由上表可以發現 2013-03-19 台積電的 `TenDaysLowest` 為 100.5,而下表可以發現確實從 2013-03-05 到 2013-03-18 之間的最低收盤價為 100.5 而非 2013-03-19 的 100,代表 pipeline 在計算因子時是從前一日開始,避免前視偏誤。" ] }, { "cell_type": "code", "execution_count": 6, "id": "97dc0b2f", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
dateclose
662013-03-04 00:00:00+00:00102.0
682013-03-05 00:00:00+00:00104.0
702013-03-06 00:00:00+00:00104.0
722013-03-07 00:00:00+00:00103.0
742013-03-08 00:00:00+00:00103.5
762013-03-11 00:00:00+00:00102.0
782013-03-12 00:00:00+00:00102.5
802013-03-13 00:00:00+00:00104.5
822013-03-14 00:00:00+00:00104.0
842013-03-15 00:00:00+00:00103.0
862013-03-18 00:00:00+00:00100.5
882013-03-19 00:00:00+00:00100.0
\n", "
" ], "text/plain": [ " date close\n", "66 2013-03-04 00:00:00+00:00 102.0\n", "68 2013-03-05 00:00:00+00:00 104.0\n", "70 2013-03-06 00:00:00+00:00 104.0\n", "72 2013-03-07 00:00:00+00:00 103.0\n", "74 2013-03-08 00:00:00+00:00 103.5\n", "76 2013-03-11 00:00:00+00:00 102.0\n", "78 2013-03-12 00:00:00+00:00 102.5\n", "80 2013-03-13 00:00:00+00:00 104.5\n", "82 2013-03-14 00:00:00+00:00 104.0\n", "84 2013-03-15 00:00:00+00:00 103.0\n", "86 2013-03-18 00:00:00+00:00 100.5\n", "88 2013-03-19 00:00:00+00:00 100.0" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from zipline.data.data_portal import DataPortal, get_bundle\n", "df_bundle = get_bundle(bundle_name='tquant',\n", " calendar_name='TEJ',\n", " start_dt=pd.Timestamp('2013-01-05', tz='UTC'),\n", " end_dt=pd.Timestamp('2023-01-03', tz='UTC'))\n", "\n", "df_bundle.loc[(df_bundle['symbol']=='2330') & (df_bundle['date'].between('2013-03-04','2013-03-19'))][[\"date\", 'close']]" ] } ], "metadata": { "kernelspec": { "display_name": "Python [conda env:zipline-tej] *", "language": "python", "name": "conda-env-zipline-tej-py" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.13" } }, "nbformat": 4, "nbformat_minor": 5 }