{ "cells": [ { "cell_type": "markdown", "id": "efeb565c", "metadata": {}, "source": [ "## Factors\n", "Factor (因子)是一個將資產在某個時間點的特徵轉換為數字的函數\n", "```\n", "F(asset, timestamp) -> float\n", "```\n", "在 Pipeline 中,Factors 是最常被使用的項目。Factors 的輸出值會是數值,輸入值會是一個以上的資料欄位與時間窗口長度。\n", "\n", "在 Pipeline 中最簡單的 Factors 為 __built-in factors__。__Built-in factors__ 是為了執行常用計算預先建立的。作為第一個例子,讓我們創建一個 factor 來計算過去十天的平均收盤價。\n", "\n", "我們可以使用 __Built-in factor__ 中的`SimpleMovingAverage`(指定窗口長度為10天)來計算平均價格。為了實現這個功能,我們需要匯入 __Built-in factor__ `SimpleMovingAverage` 以及 `TWEquityPricing` 資料集。" ] }, { "cell_type": "code", "execution_count": 1, "id": "af844c98", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "[2023-10-25 08:34:14.844867] INFO: zipline.data.bundles.core: Ingesting tquant.\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Merging daily equity files:\n" ] } ], "source": [ "import os\n", "import pandas as pd\n", "\n", "os.environ['TEJAPI_BASE'] = 'https://api.tej.com.tw'\n", "os.environ['TEJAPI_KEY'] = 'YOUR KEY'\n", "\n", "os.environ['ticker'] = '2330 2337'\n", "os.environ['mdate'] = '20070101 20230701'\n", "\n", "!zipline ingest -b tquant\n", "\n", "from zipline.pipeline import Pipeline\n", "from zipline.TQresearch.tej_pipeline import run_pipeline\n", "from zipline.pipeline.data import TWEquityPricing\n", "from zipline.pipeline.factors import SimpleMovingAverage" ] }, { "cell_type": "markdown", "id": "09baf957", "metadata": {}, "source": [ "### Creating a Factor\n", "讓我們回到上一堂課([lecture/Creating a Pipeline](https://github.com/tejtw/TQuant-Lab/blob/main/lecture/Creating%20a%20Pipeline.ipynb))的 `make_pipeline` function。 \n", "\n", "\n", "首先我們實例化一個 `SimpleMovingAverage` factor。為了創建一個 `SimpleMovingAverage` factor,我們可以對 `SimpleMovingAverage` 建構子輸入兩個參數:__inputs__(必須是內含一個 `BoundColumn` 物件的 `list` ),以及 __window_length__(為整數,代表要用過去多少天的資料計算平均價格)。 \n", "\n", "\n", "\n", "我們會在之後更深度地討論 `BoundColumn`,目前我們只需要知道 `BoundColumn` 代表要將何種資料傳入 `Factor` 物件就好。 \n", "\n", " \n", "下方程式碼將創建一個計算過去10天平均收盤價的 `Factor`。" ] }, { "cell_type": "code", "execution_count": 2, "id": "7dc73cfb", "metadata": {}, "outputs": [], "source": [ "mean_close_10 = SimpleMovingAverage(inputs=[TWEquityPricing.close], window_length=10)" ] }, { "cell_type": "markdown", "id": "a6ad9ae9", "metadata": {}, "source": [ "請注意,很重要的一點是,建立 factor 並不完全地等於執行計算。創建一個 factor 就像是定義一個 function。要執行一個計算,我們需要將 factor 加入 pipeline 並執行它。" ] }, { "cell_type": "markdown", "id": "06e301ce", "metadata": {}, "source": [ "### Adding a Factor to a Pipeline\n", "讓我們更新原本空的 pipeline ,並讓它計算我們新的移動平均因子。\n", "\n", "首先,先在 `make_pipeline` 中實體化剛剛建立因子,讓 pipeline 知道要如何計算它。接著,我們可以藉由設定 `Pipeline` 的 `columns` 參數(`columns`為一個將欄位名稱映射到 factors、 filters 或 classifiers 的字典),讓 pipeline 知道要輸出哪些資料。。 \n", "\n", "更新後的 `make_pipeline` 函數看起來應該像這樣:" ] }, { "cell_type": "code", "execution_count": 3, "id": "7b944bcc", "metadata": {}, "outputs": [], "source": [ "def make_pipeline():\n", " \n", " mean_close_10 = SimpleMovingAverage(inputs=[TWEquityPricing.close], window_length=10)\n", " \n", " return Pipeline(\n", " columns={\n", " '10_day_mean_close': mean_close_10\n", " }\n", " )" ] }, { "cell_type": "markdown", "id": "f48fd7de", "metadata": {}, "source": [ "再來,建立pipeline、執行,接著展示結果。\n", "\n", "現在,在我們的 pipeline 中有一個平均收盤價的欄位(`10_day_mean_close`),並且總共有兩檔證券。請注意每一列對應著特定證券在特定日期的計算結果。\n", "\n", "pipeline 的產出結果為帶有 MultiIndex 的 DataFrame,它的第一層 __index__ 為執行計算時的日期,而第二層 __index__ 為對應證券的 `Equity` 物件。舉例來說,第一列(`2023-05-05 00:00:00+00:00`, `Equity(0 [2330])`)會是在 2023/5/5 時,證券代碼為 2330 的平均收盤價計算結果。" ] }, { "cell_type": "code", "execution_count": 4, "id": "ab3ed3a2", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
10_day_mean_close
2023-05-05 00:00:00+00:00Equity(0 [2330])501.100
Equity(1 [2337])32.935
\n", "
" ], "text/plain": [ " 10_day_mean_close\n", "2023-05-05 00:00:00+00:00 Equity(0 [2330]) 501.100\n", " Equity(1 [2337]) 32.935" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "result = run_pipeline(make_pipeline(), '2023-05-05', '2023-05-05')\n", "result" ] }, { "cell_type": "markdown", "id": "8884517a", "metadata": {}, "source": [ "如果 pipeline 計算天數多於一天(2018-01-03~2022-12-30),輸出會看起來像這樣:" ] }, { "cell_type": "code", "execution_count": 5, "id": "f3b0fe63", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
10_day_mean_close
2018-01-03 00:00:00+00:00Equity(0 [2330])226.950
Equity(1 [2337])42.500
2018-01-04 00:00:00+00:00Equity(0 [2330])228.150
Equity(1 [2337])42.565
2018-01-05 00:00:00+00:00Equity(0 [2330])229.650
.........
2022-12-28 00:00:00+00:00Equity(1 [2337])34.850
2022-12-29 00:00:00+00:00Equity(0 [2330])462.200
Equity(1 [2337])34.680
2022-12-30 00:00:00+00:00Equity(0 [2330])458.750
Equity(1 [2337])34.530
\n", "

2446 rows × 1 columns

\n", "
" ], "text/plain": [ " 10_day_mean_close\n", "2018-01-03 00:00:00+00:00 Equity(0 [2330]) 226.950\n", " Equity(1 [2337]) 42.500\n", "2018-01-04 00:00:00+00:00 Equity(0 [2330]) 228.150\n", " Equity(1 [2337]) 42.565\n", "2018-01-05 00:00:00+00:00 Equity(0 [2330]) 229.650\n", "... ...\n", "2022-12-28 00:00:00+00:00 Equity(1 [2337]) 34.850\n", "2022-12-29 00:00:00+00:00 Equity(0 [2330]) 462.200\n", " Equity(1 [2337]) 34.680\n", "2022-12-30 00:00:00+00:00 Equity(0 [2330]) 458.750\n", " Equity(1 [2337]) 34.530\n", "\n", "[2446 rows x 1 columns]" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "result = run_pipeline(make_pipeline(), '2018-01-03', '2022-12-30')\n", "result" ] }, { "cell_type": "markdown", "id": "7c85ebd1", "metadata": {}, "source": [ "注意:透過 `Pipeline.add` 方法,也可以把因子加入至已經存在的 `Pipeline` 實體中。使用 `add` 看起來會像這樣:\n", "\n", "```\n", "my_pipe = Pipeline()\n", "f1 = SomeFactor(...)\n", "my_pipe.add(f1, 'f1')\n", "```" ] }, { "cell_type": "markdown", "id": "562e43f5", "metadata": {}, "source": [ "### Latest\n", "`Latest` 是最常使用的 `Factors`。`Latest` 可以取得指定欄位中最近一期的資料。這個因子因為太常被使用以至於它實例化的方式與比較特別。\n", "\n", "若要取得特定欄位最近一期的資料,最好方法是呼叫它的 `.latest` 屬性。作為示範,我們更新 `Pipeline` 以創建一個代表最近一期 __收盤價__ 的因子 ,然後把它加入 pipeline:" ] }, { "cell_type": "code", "execution_count": 6, "id": "21601406", "metadata": {}, "outputs": [], "source": [ "def make_pipeline():\n", "\n", " mean_close_10 = SimpleMovingAverage(inputs=[TWEquityPricing.close], window_length=10)\n", " latest_close = TWEquityPricing.close.latest\n", "\n", " return Pipeline(\n", " columns={\n", " '10_day_mean_close': mean_close_10,\n", " 'latest_close_price': latest_close\n", " }\n", " )" ] }, { "cell_type": "markdown", "id": "b8c32c92", "metadata": {}, "source": [ "現在,當我們再次執行 pipeline 時,產出的 DataFrame 會包含兩個欄位。一個欄位(`10_day_mean_close`)記錄每隻證券的10日平均收盤價,另一個(`latest_close_price`)記錄最近收盤價。" ] }, { "cell_type": "code", "execution_count": 7, "id": "889e3139", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
10_day_mean_closelatest_close_price
2023-05-05 00:00:00+00:00Equity(0 [2330])501.100498.0
Equity(1 [2337])32.93532.6
\n", "
" ], "text/plain": [ " 10_day_mean_close \\\n", "2023-05-05 00:00:00+00:00 Equity(0 [2330]) 501.100 \n", " Equity(1 [2337]) 32.935 \n", "\n", " latest_close_price \n", "2023-05-05 00:00:00+00:00 Equity(0 [2330]) 498.0 \n", " Equity(1 [2337]) 32.6 " ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "result = run_pipeline(make_pipeline(), '2023-05-05', '2023-05-05')\n", "result.head(5)" ] }, { "cell_type": "markdown", "id": "e62a5b4d", "metadata": {}, "source": [ "`.latest` 有時可以回傳 `Factors` 以外的東西。我們會在之後的課程示範其他可能回傳的資料類型。" ] }, { "cell_type": "markdown", "id": "7bd70357", "metadata": {}, "source": [ "## Default Inputs\n", "有些 Factors 有著不可被更改的預設輸入值。例如 `VWAP` __built-in factor__ 永遠都是從`TWEquityPricing.close` 和 `TWEquityPricing.volume` 計算而來的。當一個 Factors 永遠由同樣一個 `BoundColumns` 計算而來時,我們會說這個建構子沒有指定的 `inputs`。" ] }, { "cell_type": "code", "execution_count": 8, "id": "dc36a97f", "metadata": {}, "outputs": [], "source": [ "from zipline.pipeline.factors import VWAP\n", "vwap = VWAP(window_length=10)" ] }, { "cell_type": "markdown", "id": "e7e24142", "metadata": {}, "source": [ "其餘 __built-in factor__ 請參考:[Pipeline built-in factors.ipynb](https://github.com/tejtw/TQuant-Lab/blob/main/lecture/Pipeline%20built-in%20factors.ipynb)" ] } ], "metadata": { "kernelspec": { "display_name": "Python [conda env:zipline-tej] *", "language": "python", "name": "conda-env-zipline-tej-py" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.13" } }, "nbformat": 4, "nbformat_minor": 5 }