{
"cells": [
{
"cell_type": "markdown",
"id": "efeb565c",
"metadata": {},
"source": [
"## Factors\n",
"Factor (因子)是一個將資產在某個時間點的特徵轉換為數字的函數\n",
"```\n",
"F(asset, timestamp) -> float\n",
"```\n",
"在 Pipeline 中,Factors 是最常被使用的項目。Factors 的輸出值會是數值,輸入值會是一個以上的資料欄位與時間窗口長度。\n",
"\n",
"在 Pipeline 中最簡單的 Factors 為 __built-in factors__。__Built-in factors__ 是為了執行常用計算預先建立的。作為第一個例子,讓我們創建一個 factor 來計算過去十天的平均收盤價。\n",
"\n",
"我們可以使用 __Built-in factor__ 中的`SimpleMovingAverage`(指定窗口長度為10天)來計算平均價格。為了實現這個功能,我們需要匯入 __Built-in factor__ `SimpleMovingAverage` 以及 `TWEquityPricing` 資料集。"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "af844c98",
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"[2023-10-25 08:34:14.844867] INFO: zipline.data.bundles.core: Ingesting tquant.\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Merging daily equity files:\n"
]
}
],
"source": [
"import os\n",
"import pandas as pd\n",
"\n",
"os.environ['TEJAPI_BASE'] = 'https://api.tej.com.tw'\n",
"os.environ['TEJAPI_KEY'] = 'YOUR KEY'\n",
"\n",
"os.environ['ticker'] = '2330 2337'\n",
"os.environ['mdate'] = '20070101 20230701'\n",
"\n",
"!zipline ingest -b tquant\n",
"\n",
"from zipline.pipeline import Pipeline\n",
"from zipline.TQresearch.tej_pipeline import run_pipeline\n",
"from zipline.pipeline.data import TWEquityPricing\n",
"from zipline.pipeline.factors import SimpleMovingAverage"
]
},
{
"cell_type": "markdown",
"id": "09baf957",
"metadata": {},
"source": [
"### Creating a Factor\n",
"讓我們回到上一堂課([lecture/Creating a Pipeline](https://github.com/tejtw/TQuant-Lab/blob/main/lecture/Creating%20a%20Pipeline.ipynb))的 `make_pipeline` function。 \n",
"\n",
"\n",
"首先我們實例化一個 `SimpleMovingAverage` factor。為了創建一個 `SimpleMovingAverage` factor,我們可以對 `SimpleMovingAverage` 建構子輸入兩個參數:__inputs__(必須是內含一個 `BoundColumn` 物件的 `list` ),以及 __window_length__(為整數,代表要用過去多少天的資料計算平均價格)。 \n",
"\n",
"\n",
"\n",
"我們會在之後更深度地討論 `BoundColumn`,目前我們只需要知道 `BoundColumn` 代表要將何種資料傳入 `Factor` 物件就好。 \n",
"\n",
" \n",
"下方程式碼將創建一個計算過去10天平均收盤價的 `Factor`。"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "7dc73cfb",
"metadata": {},
"outputs": [],
"source": [
"mean_close_10 = SimpleMovingAverage(inputs=[TWEquityPricing.close], window_length=10)"
]
},
{
"cell_type": "markdown",
"id": "a6ad9ae9",
"metadata": {},
"source": [
"請注意,很重要的一點是,建立 factor 並不完全地等於執行計算。創建一個 factor 就像是定義一個 function。要執行一個計算,我們需要將 factor 加入 pipeline 並執行它。"
]
},
{
"cell_type": "markdown",
"id": "06e301ce",
"metadata": {},
"source": [
"### Adding a Factor to a Pipeline\n",
"讓我們更新原本空的 pipeline ,並讓它計算我們新的移動平均因子。\n",
"\n",
"首先,先在 `make_pipeline` 中實體化剛剛建立因子,讓 pipeline 知道要如何計算它。接著,我們可以藉由設定 `Pipeline` 的 `columns` 參數(`columns`為一個將欄位名稱映射到 factors、 filters 或 classifiers 的字典),讓 pipeline 知道要輸出哪些資料。。 \n",
"\n",
"更新後的 `make_pipeline` 函數看起來應該像這樣:"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "7b944bcc",
"metadata": {},
"outputs": [],
"source": [
"def make_pipeline():\n",
" \n",
" mean_close_10 = SimpleMovingAverage(inputs=[TWEquityPricing.close], window_length=10)\n",
" \n",
" return Pipeline(\n",
" columns={\n",
" '10_day_mean_close': mean_close_10\n",
" }\n",
" )"
]
},
{
"cell_type": "markdown",
"id": "f48fd7de",
"metadata": {},
"source": [
"再來,建立pipeline、執行,接著展示結果。\n",
"\n",
"現在,在我們的 pipeline 中有一個平均收盤價的欄位(`10_day_mean_close`),並且總共有兩檔證券。請注意每一列對應著特定證券在特定日期的計算結果。\n",
"\n",
"pipeline 的產出結果為帶有 MultiIndex 的 DataFrame,它的第一層 __index__ 為執行計算時的日期,而第二層 __index__ 為對應證券的 `Equity` 物件。舉例來說,第一列(`2023-05-05 00:00:00+00:00`, `Equity(0 [2330])`)會是在 2023/5/5 時,證券代碼為 2330 的平均收盤價計算結果。"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "ab3ed3a2",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" | \n",
" 10_day_mean_close | \n",
"
\n",
" \n",
" \n",
" \n",
" 2023-05-05 00:00:00+00:00 | \n",
" Equity(0 [2330]) | \n",
" 501.100 | \n",
"
\n",
" \n",
" Equity(1 [2337]) | \n",
" 32.935 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" 10_day_mean_close\n",
"2023-05-05 00:00:00+00:00 Equity(0 [2330]) 501.100\n",
" Equity(1 [2337]) 32.935"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"result = run_pipeline(make_pipeline(), '2023-05-05', '2023-05-05')\n",
"result"
]
},
{
"cell_type": "markdown",
"id": "8884517a",
"metadata": {},
"source": [
"如果 pipeline 計算天數多於一天(2018-01-03~2022-12-30),輸出會看起來像這樣:"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "f3b0fe63",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" | \n",
" 10_day_mean_close | \n",
"
\n",
" \n",
" \n",
" \n",
" 2018-01-03 00:00:00+00:00 | \n",
" Equity(0 [2330]) | \n",
" 226.950 | \n",
"
\n",
" \n",
" Equity(1 [2337]) | \n",
" 42.500 | \n",
"
\n",
" \n",
" 2018-01-04 00:00:00+00:00 | \n",
" Equity(0 [2330]) | \n",
" 228.150 | \n",
"
\n",
" \n",
" Equity(1 [2337]) | \n",
" 42.565 | \n",
"
\n",
" \n",
" 2018-01-05 00:00:00+00:00 | \n",
" Equity(0 [2330]) | \n",
" 229.650 | \n",
"
\n",
" \n",
" ... | \n",
" ... | \n",
" ... | \n",
"
\n",
" \n",
" 2022-12-28 00:00:00+00:00 | \n",
" Equity(1 [2337]) | \n",
" 34.850 | \n",
"
\n",
" \n",
" 2022-12-29 00:00:00+00:00 | \n",
" Equity(0 [2330]) | \n",
" 462.200 | \n",
"
\n",
" \n",
" Equity(1 [2337]) | \n",
" 34.680 | \n",
"
\n",
" \n",
" 2022-12-30 00:00:00+00:00 | \n",
" Equity(0 [2330]) | \n",
" 458.750 | \n",
"
\n",
" \n",
" Equity(1 [2337]) | \n",
" 34.530 | \n",
"
\n",
" \n",
"
\n",
"
2446 rows × 1 columns
\n",
"
"
],
"text/plain": [
" 10_day_mean_close\n",
"2018-01-03 00:00:00+00:00 Equity(0 [2330]) 226.950\n",
" Equity(1 [2337]) 42.500\n",
"2018-01-04 00:00:00+00:00 Equity(0 [2330]) 228.150\n",
" Equity(1 [2337]) 42.565\n",
"2018-01-05 00:00:00+00:00 Equity(0 [2330]) 229.650\n",
"... ...\n",
"2022-12-28 00:00:00+00:00 Equity(1 [2337]) 34.850\n",
"2022-12-29 00:00:00+00:00 Equity(0 [2330]) 462.200\n",
" Equity(1 [2337]) 34.680\n",
"2022-12-30 00:00:00+00:00 Equity(0 [2330]) 458.750\n",
" Equity(1 [2337]) 34.530\n",
"\n",
"[2446 rows x 1 columns]"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"result = run_pipeline(make_pipeline(), '2018-01-03', '2022-12-30')\n",
"result"
]
},
{
"cell_type": "markdown",
"id": "7c85ebd1",
"metadata": {},
"source": [
"注意:透過 `Pipeline.add` 方法,也可以把因子加入至已經存在的 `Pipeline` 實體中。使用 `add` 看起來會像這樣:\n",
"\n",
"```\n",
"my_pipe = Pipeline()\n",
"f1 = SomeFactor(...)\n",
"my_pipe.add(f1, 'f1')\n",
"```"
]
},
{
"cell_type": "markdown",
"id": "562e43f5",
"metadata": {},
"source": [
"### Latest\n",
"`Latest` 是最常使用的 `Factors`。`Latest` 可以取得指定欄位中最近一期的資料。這個因子因為太常被使用以至於它實例化的方式與比較特別。\n",
"\n",
"若要取得特定欄位最近一期的資料,最好方法是呼叫它的 `.latest` 屬性。作為示範,我們更新 `Pipeline` 以創建一個代表最近一期 __收盤價__ 的因子 ,然後把它加入 pipeline:"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "21601406",
"metadata": {},
"outputs": [],
"source": [
"def make_pipeline():\n",
"\n",
" mean_close_10 = SimpleMovingAverage(inputs=[TWEquityPricing.close], window_length=10)\n",
" latest_close = TWEquityPricing.close.latest\n",
"\n",
" return Pipeline(\n",
" columns={\n",
" '10_day_mean_close': mean_close_10,\n",
" 'latest_close_price': latest_close\n",
" }\n",
" )"
]
},
{
"cell_type": "markdown",
"id": "b8c32c92",
"metadata": {},
"source": [
"現在,當我們再次執行 pipeline 時,產出的 DataFrame 會包含兩個欄位。一個欄位(`10_day_mean_close`)記錄每隻證券的10日平均收盤價,另一個(`latest_close_price`)記錄最近收盤價。"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "889e3139",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" | \n",
" 10_day_mean_close | \n",
" latest_close_price | \n",
"
\n",
" \n",
" \n",
" \n",
" 2023-05-05 00:00:00+00:00 | \n",
" Equity(0 [2330]) | \n",
" 501.100 | \n",
" 498.0 | \n",
"
\n",
" \n",
" Equity(1 [2337]) | \n",
" 32.935 | \n",
" 32.6 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" 10_day_mean_close \\\n",
"2023-05-05 00:00:00+00:00 Equity(0 [2330]) 501.100 \n",
" Equity(1 [2337]) 32.935 \n",
"\n",
" latest_close_price \n",
"2023-05-05 00:00:00+00:00 Equity(0 [2330]) 498.0 \n",
" Equity(1 [2337]) 32.6 "
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"result = run_pipeline(make_pipeline(), '2023-05-05', '2023-05-05')\n",
"result.head(5)"
]
},
{
"cell_type": "markdown",
"id": "e62a5b4d",
"metadata": {},
"source": [
"`.latest` 有時可以回傳 `Factors` 以外的東西。我們會在之後的課程示範其他可能回傳的資料類型。"
]
},
{
"cell_type": "markdown",
"id": "7bd70357",
"metadata": {},
"source": [
"## Default Inputs\n",
"有些 Factors 有著不可被更改的預設輸入值。例如 `VWAP` __built-in factor__ 永遠都是從`TWEquityPricing.close` 和 `TWEquityPricing.volume` 計算而來的。當一個 Factors 永遠由同樣一個 `BoundColumns` 計算而來時,我們會說這個建構子沒有指定的 `inputs`。"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "dc36a97f",
"metadata": {},
"outputs": [],
"source": [
"from zipline.pipeline.factors import VWAP\n",
"vwap = VWAP(window_length=10)"
]
},
{
"cell_type": "markdown",
"id": "e7e24142",
"metadata": {},
"source": [
"其餘 __built-in factor__ 請參考:[Pipeline built-in factors.ipynb](https://github.com/tejtw/TQuant-Lab/blob/main/lecture/Pipeline%20built-in%20factors.ipynb)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python [conda env:zipline-tej] *",
"language": "python",
"name": "conda-env-zipline-tej-py"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.13"
}
},
"nbformat": 4,
"nbformat_minor": 5
}