<span id="top"></span>
# <font color=#57a892>Pipeline built-in filters</font>

本文介紹常用的內建 `Filters`。

In [1]:
import os 
import pandas as pd
import numpy as np 

os.environ['TEJAPI_BASE'] = "https://api.tej.com.tw"
os.environ['TEJAPI_KEY'] = "your key"

os.environ['ticker'] = "1101 1301 1303 1802 2002 2101 2303 2317 2330 2337 2382 2388 2451 2454 2603 2881 2885 2890 2903 3711 IR0001"
os.environ['mdate'] = '20180101 20220330'

!zipline ingest -b tquant

Merging daily equity files:
Currently used TEJ API key call quota 126/100000 (0.13%)
Currently used TEJ API key data quota 872686/10000000 (8.73%)


[2024-07-09 05:27:50.154669] INFO: zipline.data.bundles.core: Ingesting tquant.
[2024-07-09 05:27:56.108611] INFO: zipline.data.bundles.core: Ingest tquant successfully.


In [2]:
from zipline.data import bundles
from zipline.pipeline import Pipeline
from zipline.TQresearch.tej_pipeline import run_pipeline
from zipline.pipeline.data import TWEquityPricing, TQAltDataSet, TQDataSet
from zipline.pipeline.factors import *
from zipline.pipeline.filters import *

start = pd.Timestamp("2018-02-06", tz='utc')
end = pd.Timestamp("2022-02-06", tz='utc')

bundle = bundles.load('tquant')
sids = bundle.asset_finder.equities_sids
assets = bundle.asset_finder.retrieve_all(sids)

<span id="menu"></span>
    
### Menu
* [All](#All)
* [Any](#Any)
* [AtLeastN](#AtLeastN)
* [AllPresent](#AllPresent)
* [StaticAssets](#StaticAssets)
* [StaticSids](#StaticSids)
* [SingleAsset](#SingleAsset)
* [top/bottom](#top/bottom)
* [percentile_between](#percentile_between)
* [if_else](#if_else)

<span id="All"></span>

## zipline.pipeline.filters.<font color=#57a892>All</font>

在 n 日內，若一資產每日皆符合條件值，該資產為 True。

>### Parameters:
>* inputs _( zipline.pipeline.data.Dataset.Boundcolumn_ or _boolean )_ - 資產價量資訊與條件值。
>* window_length _( int )_ - 決定 n 日。

[Go to Menu](#menu)

In [3]:
from zipline.pipeline.filters import All

def make_pipeline():
    return Pipeline(
        columns = {
            "ALL": All(
                inputs = [TWEquityPricing.close.latest > 40], # 設定條件為前一日收盤價 > 40 時為 True
                window_length = 1
            )
        }
    )

run_pipeline(make_pipeline(), start, end)

Unnamed: 0,Unnamed: 1,ALL
2018-02-06 00:00:00+00:00,Equity(0 [1101]),False
2018-02-06 00:00:00+00:00,Equity(1 [1301]),True
2018-02-06 00:00:00+00:00,Equity(2 [1303]),True
2018-02-06 00:00:00+00:00,Equity(3 [1802]),False
2018-02-06 00:00:00+00:00,Equity(4 [2002]),False
...,...,...
2022-01-26 00:00:00+00:00,Equity(16 [2885]),False
2022-01-26 00:00:00+00:00,Equity(17 [2890]),False
2022-01-26 00:00:00+00:00,Equity(18 [2903]),False
2022-01-26 00:00:00+00:00,Equity(19 [3711]),True


<span id="Any"></span>

## zipline.pipeline.filters.<font color=#57a892>Any</font>

在 n 日內，若一資產任一日符合條件值，該資產為 True。

>### Parameters:
>* inputs _( zipline.pipeline.data.Dataset.Boundcolumn_ or _boolean )_ - 資產價量資訊與條件值。
>* window_length _( int )_ - 決定 n 日。

[Go to Menu](#menu)

In [4]:
from zipline.pipeline.filters import Any

def make_pipeline():
    return Pipeline(
        columns = {
            "Any": Any(
                inputs = [TWEquityPricing.close.latest > 40], 
                window_length = 10
            )
        }
    )

run_pipeline(make_pipeline(), start, end)

Unnamed: 0,Unnamed: 1,Any
2018-02-06 00:00:00+00:00,Equity(0 [1101]),False
2018-02-06 00:00:00+00:00,Equity(1 [1301]),True
2018-02-06 00:00:00+00:00,Equity(2 [1303]),True
2018-02-06 00:00:00+00:00,Equity(3 [1802]),False
2018-02-06 00:00:00+00:00,Equity(4 [2002]),False
...,...,...
2022-01-26 00:00:00+00:00,Equity(16 [2885]),False
2022-01-26 00:00:00+00:00,Equity(17 [2890]),False
2022-01-26 00:00:00+00:00,Equity(18 [2903]),False
2022-01-26 00:00:00+00:00,Equity(19 [3711]),True


<span id="AtLeastN"></span>

## zipline.pipeline.filters.<font color=#57a892>AtLeastN</font>

在 m 日內，若一資產至少有 n 日符合條件值，該資產為 True。

>### Parameters:
>* inputs _( zipline.pipeline.data.Dataset.Boundcolumn_ or _boolean )_ - 資產價量資訊與條件值。
>* window_length _( int )_ - 決定 m 日。
>* N _( int )_ - 決定 n 日。

[Go to Menu](#menu)

In [5]:
from zipline.pipeline.filters import AtLeastN

def make_pipeline():
    return Pipeline(
        columns = {
            "AtLeastN": AtLeastN(
                inputs = [TWEquityPricing.close.latest > 40],
                window_length = 10,
                N = 2
            )
        }
    )

run_pipeline(make_pipeline(), start, end)

Unnamed: 0,Unnamed: 1,AtLeastN
2018-02-06 00:00:00+00:00,Equity(0 [1101]),False
2018-02-06 00:00:00+00:00,Equity(1 [1301]),True
2018-02-06 00:00:00+00:00,Equity(2 [1303]),True
2018-02-06 00:00:00+00:00,Equity(3 [1802]),False
2018-02-06 00:00:00+00:00,Equity(4 [2002]),False
...,...,...
2022-01-26 00:00:00+00:00,Equity(16 [2885]),False
2022-01-26 00:00:00+00:00,Equity(17 [2890]),False
2022-01-26 00:00:00+00:00,Equity(18 [2903]),False
2022-01-26 00:00:00+00:00,Equity(19 [3711]),True


<span id="AllPresent"></span>

## zipline.pipeline.filters.<font color=#57a892>AllPresent</font>

在 n 日內，若每日皆有指定資料，該資產為 True。

>### Parameters:
>* inputs _( zipline.pipeline.data.Dataset.Boundcolumn_ or _boolean )_ - 資產價量資訊。
>* window_length _( int )_ - 決定 n 日。

[Go to Menu](#menu)

In [6]:
from zipline.pipeline.filters import AllPresent

def make_pipeline():
    return Pipeline(
        columns = {
            "AllPresent": AllPresent(
                inputs = [TWEquityPricing.close], 
                window_length = 10
            )
        }
    )

run_pipeline(make_pipeline(), start, end).loc["2018-05-04"]
# 可注意到 3711 在 2018-04-30 才上市，因此 2018-05-04 為 False

Unnamed: 0,AllPresent
Equity(0 [1101]),True
Equity(1 [1301]),True
Equity(2 [1303]),True
Equity(3 [1802]),True
Equity(4 [2002]),True
Equity(5 [2101]),True
Equity(6 [2303]),True
Equity(7 [2317]),True
Equity(8 [2330]),True
Equity(9 [2337]),True


In [7]:
# 首先抓出所有 bundle 中的股價
from zipline.data import bundles

bundle = bundles.load('tquant')
sids = bundle.asset_finder.equities_sids
assets = bundle.asset_finder.retrieve_all(sids)
assets

[Equity(0 [1101]),
 Equity(1 [1301]),
 Equity(2 [1303]),
 Equity(3 [1802]),
 Equity(4 [2002]),
 Equity(5 [2101]),
 Equity(6 [2303]),
 Equity(7 [2317]),
 Equity(8 [2330]),
 Equity(9 [2337]),
 Equity(10 [2382]),
 Equity(11 [2388]),
 Equity(12 [2451]),
 Equity(13 [2454]),
 Equity(14 [2603]),
 Equity(15 [2881]),
 Equity(16 [2885]),
 Equity(17 [2890]),
 Equity(18 [2903]),
 Equity(19 [3711]),
 Equity(20 [IR0001])]

<span id="StaticAssets"></span>

## zipline.pipeline.filters.<font color=#57a892>StaticAssets</font>

指定特定資產為 True。

>### Parameters:
>* assets _( zipline.assets.Asset, iterable )_ - 指定資產。

[Go to Menu](#menu)

In [8]:
from zipline.pipeline.filters import StaticAssets
from zipline import run_algorithm
from zipline.api import symbol, attach_pipeline, pipeline_output

def make_pipeline():
    return Pipeline(
        columns = {
            "StaticAssets": StaticAssets(
                assets = assets[4:8]
            )
        }
    )

def initialize(context):
    my_pipe = attach_pipeline(make_pipeline(), 'my_pipe')
    
def handle_data(context, data):
    pipe = pipeline_output('my_pipe')
    print("=" * 100)
    print(pipe)

def analyze(context, perf):
    pass

results = run_algorithm(
    start = pd.Timestamp('2019-01-02', tz='utc'),
    end = pd.Timestamp('2019-01-02', tz='utc'),
    initialize = initialize,
    capital_base = 1e6,
    handle_data = handle_data,
    analyze = analyze, 
    bundle = 'tquant'
)

                     StaticAssets
Equity(0 [1101])            False
Equity(1 [1301])            False
Equity(2 [1303])            False
Equity(3 [1802])            False
Equity(4 [2002])             True
Equity(5 [2101])             True
Equity(6 [2303])             True
Equity(7 [2317])             True
Equity(8 [2330])            False
Equity(9 [2337])            False
Equity(10 [2382])           False
Equity(11 [2388])           False
Equity(12 [2451])           False
Equity(13 [2454])           False
Equity(14 [2603])           False
Equity(15 [2881])           False
Equity(16 [2885])           False
Equity(17 [2890])           False
Equity(18 [2903])           False
Equity(19 [3711])           False
Equity(20 [IR0001])         False


<span id="StaticSids"></span>

## zipline.pipeline.filters.<font color=#57a892>StaticSids</font>

指定特定資產為 True。

>### Parameters:
>* sids _( int, iterable )_ - 指定資產的 sid。

[Go to Menu](#menu)

In [9]:
from zipline.pipeline.filters import StaticSids
from zipline import run_algorithm
from zipline.api import symbol, attach_pipeline, pipeline_output

def make_pipeline():
    return Pipeline(
        columns = {
            "StaticSids": StaticSids(
                sids = range(4,8)
            )
        }
    )

def initialize(context):
    my_pipe = attach_pipeline(make_pipeline(), 'my_pipe')
    
def handle_data(context, data):
    pipe = pipeline_output('my_pipe')
    print("=" * 100)
    print(pipe)

def analyze(context, perf):
    pass

results = run_algorithm(
    start = pd.Timestamp('2019-01-01', tz='utc'),
    end = pd.Timestamp('2019-01-02', tz='utc'),
    initialize = initialize,
    capital_base = 1e6,
    handle_data = handle_data,
    analyze = analyze, 
    bundle = 'tquant'
)


                     StaticSids
Equity(0 [1101])          False
Equity(1 [1301])          False
Equity(2 [1303])          False
Equity(3 [1802])          False
Equity(4 [2002])           True
Equity(5 [2101])           True
Equity(6 [2303])           True
Equity(7 [2317])           True
Equity(8 [2330])          False
Equity(9 [2337])          False
Equity(10 [2382])         False
Equity(11 [2388])         False
Equity(12 [2451])         False
Equity(13 [2454])         False
Equity(14 [2603])         False
Equity(15 [2881])         False
Equity(16 [2885])         False
Equity(17 [2890])         False
Equity(18 [2903])         False
Equity(19 [3711])         False
Equity(20 [IR0001])       False


<span id="SingleAsset"></span>

## zipline.pipeline.filters.<font color=#57a892>SingleAsset</font>

指定單一特定資產為 True。

>### Parameters:
>* assets _( zipline.assets.Asset )_ - 指定資產。

[Go to Menu](#menu)

In [10]:
from zipline.pipeline.filters import SingleAsset
from zipline import run_algorithm
from zipline.api import symbol, attach_pipeline, pipeline_output

def make_pipeline():
    return Pipeline(
        columns = {
            "SingleAsset": SingleAsset(
                asset = assets[4]
            )
        }
    )

def initialize(context):
    my_pipe = attach_pipeline(make_pipeline(), 'my_pipe')
    
def handle_data(context, data):
    pipe = pipeline_output('my_pipe')
    print("=" * 100)
    print(pipe)

def analyze(context, perf):
    pass

results = run_algorithm(
    start = pd.Timestamp('2019-01-02', tz='utc'),
    end = pd.Timestamp('2019-01-02', tz='utc'),
    initialize = initialize,
    capital_base = 1e6,
    handle_data = handle_data,
    analyze = analyze, 
    bundle = 'tquant'
)

                     SingleAsset
Equity(0 [1101])           False
Equity(1 [1301])           False
Equity(2 [1303])           False
Equity(3 [1802])           False
Equity(4 [2002])            True
Equity(5 [2101])           False
Equity(6 [2303])           False
Equity(7 [2317])           False
Equity(8 [2330])           False
Equity(9 [2337])           False
Equity(10 [2382])          False
Equity(11 [2388])          False
Equity(12 [2451])          False
Equity(13 [2454])          False
Equity(14 [2603])          False
Equity(15 [2881])          False
Equity(16 [2885])          False
Equity(17 [2890])          False
Equity(18 [2903])          False
Equity(19 [3711])          False
Equity(20 [IR0001])        False


<span id="top/bottom"></span>

## <font color=#57a892>top/bottom</font>

將最大 / 最小的 N 項標為 True，其餘為 False。

>### Parameters:
>* N _( int )_ - 數量。
>* mask _( zipline.pipeline.Filter, optional )_ - 預設為無，若加上 mask，僅排名 mask = True 的項目。
>* groupby _( zipline.pipeline.Classifier, optional )_ -
>   * 預設為無。
>   * 必須為 `Classifier`，若給定 `Classifier` 則是每個分類取最大／最小的 N 項。

[Go to Menu](#menu)

### Examples－top

在以下範例中： 
* *sma_quartiles* 將股票依據 SMA 由低至高分成四個級距 (0, 1, 2, 3)
* *top_beta* 會先篩出平均成交額超過 5 億的股票，再從 4 個 SMA 等級中，各挑 beta 最高的 2 支股票。

In [11]:
assets_ex_IR0001 = [i for i in assets if i!= bundle.asset_finder.lookup_symbol('IR0001', as_of_date=None)]

def make_pipeline():

#     quartiles
    sma = SimpleMovingAverage(inputs = [TWEquityPricing.close], window_length = 30)
    sma_quartiles = sma.quartiles(mask = StaticAssets(assets_ex_IR0001))
    
#     top  
    sbeta = SimpleBeta(target = bundle.asset_finder.lookup_symbol('IR0001', as_of_date=None),
                       regression_length = 300,
                       allowed_missing_percentage = 0.25)
    
    adv = AverageDollarVolume(window_length = 10)
    top_dollar = adv > 500000000
    top_beta = sbeta.top(N = 2, mask = top_dollar & StaticAssets(assets_ex_IR0001), groupby = sma_quartiles)
    
    return Pipeline(
        columns={
            'SMA': sma,
            'SMA Quartile': sma_quartiles,
            'Average Dollar Volume':adv,
            'Simple Beta': sbeta,
            'top_beta': top_beta
        }
    )

可以看到在 *top_beta* 欄位中，4 個 SMA 級距各有兩檔股票被標為 True，且平均成交額皆大於 5 億 ( 5e+08 )。

In [12]:
result = run_pipeline(make_pipeline(), end, end)
result.loc[:,['Average Dollar Volume', 'SMA Quartile', 'Simple Beta', 'top_beta']]\
            [result.top_beta == True].sort_values(['SMA Quartile', 'Simple Beta'], ascending=[False, False])

Unnamed: 0,Unnamed: 1,Average Dollar Volume,SMA Quartile,Simple Beta,top_beta
2022-02-07 00:00:00+00:00,Equity(14 [2603]),13869230000.0,3,1.920292,True
2022-02-07 00:00:00+00:00,Equity(13 [2454]),5716348000.0,3,1.453462,True
2022-02-07 00:00:00+00:00,Equity(7 [2317]),2912853000.0,2,1.142091,True
2022-02-07 00:00:00+00:00,Equity(2 [1303]),647684300.0,2,0.881953,True
2022-02-07 00:00:00+00:00,Equity(6 [2303]),8088602000.0,1,1.774655,True
2022-02-07 00:00:00+00:00,Equity(9 [2337]),709846800.0,1,1.380687,True
2022-02-07 00:00:00+00:00,Equity(4 [2002]),1103390000.0,0,1.134731,True
2022-02-07 00:00:00+00:00,Equity(16 [2885]),614422600.0,0,0.843807,True


<span id="percentile_between"></span>

## <font color=#57a892>percentile_between</font>

將數值大小介於兩個百分位數（含）之間的資料標為 True，其餘為 False。

>### Parameters:
>* min_percentile _( float )_ - 下限，介於 [0.0, 100.0]。
>* max_percentile _( float )_ - 上限，介於 [0.0, 100.0]。
>* mask _( zipline.pipeline.Filter, optional )_ - 預設為無，若加上 mask，僅排名 mask = True 的項目。

[Go to Menu](#menu)

### Examples－percentile_between

在以下範例中：
```python
daily_r = DailyReturns()
top_r = daily_r.percentile_between(min_percentile = 80, max_percentile = 100, mask=StaticAssets(assets_ex_IR0001))
```
篩選出日報酬率前 20% 的股票。

In [13]:
def make_pipeline():

#     percentile_between  
    daily_r = DailyReturns(inputs = [TWEquityPricing.close])
    top_r = daily_r.percentile_between(min_percentile = 80, max_percentile = 100, mask=StaticAssets(assets_ex_IR0001))
    
    return Pipeline(
        columns={
            'Daily Return': daily_r,
            'top_r': top_r
        }
    )

共有 20 x ( 100% - 80% ) = 4 檔股票被標為 True。

In [14]:
result = run_pipeline(make_pipeline(), end, end)
result.loc[:,['Daily Return','top_r']].sort_values(by = 'Daily Return', ascending = False).head(10)

Unnamed: 0,Unnamed: 1,Daily Return,top_r
2022-02-07 00:00:00+00:00,Equity(9 [2337]),0.062176,True
2022-02-07 00:00:00+00:00,Equity(14 [2603]),0.027273,True
2022-02-07 00:00:00+00:00,Equity(2 [1303]),0.010526,True
2022-02-07 00:00:00+00:00,Equity(18 [2903]),0.009615,True
2022-02-07 00:00:00+00:00,Equity(11 [2388]),0.007519,False
2022-02-07 00:00:00+00:00,Equity(4 [2002]),0.005979,False
2022-02-07 00:00:00+00:00,Equity(1 [1301]),0.004785,False
2022-02-07 00:00:00+00:00,Equity(12 [2451]),0.00431,False
2022-02-07 00:00:00+00:00,Equity(16 [2885]),0.003976,False
2022-02-07 00:00:00+00:00,Equity(17 [2890]),0.003049,False


<span id="if_else"></span>

## <font color=#57a892>if_else</font>(if_true, if_false)

在 `if_else` 函數前會先給定一個條件，若符合條件則回傳 *if_true* 的值，不符合條件則回傳 *if_false* 的值。

>### Parameters:
>* if_true _( zipline.pipeline.term.ComputableTerm )_ - 符合條件回傳的值。
>* if_false _( zipline.pipeline.term.ComputableTerm )_ - 不符合條件回傳的值。

[Go to Menu](#menu)

In [15]:
columns = ['Industry', 'Sub_Industry']

fields = ' '.join(columns)
os.environ['fields'] = fields

!zipline ingest -b fundamentals

Currently used TEJ API key call quota 138/100000 (0.14%)
Currently used TEJ API key data quota 992682/10000000 (9.93%)


[2024-07-09 05:28:16.975164] INFO: zipline.data.bundles.core: Ingesting fundamentals.
[2024-07-09 05:28:34.900705] INFO: zipline.data.bundles.core: Ingest fundamentals successfully.


### Examples - if_else

```python
ind = TQAltDataSet.Sub_Industry.latest.eq('').if_else(TQAltDataSet.Industry.latest, TQAltDataSet.Sub_Industry.latest)
```
此範例的條件為子產業別 ( Sub_Industry ) 是否沒有值，若符合條件則回傳主產業別 ( Industry )，否則回傳子產業別 ( Sub_Industry )。

In [16]:
def make_pipeline():

    Industry = TQAltDataSet.Industry.latest
    Sub_Industry = TQAltDataSet.Sub_Industry.latest
    check = TQAltDataSet.Sub_Industry.latest.eq('')
    ind = TQAltDataSet.Sub_Industry.latest.eq('').if_else(TQAltDataSet.Industry.latest, TQAltDataSet.Sub_Industry.latest)
    
    return Pipeline(
        columns={
            '主產業別': Industry,
            '子產業別': Sub_Industry,
            '是否符合條件': check,
            '回傳產業': ind
        }
    )

run_pipeline(make_pipeline(), end, end).head(10)

Unnamed: 0,Unnamed: 1,主產業別,子產業別,是否符合條件,回傳產業
2022-02-07 00:00:00+00:00,Equity(0 [1101]),M1100 水泥工業,,True,M1100 水泥工業
2022-02-07 00:00:00+00:00,Equity(1 [1301]),M1300 塑膠工業,,True,M1300 塑膠工業
2022-02-07 00:00:00+00:00,Equity(2 [1303]),M1300 塑膠工業,,True,M1300 塑膠工業
2022-02-07 00:00:00+00:00,Equity(3 [1802]),M1800 玻璃陶瓷,,True,M1800 玻璃陶瓷
2022-02-07 00:00:00+00:00,Equity(4 [2002]),M2000 鋼鐵工業,,True,M2000 鋼鐵工業
2022-02-07 00:00:00+00:00,Equity(5 [2101]),M2100 橡膠工業,,True,M2100 橡膠工業
2022-02-07 00:00:00+00:00,Equity(6 [2303]),M2300 電子工業,M2324 半導體業,False,M2324 半導體業
2022-02-07 00:00:00+00:00,Equity(7 [2317]),M2300 電子工業,M2331 其他電子業,False,M2331 其他電子業
2022-02-07 00:00:00+00:00,Equity(8 [2330]),M2300 電子工業,M2324 半導體業,False,M2324 半導體業
2022-02-07 00:00:00+00:00,Equity(9 [2337]),M2300 電子工業,M2324 半導體業,False,M2324 半導體業
