이 노트북의 코드에 대한 설명은 [Pipeline에서 캐싱을 사용하기](https://tensorflow.blog/2017/12/08/pipeline%ec%97%90%ec%84%9c-%ec%ba%90%ec%8b%b1%ec%9d%84-%ec%82%ac%ec%9a%a9%ed%95%98%ea%b8%b0/) 글을 참고하세요.

In [1]:
# 보스턴 주택 데이터셋이 1.0 버전에 deprecated 되었고 1.2 버전에서 삭제됩니다.
# 경고 메시지를 피하기 위해 다음 코드를 추가합니다.
import warnings
warnings.filterwarnings("ignore", category=FutureWarning)

from sklearn.datasets import load_boston
from sklearn.preprocessing import StandardScaler, PolynomialFeatures
from sklearn.linear_model import Ridge
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.pipeline import make_pipeline

boston = load_boston()
X_train, X_test, y_train, y_test = train_test_split(boston.data, boston.target, random_state=0)
param_grid = {'polynomialfeatures__degree': [1, 2, 3, 4, 5],
                      'ridge__alpha': [0.001, 0.01, 0.1, 1, 10, 100]}

In [2]:
pipe = make_pipeline(StandardScaler(), PolynomialFeatures(), Ridge())
grid = GridSearchCV(pipe, param_grid=param_grid, cv=5, n_jobs=-1)

In [3]:
%timeit grid.fit(X_train, y_train)

696 ms ± 19.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [4]:
from tempfile import mkdtemp
from shutil import rmtree

In [5]:
cache_dir = mkdtemp()
pipe2 = make_pipeline(StandardScaler(), PolynomialFeatures(), Ridge(), memory=cache_dir)
grid2 = GridSearchCV(pipe2, param_grid=param_grid, cv=5, n_jobs=-1)

In [6]:
%timeit grid2.fit(X_train, y_train)

714 ms ± 9.09 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [7]:
rmtree(cache_dir)