{ "cells": [ { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "# MLJ Example\n", "\n", "The closest equivalent I've seen to `scikit-learn` for Julia is `MLJ`. `MLJ` provides a unified API wrapper around a variety of ML libraries available in Julia. \n", "\n", "Here we'll walk through basic `MLJ` examples with some side notes on the quirks of this library.\n", "\n", "1. Selecting and loading a *'model'* in `MLJ`, where 'model' is a container of hyperparameters\n", "2. Create a *'machine'* which binds the *model* to *data* and checks data types\n", "3. `fit!` to fit the data\n", "4. `evaluate!` or `evaluate` the model performance using a *resampling* strategy against metrics (*measure*)" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "\u001b[33m\u001b[1m┌ \u001b[22m\u001b[39m\u001b[33m\u001b[1mWarning: \u001b[22m\u001b[39mbackend `GR` is not installed.\n", "\u001b[33m\u001b[1m└ \u001b[22m\u001b[39m\u001b[90m@ Plots ~/.julia/packages/Plots/UTR4H/src/backends.jl:37\u001b[39m\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\u001b[36m\u001b[1m[ \u001b[22m\u001b[39m\u001b[36m\u001b[1mInfo: \u001b[22m\u001b[39mGR\n" ] } ], "source": [ "# import Pkg; Pkg.add(\"MLJ\"); Pkg.add(\"MLJDecisionTreeInterface\")\n", "using MLJ, DataFramesMeta, Plots" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "NamedTuple{(:sepal_length, :sepal_width, :petal_length, :petal_width, :target), Tuple{Vector{Float64}, Vector{Float64}, Vector{Float64}, Vector{Float64}, CategoricalArrays.CategoricalVector{String, UInt32, String, CategoricalArrays.CategoricalValue{String, UInt32}, Union{}}}}" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# MLJ has some datasets builtin\n", "iris = load_iris();\n", "typeof(iris)" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "┌──────────────┬─────────────┬──────────────┬─────────────┬──────────────────────────────────┐\n", "│\u001b[1m sepal_length \u001b[0m│\u001b[1m sepal_width \u001b[0m│\u001b[1m petal_length \u001b[0m│\u001b[1m petal_width \u001b[0m│\u001b[1m target \u001b[0m│\n", "│\u001b[90m Float64 \u001b[0m│\u001b[90m Float64 \u001b[0m│\u001b[90m Float64 \u001b[0m│\u001b[90m Float64 \u001b[0m│\u001b[90m CategoricalValue{String, UInt32} \u001b[0m│\n", "│\u001b[90m Continuous \u001b[0m│\u001b[90m Continuous \u001b[0m│\u001b[90m Continuous \u001b[0m│\u001b[90m Continuous \u001b[0m│\u001b[90m Multiclass{3} \u001b[0m│\n", "├──────────────┼─────────────┼──────────────┼─────────────┼──────────────────────────────────┤\n", "│ 5.1 │ 3.5 │ 1.4 │ 0.2 │ setosa │\n", "│ 4.9 │ 3.0 │ 1.4 │ 0.2 │ setosa │\n", "│ 4.7 │ 3.2 │ 1.3 │ 0.2 │ setosa │\n", "└──────────────┴─────────────┴──────────────┴─────────────┴──────────────────────────────────┘\n" ] } ], "source": [ "# but it comes as a special NamedTuple with 'scitypes' inside of it, so let's make it into a dataframe\n", "iris = DataFrame(iris)\n", "# and we can `pretty` print the dataframe\n", "first(iris,3) |> pretty" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "# There's a univariate time series one here too\n", "sunspots = MLJ.load_sunspots() |> DataFrame;" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "Let's run an initial model on this DataFrame. Since `MLJ` is a wrapper on top of a bunch of other libraries, we need a way to explore the catalog of models that are available and load specific models so that we can put them to work. \n", "\n", "* See the [Model Search](https://alan-turing-institute.github.io/MLJ.jl/dev/model_search/) section in the `MLJ` docs for more details\n", "* `models(\"KMeans\")`: Lists models in the catalog that match the name \"KMeans\"\n", "* `info(\"KMeans\", pkg=\"Clustering\")`: Show the parameters for the \"KMeans\" model from the \"Clustering\" package\n", "* `doc(\"KMeans\", pkg=\"Clustering\")`: Show the docs for the \"KMeans\" model from the \"Clustering\" package" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "┌─────────┬──────────────┬───────────────────┬──────────────────────────────────┬───────────────┐\n", "│\u001b[1m name \u001b[0m│\u001b[1m package_name \u001b[0m│\u001b[1m human_name \u001b[0m│\u001b[1m hyperparameters \u001b[0m│\u001b[1m is_pure_julia \u001b[0m│\n", "│\u001b[90m String \u001b[0m│\u001b[90m String \u001b[0m│\u001b[90m String \u001b[0m│\u001b[90m Tuple{Symbol, Symbol, Symbol} \u001b[0m│\u001b[90m Bool \u001b[0m│\n", "│\u001b[90m Textual \u001b[0m│\u001b[90m Textual \u001b[0m│\u001b[90m Textual \u001b[0m│\u001b[90m Tuple{Unknown, Unknown, Unknown} \u001b[0m│\u001b[90m Count \u001b[0m│\n", "├─────────┼──────────────┼───────────────────┼──────────────────────────────────┼───────────────┤\n", "│ KMeans │ Clustering │ K-means clusterer │ (:k, :metric, :init) │ true │\n", "└─────────┴──────────────┴───────────────────┴──────────────────────────────────┴───────────────┘\n" ] } ], "source": [ "# Show an example as a dataframe\n", "example = info(\"KMeans\", pkg=\"Clustering\") |> pairs |> collect |> DataFrame\n", "example[1,[\"name\", \"package_name\", \"human_name\", \"hyperparameters\", \"is_pure_julia\"]] |> DataFrame |> pretty" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now here's an example of running the KMeans model with $k=3$ to split the `iris` data into 3 clusters:" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "┌ Info: Training machine(KMeans(k = 3, …), …).\n", "└ @ MLJBase /Users/nelsontang/.julia/packages/MLJBase/9Nkjh/src/machines.jl:492\n" ] }, { "data": { "text/plain": [ "8-element Vector{Tuple{CategoricalArrays.CategoricalValue{Int64, UInt32}, CategoricalArrays.CategoricalValue{String, UInt32}}}:\n", " (1, \"setosa\")\n", " (1, \"setosa\")\n", " (1, \"setosa\")\n", " (1, \"setosa\")\n", " (1, \"setosa\")\n", " (1, \"setosa\")\n", " (1, \"setosa\")\n", " (1, \"setosa\")" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "KMeans = @load KMeans pkg=Clustering verbosity=0\n", "# Split the dataframe into y and X, where y is the column `target` and X is everything else\n", "y, X = unpack(iris, ==(:target))\n", "model = KMeans(k=3)\n", "mach = machine(model, X) |> fit!\n", "\n", "yhat = predict(mach, X)\n", "\n", "# Check cluster assignments\n", "compare = zip(yhat, y) |> collect\n", "compare[1:8] # clusters align with classes" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's review what happened in the cell above:\n", "\n", "* `@load`: loads the specified model from the catalog and brings it in to the environment\n", "* `unpack(iris, ==(:target))`: horizontally split the `iris` data into X and y, where y are all columns called \"target\" (here we use the symbol `:target`)\n", "* `KMeans(k=3)`: Instantiate the KMeans model (from the `@load` on line 1) and set $k=3$\n", "* `machine(model, X) |> fit!`: We create a 'machine' that combines a model with training data and then `fit!` trains the model on the data\n", " * The pipe operator `|>` is similar to method chaining in python or `dplyr`'s pipe in R\n", "* `predict(mach, X)`: Provides the predicted cluster assignments for each row of $X$" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This also created a fitted machine that we can call things like `fitted_params`, which in the case of `KMeans` just has a single parameter `centers` and each row corresponds to a column from the `iris` training data (sepal length, sepal width, etc)" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "4×3 Matrix{Float64}:\n", " 5.006 6.85385 5.88361\n", " 3.418 3.07692 2.74098\n", " 1.464 5.71538 4.38852\n", " 0.244 2.05385 1.43443" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "fitted_params(mach).centers" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "and we can generate a report on the fitted machine with `report` to get some of the key outputs. This includes:\n", "\n", "* `assignments`: cluster assignments\n", "* `cluster_labels`: cluster labels" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(assignments = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1 … 2, 2, 3, 2, 2, 2, 3, 2, 2, 3],\n", " cluster_labels = CategoricalArrays.CategoricalValue{Int64, UInt32}[1, 2, 3],)" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "report(mach)" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "# A more involved example with the `boston` dataset\n", "\n", "Let's build on the clustering example with an example from supervised learning. Below we'll take the popular boston housing dataset so we can show how to use `MLJ` to do things like a train/test split and evaluating against accuracy metrics." ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
3×14 DataFrame
RowCrimZnIndusChasNOxRmAgeDisRadTaxPTRatioBlackLStatMedV
Float64Float64Float64Int64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64
10.0063218.02.3100.5386.57565.24.091.0296.015.3396.94.9824.0
20.027310.07.0700.4696.42178.94.96712.0242.017.8396.99.1421.6
30.027290.07.0700.4697.18561.14.96712.0242.017.8392.834.0334.7
" ], "text/latex": [ "\\begin{tabular}{r|ccccccccccc}\n", "\t& Crim & Zn & Indus & Chas & NOx & Rm & Age & Dis & Rad & Tax & \\\\\n", "\t\\hline\n", "\t& Float64 & Float64 & Float64 & Int64 & Float64 & Float64 & Float64 & Float64 & Float64 & Float64 & \\\\\n", "\t\\hline\n", "\t1 & 0.00632 & 18.0 & 2.31 & 0 & 0.538 & 6.575 & 65.2 & 4.09 & 1.0 & 296.0 & $\\dots$ \\\\\n", "\t2 & 0.02731 & 0.0 & 7.07 & 0 & 0.469 & 6.421 & 78.9 & 4.9671 & 2.0 & 242.0 & $\\dots$ \\\\\n", "\t3 & 0.02729 & 0.0 & 7.07 & 0 & 0.469 & 7.185 & 61.1 & 4.9671 & 2.0 & 242.0 & $\\dots$ \\\\\n", "\\end{tabular}\n" ], "text/plain": [ "\u001b[1m3×14 DataFrame\u001b[0m\n", "\u001b[1m Row \u001b[0m│\u001b[1m Crim \u001b[0m\u001b[1m Zn \u001b[0m\u001b[1m Indus \u001b[0m\u001b[1m Chas \u001b[0m\u001b[1m NOx \u001b[0m\u001b[1m Rm \u001b[0m\u001b[1m Age \u001b[0m\u001b[1m Dis \u001b[0m\u001b[1m R\u001b[0m ⋯\n", " │\u001b[90m Float64 \u001b[0m\u001b[90m Float64 \u001b[0m\u001b[90m Float64 \u001b[0m\u001b[90m Int64 \u001b[0m\u001b[90m Float64 \u001b[0m\u001b[90m Float64 \u001b[0m\u001b[90m Float64 \u001b[0m\u001b[90m Float64 \u001b[0m\u001b[90m F\u001b[0m ⋯\n", "─────┼──────────────────────────────────────────────────────────────────────────\n", " 1 │ 0.00632 18.0 2.31 0 0.538 6.575 65.2 4.09 ⋯\n", " 2 │ 0.02731 0.0 7.07 0 0.469 6.421 78.9 4.9671\n", " 3 │ 0.02729 0.0 7.07 0 0.469 7.185 61.1 4.9671\n", "\u001b[36m 6 columns omitted\u001b[0m" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# Load in boston housing market data\n", "boston = load_boston() |> DataFrame\n", "first(boston,3)" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "RandomForestRegressor(\n", " max_depth = -1, \n", " min_samples_leaf = 1, \n", " min_samples_split = 2, \n", " min_purity_increase = 0.0, \n", " n_subfeatures = -1, \n", " n_trees = 10, \n", " sampling_fraction = 0.7, \n", " feature_importance = :impurity, \n", " rng = Random._GLOBAL_RNG())" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# Train Test Split with `partition`\n", "train, test = partition(boston, 0.8, rng=42);\n", "\n", "# As before, unpack to horizontally split into y and X\n", "y_train, X_train = unpack(train, ==(:MedV))\n", "y_test, X_test = unpack(test, ==(:MedV));\n", "\n", "# Load in our model and instantiate it\n", "Tree = @load RandomForestRegressor pkg=DecisionTree verbosity=0;\n", "tree = Tree()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Next we build our machine with the training data, and fit it:" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "┌ Info: Training machine(RandomForestRegressor(max_depth = -1, …), …).\n", "└ @ MLJBase /Users/nelsontang/.julia/packages/MLJBase/9Nkjh/src/machines.jl:492\n" ] }, { "data": { "text/plain": [ "trained Machine; caches model-specific representations of data\n", " model: RandomForestRegressor(max_depth = -1, …)\n", " args: \n", " 1:\tSource @128 ⏎ Table{Union{AbstractVector{Continuous}, AbstractVector{Count}}}\n", " 2:\tSource @267 ⏎ AbstractVector{Continuous}\n" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "mach = machine(tree, X_train, y_train) |> fit!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Model Evaluation\n", "\n", "Now we want to check our accuracy in terms of RMSE - instead of doing the old `predict` one time, let's use `evaluate` to see how this model performs with 5-fold cross validation:" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "\r\u001b[33mEvaluating over 5 folds: 40%[==========> ] ETA: 0:00:01\u001b[39m\u001b[K" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\r\u001b[33mEvaluating over 5 folds: 100%[=========================] Time: 0:00:01\u001b[39m\u001b[K\n" ] }, { "data": { "text/plain": [ "PerformanceEvaluation object with these fields:\n", " measure, operation, measurement, per_fold,\n", " per_observation, fitted_params_per_fold,\n", " report_per_fold, train_test_rows\n", "Extract:\n", "┌────────────────────────┬───────────┬─────────────┬─────────┬──────────────────\n", "│\u001b[22m measure \u001b[0m│\u001b[22m operation \u001b[0m│\u001b[22m measurement \u001b[0m│\u001b[22m 1.96*SE \u001b[0m│\u001b[22m per_fold \u001b[0m ⋯\n", "├────────────────────────┼───────────┼─────────────┼─────────┼──────────────────\n", "│ RootMeanSquaredError() │ predict │ 3.48 │ 0.521 │ [4.22, 3.41, 3. ⋯\n", "└────────────────────────┴───────────┴─────────────┴─────────┴──────────────────\n", "\u001b[36m 1 column omitted\u001b[0m\n" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "results = evaluate!(mach, resampling=CV(nfolds=5, rng=42), measure=rms, operation=predict)" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(features = [:Crim, :Zn, :Indus, :Chas, :NOx, :Rm, :Age, :Dis, :Rad, :Tax, :PTRatio, :Black, :LStat],)" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "report(mach)" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "This is really cool - it lets you specify the evaluation strategy quite nicely. \n", "\n", "* See [measure list](https://alan-turing-institute.github.io/MLJ.jl/stable/performance_measures/#List-of-measures) for a list of possible performance measures" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "Next, we check RMSE against the test set with `predict`:" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "3.940248270812838" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "y_pred = predict(mach, X_test)\n", "# Check RMSE accuracy:\n", "rms(y_pred, y_test)" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "## Dealing with `scitype`s\n", "\n", "The above example works as a quickstart but as soon as we jump into linear regression, we run into an issue. Let's say I load in Linear Regression:" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "LinearRegressor(\n", " fit_intercept = true, \n", " solver = nothing)" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "LinearRegressor = @load LinearRegressor pkg=MLJLinearModels verbosity=0\n", "ols = LinearRegressor()" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "Now, if we run `ols_mach = machine(ols, X_train, y_train) |> fit!` again, we'll run into an error:\n", "\n", "```julia\n", "ols_mach = machine(ols, X_train, y_train) |> fit!\n", "\n", ">┌ Warning: The number and/or types of data arguments do not match what the specified model\n", ">│ supports. Suppress this type check by specifying `scitype_check_level=0`.\n", "..\n", ">│ In general, data in `machine(model, data...)` is expected to satisfy\n", ">│ \n", ">│ scitype(data) <: MLJ.fit_data_scitype(model)\n", ">│ \n", ">│ In the present case:\n", ">│ \n", ">│ scitype(data) = Tuple{Table{Union{AbstractVector{Continuous}, AbstractVector{Count}}}, AbstractVector{Continuous}}\n", ">│ \n", ">│ fit_data_scitype(model) = Tuple{Table{<:AbstractVector{<:Continuous}}, AbstractVector{Continuous}}\n", "```\n", "\n", "The important part of this error is right after \"In the present case\" and it's the stuff in the `{}` brackets which specify the `scitype`.\n", "\n", "* `scitype(data) = ...` is telling us that we have `Continuous` and `Count` data\n", "* `fit_data_scitype(model) = ...` tells us that this model needs `Continuous` data only.\n", "\n", "In other words, the model needs `Continuous` data but there's `Count` data in there right now. We need to use `schema` to figure out where the problem is and `coerce` it into the right data type.\n", "\n", "### Check data `scitype` with `schema`\n", "\n", "You can inspect the data `scitype` with the `schema` command:\n" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "┌─────────┬────────────┬─────────┐\n", "│\u001b[22m names \u001b[0m│\u001b[22m scitypes \u001b[0m│\u001b[22m types \u001b[0m│\n", "├─────────┼────────────┼─────────┤\n", "│ Crim │ Continuous │ Float64 │\n", "│ Zn │ Continuous │ Float64 │\n", "│ Indus │ Continuous │ Float64 │\n", "│ Chas │ Count │ Int64 │\n", "│ NOx │ Continuous │ Float64 │\n", "│ Rm │ Continuous │ Float64 │\n", "│ Age │ Continuous │ Float64 │\n", "│ Dis │ Continuous │ Float64 │\n", "│ Rad │ Continuous │ Float64 │\n", "│ Tax │ Continuous │ Float64 │\n", "│ PTRatio │ Continuous │ Float64 │\n", "│ Black │ Continuous │ Float64 │\n", "│ LStat │ Continuous │ Float64 │\n", "└─────────┴────────────┴─────────┘\n" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "schema(X_train)" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "The problem here is `Chas`, which is dummy variable (0 or 1) indicating if the house is near the Charles River. According to our `schema` it's a `Count` (which is what `MLJ` does by default on integer data), and our model only takes `Continuous` so we'll need to `coerce` it to be `Continous`. \n", "\n", "* Reference: [MLJ Docs: scitypes and coercion](https://alan-turing-institute.github.io/MLJ.jl/dev/mlj_cheatsheet/#Scitypes-and-coercion)\n", "* Reference: [Data Science Tutorials in Julia: Linear Regression](https://juliaai.github.io/DataScienceTutorials.jl/isl/lab-3/)" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "┌─────────┬────────────┬─────────┐\n", "│\u001b[22m names \u001b[0m│\u001b[22m scitypes \u001b[0m│\u001b[22m types \u001b[0m│\n", "├─────────┼────────────┼─────────┤\n", "│ Crim │ Continuous │ Float64 │\n", "│ Zn │ Continuous │ Float64 │\n", "│ Indus │ Continuous │ Float64 │\n", "│ Chas │ Continuous │ Float64 │\n", "│ NOx │ Continuous │ Float64 │\n", "│ Rm │ Continuous │ Float64 │\n", "│ Age │ Continuous │ Float64 │\n", "│ Dis │ Continuous │ Float64 │\n", "│ Rad │ Continuous │ Float64 │\n", "│ Tax │ Continuous │ Float64 │\n", "│ PTRatio │ Continuous │ Float64 │\n", "│ Black │ Continuous │ Float64 │\n", "│ LStat │ Continuous │ Float64 │\n", "└─────────┴────────────┴─────────┘\n" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "X_train = coerce(X_train, :Chas=>Continuous)\n", "X_test = coerce(X_test, :Chas=>Continuous)\n", "schema(X_train)" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "Now we can run our linear regression model:" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "┌ Info: Training machine(LinearRegressor(fit_intercept = true, …), …).\n", "└ @ MLJBase /Users/nelsontang/.julia/packages/MLJBase/9Nkjh/src/machines.jl:492\n", "┌ Info: Solver: MLJLinearModels.Analytical\n", "│ iterative: Bool false\n", "│ max_inner: Int64 200\n", "└ @ MLJLinearModels /Users/nelsontang/.julia/packages/MLJLinearModels/Goxow/src/mlj/interface.jl:39\n" ] }, { "data": { "text/plain": [ "(coefs = [:Crim => -0.11324911498843833, :Zn => 0.04575441928605188, :Indus => 0.01945765891619942, :Chas => 2.0407623604133356, :NOx => -19.942391951773786, :Rm => 3.6700229309808057, :Age => -0.0004734356464829028, :Dis => -1.6863423449201385, :Rad => 0.32811707418244485, :Tax => -0.012908158787600524, :PTRatio => -1.0233533411398805, :Black => 0.008312853766953626, :LStat => -0.5495075409339456],\n", " intercept = 41.46800130672328,)" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "ols_mach = machine(ols, X_train, y_train) |> fit!\n", "# See fitted parameters\n", "fitted_params(ols_mach)" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "\r\u001b[33mEvaluating over 5 folds: 40%[==========> ] ETA: 0:00:02\u001b[39m\u001b[K" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\r\u001b[33mEvaluating over 5 folds: 100%[=========================] Time: 0:00:01\u001b[39m\u001b[K\n" ] }, { "data": { "text/plain": [ "PerformanceEvaluation object with these fields:\n", " measure, operation, measurement, per_fold,\n", " per_observation, fitted_params_per_fold,\n", " report_per_fold, train_test_rows\n", "Extract:\n", "┌────────────────────────┬───────────┬─────────────┬─────────┬──────────────────\n", "│\u001b[22m measure \u001b[0m│\u001b[22m operation \u001b[0m│\u001b[22m measurement \u001b[0m│\u001b[22m 1.96*SE \u001b[0m│\u001b[22m per_fold \u001b[0m ⋯\n", "├────────────────────────┼───────────┼─────────────┼─────────┼──────────────────\n", "│ RootMeanSquaredError() │ predict │ 5.01 │ 0.53 │ [5.17, 5.58, 4. ⋯\n", "└────────────────────────┴───────────┴─────────────┴─────────┴──────────────────\n", "\u001b[36m 1 column omitted\u001b[0m\n" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# Check RMSE\n", "evaluate!(ols_mach, resampling=CV(nfolds=5, rng=42), measure=rms, operation=predict)" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "## Check against the test set with `predict`\n", "\n", "Use `predict` to get predictions out:" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "OLS Test RMSE: 4.755466266615066" ] } ], "source": [ "y_pred = predict(ols_mach, X_test);\n", "# Calculate RMSE\n", "ols_test_error = rms(y_pred, y_test)\n", "print(\"OLS Test RMSE: $ols_test_error\")" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "## Standardizing Data\n", "\n", "Now, what if we also wanted to standardize the data so we can run a regularized regression like Lasso or ElasticNet? `MLJ` contains some data transformation helpers that must be loaded as well." ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "┌ Info: Training machine(Standardizer(features = Symbol[], …), …).\n", "└ @ MLJBase /Users/nelsontang/.julia/packages/MLJBase/9Nkjh/src/machines.jl:492\n" ] }, { "data": { "text/plain": [ "trained Machine; caches model-specific representations of data\n", " model: Standardizer(features = Symbol[], …)\n", " args: \n", " 1:\tSource @751 ⏎ Table{AbstractVector{Continuous}}\n" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "Standardizer = @load Standardizer pkg=MLJModels verbosity=0\n", "# Just like the other models, we instantiate it and bind data to it with a machine\n", "standardizer = Standardizer()\n", "standardizer_machine = machine(standardizer, X_train) |> fit!" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "Next use `transform` with this fitted machine to get data out" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
5×13 DataFrame
RowCrimZnIndusChasNOxRmAgeDisRadTaxPTRatioBlackLStat
Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64
1-0.107552-0.4806051.01981-0.292539-0.199694-0.73765-0.9951540.1432081.624771.514590.7830510.399715-0.316915
2-0.0264694-0.4806051.01981-0.2925390.2168480.2131940.236529-0.4395491.624771.514590.7830510.4044490.232898
37.5345-0.4806051.01981-0.2925391.07597-0.4629621.12391-0.9699861.624771.514590.783051-3.560361.10646
4-0.236124-0.4806051.57981-0.2925390.598678-1.784281.12391-1.1461-0.6525510.1717361.250870.4415813.03081
5-0.39497-0.480605-0.0573407-0.292539-1.23237-0.457327-1.804440.719759-0.652551-0.6023130.315230.231622-0.392271
" ], "text/latex": [ "\\begin{tabular}{r|ccccccccc}\n", "\t& Crim & Zn & Indus & Chas & NOx & Rm & Age & Dis & \\\\\n", "\t\\hline\n", "\t& Float64 & Float64 & Float64 & Float64 & Float64 & Float64 & Float64 & Float64 & \\\\\n", "\t\\hline\n", "\t1 & -0.107552 & -0.480605 & 1.01981 & -0.292539 & -0.199694 & -0.73765 & -0.995154 & 0.143208 & $\\dots$ \\\\\n", "\t2 & -0.0264694 & -0.480605 & 1.01981 & -0.292539 & 0.216848 & 0.213194 & 0.236529 & -0.439549 & $\\dots$ \\\\\n", "\t3 & 7.5345 & -0.480605 & 1.01981 & -0.292539 & 1.07597 & -0.462962 & 1.12391 & -0.969986 & $\\dots$ \\\\\n", "\t4 & -0.236124 & -0.480605 & 1.57981 & -0.292539 & 0.598678 & -1.78428 & 1.12391 & -1.1461 & $\\dots$ \\\\\n", "\t5 & -0.39497 & -0.480605 & -0.0573407 & -0.292539 & -1.23237 & -0.457327 & -1.80444 & 0.719759 & $\\dots$ \\\\\n", "\\end{tabular}\n" ], "text/plain": [ "\u001b[1m5×13 DataFrame\u001b[0m\n", "\u001b[1m Row \u001b[0m│\u001b[1m Crim \u001b[0m\u001b[1m Zn \u001b[0m\u001b[1m Indus \u001b[0m\u001b[1m Chas \u001b[0m\u001b[1m NOx \u001b[0m\u001b[1m Rm \u001b[0m\u001b[1m Age\u001b[0m ⋯\n", " │\u001b[90m Float64 \u001b[0m\u001b[90m Float64 \u001b[0m\u001b[90m Float64 \u001b[0m\u001b[90m Float64 \u001b[0m\u001b[90m Float64 \u001b[0m\u001b[90m Float64 \u001b[0m\u001b[90m Flo\u001b[0m ⋯\n", "─────┼──────────────────────────────────────────────────────────────────────────\n", " 1 │ -0.107552 -0.480605 1.01981 -0.292539 -0.199694 -0.73765 -0. ⋯\n", " 2 │ -0.0264694 -0.480605 1.01981 -0.292539 0.216848 0.213194 0.\n", " 3 │ 7.5345 -0.480605 1.01981 -0.292539 1.07597 -0.462962 1.\n", " 4 │ -0.236124 -0.480605 1.57981 -0.292539 0.598678 -1.78428 1.\n", " 5 │ -0.39497 -0.480605 -0.0573407 -0.292539 -1.23237 -0.457327 -1. ⋯\n", "\u001b[36m 7 columns omitted\u001b[0m" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "X_train_standardized = MLJ.transform(standardizer_machine, X_train)\n", "X_test_standardized = MLJ.transform(standardizer_machine, X_test)\n", "first(X_train_standardized, 5)" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "┌ Info: Training machine(LassoRegressor(lambda = 1.0, …), …).\n", "└ @ MLJBase /Users/nelsontang/.julia/packages/MLJBase/9Nkjh/src/machines.jl:492\n", "┌ Info: Solver: MLJLinearModels.ProxGrad\n", "│ accel: Bool true\n", "│ max_iter: Int64 1000\n", "│ tol: Float64 0.0001\n", "│ max_inner: Int64 100\n", "│ beta: Float64 0.8\n", "└ @ MLJLinearModels /Users/nelsontang/.julia/packages/MLJLinearModels/Goxow/src/mlj/interface.jl:39\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\r\u001b[33mEvaluating over 5 folds: 40%[==========> ] ETA: 0:00:00\u001b[39m\u001b[K" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\r\u001b[33mEvaluating over 5 folds: 100%[=========================] Time: 0:00:00\u001b[39m\u001b[K\n" ] }, { "data": { "text/plain": [ "PerformanceEvaluation object with these fields:\n", " measure, operation, measurement, per_fold,\n", " per_observation, fitted_params_per_fold,\n", " report_per_fold, train_test_rows\n", "Extract:\n", "┌────────────────────────┬───────────┬─────────────┬─────────┬──────────────────\n", "│\u001b[22m measure \u001b[0m│\u001b[22m operation \u001b[0m│\u001b[22m measurement \u001b[0m│\u001b[22m 1.96*SE \u001b[0m│\u001b[22m per_fold \u001b[0m ⋯\n", "├────────────────────────┼───────────┼─────────────┼─────────┼──────────────────\n", "│ RootMeanSquaredError() │ predict │ 5.58 │ 0.528 │ [6.06, 6.05, 5. ⋯\n", "└────────────────────────┴───────────┴─────────────┴─────────┴──────────────────\n", "\u001b[36m 1 column omitted\u001b[0m\n" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "LassoRegressor = @load LassoRegressor pkg=MLJLinearModels verbosity=0\n", "\n", "lasso = LassoRegressor() #default λ=1\n", "lasso_mach = machine(lasso, X_train_standardized, y_train) |> fit!\n", "evaluate!(lasso_mach, resampling=CV(nfolds=5, rng=42), measure=rms, operation=predict)" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "As with simple linear regression we can inspect the coefficients with `fitted_params`:" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(coefs = [:Crim => -0.0, :Zn => 0.0, :Indus => -0.0, :Chas => 0.0, :NOx => -0.0, :Rm => 2.4064540627537454, :Age => -0.0, :Dis => -0.0, :Rad => -0.0, :Tax => -0.0, :PTRatio => -1.6996069999970236, :Black => 0.23833314394555966, :LStat => -3.582731114009766],\n", " intercept = 22.287274937530785,)" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "fitted_params(lasso_mach)" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "Looks like with the defaults it was able to do some basic feature engineering, but can we find a more optimal set of parameters?" ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [], "source": [ "# XGBoostRegressor = @load XGBoostRegressor pkg=XGBoost verbosity=0\n", "# xgb = XGBoostRegressor()\n", "# xgb_mach = machine(xgb, X_train_standardized, y_train) |> fit!" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "## Tuning\n", "\n", "Next, what if we want to try a *bunch of models* and use a grid search to find hyperparameters, similar to `GridSearchCV` in `scikit-learn`?\n", "\n", "In `MLJ`, we create a *Tuning Model* and specify *ranges* over which to search. \n", "\n", "With our `LassoRegressor` model above, we can take a look at the default parameters:" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "LassoRegressor(\n", " lambda = 1.0, \n", " fit_intercept = true, \n", " penalize_intercept = false, \n", " scale_penalty_with_samples = true, \n", " solver = nothing)" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "lasso" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "The default regularization strength lambda ($\\lambda$) is set to 1.0 - I want to try a range of different values of $\\lambda$ and see how it affects cross-validation RMSE." ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "┌ Info: Training machine(DeterministicTunedModel(model = LassoRegressor(lambda = 1.0, …), …), …).\n", "└ @ MLJBase /Users/nelsontang/.julia/packages/MLJBase/9Nkjh/src/machines.jl:492\n", "┌ Info: Attempting to evaluate 10 models.\n", "└ @ MLJTuning /Users/nelsontang/.julia/packages/MLJTuning/ZFg3R/src/tuned_models.jl:727\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\r\u001b[33mEvaluating over 10 metamodels: 0%[> ] ETA: N/A\u001b[39m\u001b[K" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\r\u001b[33mEvaluating over 10 metamodels: 10%[==> ] ETA: 0:00:02\u001b[39m\u001b[K" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\r\u001b[33mEvaluating over 10 metamodels: 20%[=====> ] ETA: 0:00:02\u001b[39m\u001b[K" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\r\u001b[33mEvaluating over 10 metamodels: 30%[=======> ] ETA: 0:00:01\u001b[39m\u001b[K" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\r\u001b[33mEvaluating over 10 metamodels: 40%[==========> ] ETA: 0:00:01\u001b[39m\u001b[K" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\r\u001b[33mEvaluating over 10 metamodels: 50%[============> ] ETA: 0:00:01\u001b[39m\u001b[K" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\r\u001b[33mEvaluating over 10 metamodels: 60%[===============> ] ETA: 0:00:00\u001b[39m\u001b[K" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\r\u001b[33mEvaluating over 10 metamodels: 70%[=================> ] ETA: 0:00:00\u001b[39m\u001b[K" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\r\u001b[33mEvaluating over 10 metamodels: 80%[====================> ] ETA: 0:00:00\u001b[39m\u001b[K" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\r\u001b[33mEvaluating over 10 metamodels: 90%[======================> ] ETA: 0:00:00\u001b[39m\u001b[K" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\r\u001b[33mEvaluating over 10 metamodels: 100%[=========================] Time: 0:00:00\u001b[39m\u001b[K\n" ] }, { "data": { "text/plain": [ "trained Machine; does not cache data\n", " model: DeterministicTunedModel(model = LassoRegressor(lambda = 1.0, …), …)\n", " args: \n", " 1:\tSource @198 ⏎ Table{AbstractVector{Continuous}}\n", " 2:\tSource @442 ⏎ AbstractVector{Continuous}\n" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "parameter_range = range(lasso, :lambda, lower=0.001, upper=10.0, scale=:log)\n", "tuned_lasso = TunedModel(lasso, \n", " resampling=CV(nfolds=5, rng=42), \n", " tuning=Grid(resolution=10), # Search over 10 values between 'lower' and 'upper' in `parameter_range`\n", " range=parameter_range, \n", " measure=rms)\n", "# As before we then need to take this model and bind it to data as a machine\n", "tuned_lasso_machine = machine(tuned_lasso, X_train_standardized, y_train) |> fit!" ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(best_model = LassoRegressor(lambda = 0.02154434690031884, …),\n", " best_fitted_params = (coefs = [:Crim => -1.00621269776367, :Zn => 0.980277412099956, :Indus => 0.0, :Chas => 0.548151440210402, :NOx => -2.1972017021636727, :Rm => 2.6143569026062288, :Age => -0.0, :Dis => -3.3836879066403185, :Rad => 2.619535447511152, :Tax => -1.9458601239151805, :PTRatio => -2.158404705803287, :Black => 0.7705983695536748, :LStat => -3.923513035291781],\n", " intercept = 22.41855852728268,),)" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "fitted_params(tuned_lasso_machine)" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "Since a *model* in `MLJ` is just a container of hyperparameters, we can get the best model from the above tuple:" ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "LassoRegressor(\n", " lambda = 0.02154434690031884, \n", " fit_intercept = true, \n", " penalize_intercept = false, \n", " scale_penalty_with_samples = true, \n", " solver = nothing)" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "best_lasso_model = fitted_params(tuned_lasso_machine).best_model" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "And if you use the Julia `Plots` library, there's some convenient stuff you can do with plotting a machine:" ] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "┌ Warning: scale log is unsupported with Plots.GRBackend().\n", "│ Choose from: [:identity, :ln, :log10, :log2]\n", "└ @ Plots /Users/nelsontang/.julia/packages/Plots/UTR4H/src/args.jl:1580\n" ] }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAARMAAAD6CAIAAAAN9zQRAAAABmJLR0QA/wD/AP+gvaeTAAAgAElEQVR4nO3deVwTVx4A8JcAkSAQCAECERERUU4VVEAsIIh31RbQiufqStv1PlBLV61HBbUere0qXkUtAoIHVfGglXKIFkVFTrkE5YYAgZCLJPvH7KYpIiZDJIn8vp/+wbyZefkx8utM3ryDIBaLEQBATkRlBwCAWoLMAQAPyBwA8IDMAQAPyBwA8IDMAQAPyBwA8IDMAQAPyBwA8MCTOenp6dbW1q9evVJ4NACoC00c54hEorKyMkNDQ0UF4e7ubm5urqGhoagK3yuRSEQkwr1aPmp00dra2mg02vnz53s+DE/mjB071sjI6Lfffps9ezau2LoqKSlZtmyZAlPxveJwOGQyWdlRqBk1umhZWVlPnjx552F4ModEIu3bt++LL76orKz86KOP9PT0pHcNGjQIR4UzZ840NzfHEUzfa2trk/6VgSxU8KKxWKwtB/5T2MhBCI001onY/AUWoZaWVnFx8TtPx5M5jY2NK1euRAitWbOmyy4nJ6dnz57hqBOAvlRXVzfpy935ntuQpQlCKKWtPm3ZV7/99G8TExMZa8CTOfr6+idOnOh2F5VKxVEhAH1s0/7j+T67kPb/b4N6JrleOzft//Hcwe0y1oAnc8hkMnbPAUBNlbJEf6UNhqxf0tIpew14MkdCIBBUVFRUV1cbGxvb2tqqS+MJAN0jEGQ/FuffOo/H27hxo4GBgY2NjZeXl52dnYWFxc8//4yvNgD6mJUeAfHa/1bEbRuqL0c64LznfP755+fOnZs7d+6UKVOoVOqrV6+io6OXLVumqam5cOFCfHUC0GcOhn7+ZNWOAs+vkK4RQgi1N41M2/vdT1/LXgMBxzwEVVVVFhYW33///apVqySFYrE4MDAwLy+voKBA3goZDEZWVha0Sn/AVPCitbS0hB44XtCAtUpr79/8hYGBAULo6tWrUVFRV65c6fl0PPecvLw8TU3NFStWSBcSCISQkJApU6bw+XwSiYSjWgD6koGBQeTerbhPx/M9R1tbu7Ozk8VidSlnMplaWlqamr1qdQBALeDJHFdXV319/X/+858tLS2SwtLS0rCwMB8fH2hhA/0BnvuDjo7ODz/8sGzZMktLy/Hjx2MtBA8fPqRQKIcOHVJ4iACoIJz3h0WLFmVkZMycObOysjI1NZXNZq9ZsyYnJ8fOzu5tp+Tk5AQGBrq4uKxYsaKhoQFvwACoBDyZU1JSsnXr1iFDhvzyyy+FhYXV1dVPnz49dOgQg8F42yk1NTXe3t6+vr5xcXFkMvmzzz7rRcwAKB+ezCkvL4+IiNDX15f9lJs3bw4fPvzzzz+3trb+7rvvMjIycnNzcXw0ACoCT+Y4ODhoaGiUlJTIfgqHw9HR0cF+1tLS0tLSgi7VQK3haSEwMzPbtm1bSEhIdHT0kCFDZDnF29s7NDT08ePHLi4uZ8+ebWtrq6urk+xlMplubm6SMaETJkw4fvw4jsD6BpvNJsjTwQkgtbpoXC5Xlu4BeDKHyWQmJSXl5eXZ2NhYWFhIjyywsbG5ePHim6c4ODj89NNPc+fOFQqFnp6eo0aNkh4IYWBgEBsba2pqim1SKBRdXV0cgfUNsVisyuGpJjW6aNra2rIkOZ7M0dDQGDp06NChQ9/c1cOA0KVLly5duhQhxGazzc3N7e3tJbuIRKKlpaW69L4BAOHLnAEDBoSHh5uZmck1svz169eDBg0SCASbNm0aNWrU6NGjcXw0ACoCTwtBSkqKtbV1e3v7uw+VMnnyZAaDYWBg8PLly0uXLuH4XABUB557juxjtaUVFBTU1taSyWQKhYLjdABUCp57zujRo0eNGnXq1Cl5T6TT6ZA24MOA557D5XI//vjjb7755v79+15eXtKvRKlUakBAgOLCA0BF4ckcFou1a9cuhND169evX78uvcvJyQkyB/QHeDLH2NiYyWR2u0tdZrgFoJfwZA6RSFSXmWwBeE/kaCEQCoWpqalvu9sghB49enT16lVFRAWAqpMjc9rb2728vDIzM7HNZ8+eEYlE6fk64uPjt2+XdYpEANQa/pHP4v9TYDQAqAuYMwAAPCBzAMADMgcAPORulU5PT2ez2QihiooKhNDt27cl46ILCwsVGxwAKkvuzAkPD5fe3LBhg/Smo6NjbyMCQB3IkTkDBw68c+dOz8eo2tzBALwncmSOpqbm5MmT318oAKgRaCEAAA857jlRUVE3btzo+RhLS8sDBw70LiQA1IAcmdPU1FRWVibZzMvL43K5JiYmhoaGNTU1LBbL2NjY19f3PQQJgMqR42ltw4YNj/5vypQpDg4OT58+raurKywsbGlp+fXXX7W0tIKDg99frACoDjzfc169ehUeHh4fH+/s7IyVEAiEmTNn7tmzZ+PGjQoNDwAVhSdz8vPzSSTSm1Or2djYFBcX8/l8RQQGgErDkzl0Op3L5SYmJnYpj42NNTIygqUOQX+AZ0yos7Ozn59fcHDw559/PmnSJCMjo6qqqpiYmISEhG+//VbhIQKggnCu6Xn58uW1a9ceO3bs8OHDWImhoeH+/fs3bdqkuNgAUF04M0dPT+/MmTNHjhwpLi5uaGgYPHjwsGHD4DkN9B+9WkdaX1/fxcVFUaEAoEbw975JTEycM2fOiBEjgoKCsJLNmzffvXtXQYEBoNJwZs6BAwdmz57d0NBgbGwsmZq9vb396NGjiosNANWFJ3OYTOa///3vvXv3ZmRkSG44CKFJkyY9fPhQcbEBoLrwZM6TJ0+EQmGXMW0IIQaD0dTUBG9CQX+As4Wg28miampqsNVzuz2ltbU1KiqqoqLCwsJiyZIlMEsoUGt47jmjRo0iEokxMTEIIcmKimKx+Pjx425ubt2uscjj8dzd3e/fv+/k5PTnn3+OHz+ew+H0Jm4AlAvPPcfIyGj16tUhISH5+fmtra0tLS3R0dGRkZFpaWm3b9/u9pS8vLyXL18+f/5cQ0Nj0aJFVCr12bNnbm5uvQseAKXB+bQWERFBIpGOHj2K3ToyMzNNTEyio6P9/Py6PX7w4MGampoFBQUODg5FRUUikcjKygp/1AAoG86Vp6qrq7dv375t27bs7OzGxkYGgzFmzJgBAwa87RQajRYXF+ft7U2j0err66OjoyVruCOE2traNm3apKOjg22OGDFi1apVOALrG1wu923f5cDbqNFFEwgEssz5jCdzUlJSpk2bVldXZ2Ji4u3tLcsptbW1y5cv37t3r4+PT1pa2sqVK7OyshgMBrZXS0vL2dlZshDi0KFDVfkq99AKAt5GjS6ahoZGt9/Vu8C/wq4stUskJiZaWFiEhIQghIYPH37+/PkrV65Ibiza2tqLFi0yNzfHEUzf09DQgAW25KVGF41IlKnZDOcKu87OzmfOnJH9FGwkAo/HQwgJBIJXr17RaDQcHw2AisC/wu6OHTuwFXZ1dXUlu962wu6sWbOOHj06btw4T0/PzMxMOp0+d+5c/FEDoGw4V9jdvXs3QigxMbHLyNC3rbBLIpH++OOPzMzMysrKBQsWuLu7y3hPBEA14ckcGo1WWlra7a4ehugQCAQPDw8PDw8cnwiAqsGTORoaGkOHDlV4KACoEXhkAgAPnH0IuFxuRETEnTt3ysvLuVyupNze3j4tLU1BsQGgunDecxYvXvztt99aW1uLRKKJEydOnz4dITRw4MBPP/1UoeEBoKLwZE5tbe2lS5eioqLOnTvHYDAWLFhw4cKF4uJiKpUqEAgUHiIAKghP5rx48UJDQwO7vRCJROxpzcjIaOfOnUeOHFFwgACoJJxPa0QiEXshY2pqWl1djRXS6fTq6mq47YD+AE/mDBs2rLOzs7y8HCE0atSo6Ojo1tZWoVB46tQpMzMzdenYB0Bv4Mkcc3NzDw+Pa9euIYS+/PLLqqoqY2NjIyOjM2fOhIaGKjpCAFQRzlbp9PR07Adzc/Ps7OyEhISGhgZfX19YSBT0E72a4xMzZMgQWDYH9DfQhwAAPPDccxoaGmxtbbvdBX0IQD+BJ3PIZPLKlSulS1gs1q1btwQCAfQhAP0EnszR1dUNDw/vUsjlcv39/Ts7OxURFQCqTmHfc7S1tdeuXXvgwAFFVQiAKlNkCwGJRGpqaoI+BKA/UECrNEJILBbn5eXt2rVr+PDh0IcA9AeKaVtjs9l8Pl9PTy86OlpBgQGg0hTTtjZw4EArK6tp06YZGRkpKDAAVJrC2tYA6FegDwEAeOCcb23z5s3vPGzbtm1DhgzBUT8Aqg9P5vD5/OTk5FevXgkEAgqFYmhoWFtby+Vy9fT0jI2NJYd9+eWXiosTANWC52mNRqMtXrzY3t4+KyurpaWlvLyczWYnJCTo6ur+9NNPpf/n7Oys8HABUBF47jlVVVW7d+8uLi6WrB5FJBI/+eST5ubmdevWFRQUKDRCAFQRnntOXl6elpaWpaVll3JbW9uioiJYmxr0B3gyx8TEhMvl3rhxo0v55cuXDQ0Ne5haGoAPBp6nNWdnZ29v788++2z16tWTJk0yNDSsrq6OjY2Njo7+5ptvFB4iAG8Si8UVFRV8Pt/a2lopa1rhyRwCgYCtuHbgwAHJK1F9ff3du3eHhYV1e0pnZ+ezZ8+kSxgMBp1Ox/HpAMTfuLv3l9tVBnYCjQFmzLPLvUZuXLm4j2PA2ePTwMDgwoUL33//fVFRUX19/eDBg21tbSVL5L6pvb0dW+oQ8+TJk8jIyOXLl+P7dNCf/Zn9dHVCbu2kfdhmCwraVXRL90JcyMKgvgyjV32lqVSqu7u7LEcaGBg8evQI+zk7O9vT0xNGjwJ8Is4m1Lr97dGGZTv1/L0dfZw58rUQ1NXVVVZWSpc8e/YsKCjI2dl56tSpSUlJslRy+vTpgIAAAwMDuT4aAEyzgIiIXb/YtHT29dgW+e45vr6+Dg4OMTEx2GZJScmECRM4HI6FhUVhYWFycvKtW7f8/Px6qIHL5V68ePHy5cvShXw+PzEx0dDQENscNGiQm5ubXIH1JaFQKBQKlR2FmlHgRSMTuqlHR0Nh9YtEIlkOkyNzGhsb8/Pz9+3bJynZu3cvh8NJSkry9/evra318vIKDw/vOXMSEhIMDAy8vLykC7lcbmJiIplMxjbHjBkzevRo2QPrY3w+H1tkG8hOgRdtrqfzH0/vsm3/mhNTo7HMzUJfUfULBAKxWPzOw+TInIqKCrFY7OTkhG2KxeKbN2/6+Pj4+/sjhOh0+qpVq/bs2dNzJWfOnFm+fDmBQJAu1NfXP3XqlLm5uezBKJFQKOyhLQR0S4EXbUVwYN7LI3EPS6uHTUOaJMPylEmEou8O7VTUYOQBAwZ0+fvslhyZ09HRgRDS09PDNouLi+vr6318fCQHDBs2rKGhgc/nv+1laHl5eWpq6s8//yz7hwLwpsNh69aUl1++8webw/34nx6jnPu6SRrJlTkWFhYIoZycHG9vb4TQ77//jhCaOHGi5ID6+nptbe0e+hCcOXPG398fqweA3rCystoYYqXEAOTInMGDB1tZWa1bt+748eMIoYMHD5qYmEiv0p6TkyPpA9otGxubOXPm4I4VANUhR+YQicTIyMhZs2Zh73A0NTXPnTunqfm/Gng83oULF+bNm9dDDYsXK+GuCsD7IF+rtJ+fX35+fmJiolAo9PPzk7QWIIRqa2s3bNgwc+ZMRUcIgCqSuw+BlZXV2rVr3yy3tLTcsmWLIkICQA3ADB4A4IEzc4qLiwMDA01MTAh/ByOoQT+Bp8cnl8v18fHhcDgLFy7s8vpSegYPAD5geDInNze3qqrqwYMH48ePV3hAAKgFPE9rLBaLQCCMGTNG4dEAoC7wZM64ceMoFMrDhw8VHg0A6gJP5ujo6Pz4448rVqy4du1aXV1dsxQWi6XwEAFQQThXAQkODkYIvdmVxsnJqct8AwB8kBS8lgG0rYF+Ak/mDBw4ELoLgH4O+hAAgAf+uW+eP3+enJz88uVLLpcrKWQwGNu3b1dEYACoNJyZc/z48X/96186OjrYQiCtra08Ho9CoUyYMEGx8QGgmvA8rbW3t2/atGn58uUtLS329vbff/89h8OJjY3V0dHZtm2bwkMEQAXhuee8ePGCzWaHh4dj8/l2dnYSCISgoKCampp169ZJZiQE4AOG557T1NREJpOpVCpCyMDAoLm5GSt3c3N78uRJZ2enIgMEQCXhyRwGg8HhcJqamhBCVlZW9+7dw8qfPn1KJpOVMrE8AH0Mz9PaiBEj6HT6zZs3Fy1atGzZMk9Pz5kzZ5qYmFy8eHHu3LmyzFUFgLrDc88hEok3b950cXFBCE2YMOHkyZONjY2pqamLFi368ccfFR0hAKoIZ6u09Oy1K1asWLFihYLiAUA99GoVED6fX1ZWpqmpOWzYMEUFBIBawNn7pq2tbdmyZQMHDhw5cuSaNWuwwilTphw+fFhxsQGgunBmTnBwcGJi4r59+7744gtJoZeXV3R0tIICA0Cl4cmc4uLiX3/9NS4ubtOmTSNHjpSUOzs7FxQUKC42AFQXnswpLS0dMGCA9CoGGAqFwmazYW0Z0B/gyRwKhcLj8bA3odJyc3OpVOqAAQMUERgAKg1P5ri4uJiYmGzdulUoFEree1ZWVu7bt2/GjBkKDQ8AFYWnVZpEIv3nP/8JCgq6f/++oaFhXV3dvHnzbt68qauru3fv3p7P5fF4sOYZ+ADgbFv75JNP0tLSrK2t8/LyysrKUlJSAgIC/vzzzx5WlWpqavrkk08oFIqJiQlMcQjUHf43oe7u7tevX5f9+MDAQCsrKyaTqaOjU1RUhPtzAVAFvepDILvHjx9nZ2cnJiZiz2m2trZ987kAvCdyZA6Pxzty5EjPxxgbG//jH/94szw3N3f48OGrV69OSkqiUCh79uwJDAyU7BWJRBUVFZL5DGg0mr6+vuyBAdD35MgcLpe7devWno9xcnLqNnPq6+uzsrKWLl169uzZlJSU6dOnOzs7Dx8+HNvb0tIyb948ycAed3f3EydOyB5YH2tvb1d2COpHjS4ah8MRiUTvPEyOzMFWyKHRaAsWLFiyZMmQIUPePOZtw9qMjY319fWxrjre3t4uLi737t2TZA6VSn3w4EGXBUVUmWRpeyA7dbloZDKZSHx3y5kcbWv6+vqVlZUbN268fv36mDFjJk+efOnSJS0tLUMpb3vKcnR0FAqFklTm8/laWlqyfzQAqka+VulBgwZt2bKlpKTk0aNHLi4uGzduNDExCQoKSk5OFovFPZzo4uLi6Oi4ffv22trac+fOFRUVTZ06tXeRA6BMON/nuLi4nDhxorq6+tixYzU1NZMnT96xY0fPp8THx5eWlnp7e8fGxt69e1eNns0AeFOvWqXb2toaGhqwDmwDBw7s+WAGgxETE9ObjwNAdeDJHB6Pd+fOnfPnz1+5csXY2DgwMDAmJsbJyUnhwQGgsuTLnMePH587dy46OrqtrW3y5MnR0dFz587V1Oyj16kAqA45/uhbW1tdXV319fUDAwPnz59vaGiIEOqyzhSZTLazs1NwjACoHrlvFywW6/Tp06dPn+52L6zZBt4pv6Dowq93mlvb/D1c5kz3V9MJ+uTIHDKZ/M5X+9iUuQC8zfo9h2NeadfaBSGLgT+nZo69uC7x2DcGBgbKjktucmQOiURauXLl+wsFfPBu3L13ptGCNXY2tsm18U5jjFq5/bu473crNzAc8LzPYbFYISEhb843kJqaumvXLkVEBT5MUdfvsew//luRjkFus5Ki6R08mcPhcCIjIwUCQZfy/Pz8hIQERUQFPkxcIQG98a2GI9aQpYelqlHkOqH19fVYgxsA3RpkSEac1i6FRhp8WXpYqhr52taioqLu37/P4XAQQmvWrJHutclkMm/duiU9cSEAXXz95ZLb6w6U+e1ChP+lim5x8meeI5QbFT7yZU5+fn5ycrJQKEQI3bt3T/p/FUZGRsuWLfvqq68UHCD4gJibmcV/vXjdd2FlyEQwQN+kvfwfPg7rVixWdlx4yJc5ERERERERDQ0Ntra2ubm57+yrBkAXox3t//j5YHt7e3t7O51OV3Y4+OHpOGNsbMxkMhUeCug/dHV1dXV1lR1Fr+DvclZXV3fhwoX8/PyqqipTU1Nvb+8FCxbABJ+gn8CZOenp6R9//HFzczONRqNSqenp6efOnTt8+HBycrKJiYliQwRABeFpDezs7AwODh40aFBWVlZDQ0NRURGLxYqPj6+oqNi0aZPCQwSqo7Oz8+nTp7fu/lZbW6vsWJQMzz0nOzu7srKysLBQMm0akUj89NNP6+vrQ0NDFRoeUCFJv6d9depasbE7R4dGj7nopd9yNvyrfvt8jidzmEymlpbWmyscjhw5sr29ncfj9dur+QGrqq7+8tSdl5P2YZvV1h6xLVWaYeHnDr5jFP2HCs/TmpWVlUAguHXrVpfy69evMxgMSJsP0pGzsS/H/Uu6RGTAeFAn7OzsVFZIyoXnnmNra+vj47NgwYItW7b4+/tTqdTXr19HR0dHRkZCj88P1avGFmRq1KWwjWzKZDL7Z5sQzra1mJiY4ODgsLCwsLCw/1WkqblmzZpt27YpLjagQuiGeqijBen8bSDNQE5Dv+2piDNzTExM7t69m5ubm5WVxWazaTTaRx99BBNBfcDWLw26HHbildeWv4ra6l2own4742SvJt9wcHBwcHBQVChAlVkOHnwwaNyu2B1FjEmdA42p1VmehJIzB/+t7LiUBn/mVFVVRUVF5efn19XVGRgY+Pn5LVq0CBZj+4AFzfKf5TcxI/Phq7piz8+8bGxWKDsiZcKZOSkpKbNnz25razM3N6fRaE+ePImPjz969Ojvv/+u1t34QM/IZLLfJG9lR6ES8LRKCwSChQsXDhs27NmzZ69fv3769GlDQ8P169dra2s3bNig8BCBolRUVDx8+LC5WT2HL6sYPPecx48fV1VVpaSkSF6GEgiEGTNmRERErF+/XiwWq+k8QB+wnLzCL8JPFA+0b9UxNW1JnaDfdvrbrfBo3Rt4MqelpUVLS+vN9XNsbGzYbDafz4eXoSqFzWbP3/mfAv/92EjMV2hqLKuOv3lPwo/fKjs0NYbnac3a2logENy4caNL+dWrVy0sLCBtVM25+MRCx39IBjAjhMT6po86KK2tXacEALLDc8+xsbGZPHnywoUL169fP2XKFBqN9vr16wsXLkRFRe3bt0/hIYJeynlRLrb4uEths/7QyspKR0dHpYT0AcDZtnbx4sUlS5bs3r179+7/zTFHIpE2b968efPmt52yc+fO/Px87GcGg3H48OFuD+vs7Iy8EJeRU2Soq7Ps0+kuznj+afl8/qno+Kz8UmMDvZD5s62HWuGoRCQS3f7tXsbTAruhFrOn+uEbOi4SidIzMp8UlTrbWE309HjbapA9Ky4t++5MbEMb14Km99Xni+Xt7WLFMEWtNcjIUrpQt73a1NQLRzAAgzNzjIyMrl+//uLFi0ePHrW3t5uamnp4eBgbG/dwyh9//OHq6jpu3DiE0NsWRXxdVTV7/b4c+2WddoFIwLl04tI80zvf79goV2wvikuDwo4+t18msglAnJaLe3/5fAwl7F/drPvbg+LSsuCvjz4dPEtAn0t4Vj3y8t69i/3mTJkkVyX5hS+W7vopx3wyz2g8qbDM8cTmE1v/4eIk37vjQ6cuHMqsr3JdhbS0Eac1ccMPB4MnfjLNT/YalgV+fHzt9+WTtv9VJOCMEFb0z/5mCiNWnKSkpBUrVrxtr7e3d0JCQre7zM3Nq6qqxGLx1H+Gop9Y6CRf8h9lTczt31PlCmPiovXoRId0JSbLf8grKJSrErfP1qDjbOlKrD/bxmQyxWIxi8WSpQahUDg6cBU6wfmrkkieY+BqPp8vexj19fWWi/dIh4FO8u3nb5KrErFYnHAz2T54G+GrDHTgJXndFc/gtVXV1XLV0EsyXjRVcOXKlTlz5rzzMLlbCLhc7oMHDxISEgoKCiSFv//+u7u7+7Rp0yTPY906dOjQjBkzQkND6+rq3twrEomKOwYgLW3pwlb72VG//iZ7eM3NzWWaDET827203vmzyNhE2SspLy8voTghjb/1yCob+VnMta4DK3qQ9ehxkfkkRJR6PCMQiixnpKRlyF7JtTv3KobO6FJYSp/w9OlT2StBCH0yzffhibB4t9Zvte/eCTBNu3DE3MxMrhpAF/I9rZWWlvr6+lZUVCCECARCaGhoWFjYF1988csvv5iZmR0+fDgkJORt586fP9/U1FRLS+v8+fPjxo3LycmhUCjYLiaT6ebmRiQSq4bP6noaUaOdK2hvb5cxwpqaGi7pjUdBbb16ZqvslVRUVLDIXZ9kxBR64cskbN6SHn5NiaKylx16g7sU8imD8oqz3MfJGkltIxNpd10Mna+lV1ffIPuvI+HvM9EfIYQQjnN7QyQS/fDDD2vXru3LD8WNy+WKe1wuGiNf5mzcuLGuru6rr74aPnz47du39+/fn5aWlpOTExERsXr1ajKZ3MO5kr+2adOm2dnZXb9+PTg4GCsxMDCIjY01NTWdue2Hgi6ntdQ4DKHLPsOQjY2NUXtM098LiTX5E0aPlL0SOzs72vno6r8XatYXebjaMZnMkydPbtz47q9eo+xH6Kc/Z1mOkS7UYZaMnTRC9kimTBh34FwWy+hvGWjWkD1+3FI1mnWpra0tPDxcMiBFxWlra8vyKl++p7X79++Hhobu3bt3yZIl0dHRHh4emZmZt2/fDg0N7Tlt/vaRRCKdTpd+mUAkEi0tLYcOHbrcd7RekdQTUSdv5MPvQkPkmANSS0truq0B6dXjv4oEHMfnp5bP/1T2SkxMTFy1alBb/V9FQoF9Uczc6f6yV+Lk6OjAzEQCzl9FnTz7qrvubuNkr2Ss6xj39j8JzNeSEq3q5/4m3J4bY0AfkOOew+FwGhsbx48fLylxc3NraGjw8PB457nt7e0vX77EhiQkJSU9fPjwh25J9lMAAAoYSURBVB9+ePOwjSsX616Ii7yzvUGDShJ2DCdzfjq4UfJQJ6NDX683PBp5KeVak7YZmd9ip8M5fvQrbW3td58pJfq77f/8en8Gc0AjdSSl/fXIzpcnD2wmkUhyVZJw+OulYXueEK0aDWyMWGWOvKKfv9sm7+zjV3/6dkvEsZT0ttZOIlWDP3uUxddhMAWxCpC9zaGlpQUhlJycLCkJCwtzcXGR5dza2lo6nW5kZGRqakqn06OioqT3StrWJNrb2zs7O2WPrVsNDQ0CgaA3NbBYrCdPnjQ0NEhK8vPzbW1t5ark1atXKX/8UVFR0ZtIxGJx7y+IsrBYLF1dXWVHISsZ29bkfp+TmpoqmRq3oKCgubn50qVLkr0UCsXfv5tHGlNT05qamsbGRpFI9OZrhJaWFg8PnG8J+5hAIKirq7O2tlZ2IOpELBZzOBx1uWgdHR00Gu2dhxHEMjQjYFpbW9+5niO+FXZLSkrUaAUVFov1tje54G3U6KKJRCIymcxgMHo+TI57jo6OTlxcXM/HyPudBPPm1G0AqDg57jmgZ3w+PzU1tbS01NbW1tvbW9nhqDQ2m33jxg0SiTRjxgw1nQNEbZ6RVN/p06evXbtGIBD27du3detWZYejusRi8dSpU0tKSu7fv79w4UJlh4MT3HMU7/Xr15MmTXrx4oWyA1FRKSkphw8fvnbtGkLIxcUlLi5OXRoPpME9R/Fu377t5uam7ChUV35+vmRckIODg3QHSDXSq/nW+qcnT5506Uji4OCwf/9+7GfsJe+dO3eUEZp64HK5knfKAwYM6OjoUG48+EDmdMXj8To6OrpM+srj8fLy8oyMjCwtLe3t7c+ePSu9V/J3kJ2dHRIScvXq1f489IXNZotEIj29v3VUbW5uLisrs7S0pNFo5ubmKSkpWHl1dfU7239V1Pt9H6tWnj175uzsrKmpSSKRpMtzcnIYDIa7u7upqem6det6ON3W1jYrK4vJZDY3N7//eFVOXFzcsGHDCATCxIkTpcvj4+OpVOrEiROpVGpUVBSTybS2tq6trS0qKrKxsellPw9lgcz5S3V1dUpKSmpqapfMmTZt2o4dO8RicW1tLY1Ge/ToUbennz171u//Zs+e3QcBq5rc3Nw///zz2LFj0pnD4/HodPqNGzfEYnFGRgaFQmlra0tOTvb3958xY0Z2drby4u0VyJyunj59Kp05LS0tRCKxsrIS21y6dGloaKiSQlMPkZGR0plz9+5dc3NzkUiEbdrZ2cXHxyspNEWCtrV3eP36tYaGxqBBg7BNKyuryspK5YakXiorK62srCQjXqysrLCRkeoOMucdOjo6SCSS5B9eW1ubzWYrNyT1gl1AySaZTFbTxrQuIHPewdTUlM1m83g8bLOpqQmmnJcLnU6XdK5HCDU2Nn4YFxAy5x0YDIa5uXlGxv+m3cjIyBg7dqxyQ1IvY8aMKSwsxKaB53K5jx49cnV1VXZQCqCxc+dOZcegKjo6Os6ePfvw4cO0tDQ6nV5UVOTo6EgkEgUCwf79+62srH755Ze7d+9GRkbKOzi0n6ioqIiLi0tPTy8oKNDW1sZanw0NDR8/fnzlyhUajbZz504KhbJly5Z316XyIHP+wuVyExMThUKhh4dHc3Mzl8udOHEiQsjDw4NAIMTExCCETp48aQbzLb1FTU1NSkqKnp6enZ1dc3Ozrq6uk5MTQmjmzJklJSWJiYnDhg07duyYvCPbVRP0+AQAD/ieAwAekDkA4AGZAwAekDkA4AGZAwAekDlKUF1dPW7cuN9+k2ONBtktWrQoNDRU9uOPHTs2aZJ86wIBBCPblILH42VlZTU1Nb37UPkVFhZyOJx3H/d/VVVV8q4pAhDccwDAB+45yicSiR49evTw4cPGxkYLC4uPPvpo+PDhkr2ZmZlsNtvHx+fKlSvPnz+3tLScP3++jo4Oi8W6dOlSVVWVq6vr9OnTu9TJZrMvXbpUWlo6bNiwoKAg6ZUmhELh5cuXsYGugYGBXU5sbm7+7bffiouLCQSCra3ttGnTPoxX/oqn7AFC/VFZWRlCKDY2FtvctWuXtra2s7Ozj48Pg8HQ0NA4efKk5OCAgABnZ+dp06ZZWFi4urpqamqOHz++rKxs6NChdnZ29vb2CKEtW7ZIjnd1dfX19bWzsxs5cqSHhweJRHJ0dGxqasL2crlcX19fIpE4duzY0aNHm5ubBwUFGRoaSk7X19c3MzPz9PR0c3PT0tKytbWtq6vrk6uiZiBzlKBL5uTm5kr+svl8/ooVK3R0dFpaWrCSgIAAhND69euFQqFYLL548SJCiE6nX7p0CTtg1apVJBJJ8veN9UT++uuvsc3MzExtbe2QkBBsMyIiAiEkWbD1zJkzCCHpzElNTZWM3ywsLDQyMlqzZs37uQzqDTJHCbpkThfY/GNpaWnYZkBAgK6uLofDwTaFQiGJRPL29pYc/+DBA4TQvXv3sE1XV1djY2NsyT7M0qVLKRQK9vPo0aO9vLykP2706NHSmdPF8uXL3dzc5Pz9+gVoIVC++vr6tWvXDh48WEtLi0AgjBw5EiEkPWbbyspK8mWDSCRSqdQRI0ZI9mJLVtTX/7XCnIODw4ABAySbLi4ura2t2AGlpaVdhse4uLhIbyYlJXl5eVGpVAKBQCAQTp8+/WEMflY4yBzlmzNnzpUrV8LDw7Ozs0tLS1NTUxFCAoFAckCX4UAEAkG6BBvpLZbq8y6dNgghrHmgo6NDJBLxeLwue6UbANLT02fMmGFpaXn16tUXL16UlpYuXry4s7NTIb/mBwba1pSsubn5wYMHJ0+eXLBgAVZSXl7eyzpLSkqkN4uLi0kkkoWFBbYe65t7JT/fvn3bwMAgKipKMu+C9K0MSIN7jpJhD83YYGOEkFAoPHjwYC/rLCkpSUpKwn5uaWk5d+6cl5cXtiSej49PYmLiy5cvsb35+fnJycmSE0UiEZ/Pl0xRkp2dfffu3V4G86GCe46SUanUKVOm7Ny5s66uztjY+MqVK5qavf1HcXBwWLRo0eLFi42MjC5cuMBisSTTXm/fvv3atWuenp7Lly8XCoUnT54cPXp0aWkptjcgIODAgQO+vr7z5s2rqqo6e/asq6sr1p4BuoDR1EogEolaW1v9/PzMzc0RQrNmzeLxeJmZmaWlpTNnzoyIiGhtbfX19bWwsEAINTc3W1pafvTRR5LT6+rqxo8fjy30jRASCoXStTU2Nnp7e2/ZsuX27dv379+3s7M7e/bsqFGjsIP19PQCAgKqqqrS09M7OjrCw8Pd3NwMDQ2nTp2KEDIzM/Py8iooKMjIyNDU1Dxy5IiTk5ORkVG3a7/2czCaGgA84HsOAHhA5gCAB2QOAHhA5gCAB2QOAHhA5gCAB2QOAHhA5gCAB2QOAHhA5gCAB2QOAHhA5gCAB2QOAHhA5gCAB2QOAHhA5gCAB2QOAHj8F8i9BuLvp7vWAAAAAElFTkSuQmCC", "image/svg+xml": "\n\n\n \n \n \n\n\n\n \n \n \n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n", "text/html": [ "\n", "\n", "\n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "Plots.plot(tuned_lasso_machine)" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "Unfortunately this doesn't work with `Makie` but we can take the convenience. \n", "\n", "`report` also gets us some useful information too:" ] }, { "cell_type": "code", "execution_count": 31, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(best_model = LassoRegressor(lambda = 0.02154434690031884, …),\n", " best_history_entry = (model = LassoRegressor(lambda = 0.02154434690031884, …),\n", " measure = [RootMeanSquaredError()],\n", " measurement = [5.00501639997187],\n", " per_fold = [[5.195550421954174, 5.594465814013773, 4.114753673922431, 4.854588100919187, 5.14401890179805]],),\n", " history = NamedTuple{(:model, :measure, :measurement, :per_fold), Tuple{MLJLinearModels.LassoRegressor, Vector{RootMeanSquaredError}, Vector{Float64}, Vector{Vector{Float64}}}}[(model = LassoRegressor(lambda = 0.007742636826811269, …), measure = [RootMeanSquaredError()], measurement = [5.00631391933102], per_fold = [[5.173141601266498, 5.583597171707226, 4.117843599890676, 4.876589616915048, 5.161412760748969]]), (model = LassoRegressor(lambda = 3.593813663804628, …), measure = [RootMeanSquaredError()], measurement = [7.037660609843977], per_fold = [[7.512438084636223, 7.063395454670881, 6.8661160065870375, 6.433322821349736, 7.265250266341788]]), (model = LassoRegressor(lambda = 0.02154434690031884, …), measure = [RootMeanSquaredError()], measurement = [5.00501639997187], per_fold = [[5.195550421954174, 5.594465814013773, 4.114753673922431, 4.854588100919187, 5.14401890179805]]), (model = LassoRegressor(lambda = 10.000000000000002, …), measure = [RootMeanSquaredError()], measurement = [9.313715174414954], per_fold = [[9.973453424001466, 8.781991133737549, 9.173940857089404, 8.922473856481114, 9.662379608973254]]), (model = LassoRegressor(lambda = 0.46415888336127803, …), measure = [RootMeanSquaredError()], measurement = [5.375361457189756], per_fold = [[5.877170727397991, 5.967933998434739, 4.578944338203925, 4.857603090906311, 5.4545524992795205]]), (model = LassoRegressor(lambda = 0.05994842503189412, …), measure = [RootMeanSquaredError()], measurement = [5.0109484736056835], per_fold = [[5.259321427353059, 5.623369801398444, 4.123251164019081, 4.795490781061791, 5.12516743466273]]), (model = LassoRegressor(lambda = 0.1668100537200059, …), measure = [RootMeanSquaredError()], measurement = [5.125607287752084], per_fold = [[5.524998247530005, 5.773465581399223, 4.2636201059791485, 4.717550048386918, 5.202595988478275]]), (model = LassoRegressor(lambda = 1.291549665014884, …), measure = [RootMeanSquaredError()], measurement = [5.67902723720285], per_fold = [[6.145369593492875, 6.098863598206641, 5.196840803974826, 5.018965631235667, 5.839338731818849]]), (model = LassoRegressor(lambda = 0.002782559402207125, …), measure = [RootMeanSquaredError()], measurement = [5.009033242868468], per_fold = [[5.167453566111362, 5.579935696829209, 4.1218990712709935, 4.889586604591623, 5.168723467607311]]), (model = LassoRegressor(lambda = 0.0010000000000000002, …), measure = [RootMeanSquaredError()], measurement = [5.009675758229785], per_fold = [[5.1655412989846035, 5.5784343387348025, 4.123519354099901, 4.892197295073789, 5.171605475316245]])],\n", " best_report = nothing,\n", " plotting = (parameter_names = [\"lambda\"],\n", " parameter_scales = [:log],\n", " parameter_values = Any[0.007742636826811269; 3.593813663804628; … ; 0.002782559402207125; 0.0010000000000000002;;],\n", " measurements = [5.00631391933102, 7.037660609843977, 5.00501639997187, 9.313715174414954, 5.375361457189756, 5.0109484736056835, 5.125607287752084, 5.67902723720285, 5.009033242868468, 5.009675758229785],),)" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "report(tuned_lasso_machine)" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "Since this is a `NamedTuple` we can get specific things out, like the RMSE for each value of $\\lambda$ in our search:" ] }, { "cell_type": "code", "execution_count": 32, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "10-element Vector{Float64}:\n", " 5.00631391933102\n", " 7.037660609843977\n", " 5.00501639997187\n", " 9.313715174414954\n", " 5.375361457189756\n", " 5.0109484736056835\n", " 5.125607287752084\n", " 5.67902723720285\n", " 5.009033242868468\n", " 5.009675758229785" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "report(tuned_lasso_machine).plotting.measurements" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "And we can get the best history entry RMSE:" ] }, { "cell_type": "code", "execution_count": 33, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(model = LassoRegressor(lambda = 0.02154434690031884, …),\n", " measure = [RootMeanSquaredError()],\n", " measurement = [5.00501639997187],\n", " per_fold = [[5.195550421954174, 5.594465814013773, 4.114753673922431, 4.854588100919187, 5.14401890179805]],)" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "report(tuned_lasso_machine).best_history_entry" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "We see there under `measurement` the measured RMSE in the grid search at that `lambda` value." ] }, { "cell_type": "code", "execution_count": 34, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Best Lasso RMSE: 4.6343333341600985\n" ] } ], "source": [ "y_pred = predict(tuned_lasso_machine, X_test_standardized)\n", "best_lasso_rmse = rms(y_pred, y_test)\n", "println(\"Best Lasso RMSE: $best_lasso_rmse\")" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "So here we see we got a minor improvement in test RMSE compared to linear regression but still not as good as the default random forest." ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "# Wrapping up the steps as a `Pipeline`\n", "\n", "So in the above lasso model, we applied the following steps to the `boston` dataset:\n", "\n", "1. Updated datatypes to be `Continuous` type\n", "2. Applied a `Standardizer` to the features\n", "3. Used a `Grid` and `TunedModel` to find an optimal parameter value of $\\lambda$\n", "4. Calculate the test RMSE with the best model from step 3\n", "\n", "Reference:\n", "* [Composing Models](https://alan-turing-institute.github.io/MLJ.jl/dev/composing_models/)" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "Let's recap the steps in a single code cell:" ] }, { "cell_type": "code", "execution_count": 35, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Best Lasso RMSE: 4.6343333341600985\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "┌ Info: Training machine(Standardizer(features = Symbol[], …), …).\n", "└ @ MLJBase /Users/nelsontang/.julia/packages/MLJBase/9Nkjh/src/machines.jl:492\n" ] } ], "source": [ "# Load in boston housing market data\n", "boston = load_boston() |> DataFrame\n", "\n", "# Train Test Split with `partition`\n", "train, test = partition(boston, 0.8, rng=42);\n", "\n", "# As before, unpack to horizontally split into y and X\n", "y_train, X_train = unpack(train, ==(:MedV))\n", "y_test, X_test = unpack(test, ==(:MedV));\n", "\n", "# 1. Coerce datatypes\n", "X_train = coerce(X_train, :Chas=>Continuous)\n", "X_test = coerce(X_test, :Chas=>Continuous)\n", "\n", "# 2. Standardize\n", "Standardizer = @load Standardizer pkg=MLJModels verbosity=0\n", "# Just like the other models, we instantiate it and bind data to it with a machine\n", "standardizer = Standardizer()\n", "standardizer_machine = machine(standardizer, X_train) |> fit!\n", "\n", "# And transform\n", "X_train_standardized = MLJ.transform(standardizer_machine, X_train)\n", "X_test_standardized = MLJ.transform(standardizer_machine, X_test)\n", "\n", "# 3. Load and Tune Model\n", "# Load in our model and instantiate it\n", "LassoRegressor = @load LassoRegressor pkg=MLJLinearModels verbosity=0\n", "lasso = LassoRegressor()\n", "\n", "parameter_range = range(lasso, :lambda, lower=0.001, upper=10.0, scale=:log)\n", "tuned_lasso = TunedModel(lasso, \n", " resampling=CV(nfolds=5, rng=42), \n", " tuning=Grid(resolution=10), # Search over 10 values between 'lower' and 'upper' in `parameter_range`\n", " range=parameter_range, \n", " measure=rms)\n", "\n", "# As before we then need to take this model and bind it to data as a machine,\n", "# note that we don't pipe it with `|>` into fit! this time.\n", "tuned_lasso_machine = machine(tuned_lasso, X_train_standardized, y_train)\n", "# Here we call fit! separately so we can set verbosity to 0\n", "fit!(tuned_lasso_machine, verbosity=0)\n", "\n", "# 4. Predict\n", "y_pred = predict(tuned_lasso_machine, X_test_standardized)\n", "best_lasso_rmse = rms(y_pred, y_test)\n", "println(\"Best Lasso RMSE: $best_lasso_rmse\")" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "### Method 1: Using the julia base pipe `|>`" ] }, { "cell_type": "code", "execution_count": 36, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "DeterministicPipeline(\n", " f = var\"#9#10\"(), \n", " standardizer = Standardizer(\n", " features = Symbol[], \n", " ignore = false, \n", " ordered_factor = false, \n", " count = false), \n", " deterministic_tuned_model = DeterministicTunedModel(\n", " model = LassoRegressor(lambda = 1.0, …), \n", " tuning = Grid(goal = nothing, …), \n", " resampling = CV(nfolds = 5, …), \n", " measure = RootMeanSquaredError(), \n", " weights = nothing, \n", " class_weights = nothing, \n", " operation = nothing, \n", " range = NumericRange(0.001 ≤ lambda ≤ 10.0; origin=5.0005, unit=4.9995; on log scale), \n", " selection_heuristic = MLJTuning.NaiveSelection(nothing), \n", " train_best = true, \n", " repeats = 1, \n", " n = nothing, \n", " acceleration = CPU1{Nothing}(nothing), \n", " acceleration_resampling = CPU1{Nothing}(nothing), \n", " check_measure = true, \n", " cache = true), \n", " cache = true)" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# Load in boston housing market data\n", "boston = load_boston() |> DataFrame\n", "\n", "# Train Test Split with `partition`\n", "train, test = partition(boston, 0.8, rng=42);\n", "\n", "# As before, unpack to horizontally split into y and X\n", "y_train, X_train = unpack(train, ==(:MedV))\n", "y_test, X_test = unpack(test, ==(:MedV));\n", "\n", "# Specify the tuned_lasso model\n", "parameter_range = range(lasso, :lambda, lower=0.001, upper=10.0, scale=:log)\n", "tuned_lasso = TunedModel(lasso, \n", " resampling=CV(nfolds=5, rng=42), \n", " tuning=Grid(resolution=10), # Search over 10 values between 'lower' and 'upper' in `parameter_range`\n", " range=parameter_range, \n", " measure=rms)\n", "\n", "# Return a `DeterministicPipeline` model\n", "pipe = (X_train -> coerce(X_train, :Chas=>Continuous)) |> Standardizer() |> tuned_lasso" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "A pipeline is just another model, so we can bind it to a `machine` and `evaluate`, `predict`, etc. \n", "\n", "Here we bind a `machine` and `predict` on the `X_test` data to get predictions and calculate the RMSE on the test set:" ] }, { "cell_type": "code", "execution_count": 37, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "4.6343333341600985" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "pipe_machine = machine(pipe, X_train, y_train)\n", "fit!(pipe_machine, verbosity=0)\n", "\n", "y_pred_pipeline = predict(pipe_machine, X_test)\n", "rms(y_pred_pipeline, y_test)" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "## Method 2: Using a `Pipeline()` \n", "\n", "For clarity, it may be helpful to use a `Pipeline` method instead in which we create the model with `Pipeline`:" ] }, { "cell_type": "code", "execution_count": 38, "metadata": {}, "outputs": [], "source": [ "# Using a `Pipeline` function instead of using the base `|>` pipes\n", "pipe = Pipeline(\n", " transformer = (X_train -> coerce(X_train, :Chas=>Continuous)),\n", " standardizer = Standardizer(),\n", " tuner = tuned_lasso\n", ");" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "This creates the same pipeline as before, and we can get our predictions out from a new `machine` again:" ] }, { "cell_type": "code", "execution_count": 39, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "4.6343333341600985" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "pipe_machine = machine(pipe, X_train, y_train)\n", "fit!(pipe_machine, verbosity=0)\n", "\n", "y_pred_pipeline = predict(pipe_machine, X_test)\n", "rms(y_pred_pipeline, y_test)" ] }, { "cell_type": "code", "execution_count": 40, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Julia 1.8.3", "language": "julia", "name": "julia-1.8" }, "language_info": { "file_extension": ".jl", "mimetype": "application/julia", "name": "julia", "version": "1.8.3" }, "orig_nbformat": 4 }, "nbformat": 4, "nbformat_minor": 2 }