{ "metadata": { "name": "", "signature": "sha256:4522a4af5b5249335fbc13fc6ff380b3cda518b5023a1f217b0bd08639155ba8" }, "nbformat": 3, "nbformat_minor": 0, "worksheets": [ { "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "Example with ParsimonY\n", "====================\n", "\n", "In previous examples, we only use scikit-learn algorithms.\n", "In this example, we will learn how to use another Python machine learning library.\n", "You need to install ParsimonY to run this example: https://github.com/neurospin/pylearn-parsimony\n", "\n", "For ``Pipeline``, ``r2_score`` and ``StratifiedKFold`` we rely on scikit-learn objects.\n" ] }, { "cell_type": "code", "collapsed": true, "input": [ "from sklearn.cross_validation import StratifiedKFold\n", "from sklearn.pipeline import Pipeline\n", "from sklearn.metrics import r2_score" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 1 }, { "cell_type": "code", "collapsed": false, "input": [ "from mempamal.configuration import JSONify_estimator, JSONify_cv, build_dataset\n", "from mempamal.workflow import create_wf, save_wf\n", "from mempamal.datasets import iris" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 2 }, { "cell_type": "code", "collapsed": true, "input": [ "# iris dataset as usual but with linear regression (Why not! :p).\n", "X, y = iris.get_data()" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 3 }, { "cell_type": "markdown", "metadata": {}, "source": [ "To ensure that ``parsimony.estimators.ElasticNet`` is compliant with the scikit-learn interface, we create a wrapper which inherits from ``sklearn.base.BaseEstimator``. Notice, that this wrapper must be accessible in your ``PYTHONPATH`` for the future tasks. In the MEmPaMaL examples we provide the ElasticNet wrapper for the sake of the example." ] }, { "cell_type": "code", "collapsed": false, "input": [ "import inspect\n", "from mempamal.examples.elasticnet_parsimony import EnetWrap\n", "print(inspect.getsource(inspect.getmodule(EnetWrap)))" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "from parsimony.estimators import ElasticNet\n", "from sklearn.base import BaseEstimator\n", "\n", "\n", "class EnetWrap(BaseEstimator, ElasticNet):\n", "\n", " def __init__(self, l=0., alpha=1., algorithm_params={},\n", " penalty_start=0, mean=True):\n", " super(EnetWrap, self).__init__(l=l, alpha=alpha,\n", " algorithm_params=algorithm_params,\n", " penalty_start=penalty_start, mean=mean)\n", "\n" ] } ], "prompt_number": 4 }, { "cell_type": "markdown", "metadata": {}, "source": [ "The estimator is only the ``EnetWrap`` and we create a multi-parameters grid (``enet__l`` and ``enet__alpha``)." ] }, { "cell_type": "code", "collapsed": false, "input": [ "est = Pipeline([(\"enet\", EnetWrap())])" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 5 }, { "cell_type": "code", "collapsed": false, "input": [ "alphas = [1e-4, 1e-3, 1e-2, 0.1, 1., 10., 100., 1e3]\n", "ls = [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9]\n", "grid = []\n", "for a in alphas:\n", " for l in ls:\n", " grid.append({\"enet__l\": l,\n", " \"enet__alpha\": a})\n", "print(\"The grid contains {} sets of parameters:\".format(len(grid)))\n", "grid" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "The grid contains 72 sets of parameters:\n" ] }, { "metadata": {}, "output_type": "pyout", "prompt_number": 6, "text": [ "[{'enet__alpha': 0.0001, 'enet__l': 0.1},\n", " {'enet__alpha': 0.0001, 'enet__l': 0.2},\n", " {'enet__alpha': 0.0001, 'enet__l': 0.3},\n", " {'enet__alpha': 0.0001, 'enet__l': 0.4},\n", " {'enet__alpha': 0.0001, 'enet__l': 0.5},\n", " {'enet__alpha': 0.0001, 'enet__l': 0.6},\n", " {'enet__alpha': 0.0001, 'enet__l': 0.7},\n", " {'enet__alpha': 0.0001, 'enet__l': 0.8},\n", " {'enet__alpha': 0.0001, 'enet__l': 0.9},\n", " {'enet__alpha': 0.001, 'enet__l': 0.1},\n", " {'enet__alpha': 0.001, 'enet__l': 0.2},\n", " {'enet__alpha': 0.001, 'enet__l': 0.3},\n", " {'enet__alpha': 0.001, 'enet__l': 0.4},\n", " {'enet__alpha': 0.001, 'enet__l': 0.5},\n", " {'enet__alpha': 0.001, 'enet__l': 0.6},\n", " {'enet__alpha': 0.001, 'enet__l': 0.7},\n", " {'enet__alpha': 0.001, 'enet__l': 0.8},\n", " {'enet__alpha': 0.001, 'enet__l': 0.9},\n", " {'enet__alpha': 0.01, 'enet__l': 0.1},\n", " {'enet__alpha': 0.01, 'enet__l': 0.2},\n", " {'enet__alpha': 0.01, 'enet__l': 0.3},\n", " {'enet__alpha': 0.01, 'enet__l': 0.4},\n", " {'enet__alpha': 0.01, 'enet__l': 0.5},\n", " {'enet__alpha': 0.01, 'enet__l': 0.6},\n", " {'enet__alpha': 0.01, 'enet__l': 0.7},\n", " {'enet__alpha': 0.01, 'enet__l': 0.8},\n", " {'enet__alpha': 0.01, 'enet__l': 0.9},\n", " {'enet__alpha': 0.1, 'enet__l': 0.1},\n", " {'enet__alpha': 0.1, 'enet__l': 0.2},\n", " {'enet__alpha': 0.1, 'enet__l': 0.3},\n", " {'enet__alpha': 0.1, 'enet__l': 0.4},\n", " {'enet__alpha': 0.1, 'enet__l': 0.5},\n", " {'enet__alpha': 0.1, 'enet__l': 0.6},\n", " {'enet__alpha': 0.1, 'enet__l': 0.7},\n", " {'enet__alpha': 0.1, 'enet__l': 0.8},\n", " {'enet__alpha': 0.1, 'enet__l': 0.9},\n", " {'enet__alpha': 1.0, 'enet__l': 0.1},\n", " {'enet__alpha': 1.0, 'enet__l': 0.2},\n", " {'enet__alpha': 1.0, 'enet__l': 0.3},\n", " {'enet__alpha': 1.0, 'enet__l': 0.4},\n", " {'enet__alpha': 1.0, 'enet__l': 0.5},\n", " {'enet__alpha': 1.0, 'enet__l': 0.6},\n", " {'enet__alpha': 1.0, 'enet__l': 0.7},\n", " {'enet__alpha': 1.0, 'enet__l': 0.8},\n", " {'enet__alpha': 1.0, 'enet__l': 0.9},\n", " {'enet__alpha': 10.0, 'enet__l': 0.1},\n", " {'enet__alpha': 10.0, 'enet__l': 0.2},\n", " {'enet__alpha': 10.0, 'enet__l': 0.3},\n", " {'enet__alpha': 10.0, 'enet__l': 0.4},\n", " {'enet__alpha': 10.0, 'enet__l': 0.5},\n", " {'enet__alpha': 10.0, 'enet__l': 0.6},\n", " {'enet__alpha': 10.0, 'enet__l': 0.7},\n", " {'enet__alpha': 10.0, 'enet__l': 0.8},\n", " {'enet__alpha': 10.0, 'enet__l': 0.9},\n", " {'enet__alpha': 100.0, 'enet__l': 0.1},\n", " {'enet__alpha': 100.0, 'enet__l': 0.2},\n", " {'enet__alpha': 100.0, 'enet__l': 0.3},\n", " {'enet__alpha': 100.0, 'enet__l': 0.4},\n", " {'enet__alpha': 100.0, 'enet__l': 0.5},\n", " {'enet__alpha': 100.0, 'enet__l': 0.6},\n", " {'enet__alpha': 100.0, 'enet__l': 0.7},\n", " {'enet__alpha': 100.0, 'enet__l': 0.8},\n", " {'enet__alpha': 100.0, 'enet__l': 0.9},\n", " {'enet__alpha': 1000.0, 'enet__l': 0.1},\n", " {'enet__alpha': 1000.0, 'enet__l': 0.2},\n", " {'enet__alpha': 1000.0, 'enet__l': 0.3},\n", " {'enet__alpha': 1000.0, 'enet__l': 0.4},\n", " {'enet__alpha': 1000.0, 'enet__l': 0.5},\n", " {'enet__alpha': 1000.0, 'enet__l': 0.6},\n", " {'enet__alpha': 1000.0, 'enet__l': 0.7},\n", " {'enet__alpha': 1000.0, 'enet__l': 0.8},\n", " {'enet__alpha': 1000.0, 'enet__l': 0.9}]" ] } ], "prompt_number": 6 }, { "cell_type": "markdown", "metadata": {}, "source": [ "We jsonify the estimator and the cross-validation configuration.\n", "We build the dataset in the current directory. \n", "It's create a ``dataset.joblib`` file. \n", "Then we create the workflow in our internal format (``create_wf``). \n", "With ``verbose=True``, it prints the commands on ``stdout``.\n", "And finally, we output the workflow (``save_wf``) in the soma-workflow format \n", "and write it to ``workflow.json`` (need soma-workflow)." ] }, { "cell_type": "code", "collapsed": false, "input": [ "method_conf = JSONify_estimator(est, out=\"./est.json\")\n", "cv_conf = JSONify_cv(StratifiedKFold, cv_kwargs={\"n_folds\": 5},\n", " score_func=r2_score, \n", " stratified=True,\n", " inner_cv=StratifiedKFold,\n", " inner_cv_kwargs={\"n_folds\": 5},\n", " inner_score_func=r2_score,\n", " out=\"./cv.json\")\n", "dataset = build_dataset(X, y, method_conf, cv_conf, grid=grid, outputdir=\".\")\n", "wfi = create_wf(dataset['folds'], cv_conf, method_conf, \".\",\n", " verbose=True)\n", "wf = save_wf(wfi, \"./workflow.json\", mode=\"soma-workflow\")" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "python mempamal/scripts/mapper.py ./cv.json ./est.json ./dataset.joblib ./map_res_0_0.pkl 0 --inner 0\n", "python mempamal/scripts/mapper.py ./cv.json ./est.json ./dataset.joblib ./map_res_0_1.pkl 0 --inner 1\n", "python mempamal/scripts/mapper.py ./cv.json ./est.json ./dataset.joblib ./map_res_0_2.pkl 0 --inner 2\n", "python mempamal/scripts/mapper.py ./cv.json ./est.json ./dataset.joblib ./map_res_0_3.pkl 0 --inner 3\n", "python mempamal/scripts/mapper.py ./cv.json ./est.json ./dataset.joblib ./map_res_0_4.pkl 0 --inner 4\n", "\n", "python mempamal/scripts/inner_reducer.py ./cv.json ./est.json ./dataset.joblib ./red_res_0.pkl ./map_res_0_{inner}.pkl 0\n", "\n", "python mempamal/scripts/mapper.py ./cv.json ./est.json ./dataset.joblib ./map_res_1_0.pkl 1 --inner 0\n", "python mempamal/scripts/mapper.py ./cv.json ./est.json ./dataset.joblib ./map_res_1_1.pkl 1 --inner 1\n", "python mempamal/scripts/mapper.py ./cv.json ./est.json ./dataset.joblib ./map_res_1_2.pkl 1 --inner 2\n", "python mempamal/scripts/mapper.py ./cv.json ./est.json ./dataset.joblib ./map_res_1_3.pkl 1 --inner 3\n", "python mempamal/scripts/mapper.py ./cv.json ./est.json ./dataset.joblib ./map_res_1_4.pkl 1 --inner 4\n", "\n", "python mempamal/scripts/inner_reducer.py ./cv.json ./est.json ./dataset.joblib ./red_res_1.pkl ./map_res_1_{inner}.pkl 1\n", "\n", "python mempamal/scripts/mapper.py ./cv.json ./est.json ./dataset.joblib ./map_res_2_0.pkl 2 --inner 0\n", "python mempamal/scripts/mapper.py ./cv.json ./est.json ./dataset.joblib ./map_res_2_1.pkl 2 --inner 1\n", "python mempamal/scripts/mapper.py ./cv.json ./est.json ./dataset.joblib ./map_res_2_2.pkl 2 --inner 2\n", "python mempamal/scripts/mapper.py ./cv.json ./est.json ./dataset.joblib ./map_res_2_3.pkl 2 --inner 3\n", "python mempamal/scripts/mapper.py ./cv.json ./est.json ./dataset.joblib ./map_res_2_4.pkl 2 --inner 4\n", "\n", "python mempamal/scripts/inner_reducer.py ./cv.json ./est.json ./dataset.joblib ./red_res_2.pkl ./map_res_2_{inner}.pkl 2\n", "\n", "python mempamal/scripts/mapper.py ./cv.json ./est.json ./dataset.joblib ./map_res_3_0.pkl 3 --inner 0\n", "python mempamal/scripts/mapper.py ./cv.json ./est.json ./dataset.joblib ./map_res_3_1.pkl 3 --inner 1\n", "python mempamal/scripts/mapper.py ./cv.json ./est.json ./dataset.joblib ./map_res_3_2.pkl 3 --inner 2\n", "python mempamal/scripts/mapper.py ./cv.json ./est.json ./dataset.joblib ./map_res_3_3.pkl 3 --inner 3\n", "python mempamal/scripts/mapper.py ./cv.json ./est.json ./dataset.joblib ./map_res_3_4.pkl 3 --inner 4\n", "\n", "python mempamal/scripts/inner_reducer.py ./cv.json ./est.json ./dataset.joblib ./red_res_3.pkl ./map_res_3_{inner}.pkl 3\n", "\n", "python mempamal/scripts/mapper.py ./cv.json ./est.json ./dataset.joblib ./map_res_4_0.pkl 4 --inner 0\n", "python mempamal/scripts/mapper.py ./cv.json ./est.json ./dataset.joblib ./map_res_4_1.pkl 4 --inner 1\n", "python mempamal/scripts/mapper.py ./cv.json ./est.json ./dataset.joblib ./map_res_4_2.pkl 4 --inner 2\n", "python mempamal/scripts/mapper.py ./cv.json ./est.json ./dataset.joblib ./map_res_4_3.pkl 4 --inner 3\n", "python mempamal/scripts/mapper.py ./cv.json ./est.json ./dataset.joblib ./map_res_4_4.pkl 4 --inner 4\n", "\n", "python mempamal/scripts/inner_reducer.py ./cv.json ./est.json ./dataset.joblib ./red_res_4.pkl ./map_res_4_{inner}.pkl 4\n", "\n", "python mempamal/scripts/outer_reducer.py ./final_res.pkl ./red_res_{outer}.pkl\n" ] } ], "prompt_number": 7 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now, we create a ``WorkflowController`` and we submit the workflow. \n", "We wait for workflow completion then we read the final results." ] }, { "cell_type": "code", "collapsed": false, "input": [ "from soma_workflow.client import WorkflowController\n", "\n", "import time\n", "import json\n", "import sklearn.externals.joblib as joblib\n", "\n", "controller = WorkflowController()\n", "wf_id = controller.submit_workflow(workflow=wf, name=\"third example\")\n", "\n", "while controller.workflow_status(wf_id) != 'workflow_done':\n", " time.sleep(2)\n", "print(joblib.load('./final_res.pkl'))" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "light mode\n", "{'std': 0.020656789215473488, 'raw': array([ 0.90915875, 0.95577807, 0.93911917, 0.93473956, 0.89903827]), 'median': 0.93473955712380996, 'mean': 0.9275667625341375}" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n" ] } ], "prompt_number": 8 } ], "metadata": {} } ] }