{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "\n", "\n", "# The RMVA Inteface: TMVA and R" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Required headers" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": true }, "outputs": [], "source": [ "#include \"TRInterface.h\"\n", "#include \"TMVA/MethodC50.h\"\n", "#include \"TMVA/MethodRSNNS.h\"\n", "#include \"TMVA/MethodRXGB.h\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Declare Factory" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "--- Factory : You are running ROOT Version: 6.07/07, Apr 1, 2016\n", "--- Factory : \n", "--- Factory : _/_/_/_/_/ _| _| _| _| _|_| \n", "--- Factory : _/ _|_| _|_| _| _| _| _| \n", "--- Factory : _/ _| _| _| _| _| _|_|_|_| \n", "--- Factory : _/ _| _| _| _| _| _| \n", "--- Factory : _/ _| _| _| _| _| \n", "--- Factory : \n", "--- Factory : ___________TMVA Version 4.2.1, Feb 5, 2015\n", "--- Factory : \n" ] } ], "source": [ "TMVA::Tools::Instance();\n", "\n", "auto inputFile = TFile::Open(\"https://raw.githubusercontent.com/iml-wg/tmvatutorials/master/inputdata.root\");\n", "auto outputFile = TFile::Open(\"TMVAOutputCV.root\", \"RECREATE\");\n", "\n", "TMVA::Factory factory(\"TMVAClassification\", outputFile,\n", " \"!V:ROC:!Correlations:!Silent:Color:!DrawProgressBar:AnalysisType=Classification\" ); " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Declare DataLoader" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "TMVA::DataLoader loader(\"dataset\");\n", "\n", "//adding variables to dataset\n", "loader.AddVariable(\"var1\");\n", "loader.AddVariable(\"var2\");\n", "loader.AddVariable(\"var3\");\n", "loader.AddVariable(\"var4\");" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Setting up Dataset" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "--- DataSetInfo : Dataset[dataset] : Added class \"Signal\"\t with internal class number 0\n", "--- dataset : Add Tree Sig of type Signal with 6000 events\n", "--- DataSetInfo : Dataset[dataset] : Added class \"Background\"\t with internal class number 1\n", "--- dataset : Add Tree Bkg of type Background with 6000 events\n", "--- dataset : Preparing trees for training and testing...\n" ] } ], "source": [ "TTree *tsignal, *tbackground;\n", "inputFile->GetObject(\"Sig\", tsignal);\n", "inputFile->GetObject(\"Bkg\", tbackground);\n", "\n", "TCut mycuts, mycutb;\n", " \n", "loader.AddSignalTree (tsignal, 1); //signal weight = 1\n", "loader.AddBackgroundTree (tbackground, 1); //background weight = 1 \n", "\n", "loader.PrepareTrainingAndTestTree(mycuts, mycutb,\n", "\"nTrain_Signal=1000:nTrain_Background=1000:SplitMode=Random:NormMode=NumEvents:!V\"); " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Booking methods\n", "The available Booking methods with options for RMVA are:\n", "\n", "- C50 Boosted Decision Trees http://oproject.org/tiki-index.php?page=RMVA#C50Booking\n", "- RMLP Neural Networks http://oproject.org/tiki-index.php?page=RMVA#RSNNSMLP \n", "- Extreme Gradient Boosted (RXGB) Decision Trees http://oproject.org/tiki-index.php?page=RMVA#RXGBBooking" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "--- Factory : Booking method: \u001b[1mC50\u001b[0m DataSet Name: \u001b[1mdataset\u001b[0m\n", "--- DataSetFactory : Dataset[dataset] : Splitmode is: \"RANDOM\" the mixmode is: \"SAMEASSPLITMODE\"\n", "--- DataSetFactory : Dataset[dataset] : Create training and testing trees -- looping over class \"Signal\" ...\n", "--- DataSetFactory : Dataset[dataset] : Weight expression for class 'Signal': \"\"\n", "--- DataSetFactory : Dataset[dataset] : Create training and testing trees -- looping over class \"Background\" ...\n", "--- DataSetFactory : Dataset[dataset] : Weight expression for class 'Background': \"\"\n", "--- DataSetFactory : Dataset[dataset] : Number of events in input trees (after possible flattening of arrays):\n", "--- DataSetFactory : Dataset[dataset] : Signal -- number of events : 6000 / sum of weights: 6000 \n", "--- DataSetFactory : Dataset[dataset] : Background -- number of events : 6000 / sum of weights: 6000 \n", "--- DataSetFactory : Dataset[dataset] : Signal tree -- total number of entries: 6000 \n", "--- DataSetFactory : Dataset[dataset] : Background tree -- total number of entries: 6000 \n", "--- DataSetFactory : Dataset[dataset] : Preselection: (will NOT affect number of requested training and testing events)\n", "--- DataSetFactory : Dataset[dataset] : No preselection cuts applied on event classes\n", "--- DataSetFactory : Dataset[dataset] : Weight renormalisation mode: \"NumEvents\": renormalises all event classes \n", "--- DataSetFactory : Dataset[dataset] : such that the effective (weighted) number of events in each class equals the respective \n", "--- DataSetFactory : Dataset[dataset] : number of events (entries) that you demanded in PrepareTrainingAndTestTree(\"\",\"nTrain_Signal=.. )\n", "--- DataSetFactory : Dataset[dataset] : ... i.e. such that Sum[i=1..N_j]{w_i} = N_j, j=0,1,2...\n", "--- DataSetFactory : Dataset[dataset] : ... (note that N_j is the sum of TRAINING events (nTrain_j...with j=Signal,Background..\n", "--- DataSetFactory : Dataset[dataset] : ..... Testing events are not renormalised nor included in the renormalisation factor! )\n", "--- DataSetFactory : Dataset[dataset] : --> Rescale Signal event weights by factor: 1\n", "--- DataSetFactory : Dataset[dataset] : --> Rescale Background event weights by factor: 1\n", "--- DataSetFactory : Dataset[dataset] : Number of training and testing events after rescaling:\n", "--- DataSetFactory : Dataset[dataset] : ---------------------------------------------------------------------------\n", "--- DataSetFactory : Dataset[dataset] : Signal -- training events : 1000 (sum of weights: 1000) - requested were 1000 events\n", "--- DataSetFactory : Dataset[dataset] : Signal -- testing events : 5000 (sum of weights: 5000) - requested were 0 events\n", "--- DataSetFactory : Dataset[dataset] : Signal -- training and testing events: 6000 (sum of weights: 6000)\n", "--- DataSetFactory : Dataset[dataset] : Background -- training events : 1000 (sum of weights: 1000) - requested were 1000 events\n", "--- DataSetFactory : Dataset[dataset] : Background -- testing events : 5000 (sum of weights: 5000) - requested were 0 events\n", "--- DataSetFactory : Dataset[dataset] : Background -- training and testing events: 6000 (sum of weights: 6000)\n", "--- DataSetFactory : Dataset[dataset] : Create internal training tree\n", "--- DataSetFactory : Dataset[dataset] : Create internal testing tree\n", "--- DataSetInfo : Dataset[dataset] : Correlation matrix (Signal):\n", "--- DataSetInfo : ----------------------------------------\n", "--- DataSetInfo : var1 var2 var3 var4\n", "--- DataSetInfo : var1: +1.000 +0.386 +0.597 +0.808\n", "--- DataSetInfo : var2: +0.386 +1.000 +0.696 +0.743\n", "--- DataSetInfo : var3: +0.597 +0.696 +1.000 +0.860\n", "--- DataSetInfo : var4: +0.808 +0.743 +0.860 +1.000\n", "--- DataSetInfo : ----------------------------------------\n", "--- DataSetInfo : Dataset[dataset] : Correlation matrix (Background):\n", "--- DataSetInfo : ----------------------------------------\n", "--- DataSetInfo : var1 var2 var3 var4\n", "--- DataSetInfo : var1: +1.000 +0.856 +0.914 +0.964\n", "--- DataSetInfo : var2: +0.856 +1.000 +0.927 +0.937\n", "--- DataSetInfo : var3: +0.914 +0.927 +1.000 +0.971\n", "--- DataSetInfo : var4: +0.964 +0.937 +0.971 +1.000\n", "--- DataSetInfo : ----------------------------------------\n", "--- DataSetFactory : Dataset[dataset] : \n", "--- Factory : Booking method: \u001b[1mRMLP\u001b[0m DataSet Name: \u001b[1mdataset\u001b[0m\n", "--- RMLP : Dataset[dataset] : Create Transformation \"N\" with events from all classes.\n", "--- Norm : Transformation, Variable selection : \n", "--- Norm : Input : variable 'var1' (index=0). <---> Output : variable 'var1' (index=0).\n", "--- Norm : Input : variable 'var2' (index=1). <---> Output : variable 'var2' (index=1).\n", "--- Norm : Input : variable 'var3' (index=2). <---> Output : variable 'var3' (index=2).\n", "--- Norm : Input : variable 'var4' (index=3). <---> Output : variable 'var4' (index=3).\n", "--- Factory : Booking method: \u001b[1mRXGB\u001b[0m DataSet Name: \u001b[1mdataset\u001b[0m\n", "--- Factory : Booking method: \u001b[1mBDT\u001b[0m DataSet Name: \u001b[1mdataset\u001b[0m\n" ] } ], "source": [ "//C50 Boosted Decision Trees (BDTs)\n", "factory.BookMethod(&loader, TMVA::Types::kC50, \"C50\",\n", " \"!H:NTrials=5:Rules=kTRUE:ControlSubSet=kFALSE:ControlBands=10:ControlWinnow=kFALSE:ControlNoGlobalPruning=kTRUE:ControlCF=0.25:ControlMinCases=2:ControlFuzzyThreshold=kTRUE:ControlSample=0:ControlEarlyStopping=kTRUE:!V\" );\n", " \n", "//Neural Networks using RSNNS package\n", "factory.BookMethod(&loader, TMVA::Types::kRSNNS, \"RMLP\",\n", " \"!H:VarTransform=N:Size=c(5):Maxit=10:InitFunc=Randomize_Weights:LearnFunc=Std_Backpropagation:LearnFuncParams=c(0.2,0):!V\" );\n", "\n", "//eXtreme Gradient Boosted XGB Decision Trees\n", "factory.BookMethod(&loader, TMVA::Types::kRXGB, \"RXGB\",\"!V:NRounds=20:MaxDepth=2:Eta=1\" );\n", "\n", "//TMVA BDTs\n", "factory.BookMethod(&loader,TMVA::Types::kBDT, \"BDT\",\n", " \"!V:NTrees=50:MinNodeSize=2.5%:MaxDepth=2:BoostType=AdaBoost:AdaBoostBeta=0.5:UseBaggedBoost:BaggedSampleFraction=0.5:SeparationType=GiniIndex:nCuts=20\" );" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Training the Methods" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "--- Factory : \n", "--- Factory : Train all methods for Classification ...\n", "--- Factory : \n", "--- Factory : current transformation string: 'I'\n", "--- Factory : Dataset[dataset] : Create Transformation \"I\" with events from all classes.\n", "--- Id : Transformation, Variable selection : \n", "--- Id : Input : variable 'var1' (index=0). <---> Output : variable 'var1' (index=0).\n", "--- Id : Input : variable 'var2' (index=1). <---> Output : variable 'var2' (index=1).\n", "--- Id : Input : variable 'var3' (index=2). <---> Output : variable 'var3' (index=2).\n", "--- Id : Input : variable 'var4' (index=3). <---> Output : variable 'var4' (index=3).\n", "--- Id : Preparing the Identity transformation...\n", "--- TFHandler_Factory : -----------------------------------------------------------\n", "--- TFHandler_Factory : Variable Mean RMS [ Min Max ]\n", "--- TFHandler_Factory : -----------------------------------------------------------\n", "--- TFHandler_Factory : var1: 0.00077102 1.6695 [ -5.8991 4.7639 ]\n", "--- TFHandler_Factory : var2: -0.0063164 1.5765 [ -5.2454 4.8300 ]\n", "--- TFHandler_Factory : var3: -0.010870 1.7365 [ -5.3563 4.6430 ]\n", "--- TFHandler_Factory : var4: 0.14557 2.1608 [ -6.9675 5.0307 ]\n", "--- TFHandler_Factory : -----------------------------------------------------------\n", "--- TFHandler_Factory : Plot event variables for Id\n", "--- TFHandler_Factory : Create scatter and profile plots in target-file directory: \n", "--- TFHandler_Factory : TMVAOutputCV.root:/dataset/InputVariables_Id/CorrelationPlots\n", "--- TFHandler_Factory : \n", "--- TFHandler_Factory : Ranking input variables (method unspecific)...\n", "--- IdTransformation : Ranking result (top variable is best ranked)\n", "--- IdTransformation : -----------------------------\n", "--- IdTransformation : Rank : Variable : Separation\n", "--- IdTransformation : -----------------------------\n", "--- IdTransformation : 1 : var4 : 3.458e-01\n", "--- IdTransformation : 2 : var3 : 2.817e-01\n", "--- IdTransformation : 3 : var1 : 2.640e-01\n", "--- IdTransformation : 4 : var2 : 2.173e-01\n", "--- IdTransformation : -----------------------------\n", "--- Factory : Train method: C50 for Classification\n", "--- C50 : Dataset[dataset] : Begin training\n", "--- C50 : \n", "--- C50 : \u001b[1m--- Saving State File In:\u001b[0mweights/C50Model.RData\n", "--- C50 : \n", "--- C50 : Dataset[dataset] : End of training \n", "--- C50 : Dataset[dataset] : Elapsed time for training with 2000 events: \u001b[1;31m0.873 sec\u001b[0m \n", "--- C50 : Dataset[dataset] : Create MVA output for Dataset[dataset] : classification on training sample\n", "--- C50 : Dataset[dataset] : Evaluation of C50 on training sample (2000 events)\n", "--- C50 : Dataset[dataset] : Elapsed time for evaluation of 2000 events: \u001b[1;31m0.0748 sec\u001b[0m \n", "--- C50 : Dataset[dataset] : Creating weight file in xml format: \u001b[0;36mweights/TMVAClassification_C50.weights.xml\u001b[0m\n", "--- Factory : Training finished\n", "--- Factory : Train method: RMLP for Classification\n", "--- Norm : Preparing the transformation.\n", "--- TFHandler_RMLP : -----------------------------------------------------------\n", "--- TFHandler_RMLP : Variable Mean RMS [ Min Max ]\n", "--- TFHandler_RMLP : -----------------------------------------------------------\n", "--- TFHandler_RMLP : var1: 0.085752 0.35735 [ -1.0000 1.0000 ]\n", "--- TFHandler_RMLP : var2: 0.10321 0.36589 [ -1.0000 1.0000 ]\n", "--- TFHandler_RMLP : var3: 0.10411 0.37276 [ -1.0000 1.0000 ]\n", "--- TFHandler_RMLP : var4: 0.17623 0.40650 [ -1.0000 1.0000 ]\n", "--- TFHandler_RMLP : -----------------------------------------------------------\n", "--- RMLP : Dataset[dataset] : Begin training\n", "--- RMLP : \n", "--- RMLP : \u001b[1m--- Saving State File In:\u001b[0mweights/RMLPModel.RData\n", "--- RMLP : \n", "--- RMLP : Dataset[dataset] : End of training \n", "--- RMLP : Dataset[dataset] : Elapsed time for training with 2000 events: \u001b[1;31m1.07 sec\u001b[0m \n", "--- RMLP : Dataset[dataset] : Create MVA output for Dataset[dataset] : classification on training sample\n", "--- RMLP : Dataset[dataset] : Evaluation of RMLP on training sample (2000 events)\n", "--- RMLP : Dataset[dataset] : Elapsed time for evaluation of 2000 events: \u001b[1;31m0.739 sec\u001b[0m \n", "--- RMLP : Dataset[dataset] : Creating weight file in xml format: \u001b[0;36mweights/TMVAClassification_RMLP.weights.xml\u001b[0m\n", "--- RMLP : Dataset[dataset] : Creating standalone response class: \u001b[0;36mweights/TMVAClassification_RMLP.class.C\u001b[0m\n", "--- Factory : Training finished\n", "--- Factory : Train method: RXGB for Classification\n", "--- RXGB : Dataset[dataset] : Begin training\n", "[0]\ttrain-rmse:0.393636\n", "[1]\ttrain-rmse:0.383795\n", "[2]\ttrain-rmse:0.371803\n", "[3]\ttrain-rmse:0.362196\n", "[4]\ttrain-rmse:0.357333\n", "[5]\ttrain-rmse:0.351344\n", "[6]\ttrain-rmse:0.346084\n", "[7]\ttrain-rmse:0.340683\n", "[8]\ttrain-rmse:0.332887\n", "[9]\ttrain-rmse:0.328419\n", "[10]\ttrain-rmse:0.326714\n", "[11]\ttrain-rmse:0.324244\n", "[12]\ttrain-rmse:0.321295\n", "[13]\ttrain-rmse:0.318570\n", "[14]\ttrain-rmse:0.315405\n", "[15]\ttrain-rmse:0.312603\n", "[16]\ttrain-rmse:0.310186\n", "[17]\ttrain-rmse:0.309424\n", "[18]\ttrain-rmse:0.306099\n", "[19]\ttrain-rmse:0.304363\n", "--- RXGB : \n", "--- RXGB : \u001b[1m--- Saving State File In:\u001b[0mweights/RXGBModel.RData\n", "--- RXGB : \n", "--- RXGB : Dataset[dataset] : End of training \n", "--- RXGB : Dataset[dataset] : Elapsed time for training with 2000 events: \u001b[1;31m0.234 sec\u001b[0m \n", "--- RXGB : Dataset[dataset] : Create MVA output for Dataset[dataset] : classification on training sample\n", "--- RXGB : Dataset[dataset] : Evaluation of RXGB on training sample (2000 events)\n", "--- RXGB : Dataset[dataset] : Elapsed time for evaluation of 2000 events: \u001b[1;31m0.00485 sec\u001b[0m \n", "--- RXGB : Dataset[dataset] : Creating weight file in xml format: \u001b[0;36mweights/TMVAClassification_RXGB.weights.xml\u001b[0m\n", "--- Factory : Training finished\n", "--- Factory : Train method: BDT for Classification\n", "--- BDT : Dataset[dataset] : Begin training\n", "--- BDT : found and suggest the following possible pre-selection cuts \n", "--- BDT : as option DoPreselection was not used, these cuts however will not be performed, but the training will see the full sample\n", "--- BDT : found cut: Bkg if var 0 < -2.99038\n", "--- BDT : found cut: Bkg if var 2 < -2.88493\n", "--- BDT : found cut: Bkg if var 3 < -2.54088\n", "--- BDT : For classification trees, \n", "--- BDT : the effective number of backgrounds is scaled to match \n", "--- BDT : the signal. Othersise the first boosting step would do 'just that'!\n", "--- BDT : re-normlise events such that Sig and Bkg have respective sum of weights = 1\n", "--- BDT : sig->sig*1ev. bkg->bkg*1ev.\n", "--- BDT : #events: (reweighted) sig: 1000 bkg: 1000\n", "--- BDT : #events: (unweighted) sig: 1000 bkg: 1000\n", "--- BDT : Training 50 Decision Trees ... patience please\n", "--- BinaryTree : The minimal node size MinNodeSize=2.5 fMinNodeSize=2.5% is translated to an actual number of events = 25.7 for the training sample size of 1028\n", "--- BinaryTree : Note: This number will be taken as absolute minimum in the node, \n", "--- BinaryTree : in terms of 'weighted events' and unweighted ones !! \n", "--- BDT : elapsed time: \u001b[1;31m0.0508 sec\u001b[0m \n", "--- BDT : average number of nodes (w/o pruning) : 4\n", "--- BDT : Dataset[dataset] : End of training \n", "--- BDT : Dataset[dataset] : Elapsed time for training with 2000 events: \u001b[1;31m0.061 sec\u001b[0m \n", "--- BDT : Dataset[dataset] : Create MVA output for Dataset[dataset] : classification on training sample\n", "--- BDT : Dataset[dataset] : Evaluation of BDT on training sample (2000 events)\n", "--- BDT : Dataset[dataset] : Elapsed time for evaluation of 2000 events: \u001b[1;31m0.00845 sec\u001b[0m \n", "--- BDT : Dataset[dataset] : Creating weight file in xml format: \u001b[0;36mdataset/weights/TMVAClassification_BDT.weights.xml\u001b[0m\n", "--- BDT : Dataset[dataset] : Creating standalone response class: \u001b[0;36mdataset/weights/TMVAClassification_BDT.class.C\u001b[0m\n", "--- BDT : Write monitoring histograms to file: TMVAOutputCV.root:/dataset/Method_BDT/BDT\n", "--- Factory : Training finished\n", "--- Factory : \n", "--- Factory : Ranking input variables (method specific)...\n", "--- Factory : No variable ranking supplied by classifier: C50\n", "--- Factory : No variable ranking supplied by classifier: RMLP\n", "--- Factory : No variable ranking supplied by classifier: RXGB\n", "--- BDT : Ranking result (top variable is best ranked)\n", "--- BDT : --------------------------------------\n", "--- BDT : Rank : Variable : Variable Importance\n", "--- BDT : --------------------------------------\n", "--- BDT : 1 : var4 : 4.608e-01\n", "--- BDT : 2 : var1 : 2.923e-01\n", "--- BDT : 3 : var2 : 1.457e-01\n", "--- BDT : 4 : var3 : 1.012e-01\n", "--- BDT : --------------------------------------\n", "--- Factory : \n", "--- Factory : === Destroy and recreate all methods via weight files for testing ===\n", "--- Factory : \n", "--- MethodBase : Dataset[dataset] : Reading weight file: \u001b[0;36mweights/TMVAClassification_C50.weights.xml\u001b[0m\n", "--- C50 : Dataset[dataset] : Read method \"C50\" of type \"C50\"\n", "--- C50 : Dataset[dataset] : MVA method was trained with TMVA Version: 4.2.1\n", "--- C50 : Dataset[dataset] : MVA method was trained with ROOT Version: 6.07/07\n", "--- MethodBase : Dataset[dataset] : Reading weight file: \u001b[0;36mweights/TMVAClassification_RMLP.weights.xml\u001b[0m\n", "--- RMLP : Dataset[dataset] : Read method \"RMLP\" of type \"RSNNS\"\n", "--- RMLP : Dataset[dataset] : MVA method was trained with TMVA Version: 4.2.1\n", "--- RMLP : Dataset[dataset] : MVA method was trained with ROOT Version: 6.07/07\n", "--- MethodBase : Dataset[dataset] : Reading weight file: \u001b[0;36mweights/TMVAClassification_RXGB.weights.xml\u001b[0m\n", "--- RXGB : Dataset[dataset] : Read method \"RXGB\" of type \"RXGB\"\n", "--- RXGB : Dataset[dataset] : MVA method was trained with TMVA Version: 4.2.1\n", "--- RXGB : Dataset[dataset] : MVA method was trained with ROOT Version: 6.07/07\n", "--- MethodBase : Dataset[dataset] : Reading weight file: \u001b[0;36mdataset/weights/TMVAClassification_BDT.weights.xml\u001b[0m\n", "--- BDT : Dataset[dataset] : Read method \"BDT\" of type \"BDT\"\n", "--- BDT : Dataset[dataset] : MVA method was trained with TMVA Version: 4.2.1\n", "--- BDT : Dataset[dataset] : MVA method was trained with ROOT Version: 6.07/07\n" ] } ], "source": [ "factory.TrainAllMethods();" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Testing and Evaluating the data" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "--- Factory : Test all methods...\n", "--- Factory : Test method: C50 for Classification performance\n", "--- C50 : Dataset[dataset] : Evaluation of C50 on testing sample (10000 events)\n", "--- C50 : \n", "--- C50 : \u001b[1m--- Loading State File From:\u001b[0mweights/C50Model.RData\n", "--- C50 : \n", "--- C50 : Dataset[dataset] : Elapsed time for evaluation of 10000 events: \u001b[1;31m0.546 sec\u001b[0m \n", "--- Factory : Test method: RMLP for Classification performance\n", "--- RMLP : Dataset[dataset] : Evaluation of RMLP on testing sample (10000 events)\n", "--- RMLP : \n", "--- RMLP : \u001b[1m--- Loading State File From:\u001b[0mweights/RMLPModel.RData\n", "--- RMLP : \n", "--- RMLP : Dataset[dataset] : Elapsed time for evaluation of 10000 events: \u001b[1;31m3.49 sec\u001b[0m \n", "--- Factory : Test method: RXGB for Classification performance\n", "--- RXGB : Dataset[dataset] : Evaluation of RXGB on testing sample (10000 events)\n", "--- RXGB : \n", "--- RXGB : \u001b[1m--- Loading State File From:\u001b[0mweights/RXGBModel.RData\n", "--- RXGB : \n", "--- RXGB : Dataset[dataset] : Elapsed time for evaluation of 10000 events: \u001b[1;31m0.0286 sec\u001b[0m \n", "--- Factory : Test method: BDT for Classification performance\n", "--- BDT : Dataset[dataset] : Evaluation of BDT on testing sample (10000 events)\n", "--- BDT : Dataset[dataset] : Elapsed time for evaluation of 10000 events: \u001b[1;31m0.0235 sec\u001b[0m \n", "--- Factory : Evaluate all methods...\n", "--- Factory : Evaluate classifier: C50\n", "--- C50 : Testing Classification C50 METHOD \n", "--- C50 : Dataset[dataset] : Loop over test events and fill histograms with classifier response...\n", "--- Factory : Write evaluation histograms to file\n", "--- TFHandler_C50 : Plot event variables for C50\n", "--- TFHandler_C50 : -----------------------------------------------------------\n", "--- TFHandler_C50 : Variable Mean RMS [ Min Max ]\n", "--- TFHandler_C50 : -----------------------------------------------------------\n", "--- TFHandler_C50 : var1: 0.00077102 1.6695 [ -5.8991 4.7639 ]\n", "--- TFHandler_C50 : var2: -0.0063164 1.5765 [ -5.2454 4.8300 ]\n", "--- TFHandler_C50 : var3: -0.010870 1.7365 [ -5.3563 4.6430 ]\n", "--- TFHandler_C50 : var4: 0.14557 2.1608 [ -6.9675 5.0307 ]\n", "--- TFHandler_C50 : -----------------------------------------------------------\n", "--- TFHandler_C50 : Create scatter and profile plots in target-file directory: \n", "--- TFHandler_C50 : TMVAOutputCV.root:/dataset/Method_C50/C50/CorrelationPlots\n", "--- Factory : Evaluate classifier: RMLP\n", "--- RMLP : Testing Classification RMLP METHOD \n", "--- TFHandler_RMLP : -----------------------------------------------------------\n", "--- TFHandler_RMLP : Variable Mean RMS [ Min Max ]\n", "--- TFHandler_RMLP : -----------------------------------------------------------\n", "--- TFHandler_RMLP : var1: 0.066774 0.35913 [ -1.2024 1.0914 ]\n", "--- TFHandler_RMLP : var2: 0.079492 0.36669 [ -1.1391 1.2044 ]\n", "--- TFHandler_RMLP : var3: 0.079125 0.37282 [ -1.0685 1.0783 ]\n", "--- TFHandler_RMLP : var4: 0.15120 0.40805 [ -1.1921 1.0737 ]\n", "--- TFHandler_RMLP : -----------------------------------------------------------\n", "--- RMLP : Dataset[dataset] : Loop over test events and fill histograms with classifier response...\n", "--- Factory : Write evaluation histograms to file\n", "--- TFHandler_RMLP : Plot event variables for RMLP\n", "--- TFHandler_RMLP : -----------------------------------------------------------\n", "--- TFHandler_RMLP : Variable Mean RMS [ Min Max ]\n", "--- TFHandler_RMLP : -----------------------------------------------------------\n", "--- TFHandler_RMLP : var1: 0.066774 0.35913 [ -1.2024 1.0914 ]\n", "--- TFHandler_RMLP : var2: 0.079492 0.36669 [ -1.1391 1.2044 ]\n", "--- TFHandler_RMLP : var3: 0.079125 0.37282 [ -1.0685 1.0783 ]\n", "--- TFHandler_RMLP : var4: 0.15120 0.40805 [ -1.1921 1.0737 ]\n", "--- TFHandler_RMLP : -----------------------------------------------------------\n", "--- TFHandler_RMLP : Create scatter and profile plots in target-file directory: \n", "--- TFHandler_RMLP : TMVAOutputCV.root:/dataset/Method_RMLP/RMLP/CorrelationPlots\n", "--- Factory : Evaluate classifier: RXGB\n", "--- RXGB : Testing Classification RXGB METHOD \n", "--- RXGB : Dataset[dataset] : Loop over test events and fill histograms with classifier response...\n", "--- Factory : Write evaluation histograms to file\n", "--- TFHandler_RXGB : Plot event variables for RXGB\n", "--- TFHandler_RXGB : -----------------------------------------------------------\n", "--- TFHandler_RXGB : Variable Mean RMS [ Min Max ]\n", "--- TFHandler_RXGB : -----------------------------------------------------------\n", "--- TFHandler_RXGB : var1: 0.00077102 1.6695 [ -5.8991 4.7639 ]\n", "--- TFHandler_RXGB : var2: -0.0063164 1.5765 [ -5.2454 4.8300 ]\n", "--- TFHandler_RXGB : var3: -0.010870 1.7365 [ -5.3563 4.6430 ]\n", "--- TFHandler_RXGB : var4: 0.14557 2.1608 [ -6.9675 5.0307 ]\n", "--- TFHandler_RXGB : -----------------------------------------------------------\n", "--- TFHandler_RXGB : Create scatter and profile plots in target-file directory: \n", "--- TFHandler_RXGB : TMVAOutputCV.root:/dataset/Method_RXGB/RXGB/CorrelationPlots\n", "--- Factory : Evaluate classifier: BDT\n", "--- BDT : Dataset[dataset] : Loop over test events and fill histograms with classifier response...\n", "--- Factory : Write evaluation histograms to file\n", "--- TFHandler_BDT : Plot event variables for BDT\n", "--- TFHandler_BDT : -----------------------------------------------------------\n", "--- TFHandler_BDT : Variable Mean RMS [ Min Max ]\n", "--- TFHandler_BDT : -----------------------------------------------------------\n", "--- TFHandler_BDT : var1: 0.00077102 1.6695 [ -5.8991 4.7639 ]\n", "--- TFHandler_BDT : var2: -0.0063164 1.5765 [ -5.2454 4.8300 ]\n", "--- TFHandler_BDT : var3: -0.010870 1.7365 [ -5.3563 4.6430 ]\n", "--- TFHandler_BDT : var4: 0.14557 2.1608 [ -6.9675 5.0307 ]\n", "--- TFHandler_BDT : -----------------------------------------------------------\n", "--- TFHandler_BDT : Create scatter and profile plots in target-file directory: \n", "--- TFHandler_BDT : TMVAOutputCV.root:/dataset/Method_BDT/BDT/CorrelationPlots\n", "--- Factory : \n", "--- Factory : Evaluation results ranked by best signal efficiency and purity (area)\n", "--- Factory : -------------------------------------------------------------------------------------------------------------------\n", "--- Factory : DataSet MVA Signal efficiency at bkg eff.(error): | Sepa- Signifi- \n", "--- Factory : Name: Method: @B=0.01 @B=0.10 @B=0.30 ROC-integ ROCCurve| ration: cance: \n", "--- Factory : -------------------------------------------------------------------------------------------------------------------\n", "--- Factory : dataset RMLP : 0.343(06) 0.764(06) 0.954(02) 0.930 0.929 | 0.578 1.647\n", "--- Factory : dataset RXGB : 0.208(05) 0.693(06) 0.921(03) 0.901 0.902 | 0.507 1.387\n", "--- Factory : dataset BDT : 0.263(06) 0.642(06) 0.900(04) 0.894 0.894 | 0.483 1.248\n", "--- Factory : dataset C50 : 0.000(00) 0.689(06) 0.926(03) 0.892 0.898 | 0.521 1.411\n", "--- Factory : -------------------------------------------------------------------------------------------------------------------\n", "--- Factory : \n", "--- Factory : Testing efficiency compared to training efficiency (overtraining check)\n", "--- Factory : -------------------------------------------------------------------------------------------------------------------\n", "--- Factory : DataSet MVA Signal efficiency: from test sample (from training sample) \n", "--- Factory : Name: Method: @B=0.01 @B=0.10 @B=0.30 \n", "--- Factory : -------------------------------------------------------------------------------------------------------------------\n", "--- Factory : dataset RMLP : 0.343 (0.365) 0.764 (0.784) 0.954 (0.955)\n", "--- Factory : dataset RXGB : 0.208 (0.375) 0.693 (0.802) 0.921 (0.944)\n", "--- Factory : dataset BDT : 0.263 (0.225) 0.642 (0.671) 0.900 (0.902)\n", "--- Factory : dataset C50 : 0.000 (0.474) 0.689 (0.848) 0.926 (0.943)\n", "--- Factory : -------------------------------------------------------------------------------------------------------------------\n", "--- Factory : \n", "--- Dataset:dataset : Dataset[dataset] : Created tree 'TestTree' with 10000 events\n", "--- Dataset:dataset : Dataset[dataset] : Created tree 'TrainTree' with 2000 events\n", "--- Factory : \n", "--- Factory : \u001b[1mThank you for using TMVA!\u001b[0m\n", "--- Factory : \u001b[1mFor citation information, please visit: http://tmva.sf.net/citeTMVA.html\u001b[0m\n" ] } ], "source": [ "factory.TestAllMethods();\n", "factory.EvaluateAllMethods(); " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Ploting ROC Curve\n", "We enable the ROOT JavaScript interactive visualisation." ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "
\n", "
\n", "\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "%jsroot on\n", "auto c = factory.GetROCCurve(&loader);\n", "c->Draw();" ] } ], "metadata": { "kernelspec": { "display_name": "ROOT C++", "language": "c++", "name": "root" }, "language_info": { "codemirror_mode": "text/x-c++src", "file_extension": ".C", "mimetype": " text/x-c++src", "name": "c++" } }, "nbformat": 4, "nbformat_minor": 1 }